The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining book
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
Support in Sequential Pattern Mining algorithms
Posted by: Mariana T
Date: September 07, 2017 06:43PM

Hi Philippe,

first, thanks for developing smpf, it's a great tool!

i am using it for finding some sequential patterns given by a sensor of one telescope.

i have a .txt file with 58613 sequences, but i picked up the first five sequences to test your tool. here are they:

13 -1 16 -1 17 -1 19 -1 18 -1 17 -1 16 -1 14 -1 11 -1 20 -1 10 -1 17 -1 16 -1 25 -1 19 -1 15 -1 13 -1 18 -1 15 -1 18 -1 18 -1 16 -1 14 -1 17 -1 15 -1 19 -1 14 -1 16 -1 15 -1 15 -1 13 -1 16 -1 22 -1 12 -1 6 -1 18 -1 16 -1 12 -1 17 -1 18 -1 15 -1 16 -1 11 -1 18 -1 16 -1 22 -1 16 -1 21 -1 18 -1 13 -1 20 -1 17 -1 19 -1 -2
16 -1 16 -1 5 -1 11 -1 15 -1 16 -1 14 -1 9 -1 20 -1 16 -1 17 -1 24 -1 11 -1 14 -1 15 -1 14 -1 20 -1 11 -1 12 -1 14 -1 17 -1 20 -1 19 -1 16 -1 15 -1 16 -1 16 -1 9 -1 19 -1 19 -1 22 -1 11 -1 9 -1 19 -1 15 -1 14 -1 15 -1 19 -1 12 -1 17 -1 13 -1 12 -1 11 -1 13 -1 13 -1 16 -1 12 -1 11 -1 15 -1 16 -1 10 -1 22 -1 14 -1 17 -1 12 -1 -2
12 -1 11 -1 16 -1 15 -1 10 -1 17 -1 13 -1 14 -1 12 -1 13 -1 26 -1 13 -1 13 -1 15 -1 16 -1 23 -1 16 -1 18 -1 16 -1 22 -1 9 -1 14 -1 15 -1 12 -1 15 -1 14 -1 26 -1 15 -1 13 -1 9 -1 15 -1 11 -1 16 -1 13 -1 18 -1 13 -1 14 -1 9 -1 9 -1 12 -1 13 -1 14 -1 13 -1 10 -1 10 -1 9 -1 19 -1 16 -1 12 -1 14 -1 11 -1 5 -1 17 -1 15 -1 -2
13 -1 14 -1 12 -1 17 -1 10 -1 12 -1 12 -1 14 -1 12 -1 22 -1 18 -1 26 -1 16 -1 9 -1 15 -1 14 -1 13 -1 9 -1 14 -1 19 -1 15 -1 20 -1 9 -1 17 -1 18 -1 11 -1 16 -1 20 -1 17 -1 14 -1 13 -1 19 -1 16 -1 10 -1 13 -1 19 -1 20 -1 17 -1 14 -1 22 -1 16 -1 18 -1 23 -1 16 -1 15 -1 16 -1 19 -1 11 -1 15 -1 19 -1 17 -1 14 -1 12 -1 20 -1 -2
20 -1 17 -1 19 -1 16 -1 16 -1 10 -1 15 -1 13 -1 18 -1 7 -1 13 -1 18 -1 13 -1 10 -1 12 -1 8 -1 14 -1 10 -1 12 -1 20 -1 8 -1 11 -1 18 -1 15 -1 20 -1 8 -1 24 -1 16 -1 17 -1 8 -1 16 -1 18 -1 14 -1 16 -1 17 -1 10 -1 12 -1 19 -1 17 -1 12 -1 14 -1 20 -1 11 -1 10 -1 15 -1 10 -1 14 -1 15 -1 9 -1 12 -1 12 -1 18 -1 21 -1 11 -1 -2

i chose prefixspan, with minsup 0.99, and i got these first five values on output file (it has originally 4714 values):

10 -1 #SUP: 5
10 -1 17 -1 #SUP: 5
10 -1 17 -1 12 -1 #SUP: 5
10 -1 12 -1 #SUP: 5
10 -1 14 -1 #SUP: 5

however, i found out that 10 -1 appears 13 times on my input file, 10 -1 17 -1 appears 2 times, 10 -1 17 -1 12 -1 do not appear, 10 -1 12 -1 appears 4 times, and finally 10 -1 14 -1 appear one time. i got these values using spam and spade too (with same minsup), and i think it's strange that i get these supports, that does not match with the values inside the file. can you explain to me why is this happening?

thanks in advance!

Options: ReplyQuote
Re: Support in Sequential Pattern Mining algorithms
Posted by: Philippe
Date: September 07, 2017 07:05PM

Hello,

Glad the software is useful.

There values are normal. The reason is that gaps are allowed in sequential pattern mining. In other words if you have the pattern:

10 -1 17 -1 #SUP: 5

it means that 10 is followed by 17 in five sequences. However, it does not means that that 17 must appear right after 10. It just mean that it must appear after 10 in that sequence. So for example, that patterns is said to appear in that sequence:

16 -1 16 -1 5 -1 11 -1 15 -1 16 -1 14 -1 9 -1 20 -1 16 -1 17 -1 24 -1 11 -1 14 -1 15 -1 14 -1 20 -1 11 -1 12 -1 14 -1 17 -1 20 -1 19 -1 16 -1 15 -1 16 -1 16 -1 9 -1 19 -1 19 -1 22 -1 11 -1 9 -1 19 -1 15 -1 14 -1 15 -1 19 -1 12 -1 17 -1 13 -1 12 -1 11 -1 13 -1 13 -1 16 -1 12 -1 11 -1 15 -1 16 -1 10 -1 22 -1 14 -1 17 -1 12 -1 -2


Now, having said that, if you only want patterns without gaps or if you want to specify a maximum gap, some algorithms in SPMF provides the maximum gap constraint. For example, you can use CM-SPAM. It let you specify the maximum gap, as well as minimum and maximum pattern length, which can be very useful to reduce the number of patterns found.

Best regards,

Options: ReplyQuote
Re: Support in Sequential Pattern Mining algorithms
Posted by: Mariana T
Date: September 08, 2017 12:16PM

These algoritms that provides maximum gap constraint are exactly what i'm looking for.

Thanks a lot for your explanation and patience, Philippe.

Best regards,

Options: ReplyQuote


Your Name: 
Your Email: 
Subject: 
Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
 **      **  ********  **         **        **     ** 
 **  **  **  **        **    **   **         **   **  
 **  **  **  **        **    **   **          ** **   
 **  **  **  ******    **    **   **           ***    
 **  **  **  **        *********  **          ** **   
 **  **  **  **              **   **         **   **  
  ***  ***   ********        **   ********  **     ** 
This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.