The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining book
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
Help getting started with looking for patters.
Posted by: Greg
Date: September 20, 2017 05:38PM

I have a list of data I am wanting to look for patterns in. The data consists of 20 numbers that are seperated by spaces. I does not matter what order they occur just that each line be treated as a new selection. Posted below is a small clip of how the data is presented. I am trying to distinguish what groups of numbers are frequently found together and would like to be able to do this for groups of 4,5,6 ,7,8,9,10 numbers. I do not mind learning how to do this myself, but at the moment I am pretty overwhelmed on even how to get started. Any help is greatly appreciated. I have learned that having a space between the numbers is preferrable to having a comma so that is how I have it currently formatted. If I need to change it to commas, semicolons etc it is easy for me to do.

40 66 11 4 26 16 23 17 54 73 24 15 30 19 58 13 12 65 47 76
14 29 45 66 41 6 22 34 17 55 4 11 16 24 52 5 73 58 62 15
38 3 50 37 35 9 57 68 46 40 10 44 39 76 48 49 42 11 12 61
50 49 60 46 28 26 66 34 52 72 53 22 27 47 39 30 16 2 40 21
79 9 15 53 28 24 3 13 44 47 46 67 22 26 31 18 57 40 38 59
50 38 75 60 73 9 56 40 57 37 10 44 39 18 11 61 76 12 55 68
53 20 57 63 36 49 67 8 23 50 1 72 80 70 16 46 39 32 38 24
41 51 65 3 18 25 23 21 13 68 73 67 74 66 5 57 32 47 64 71
28 8 79 76 32 63 31 4 55 11 54 3 49 36 34 37 30 6 70 13
8 52 43 3 10 1 60 73 80 56 66 61 53 12 15 70 63 41 36 39
6 65 51 16 40 35 33 63 4 71 10 3 52 39 2 59 67 26 38 79
5 33 61 77 53 39 47 64 27 20 65 38 19 62 55 58 35 9 29 8
66 15 37 68 12 65 57 28 75 55 25 80 23 9 29 78 42 50 53 67
20 61 72 34 11 30 38 79 27 73 58 25 67 6 24 76 41 42 57 48
77 64 1 25 57 70 7 73 44 67 49 30 41 56 26 45 3 31 32 69
6 79 74 66 13 22 19 28 18 36 52 72 65 54 1 70 16 23 15 46
78 35 33 62 13 36 60 63 51 1 3 50 41 56 74 16 30 79 70 73
59 1 17 21 9 63 54 51 5 11 62 31 73 20 12 29 4 32 14 6
30 80 36 47 63 75 5 4 10 21 77 20 76 27 33 56 7 57 79 51
53 2 34 38 55 44 60 1 74 69 73 78 56 18 36 16 71 79 15 63
16 59 37 69 20 30 42 44 32 27 11 14 3 8 76 5 60 56 35 74
26 8 36 35 25 30 34 62 64 80 44 17 72 60 18 43 59 79 48 23
52 2 58 28 59 3 54 79 51 24 41 21 26 68 11 73 65 61 57 35
3 40 73 67 50 1 37 56 65 51 55 29 4 7 70 57 71 28 38 30
4 43 79 18 62 16 48 70 41 46 23 21 59 28 2 67 54 27 68 36

Options: ReplyQuote
Re: Help getting started with looking for patters.
Posted by: Greg
Date: September 20, 2017 05:40PM

I see I missed the "N" in Patterns but I do not know how to edit the subject to fix it.

Options: ReplyQuote
Re: Help getting started with looking for patters.
Date: September 21, 2017 07:17AM

Hi Greg,

I think that what you want to do is "frequent pattern mining". It is a classical problem in data mining, where you have many lines which contains items (numbers). Then we try to find these sets of items that appears in many lines.

To do that, you can try the FPGrowth algorithm offered in SPMF.

It will let you specify a minimum support threshold as parameter. This indicates the minimum frequency for the patterns that you want to find. For example, if you set that parameter to 0.4, the algorithm will find all the sets of items appearing in at least 40 % of the lines in your file.

Besides, the FPGrowth implementation in SPMF allows you to specify a maximum length. Thus if you set that to 5, for example, then you will only find sets containing 5 items or less. Then you could sort this by size if you need to.

So I think that you should check this out. I think this is exactly what you need.

Best regards,

Philippe

Options: ReplyQuote


Your Name: 
Your Email: 
Subject: 
Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
 **     **  **     **  ********   ********  **     ** 
  **   **   **     **  **     **     **      **   **  
   ** **    **     **  **     **     **       ** **   
    ***     **     **  ********      **        ***    
   ** **    **     **  **     **     **       ** **   
  **   **   **     **  **     **     **      **   **  
 **     **   *******   ********      **     **     ** 
This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.