Quest for specific SPM algorithm
Date: August 02, 2017 12:39AM
Your forum, blog and SPMF implementation have made quite an impression on me over the past few weeks. I can see your work is in high demand amongst data scientists, making it all the more impressive!
I'm a Computer Science master student and I'm working on my graduation project. I am currently in need of some advice; a nudge in the right direction.
What I have is a sequence database, where the itemsets have single items:
What I want to get from this is the pattern <a,b,c>, and (preferably) the information that it occurs 5 times in the first sequence and 2 times in the second sequence. So I am looking for the repeated subpattern, not merely the sequence with the minsupport. Algorithms such as CM-ClaSP and VMSP only return the <a,b,c,a,b,c> pattern. As such, I'm not looking for the maximal pattern and also not quite the closed pattern.
A few other requirements:
- I must be able to specify the max gap, which will be set to 1.
- I must be able to specify the minimum pattern length, which will probably be set to 2 or 3.
- I must be able to specify the minimum pattern repetitions, which will probably be set to somewhere between 5 and 10.
I'm somewhat skilled in Java, so I'm perfectly fine with modifying one of your implemented algorithms. I just don't know where to start. Do you have any suggestions?