The Data Mining Forum
This forum is about data mining
, data science
and big data
: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger
. No registration is required to use this forum!
Re: How to use custom separator for end of itemset and sequence in PrefixSpan?
Date: May 26, 2020 01:57AM
1) If you have an input file where other separators are used instead of -1 and -2 and you want to use SPMF, then you could just open the file in a text editor and replace the separators by -1 and -2 using the "find and replace" function.
Another way is to write a small program using any programming language to convert your file.
2) Now, if you want to see a different type of separator in the output file, then you could check the documentation of prefixspan:
There is an example input file like this:
1 -1 1 2 3 -1 1 3 -1 4 -1 3 6 -1 -2
1 4 -1 3 -1 2 3 -1 1 5 -1 -2
5 6 -1 1 2 -1 4 6 -1 3 -1 2 -1 -2
5 -1 7 -1 1 6 -1 3 -1 2 -1 3 -1 -2
and then the result file would look like this:
apple | #SUP: 4
orange | #SUP: 4
tomato | #SUP: 4
apple | orange | #SUP: 4
As you can see in the above results, the -1 separator has been replace by | in the output file.
But this feature works in the GUI of SPMF.
3) If you want to modify the code of PrefixSpan andyou are a bit familiar with java, it is not very hard and you could find the -1 and -2 in the code of PrefixSpan and replace them by something else.
So that is the main idea. It could a good idea maybe to add a feature in SPMF to let the user choose a separator. This has not been implemented. But it could be a good idea for the future.