The Data Mining Forum                             open-source data mining software open-source data mining software data science journal data mining conferences
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
How to use custom separator for end of itemset and sequence in PrefixSpan?
Posted by: Huang
Date: May 25, 2020 11:22PM

How to use custom separator for end of itemset and sequence instead of -1 and -2 in PrefixSpan?

Options: ReplyQuote
Re: How to use custom separator for end of itemset and sequence in PrefixSpan?
Date: May 26, 2020 01:57AM

Hi,

1) If you have an input file where other separators are used instead of -1 and -2 and you want to use SPMF, then you could just open the file in a text editor and replace the separators by -1 and -2 using the "find and replace" function.

Another way is to write a small program using any programming language to convert your file.

2) Now, if you want to see a different type of separator in the output file, then you could check the documentation of prefixspan:
http://www.philippe-fournier-viger.com/spmf/PrefixSpan.php

There is an example input file like this:

@CONVERTED_FROM_TEXT
@ITEM=1=apple
@ITEM=2=orange
@ITEM=3=tomato
@ITEM=4=milk
@ITEM=5=bread
@ITEM=6=noodle
@ITEM=7=rice
@ITEM=-1=|
1 -1 1 2 3 -1 1 3 -1 4 -1 3 6 -1 -2
1 4 -1 3 -1 2 3 -1 1 5 -1 -2
5 6 -1 1 2 -1 4 6 -1 3 -1 2 -1 -2
5 -1 7 -1 1 6 -1 3 -1 2 -1 3 -1 -2

and then the result file would look like this:

apple | #SUP: 4
orange | #SUP: 4
tomato | #SUP: 4
apple | orange | #SUP: 4

As you can see in the above results, the -1 separator has been replace by | in the output file.

But this feature works in the GUI of SPMF.

3) If you want to modify the code of PrefixSpan andyou are a bit familiar with java, it is not very hard and you could find the -1 and -2 in the code of PrefixSpan and replace them by something else.

----
So that is the main idea. It could a good idea maybe to add a feature in SPMF to let the user choose a separator. This has not been implemented. But it could be a good idea for the future.

Best regards,

Philippe

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.