The Data Mining Forum                             open-source data mining software data science journal data mining conferences icgec 2017 conference
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
Questions on understanding PrefixSpan algorithm
Posted by: walkman
Date: October 07, 2013 12:21PM

Hello! Can someone please help me to understand the PrefixSpan algorithm? I run the PrefixSpan implementation in SPMF, but it didn't give me the output I would expect.

My test file was:
1 2 3 4 5
2 3 4
2 1 3 4 5
3 1 4 5

I can see there are patterns like 234, 345, 45, 23.

However, when I used the PrefixSpan implementation:
java -jar spmf.jar run PrefixSpan test.txt output.txt 20% 100

I got the following output:
=======PREFIXSPAN - STATISTICS=========
Totl time ~0ms
Frequent sequences count: 0
MaxMemory(mb): 0.00
=======================================

I couldn't figure out why I got an empty output. Can someone please help? Any advice will be greatly appreciated!

Options: ReplyQuote
Re: Questions on understanding PrefixSpan algorithm
Date: October 07, 2013 04:23PM

Hi,

I answered to your e-mail. But I will copy the answer to the forum for other people who may have the same question.

===

The problem is the format of the input file. The format is defined as follows.

Each line represents a sequence.
A sequence is a list of itemsets
An itemset containing one or more distinct items.
An item is an integer
each itemset is separated by -1
each sequence must be ended by -2
If an itemset contains more than 1 item, it is assumed to be sorted.

So your datasets should probably look like this:
1 -1 2 -1 3 -1 4 -1 5 -1 -2
2 -1 3 -1 4 -1 -2
2 -1 1 -1 3 -1 4 -1 5 -1 -2
3 -1 1 -1 4 -1 5 -1-2

This means that your datasets contains 4 sequences.The first sequences means that item 1 is followed by item 2, followed by 3, followed by 4, followed by 5.

Hope this helps,

Best,

Philippe

Options: ReplyQuote
Re: Questions on understanding PrefixSpan algorithm
Posted by: walkman
Date: October 08, 2013 04:58AM

Thank you very much! I was thinking about pasting your answer here and then saw you've already posted it. It works for me now :-)

Options: ReplyQuote
Re: Questions on understanding PrefixSpan algorithm
Posted by: Kruthika
Date: April 05, 2017 05:06AM

Hello,can anyone please send me the code of PrefixSpan algorithm with the screenshots of input and output so that I can understand it better

Options: ReplyQuote
Re: Questions on understanding PrefixSpan algorithm
Date: April 05, 2017 06:36AM

Hello,

You can get the Java code of PrefixSpan in the SPMF library:

http://www.philippe-fournier-viger.com/spmf/

If you go to the documentation page on the website, you will find an example of input and output for PrefixSpan.

Best,

Options: ReplyQuote


Your Name: 
Your Email: 
Subject: 
Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
 ********  **     **  **    **  **     **  **     ** 
 **        ***   ***  ***   **  **     **  ***   *** 
 **        **** ****  ****  **  **     **  **** **** 
 ******    ** *** **  ** ** **  **     **  ** *** ** 
 **        **     **  **  ****   **   **   **     ** 
 **        **     **  **   ***    ** **    **     ** 
 ********  **     **  **    **     ***     **     ** 
This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.