The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining book
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
SPMF input file containing text
Posted by: Boyan
Date: December 16, 2016 01:26AM

I would like to run several of SPMF's sequential pattern mining algorithms on an input file containing text. I understand that each sentence forms a separate sequence of words. How can we define an itemset containing more than one word?

Thanks!

Options: ReplyQuote
Re: SPMF input file containing text
Date: December 16, 2016 06:28AM

Hello,

Ok. I will give you some explanation about the format.

Let's say that we have a text file containing a sentence "Hello how are you."

It would be represented as follows:

@CONVERTED_FROM_TEXT
@ITEM=1=Hello
@ITEM=2=how
@ITEM=3=are
@ITEM=4=you
1 -1 2 -1 3 -1 4 -1 -2

The line "1 -1 2 -1 3 -1 -2" is a sequence where the positive numbers are items (words) and the -1 separate the itemsets.

Let's say that you want to put more than one word in the same itemset. For example, we may want to put Hello and some of its synonyms in the same itemset.
We can do as follows:


@CONVERTED_FROM_TEXT
@ITEM=1=Hello
@ITEM=2=Hola
@ITEM=3=Bonjour
@ITEM=4=How
@ITEM=5=are
@ITEM=6=you
1 2 3 -1 4 -1 5 -1 6 -1 -2

This means that the words Hello, Hola and Bonjour appears simultaneously and are followed by How, which is followed by Are, which is followed by You.

This is for a single sequence, but the same format could be used for more than one sequence. For example, if we have another sequence "Hello you", the file would be:

@CONVERTED_FROM_TEXT
@ITEM=1=Hello
@ITEM=2=Hola
@ITEM=3=Bonjour
@ITEM=4=How
@ITEM=5=are
@ITEM=6=you
1 2 3 -1 4 -1 5 -1 6 -1 -2
1 -1 6 -1 -2

Hope that this is clear!

Best regards

Philippe



Edited 1 time(s). Last edit at 12/16/2016 06:29AM by webmasterphilfv.

Options: ReplyQuote
Re: SPMF input file containing text
Posted by: vinita
Date: May 01, 2017 08:22PM

Where to get the input file for running the apriori algorithm

Options: ReplyQuote
Re: SPMF input file containing text
Date: May 10, 2017 07:14PM

If you look on the website of SPMF, all the instructions are provided for running the Apriori algorithm.

Options: ReplyQuote


Your Name: 
Your Email: 
Subject: 
Spam prevention:
Please, enter the code that you see below in the input field. This is for blocking bots that try to post this form automatically.
  ******   **    **   ******   **    **  **    ** 
 **    **   **  **   **    **  ***   **  **   **  
 **          ****    **        ****  **  **  **   
 **           **     **        ** ** **  *****    
 **           **     **        **  ****  **  **   
 **    **     **     **    **  **   ***  **   **  
  ******      **      ******   **    **  **    ** 
This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.