The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
create a text file for SPMF
Posted by: dmeloni10
Date: March 06, 2019 06:48AM

Hi! I'm a CS student and i'm using SPMF for discovering sequential pattern. I have a CSV file that contains 3 columns of integer type: an User ID, a timestamp and a numeric value distinct for every purchased product.

UID TIMESTAMP SKU

1 1 20
1 1 5
1 2 3
1 2 7
2 1 16
2 4 1
3 1 3
3 1 4
3 1 8
3 2 7

.. ... ...

Using Knime, without code, grouping by UID and time stamp, i just concatenated as a string the values of products, separated by a blank space, then adding - at the end of every transaction and -2 at the end of every row,then i deleted the UID and i got the sequences.
Finally Isaved as a text file, like this

20 5 -1 3 7 -1 -2
16 -1 1 -1 -2
3 4 8 -1 7 -1 -2

The problem is that every row of this file is a string ("20 5 -1 3 7 -1 -2"winking smiley and SPMF read integer values.
What could be the right solution to create a sequence for each different customer,with SPMF format? I try to use "Generate a sequence database" algorithm but it is don't able to distinguish and separate transactions for each individual customer. ://

Thank you


...

Options: ReplyQuote
Re: create a text file for SPMF
Date: March 06, 2019 03:30PM

Hi,

Thanks for using SPMF! I think that the format that you have shown:

>
> 20 5 -1 3 7 -1 -2
> 16 -1 1 -1 -2
> 3 4 8 -1 7 -1 -2
>

looks correct. The only problem that I see is that the items in each itemset are not sorted. As you probably know, in sequential pattern mining, items whithin an itemset are considered to be unordered. But for implementation purposes, many sequential pattern mining algorithms assume that there is a total order between items inside each itemset, and that this order is always the same. This order can be anything such as the alphabetical order, but there need to be one. If there is no order than, some algorithm may generate incorrect results.

If you use the order of increasing numbers, your data would be like this:

> 5 20 -1 3 7 -1 -2
> 16 -1 1 -1 -2
> 3 4 8 -1 7 -1 -2

Other than that, I think that your data looks fine.


> What could be the right solution to create a
> sequence for each different customer,with SPMF
> format?

Yes, currently, most algorithms do not support telling that a sequences belong to a customer, such that a customer could have several sequences. How to solve that problem:
- You could consider algorithms such as SeqDIM in SPMF for doing multi-dimensional sequential pattern mining. Then you could give a customer ID to each sequence as well as other information such as the age, country etc.
- You could write some code around SPMF to remember which sequences belong to which customer in your input file. Then, you can run a sequential pattern mining algorithm with the parameter "showSequenceIDs" to true. Then it will tell you all the sequences IDs where each pattern appears, and you could map this back to the customers using the information that you saved before running SPMF!

Hope this helps!

Philippe

Options: ReplyQuote
Re: create a text file for SPMF
Posted by: dmeloni10
Date: March 07, 2019 07:59AM

Thanks for showing me these solutions!

After order the itemset , the second step is trasform the input file adding the cid as a dimension before sequence.
So my input file will become like this

1 -3 5 20 -1 3 7 -1 -2
2 -3 16 -1 1 -1 -2
3 -3 3 4 8 -1 7 -1 -2

I will continue to work on it starting from your help!

Options: ReplyQuote
Re: create a text file for SPMF
Date: March 07, 2019 04:26PM

Ok glad it helps.

Best regards,

Options: ReplyQuote
Re: create a text file for SPMF
Posted by: dmeloni10
Date: March 12, 2019 01:17AM

Hi! I have another question:
I'm doing 2 kind of analysis,with time constraint too.

My second input file start with a time stamp

<41> item list -1 <45> itemlist -1 -2,

the second row start with a time stamp

<10> itemlist -1 <15> itemlist -1 <27> itemlist -1 -2
.....................................................
<3> itemlist -1 <4> itemlist -1 -2
......

because when I created my transaction database i ordered by number of purchases per customer.
Should I order the entire dataset by date of sale instead?

Thank you!

Options: ReplyQuote
Re: create a text file for SPMF
Date: March 13, 2019 06:25PM

Hi,

Do you mean that itemsets in sequences are ordered by the number of items? If yes, then yes, if you want to analyze by timestamps, yes, you should order the itemsets by timestamps!

Best regards,

Philippe

Options: ReplyQuote
Re: create a text file for SPMF
Posted by: Daniela
Date: August 04, 2019 10:06AM

Hi! In my project i'm considering time of transaction, for this reason iI thought about using Hirate-Yamana Algorithm. I analyzed 8 years of purchases, then I associated at each transaction a timestamp, an integer number from 1 to 96, 1 correspond to January 2010 (...) until 96 that correspond to December 2018. When I use SPMF for extract frequent pattern, i lost the information about the real meaning of transaction's time , because for example in the result even if an item 5 bought in month 5,then followed by item 1 bought in month 10, (5 time unit after) I read in the output file (0,5),(5,1) , but I want to keep in my analysis month 5 e month 10.
is there an algorithm that keeps information on the month? I want discover something like" a specific product is purchased in a particular period (every year in the same month or 2 times every year), and after the same number of months another product is purchased"
Thanks for support.
Daniela

Options: ReplyQuote
Re: create a text file for SPMF
Date: August 08, 2019 01:17AM

Hi,

Sorry for the late reply. I have been travelling during the last few days.

Yes, indeed, the Hirate Yamana only indicate the relative time in the patterns that it ouputs.

I think there is no other algorithm in SPMF with time for sequential patterns...

A possible solution could be to do some preprocessing on your data to encode the time information in your item names.

For example, item 5 in month 1 could be 10005, item 5 in month 2 could be 20005, item 5 in month 3 could be 30005 and so on.... I just give this as example. But You could use other way to encode time in the item names. Then, you could use any sequential pattern mining algorithms and items would be associated to their time.

Best regards,

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.