The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
what is the advent to use in the input file numbers for items !!
Posted by: azerty1
Date: November 07, 2019 12:37AM

Hello,

I want to know what is the interest to use in SPMF the coding of the items in the file to enter !! because according to SPMF we have to replace all items with numbers !!

Options: ReplyQuote
Re: what is the advent to use in the input file numbers for items !!
Date: November 11, 2019 06:55PM

Hi,

The interest of using numbers is for efficiency. Most of the algorithms in SPMF use integer to represent items internally, because it is faster to compare integers than to compare strings, and integers require less memory than strings. For example, if you want to compare two integers 12 =? 13 it requires only one CPU instruction, while if you want to compare two strings such as "banana" and "banana juice" you need to compare many characters. Moreover, 12 requires maybe 32 or 64 bits on your computer memory, while "banana" maybe requires 7 x 32 or 64 bits, depending on the representation. So this is the reason for using integer to represent items internally.

Now, in the input files, you can use integers, or as explained in the documentation you can also define names for items, for most algorithms. For example, if you look at the documentation of Apriori, you can see that you can use this format:

@CONVERTED_FROM_TEXT
@ITEM=1=apple
@ITEM=2=orange
@ITEM=3=tomato
@ITEM=4=milk
@ITEM=5=bread
1 3 4
2 3 5
1 2 3 5
2 5
1 2 3 5

This format defines that the item 1 is equal to "apple". Also you can use the ARFF format too with SPMF. These formats will work with the user interface or command line of SPMF. If you want to use them with the source code version of SPMF, it would be possible but maybe I would need to explain to you how to do it.

Thanks for using SPMF. Best regards.

Philippe

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.