The interest of using numbers is for efficiency. Most of the algorithms in SPMF use integer to represent items internally, because it is faster to compare integers than to compare strings, and integers require less memory than strings. For example, if you want to compare two integers 12 =? 13 it requires only one CPU instruction, while if you want to compare two strings such as "banana" and "banana juice" you need to compare many characters. Moreover, 12 requires maybe 32 or 64 bits on your computer memory, while "banana" maybe requires 7 x 32 or 64 bits, depending on the representation. So this is the reason for using integer to represent items internally.
Now, in the input files, you can use integers, or as explained in the documentation you can also define names for items, for most algorithms. For example, if you look at the documentation of Apriori
, you can see that you can use this format:
1 3 4
2 3 5
1 2 3 5
1 2 3 5
This format defines that the item 1 is equal to "apple". Also you can use the ARFF format too with SPMF. These formats will work with the user interface or command line of SPMF. If you want to use them with the source code version of SPMF, it would be possible but maybe I would need to explain to you how to do it.
Thanks for using SPMF. Best regards.