Re: Association Rule Mining
Date: September 30, 2017 08:01AM
Glad you like the tool.
In terms of file format, the format is explained in the documentation for each algorithm. Moreover, the ARFF format, which is used by some other data mining software is also supported for some algorithms.
However, in general, there is a strict file format that must be used. So if you want to use some algorithm, you likely need to transform your data to the proper format. This is necessary because I simply cannot support all possible file formats. So the decision for this software has been to focus on the algorithms rather than offering a lot of preprocessing tools.
But there are some preprocessing tools in SPMF for some specific functions. For example:
A tool for generating a synthetic transaction database
A tool for generating a synthetic sequence database
A tool for generating a synthetic sequence database with timestamps
A tool for calculating statistics about a transaction database
A tool for calculating statistics about a sequence database
A tool for converting a sequence database to a transaction database
A tool for converting a transaction database to a sequence database
A tool for converting a text file to a sequence database (each sentences becomes a sequence)
A tool for converting a sequence database in various formats (CSV, KOSARAK, BMS, IBM...) to a sequence database in SPMF format
A tool for converting a transaction database in various formats (CSV...) to a transaction database in SPMF format
A tool for converting time-series to a sequence database
A tool to generate utility values for a transaction database
A tool to add timestamps to a sequence database
A tool for removing utility information from a database having utility information
A tool to resize a database in SPMF format (a text file) using a percentage of lines of data from an original database.
A tool for visualizing time-series
Besides, if you work with time series, there are also some additional preprocessing options:
an algorithm for calculating the moving average of a time series (to remove noise) new
an algorithm for calculating the piecewise aggregate approximation of a time series (to reduce the number of data points of a time series) new
an algorithm for calculating the linear regression line of a time series (using the least squares method) new
an algorithm for splitting a time series into segments of a given length new
an algorithm for splitting a time series into a given number of segmentsnew
But if you need something else, you can always ask. If it is something that can be useful for more than one person, maybe I can implement it. Or if you want to provide some code for a new tool or features, it is also possible.
Edited 1 time(s). Last edit at 09/30/2017 08:03AM by webmasterphilfv.