The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
Delta Value in TNS
Posted by: Selwyn
Date: June 25, 2019 05:24PM

In the Documentation for the TNS algorithm "Mining the Top-K Non-Redundant Sequential Rules", there is a parameter named Delta which is an integer greater than 0 than increases the precision of the result. I have no idea what to put for this value or how it scales toward precision. Can anyone offer more context?

Options: ReplyQuote
Re: Delta Value in TNS
Date: June 25, 2019 05:48PM

Hi, Thanks for using SPMF.
The TNS algorithm does not guarantee finding an exact result (it may miss some patterns). The delta parameter increase the number of patterns that TNS keeps in memory during the search by delta, and thus increase the probability that the result is exact.

If delta is set to a larger value, then there is more chance that the algorithm does not miss patterns. But the execution time and memory usage may be longer.

If delta is set to a small value, then there is more chance that the algorithm may miss patterns.

In experiments of the TNS paper, I used different values such as delta = 300, delta = 1000 and delta = 6000. For some dataset, delta = 300 was enough to obtain an exact result, while for other dataset, it was not enough. How to set the parameter really depends on the dataset.

Intuitively, the meaning of delta is a number of additional rules to keep in memory. If k = 100 and delta = 50, it means that the algorithm will keep the best 150 rules when searching. Thus how to set delta depends on the size of the search space, and thus on the dataset.

Ideally, we could set delta to very large values to make sure that we most likely miss no patterns. But it increases the runtime and memory usage. Thus, a reasonable idea is to set it to somewhere in the 100 to 1000 range, depending on your data. If the algorithm is fast, you may increase delta. If the algorithm is too slow, you may decrease delta.

Hope this helps.

Options: ReplyQuote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.