The Data Mining Forum
This forum is about data mining
, data science
and big data
: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger
. No registration is required to use this forum!
Threshold raising strategies - LIU-LB
Date: February 01, 2020 02:19PM
Hello, Prof. Philippe Fournier-Viger.
Have you read paper "Mining top-k high utility itemsets with effective threshold raising strategies" from Krishnamoorthy (2019)?
In section 4.2.2 , Why is the maximum number of breakpoints considered 3 (q=3 , contiguous)?
If we have more breakpoints in one itemset, what should we do?
Re: Threshold raising strategies - LIU-LB
Date: February 01, 2020 09:34PM
It is a heuristic and you can certainly use higher values of q. There is also downside to using higher values of q as mentioned in section 4.2.2 (below definition 18) of the paper. Please also refer to the example given in Figure 3, where the utilities of subsets are estimated using fdaec and daec. One can observe that utility estimate of fac is 15 (actual 57), estimate of fc is -30 (actual 22). As you remove more items, the estimate is likely to fall dramatically. Essentially, this might lead to wasted computation. One can also set a dynamically threshold (not considered in the paper) and stop generation of subsets when the estimate becomes zero or negative.
Hope this clarifies.