The Data Mining Forum
This forum is about data mining
, data science
and big data
: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger
. No registration is required to use this forum!
Mininig unfrequent association rules
Date: November 19, 2018 09:33AM
So, my motivation behind using SMPF was to mine unfrequent association rules.
Other platforms, JAVA libraries, and SMPF itself, have already allowed me to mine association rules (with as low confidence as 12%).
After some data analysis, I found that some interesting association rules happen at a confidence of 1%.
None of AprioriInverse, AprioriRare, etc., have allowed me to go very low confidence.. First, is that their use? Or are there other SPMF algorithms intended to mine unfrequent association rules?
What would you suggest? Should I use SPMF library functions in my JAVA code and trick (override) them somehow?
Re: Mininig unfrequent association rules
Date: November 19, 2018 03:27PM
Do you really mean low confidence or low support?
The problem for finding the low support rules is that there is perhaps so many rules if you decrease the support threshold. But if you want to make it faster and reduce the number of rules, you can also apply some constraints. For example, if you apply FPGrowth_Association_rules in SPMF, you can set a maximum antecedent length and a maximum consequent length. This will set a maximum on the size of your rules and will decrease the number of rules that may be found, and then you may be able to decrease the minimum support threshold further.
Another possibility is to modify the code to add other constraints. For example, if you are only interested in some specific items in your rules, you could modify the code to only find rules with specific items. This would also reduce the number of possibilities and let you decrease the minimum support or confidence threshold.
But still, a rule with 1% confience is it good? If you have a rule X --> Y with 1 % confidence, it means that if X appears, 1% of the times Y will also be there. This is not a strong rule! So do you really need to find low-confidence rules?
You could also try the MEIT (Memory Efficient Itemset Tree). It will let you do some targeted queries to find association rules. For example, you can ask to find all rules containing the items 1 and 15. Maybe this is what you need!
Hope this helps.