The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
association rules in the FHM algo
Posted by: Mohammad S Alodadi
Date: December 30, 2017 11:26PM

Hi All,
is there any implementation in the SPMF for generating rules from the FHM algorithm or similar algorithm for high utility datasets?
I would like to be able to generate rules from the frequent itemsets created by the FHM algo.
If there is nothing and I wanted to modify the code to be able to produce the rules with the confidence and lift measures, is there a code in SPMF that can help me to understand how to approach this problem.


Options: ReplyQuote
Re: association rules in the FHM algo
Date: January 01, 2018 03:44AM


Thanks for using SPMF.

The traditional algorithm for generating association rules from itemset is implemented in SPMF. It is called AlgoAgrawalFaster94 in the source code, and is based on the paper of Agrawal for the Apriori algorithm. However, this code is designed to be applied with frequent itemsets instead of high utility itemsets.

One could think that it is easy to apply the same algorithm to high utility itemsets to generate high utility association rules. Actually, it could be. But there is still a challenge. I will explain.

In frequent itemset mining, if an itemset {a,b,c} is frequent, then all its subsets are also frequent. This is a very useful property for generating the rules. Actually, the AlgoAgrawal will takes pairs of itemsets such as X ={a,b,c} and Y = {b,c} to generate a rule {a} --> {b,c}, if i remember well. But the key point is that if {a,b,c} is frequent then we know that also {a} and {b,c} are frequent, so it is easy to calculate the confidence and lift of the rule {a} --> {b,c}.

In high utility itemset, on the other hand, if an itemset {a,b,c} is a high utility itemset, its subsets may or may not be high utility itemset. This cause some problems because if you try to combine two high utility itemsets X ={a,b,c} and Y = {b,c} to generate a rule {a} --> {b,c}, then maybe that the itemsets {a} and {b,c} are not high utility itemsets. So in that case, you don't have the information required to calculate the lift and confidence.

So I think that the key problem is that. How to solve this problem?

- A solution could be to utilize FHMFreq which is a version of FHM that offers both the utility and support constraint. You could use the support high, and set minutil = 0. This will generate many patterns that may have a low utility but at least you will get the support values required for generating the association rules. Then you could try to apply the AlgoAgrawal94 from there (but it would requires some coding).

- Or if the algorithm don't have the information about an itemset {a} but it is needed to generate a rule, you could scan the database again to get the information. This may be slow. But it could be a solution. If you do like that you would change the "CaculateSupport()" method in AlgoAgrwal

- Or...

By the way, the implementation of FHM in SPMF always save the result to a file. If you want to use these results to generate rules, you could also want to modify FHM so that it keeps the results about high utility itemsets in memory to then generate the rules, instead of saving the results to a file.

Best regards,

Options: ReplyQuote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.