The benefit is that the set of frequent maximal itemsets is usually much smaller than the set of all frequent itemsets.
For example, if there are 10 000 frequent itemsets in a database for some minsup value, there might be only 500 maximal itemsets. So you would save space to just discover the maximal itemsets.
However, there is a problem with using maximal itemsets. It is that you loose some information. It can be proven that if you have the maximal itemsets
, you can regenerate all the frequent itemsets from them but the support information is lost.
An alternative to maximal itemsets that is lossless (without lost of information) is to discover closed itemsets
. The set of closed itemsets is larger than the set of maximal itemsets (all maximal itemsets are closed itemsets but the reverse is not always true). If you mine closed itemsets, you can regenerate the set of all frequent itemsets with their support.
Now about your second question, algorithms for mining maximal itemsets or closed itemsets usually do not work exactly like frequent itemsets mining algorithm. For example, you could see MaxMiner
, ( http://www.cs.tau.ac.il/~fiat/dmsem03/Efficiently%20mining%20long%20patterns%20from%20databases%20-%201998.pdf ) and other more recent algorithms for mining maximal itemset mining. I have not implemented them in SPMF
For closed itemset mining, there are several algorithms. I have implemented DCI_Closed, Charm and AprioriClose
. Some of these algorithms can be much faster than traditional frequent itemset mining algorithm because they use some strategies to directly find the closed itemsets without generating all the frequent itemsets.
Here is a picture showing the relationship between freqeutn itemset, closed itemsets and maximal itemsets:
(from http://www.hypertextbookshop.com/dataminingbook/public_version/contents/chapters/chapter002/section004/blue/page002.html )
Hope this helps,
Edited 5 time(s). Last edit at 04/01/2013 06:57AM by webmasterphilfv.