The Data Mining Forum
This forum is about data mining
, data science
and big data
: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger
. No registration is required to use this forum!
Precise definition of support in CMDeo
Date: April 03, 2018 07:08PM
I'm getting results I just can't understand from CMDeo.
I have singleton item lists in my sequences. For example, a typical sequence looks like:
3 -1 30 -1 39 -1 41 -1 41 -1 -2
I'm getting rules with support values far higher than I think possible. For example, one that was generated is:
A ==> B
#SUP: 340 #CONF: 0.517503805175038
In my sequence database, A occurs in 1257 sequences and B occurs in only 5 sequences. If support is counting the number of sequences in which A occurs before B, how can this exceed 5? I assume I'm misunderstanding what's being counted here.
Re: Precise definition of support in CMDeo
Date: April 07, 2018 09:50PM
If the support is 340, it means that the pattern A followed by B appears in 340 sequences.
Then, it would be impossible that B only appear 5 times...
So it seems that something is wrong here.
1) either there would be a bug in the algorithm. In that case, you can try to use RuleGrowth instead of CMDeo to see if you obtain the same result. If the result is the same, it means that it is not a bug. Or if the results is different, then it is a bug in CMDeo and please let me know so that I can fix it.
2) Or, it is possible that there is a problem with the input file format of the database.
In any case, you can send me your file to philfv8 AT yahoo.com and I can try it to check the results and see what is the problem to fix it if necessary. If you send me the file, please also tell me the parameters that you used for running the algorithm, and the rule that you think is incorrect.