The Data Mining Forum                             open-source data mining software data science journal data mining conferences high utility mining workshop
This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.  
CPT+ Scoring
Posted by: lv1984
Date: March 15, 2018 03:20AM

I'm testing the CPT+ but I can't understand how to interpret the scoring.
Is it already normalized?
What's the min and the max values for the scoring?
Can I already interpret it as a probability or it must be normalized?

Options: ReplyQuote
Re: CPT+ Scoring
Posted by: Luis
Date: March 20, 2018 07:04AM

Hey lv1984,

no, the scores are not normalized by default. If you want to normalize them by yourself, you could do it in the CountTable::getBestSequence() method:

//Filling a sequence with the best |count| items
Sequence seq = new Sequence(-1);
sd.normalize();// Implement this method in the ScoreDistribution class
List<Integer> bestItems = sd.getBest(1.002);

However the scores do not represent real proportions, because of the multiplication of the individual subscores in the CountTable::push() method.
You would have to rewrite the score system if you are interested in real proportional probabilities.

Disclaimer: I am just a student who worked with this algorithm for half a year, so I can not guarantee correctness winking smiley

Best regards,
Luis

Options: ReplyQuote
Re: CPT+ Scoring
Date: March 24, 2018 07:03AM

Thanks for answering, Luis :-)

Yes, the scores are not normalized in CPT+. The score for a prediction is the sum of its score for all the sequences that are used to make that prediction. Thus, the sum can be greater than 1. Beides, it cannot be negative.

Yes, the scoring system could be replaced by something else. When designing CPT/CPT+, my student Ted actually tried different scoring systems, and the one provided in CPT+ is the one that we found to work the best on our datasets. But maybe that other scoring systems are better or have other advantages. We found that it was more simple to have some scores that are not normalized.

Best regards,

Philippe



Edited 1 time(s). Last edit at 03/24/2018 07:04AM by webmasterphilfv.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.