 The Data Mining Forum    This forum is about data mining, data science and big data: algorithms, source code, datasets, implementations, optimizations, etc. You are welcome to post call for papers, data mining job ads, link to source code of data mining algorithms or anything else related to data mining. The forum is hosted by P. Fournier-Viger. No registration is required to use this forum!.
How to generate probabilities using data mining?
Posted by: some_math_guy
Date: July 13, 2012 07:42AM Re: How to generate probabilities using data mining?
Date: July 13, 2012 07:00PM

Hi,

Welcome to the forum!

Decision trees produce probabilities. Each leaf of a decision tree correspond to a set of training instances that have been classified by the decision tree. If the decision tree can exactly separate the data, all the leaf will always contain instances belonging to the same class (e.g. "buy" or "not buy" . This is equivalent to a probability of 0 or 1. But there are also some cases where a decision tree cannot perfectly separate the data given the attributes that you have. If this happens, the probability will be different from 0 and 1. For example, a leaf may contain 55 % of buy and 45 % of not buy. This is actually a probability and you can consider it as a probability.

Second, you could consider using the "Naive bayes classifier". These classifier are built on the Bayesian theorem from the field of statistics. Therefore the result is a probability. But you have to be careful about some underlying hypothesis for this classifier about independency between variables. You can check wikipedia for some information about this classifier: http://en.wikipedia.org/wiki/Naive_Bayes_classifier

Those are the two techniques that I'm thinking about now. There might be some other techniques too..

Best,

Philippe

Edited 1 time(s). Last edit at 07/13/2012 07:01PM by webmasterphilfv.