next up previous contents
Next: ID3 Up: Methods for Data Mining Previous: Methods for Data Mining

Bayesian Approach

Based on formal probability theory, the Bayesian approach[RN95, LS93] to uncertainty has shown useful in several fields of AI, including classification. By assuming a random distribution of events (or independence between attributes), it is possible to calculate more complex probabilities from previously known results. The following theorem, called Bayes' theorem, is used:

where is the probability that is true given the evidence E.

Consider now the following example: We want to find the probability that an animal is a dove, given the information that it flies. This is denoted P(Dove|Fly). To calculate this, information from a database and calculated probabilities are used. In the database, the probability of a flying animal, P(Fly), is 0.1, the probability of a dove, P(Dove), is 0.01, and the probability of flying doves, P(Fly|Dove), is 0.95. The change of an observed bird being a dove is from this data .

This approach uses a mathematical well-founded and statistically correct handling of uncertainty, which is a great advantage. Unfortunately, its usefulness depends on the assumption that the attributes are independent. In real-life applications or the example above, they seldom are, but instead links of dependencies between the attributes exist. In our example the probability of something being a dove depends on the probability of something being a bird and vice versa. The results from the Bayesian approach may thus not always be satisfactory. Still, it is a much used approach, and simple and accurate when applicable. The Bayesian approach may be used in a number of different ways, some of which could handle changes in the database without recalculating everything, and others with low time and memory usage.



Helge Grenager Solheim
Sat May 4 03:30:02 MET DST 1996