One goal of all data mining system is to be able to predict the correct decision for so far unseen objects. Doing this for all possible objects is clearly impossible. This may be justified by referring to the halting problem of Turing machines. Given a computer program as input, no system can decide for all possible programs, whether they stop or not.
Therefore, we want to be able to say how good a certain set of rules really are. This will be done by testing the rules against other data than those used to create them. If all objects of the domain are tested, we will get an exact ratio of correctly classified objects compared to the total number of objects. If this ratio is 1, all objects are classified perfectly.
Normally, it is impossible to test all objects in the domain. What is
usually done is to divide the original data set in two separate parts,
training set and testing set as described
in
. First, rules are generated from the
training set, and then those rules are tested on the testing set. The
percentage of objects classified correctly gives a measure of how good
the rules are, assuming the testing sets is representative with
respect to the domain.
Sometimes it is of interest to find out how wrong the wrong answers were. Then, one may apply some distance measure between the different classifications. For some applications it is also interesting to know if the system responded with ``no answer'' or similar, and for what values of the condition attributes this happened.