next up previous contents
Next: Methods for Data Mining Up: Evaluation Criteria for Data Previous: Efficiency of Learned Knowledge

Syntactic and Semantic Simplicity of Result

Having the learned knowledge in a readable and understandable form for the user is a big advantage. The user will then be able find errors at an early stage and provide with other, more clarifying test cases.

If the rules are in the form of propositional rules, we would like as few conjunctions and disjunctions as possible. For a decision tree, we would like the tree to be balanced, in order to be more readable. According to Ockham's razor, the simplest explanation to a situation is the preferable and most probable to catch the nature of the phenomenon under study. Thus, we would like our rules to be as simple as possible in order to cover for as many unseen objects as possible.

For a dataset of some size, reading through all the rules will be an impossible mission for a user, even if each rule is simple to understand. To make this possible, one could use a kind of data mining on these rules or different visualization techniques. With large data sets, and large rule sets, other properties than simplicity will probably be more important.



Helge Grenager Solheim
Sat May 4 03:30:02 MET DST 1996