next up previous contents
Next: Preface

%

Rough Sets as a Framework for Data Mining
Øyvind Tuseth Aasheim
and
Helge Grenager Solheim
Project report
Knowledge Systems Group,
Faculty of Computer Systems and Telematics,
The Norwegian University of Science and Technology
Trondheim, May 4, 1996

Supervisors: Prof. Jan Komorowski and Torulf Mollestad



Abstract:

In this report we investigate data mining in uncertain data. The main emphasis is on using Rough Sets as a framework for this purpose. In particular we consider the possibility of having a system to automatically recognize different kinds of objects based on their characteristics. This is so called classification, which is an important part of data mining. An algorithm which creates default rules by traversing a lattice of attributes is used. In addition, a classification program which uses these rules has been made, and these programs are integrated using a graphical user interface for simple visualization.

The starting point for our work was a program for generating default rules made by Jon Petter Hjulstad. This program uses an algorithm first described by our supervisor Torulf Mollestad. The algorithm is based on removing attributes and creating simpler and simpler rules with fewer and fewer attributes. These rules will also be less and less certain as attributes are removed, but will result in default rules covering more different cases. Therefore, the default rules are very useful for classifying new objects. The subset of all attributes make up a lattice, and only some parts of this are visited. At each node in the lattice, a set of rules is created, based on the attributes available at that point, and the accuracy of each rule is given.

After some time, we realized that a lot of default rules alone would not be useful. Therefore, we created a classification system, which uses the default rules to classify new objects. Several different methods for choosing which rules to use are available. First, it is possible to specify different points in the lattice to use. Second, it is possible to select some attributes which must be taken into consideration in the classification process, and some which may be. Even after this limitation of usable rules, several of the default rules may be applicable. Due to that reason, different methods for selecting a classification based on the remaining rules are available. These are: 1) use first matching rule, 2) use rule with highest accuracy, 3) voting based on accuracy and 4) voting based of weight of evidence measure. The two methods which seem to be best are the two last ones.

Our system has many different ways of operation, and can be used in various ways with different parameters. To simplify the use, a graphical user interface has been created. This user interface allows the user to give the parameters in a simple way and has help functions which give necessary explanations. It also visualizes the aspect of selecting which parts of the lattice to use rules from, and allows for easy use of all available functionality.




next up previous contents
Next: Preface

Helge Grenager Solheim
Sat May 4 03:30:02 MET DST 1996
Title Tekst og saant...


Last modified: Sat May 4 11:32:09 MET DST 1996