next up previous contents
Next: References Up: Conclusions and Future Work Previous: Improvements to RGEN

Improvements to RCLASS

Here we give three ways of improving the classification program. The first is concerned with increasing the efficiency of the implementation. The second improvement will lead to less time spent with RGEN, and while the final one gives a brief explanation of how to introduce new classification methods in the system.
Optimizing the Code: In todays version of RCLASS, a rule file with original attribute names and values is used. These names and values have to be mapped to integral values before classification can be performed. This mapping takes some time, and could be avoided by using pre-mapped values in the rule file. To do this, the way rules are loaded must be changed in the constructor of the class RuleList. In addition, a minor change in the way rules are written is required in RGEN.

Often, especially during experiments, it is desirable to perform several classifications using the same set of rules. In todays version, the rules have to be loaded each time. If instead rules read could be stored in memory between different classifications, a lot of time could be saved.

Another issue is that RCLASS reads the whole data table to be classified into memory in one operation. This could pose memory problems when working with large tables. By introducing a function for reading one object at a time, this problem could easily be avoided.
Allowing a Threshold Value: Instead of having to create rules once again with RGEN to change the threshold value, this could easily be done by RCLASS. When reading the rules, all rules with accuracy less than a given threshold could be discarded. If the rules were generated without threshold in RGEN, they would never have to be created again with a new threshold, as the user could simply set the threshold value to use in RCLASS.
Adding New Classification Methods: The current version of RCLASS can classify according to six different methods, as described in gif. In addition to these, new methods can be added. We believe that a method which uses the lattice information somehow, and do not only vote by some measure which in a way equalize rules at all nodes, should be used. This is reflected in the fact that sometimes the rules at only for instance the top 2 layers of the lattice give a much better classification than the whole set of rules.

In order to implement new classification methods, the files classifi.C, classifi.h and rclass.C must be altered.

In classifi.C, the new method can be added below the other methods in the classification block. The above methods give guidelines for how to write a new one. Also, a call to the method must be added in the switch-statement of the function DoClassification.

In classifi.h, the class header must be updated with the new method. Finally, in rclass.C, a short description of the method should be added in the printout of the usage information seen when starting RCLASS without parameters.

To integrate the method with the window interface, the Tcl-file RCLASS must be updated. How to go about should be straightforward when looking in the file.


next up previous contents
Next: References Up: Conclusions and Future Work Previous: Improvements to RGEN

Helge Grenager Solheim
Sat May 4 03:30:02 MET DST 1996