Here we give three ways of improving the classification program. The
first is concerned with increasing the efficiency of the
implementation. The second improvement will lead to less time spent
with RGEN, and while the final one gives a brief explanation of how
to introduce new classification methods in the system.
Optimizing the Code:
In todays version of RCLASS, a rule file with original attribute
names and values is used. These names and values have to be mapped to
integral values
before classification can be performed. This mapping takes some time, and
could be avoided by using pre-mapped values in the rule file. To do
this, the way
rules are loaded must be changed in the constructor of the class
RuleList. In addition, a minor change in the way rules
are written is required in RGEN.
Often, especially during experiments, it is desirable to perform several classifications using the same set of rules. In todays version, the rules have to be loaded each time. If instead rules read could be stored in memory between different classifications, a lot of time could be saved.
Another issue is that RCLASS reads the whole data table to be
classified into memory in one operation. This could pose memory
problems when working with large tables. By introducing a function for
reading one object at a time, this problem could easily be avoided.
Allowing a Threshold Value:
Instead of having to create rules once again with RGEN to change the
threshold value, this could easily be done by RCLASS. When reading the
rules, all rules with accuracy less than a given threshold could be
discarded. If the rules were generated without threshold in RGEN, they
would never have to be created again with a new threshold, as the user
could simply set the threshold value to use in RCLASS.
Adding New Classification Methods:
The current version of RCLASS can classify according to six different
methods, as described in
.
In addition to these, new methods can be added. We believe that a
method which uses the lattice information somehow, and do not only vote
by some measure which in a way equalize rules at all nodes, should be
used. This is reflected in the fact that sometimes the rules at only
for instance the top 2 layers of the lattice give a much better
classification than the whole set of rules.
In order to implement new classification methods, the files classifi.C, classifi.h and rclass.C must be altered.
In classifi.C, the new method can be added below the other methods in the classification block. The above methods give guidelines for how to write a new one. Also, a call to the method must be added in the switch-statement of the function DoClassification.
In classifi.h, the class header must be updated with the new method. Finally, in rclass.C, a short description of the method should be added in the printout of the usage information seen when starting RCLASS without parameters.
To integrate the method with the window interface, the Tcl-file RCLASS must be updated. How to go about should be straightforward when looking in the file.