next up previous contents
Next: Examples from Real Life Up: What is Data Mining Previous: What is Data Mining

Training and Test Sets

  The database which the data mining system tries to extract knowledge from, is called the training set. By examining the data in this database, the system tries to create general rules and descriptions of the patterns and relations in the database. The goal is to gain knowledge which is valid not only in the specific database considered, but also for other similar data.

The knowledge may be tested against a test set which is another database. It will then be clear if the patterns found in the training set are valid also for other data. If the knowledge gained from the training set was general knowledge, it will be correct for most parts of the test set as well.

It is common to divide a database in two parts to create a training and a test set. One of these parts, for instance 70% of the data, is used as a training set and examined by the data mining system. The rest of the original database is used as a test set to see if the knowledge acquired from the training set was of general nature or not.



Helge Grenager Solheim
Sat May 4 03:30:02 MET DST 1996