This chapter deals with fundamental issues of the Rough Sets theory, which has become a popular theory in the field of data mining. The theory was introduced by Zdzislaw Pawlak in the early 1980's, and based on this theory one can propose a formal framework for the automated transformation of data into knowledge. Pawlak has shown that the principles for learning by examples can be formulated in the basis of his theory. An important result from the theory is that it simplifies the search for dominating attributes leading to specific properties, or just rules pending in the data.
The Rough Set theory is mathematically relatively simple. Despite of this, it has shown its fruitfulness in a variety of data mining areas. Among these are information retrieval, decision support, machine learning, and knowledge based systems. A wide range of applications utilize the ideas of the theory. Medical data analysis, aircraft pilot performance evaluation, image processing, and voice recognition are a few examples.
Almost inevitably the database at used for data mining will contain imperfection, such as noise, unknown values or errors due to inaccurate measuring equipment. The Rough Set theory comes handy for dealing with these types of problems, as it is a tool for handling vagueness and uncertainty inherent to decision situations. An advantage of the Rough Sets\ methodology over the Bayesian approach is that no assumptions about the independence of the attributes are necessary nor is any background knowledge about the data.
In this chapter, a set of definitions from the world of Rough Sets is given. An example is shown in parallel with most definitions.