In this chapter we describe the concept of data mining and give the basics needed for this report. We try to give an intuitive explanation of some common phenomena with references to more relevant and formal literature. The emphasis of this report is on vague and uncertain data, and therefore the most used data mining method for dealing with this is discussed.
The first section of this chapter define data mining. In the next section some successful systems in much use are described, before we discuss different ways a computer system is able to extract knowledge from a database.
We then look at usual ways of representing knowledge, and what separates these. Afterwards, definite rules and default reasoning are discussed. The meaning of a rule is discussed together with how default reasoning allows for decision making based on rules even when not all information is given. Using default reasoning, it will also be possible to extract very general information about a database.
Even though the principle of data mining may seem quite simple, there
exist many pitfalls which should be avoided. The most common problems
in data mining are discussed in Section
. In
the section following that, we take a look at how a data mining system
may be evaluated.
We will also take a look at some of the most common methods in data mining today, including the Bayesian approach, the ID3 method and the Rough Sets methodology. The Bayesian approach is based on statistics and formal probability theory. ID3 is one particular data mining method in much use, and Rough Sets is a mathematical concept suitable for data mining.