From Data to Knowledge

Collection of raw data has in many areas of industry and research become easier than previously. Molecular biology produces long sequences of biological information; environmental satellites provide a wealth of data; process monitoring gives heaps of measurements; and the Internet gives easy access to a wide variety of data sources. Similar advances in the methods that provide useful information or knowledge from the data have not matched this overwhelming increase in the availability of data.

The From Data To Knowledge (FDK) research unit develops methods for forming useful knowledge from large masses of data. The unit operates in a multidisciplinary fashion, integrating in its research groups excellence in discrete algorithms, statistical techniques and application sciences.

The major methodological tools of the research unit are combinatorial pattern matching and data mining. The combination of these two is unique in the world. The work combines conceptual advances, algorithmic, statistical and analytical methods, and empirical work: theory and practice go hand in hand.

The results of the unit have been applied in, e.g., molecular biology, process industry, telecommunications, genetics, ecology, and natural language processing. The results have attracted wide international attention. Many concepts created by the group are in use in the scientific community, and they are presented in textbooks. Software that incorporates methods invented at the unit has been commercialized in several countries.

FDK has been selected as one of the centres of excellence by the Academy of Finland for the period 2002-2007.

Research plans

The central goal of the FDK unit is to develop new computational methods for the analysis of large and complicated data sets, i.e., to develop methods that help humans to extract knowledge from data. The group has excellent chances for this since it has an exceptional combination of skills in computing and good working contacts with experts in the various application sciences. Recently, approaches similar to the work style of the unit have gained popularity elsewhere as well. The methods for forming knowledge are based on algorithmic and statistical approaches, combinatorial pattern matching, database techniques and machine learning methods. The research develops theory and applications at the same time. Methods developed for one area can be fruitfully used in another, and thus the wide range of applications is useful both for the developers of the methods and the users.

The research of the unit can be viewed as an intertwined combination of four research areas:

Different research projects are highly connected: the approach of basic research in computational methods for applications occurs in each of the four. Similarly, the topics of discrete algorithms and probabilistic approaches occur repeatedly. Most importantly, the projects share many researchers, which implies automatic transfer of information between the different parts of the unit.