Discovery group: Data Mining for Pattern and Link Discovery



Department of Computer Science

Finnish Centre of Excellence for Algorithmic Data Analysis Research

Helsinki Institute for Information Technology HIIT


Digital Language Typology: Mining from the Surface to the Core

Digital language typology (DLT) is a multi-disciplinary project intending to produce a computer-based platform that will be able to assess the structurally manifested family relationships within any set of languages with appropriate large digital textual and speech material. To this end, we have collected a group of specialists from phonetics, linguistics, and computer science. The project focuses on comparing several Uralic languages with (Indo-European) languages that are most relevant to their evolution in terms of geographical distribution and history of language contact. In addition to relatively well studied Finnish, Hungarian and Estonian, we will include three much less resourced Samoyedic languages, namely Tundra and Forest Nenets and Nganasan, with distinct phylogenetic history. As such the project will provide the research community with new tools and shed new light on the linguistic history of mankind and advance the new field of Digital Humanities.

Digital Language Typology (DLT) is part of the Finnish Academy Digital Humanities programme, which includes "novel methods and techniques in which digital technologies and state-of-the-art computational science methods are used for collecting, managing, and analysing data in humanities and social sciences research." Please see for more details.

The DLT project is carried out by a consortium. The principal investigators are Martti Vainio (University of Helsinki, coordinator), Hannu Toivonen (University of Helsinki), and Markku Turunen (University of Tampere). The project will go on for four years (1 Jan 2016-31 Dec 2019).
11.01.2016 - 13:57 Hannu Toivonen
11.01.2016 - 13:54 Hannu Toivonen
11.01.2016 - 13:54 Toivonen