Computational Overview of Finnish Hydronyms

Antti Leino

Presentation at Onomastika Mūsdienu zinātnes skatījumā, February 2004; article in Dzintra Hirsa (ed.), Onomastica Lettica II.

Abstract

The spatial distribution of a wide range of linguistic phenomena has traditionally been visualised in the form of maps. Distribution maps are very useful when dealing with only a few different phenomena at a atime, but they soon become rather unwieldy as the number of different distributions increases. This is related to what is known in the field of data analysis as the "curse of dimensionality": in general, a lot of traditional methods tend to become unusable when dealing simultaneously with a massive number of different variables.

There are ways to cope with the problems that arise from massive dimensionality. This presentation shows how some of these methods, most notably principal component analysis, can be applied to onomastic data. Starting with raw data that consists of all hydronyms that appear on Finnish basic maps, the goal is to find a few of the most important trends that lie behind the distributions of individual names. Some of the results are rather predictable in view of present knowledge about Finnish dialects; others are less so.

Errata

In table 1 of the article and the table on slide 1, the last column shows the number of overall occurrences for each name, not the number of municipalities in which the name appears.

Antti Leino

Last modified: Tue Feb 8 10:14:52 EET 2005