On the missing data tolerance of popular clustering algorithms in the case of nominal data Jaana Heino While many clustering algorithms have been described, we still lack comprehensive experimental comparisions of those algorithms, and also very little attention has been paid to the fact that in real-life situations data is often noisy and/or missing completely. In this work, I compare the performance of three popular clustering algorithms (k-means, hierarchical clustering, and mixture model clustering with the naive Bayes assumption) on nominal data, and when some of that data is missing. As materials, I use several differently-sized random datasets produced from dependency models with varying degree of dependency between the variables, with different amounts of random noise and missing data added. The resulting clusterings with each algorithm are compared to the known true clusters using pairwise similarity and mutual information as measures of correctness. In addition to artificial data, I also consider the robustness of the algorithms under increasing amounts of missing values in the case of some real life (medical) datasets.