Department of Computer Science
Computational Generation and Dissection of Lexical Replacement Humor

Downloads

Individual files:

All files: taboo-words.zip

Computational Generation and Dissection of Lexical Replacement Humor

This readme file contains information about taboo(-inducing) word lists used in the experiments of the following paper.

Reference

Alessandro Valitutti, Antoine Doucet, Jukka M. Toivanen, and Hannu Toivonen: Computational Generation and Dissection of Lexical Replacement Humor. Submitted to Natural Language Engineering. 2015.

If you use the word lists in your research, please cite the above paper as the source of the words.

Files

The files containing the word lists are


Taboo word classes

Roughly speaking,

  • connotational taboo words are unspeakable words where the taboo is in the utterance itself
  • taboo-inducing words are not taboos in themselves, but depending on their use they can induce taboo meanings.

Please, see the paper referenced above for further information about the classification of taboo words.

Original sources

The taboo(-inducing) words were hand-picked from three sources:

  1. words used as funny autocorrections from http://www.damnyouautocorrect.com
  2. profanities from http://www.urbandictionary.com and http://onlineslangdictionary.com
  3. words related to sex, from the sexuality domain of WordNet-Domains, cf.
    Magnini, B. and CavagliĆ , G. (2000): Integrating Subject Field Codes into WordNet. In Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC2000), Athens, Greece.