Iconicity in large language models // Leonid Ryvkin

abstract: Lexical iconicity, a direct relation between a word’s meaning and its form, is an important aspect of every natural language, most commonly manifesting through sound-meaning associations. Since Large language models’ (LLMs’) access to both meaning and sound of text is only mediated (meaning through textual context, sound through written representation, further complicated by tokenization), we might expect that the encoding of iconicity in LLMs would be either insufficient or significantly different from human processing. This study addresses this hypothesis by having GPT-4 generate highly iconic pseudowords in artificial languages. To verify that these words actually carry iconicity, we had their meanings guessed by Czech and German participants (n = 672) and subsequently by LLM-based participants (generated by GPT-4 and Claude 3.5 Sonnet). The results revealed that humans can guess the meanings of pseudowords in the generated iconic language more accurately than words in distant natural languages and that LLM-based participants are even more successful than humans in this task. This core finding is accompanied by several additional analyses concerning the universality of the generated language and the cues that both human and LLM-based participants utilize.

I helped in the creation of a jspsych plugin and in the data collection for the article. It can be found here.

Download the bibliographic data or copy it from here:

@article{Marklova-Iconicity-in-Large-2025,
 abstract = {Lexical iconicity, a direct relation between a word’s meaning and its form, is an important aspect of every natural language, most commonly manifesting through sound-meaning associations. Since Large language models’ (LLMs’) access to both meaning and sound of text is only mediated (meaning through textual context, sound through written representation, further complicated by tokenization), we might expect that the encoding of iconicity in LLMs would be either insufficient or significantly different from human processing. This study addresses this hypothesis by having GPT-4 generate highly iconic pseudowords in artificial languages. To verify that these words actually carry iconicity, we had their meanings guessed by Czech and German participants (n = 672) and subsequently by LLM-based participants (generated by GPT-4 and Claude 3.5 Sonnet). The results revealed that humans can guess the meanings of pseudowords in the generated iconic language more accurately than words in distant natural languages and that LLM-based participants are even more successful than humans in this task. This core finding is accompanied by several additional analyses concerning the universality of the generated language and the cues that both human and LLM-based participants utilize.},
 author = {Marklová, Anna and Milička, Jiří and Ryvkin, Leonid and Lacková Bennet, L’udmila and Kormaníková, Libuše},
 doi = {10.1093/llc/fqaf095},
 eprint = {2501.05643},
 issn = {2055-7671},
 journal = {Digital Scholarship in the Humanities},
 month = {09},
 pages = {fqaf095},
 title = {Iconicity in large language models},
 url = {https://doi.org/10.1093/llc/fqaf095},
 year = {2025}
}