Detection of Semantic Compositionality Using Semantic Spaces

Detection of Semantic Compositionality Using Semantic Spaces

Any Natural Language Processing (NLP) system that does semantic processing relies on the assumption of semantic compositionality: the meaning of a compound is determined by the meaning of its parts and their combination. However, the compositionality assumption does not hold for many idiomatic expressions such as “blue chip”. This paper focuses on the fully automatic detection of these, further referred to as non-compositional compounds.We have proposed and tested an intuitive approach based on replacing the parts of compounds by semantically related words. Our models determining the compositionality combine simple statistic ideas with the COALS semantic space. For the evaluation, the shared dataset for the Distributional Semantics and Compositionality 2011 workshop (DISCO 2011) is used. A comparison of our approach with the traditionally used Pointwise Mutual Information (PMI) is also presented. Our best models outperform all the systems competing in DISCO 2011.

Keywords: DISCO 2011, compositionality, semantic space, collocations, COALS, PMI

Year: 2012

Download: download Full text 

Authors of this publication:

Lubomír Krčmář


Luboš graduated from the University of West Bohemia in 2009. He is a PhD student now. His research is focused on natural language processing, information retrieval, and semantic similarity of texts of varying length. Especially, he is interested in automatic extraction of collocations and idiomatic expression from large corpora.

Karel Ježek

Phone:  +420 377632475, 377632400

Karel is a group coordinator and a supervisor of PhD students working at research projects of this Group.

Related Projects:


Exploration of Semantic Spaces

Authors:  Karel Ježek, Lubomír Krčmář, Miloslav Konopík
Desc.:This work is focused on semantic relations between words and application of these relations in research fields such as information retrieval, machine translation or document clustering.