
Automatic keyphrase extraction based on NLP and statistical methods
In this article we would like to present our experimental approach to automatic keyphrase extraction based on statistical methods and Wordnet-based pattern evaluation. Automatic keyphrases are important for automatic tagging and clustering because manually assigned keyphrases are not sufficient in most cases. Keyphrase candidates are extracted in a new way derived from a combination of graph methods (TextRank) and statistical methods (TF*IDF). Keyword candidates are merged with named entities and stop words according to NL POS (Part Of a Speech) patterns. Automatic keyphrases are generated as TF*IDF weighted unigrams. Keyphrases describe the main ideas of documents in a human-readable way. Evaluation of this approach is presented in articles extracted from News web sites. Each article contains manually assigned topics/categories which are used for keyword evaluation.
Keywords: keyphrase extraction, Wordnet, TextRank, TFIDF, NLP
Year: 2011
Authors of this publication:

Martin Dostal
E-mail: madostal@kiv.zcu.cz

Karel Ježek
Phone: +420 377632475
E-mail: jezek_ka@kiv.zcu.cz
WWW: https://cs.wikipedia.org/wiki/Karel_Je%C5%BEek_(informatik)
Related Projects:

Document Clustering and Linked Data | |
Authors: | Karel Ježek, Martin Dostal |
Desc.: | Unsupervised methods for automatic tagging and clustering based on information extraction from Linked data. |