Lemmatization of Multi-Word Entity Names for Polish Language Using Rules Automatically Generated Based on the Corpus Analysis
Jacek Małyszko , Witold Abramowicz , Agata Filipowska , Tomasz Wagner
AbstractThe article concerns automatic lemmatization of Multi-Word Units for highly inflective languages. We present an approach, where the lemmatization is conducted using rules generated solely based on a corpus analysis. Conducted experiments revealed, that the accuracy of automatic lemmatization of MWUs for the Polish language according to the developed approach may reach up to 82%.
|Publication size in sheets||0.5|
|Book||Vetulani Zygmunt, Mariani Joseph, Kubis Marek (eds.): Human Language Technology. Challenges for Computer Science and Linguistics, Lecture Notes in Computer Science, no. 10930, 2018, Springer, ISBN 978-3-319-93781-6, [978-3-319-93782-3], 446 p., DOI:10.1007/978-3-319-93782-3|
|Keywords in English||Natural Language Processing, Multi-Word Units, Lemmatization|
|Score||= 20.0, 23-04-2020, ChapterFromConference|
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.