Lemmatization of Multi-Word Entity Names for Polish Language Using Rules Automatically Generated Based on the Corpus Analysis

Jacek Małyszko , Witold Abramowicz , Agata Filipowska , Tomasz Wagner

Abstract

The article concerns automatic lemmatization of Multi-Word Units for highly inflective languages. We present an approach, where the lemmatization is conducted using rules generated solely based on a corpus analysis. Conducted experiments revealed, that the accuracy of automatic lemmatization of MWUs for the Polish language according to the developed approach may reach up to 82%.
Author Jacek Małyszko (WIiGE / KIE)
Jacek Małyszko,,
- Department of Information Systems
, Witold Abramowicz (WIiGE / KIE)
Witold Abramowicz,,
- Department of Information Systems
, Agata Filipowska (WIiGE / KIE)
Agata Filipowska,,
- Department of Information Systems
, Tomasz Wagner (WIiGE)
Tomasz Wagner,,
- Faculty of Informatics and Electronic Economy
Pages74-84
Publication size in sheets0.5
Book Vetulani Zygmunt, Mariani Joseph, Kubis Marek (eds.): Human Language Technology. Challenges for Computer Science and Linguistics, Lecture Notes in Computer Science, no. 10930, 2018, Springer, ISBN 978-3-319-93781-6, [978-3-319-93782-3], 446 p., DOI:10.1007/978-3-319-93782-3
Keywords in EnglishNatural Language Processing, Multi-Word Units, Lemmatization
DOIDOI:10.1007/978-3-319-93782-3_6
Languageen angielski
Score (nominal)20
Score sourcepublisherList
ScoreMinisterial score = 20.0, 23-04-2020, ChapterFromConference
Citation count*
Cite
Share Share

Get link to the record


* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back
Confirmation
Are you sure?