Using morphological and semantic features for the quality assessment of russian Wikipedia
Włodzimierz Lewoniewski , Nina Khairova , Krzysztof Węcel , Nataliia Stratiienko , Witold Abramowicz
AbstractNowadays, the assessment of the quality and credibility of Wikipedia articles becomes increasingly important. We propose to use morphological and semantic features to estimate the quality of Wikipedia articles in Russian language. We distinguished over 150 linguistic features and divided them into four groups. In these groups, we considered the features of encyclopedic style, readability and subjectivism of the article’s text. Based on Random Forest as a classification algorithm, we show the most importance linguistic features that affect the quality of Russian Wikipedia articles. We compare the classification results of our four linguistic features groups separately. We have achieved the F-measure of 89,75%.
|Publication size in sheets||0.5|
|Book||Damaševičius Robertas, Mikašytė Vilma (eds.): Information and Software Technologies, Communications in Computer and Information Science, vol. 756, 2017, Springer, ISBN 978-3-319-67641-8, [978-3-319-67642-5], 624 p., DOI:10.1007/978-3-319-67642-5|
|Keywords in English||quality assessment of texts, morphological and semantics features, Russian Wikipedia articles, random forests classification, encyclopedic, readability, subjectivism|
|Score||= 20.0, 24-03-2020, ChapterFromConference|
|Publication indicators||= 0; : 2017 = 0.354|
|Citation count*||3 (2020-09-23)|
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.