Measures for Quality Assessment of Articles and Infoboxes in Multilingual Wikipedia
AbstractOne of the most popular collaborative knowledge bases on the Internet is Wikipedia. Articles of this free encyclopaedia are created and edited by users from different countries in about 300 languages. Depending on topic and language version, quality of information there may vary. This study presents and classifies measures that can be extracted from Wikipedia articles for the purpose of automatic quality assessment in different languages. Based on a state of the art analysis and own experiments, specific measures for various aspects of quality have been defined. Additional, in this work they were also defined measures for quality assessment of data contained in the structural parts of Wikipedia articles - infoboxes. This study describes also an extraction methods for various sources of measures, that can be used in quality assessment.
|Publication size in sheets||0.7|
|Book||Abramowicz Witold, Paschke Adrian (eds.): BIS 2018 International Workshops, Berlin, Germany, July 18–20, 2018, Revised Papers, Lecture Notes in Business Information Processing, vol. 339, 2019, Springer, ISBN 978-3-030-04848-8, [978-3-030-04849-5], 708 p., DOI:10.1007/978-3-030-04849-5|
|Keywords in Polish||Wikipedia, jakość danych, infoboks, eksploracja danych, uczenie maszynowe, zarządzanie wiedzą, gospodarka cyfrowa|
|Keywords in English||Wikipedia, data quality, infobox, data mining, machine learning, knowledge management, digital economy|
|Score||= 70.0, 11-09-2020, ChapterFromConference|
|Citation count*||7 (2020-09-16)|
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.