Modelling the Quality of Attributes in Wikipedia Infoboxes

Krzysztof Węcel , Włodzimierz Lewoniewski


Quality of data in DBpedia depends on underlying information provided in Wikipedia's infoboxes. Various language editions can provide different information about given subject with respect to set of attributes and values of these attributes. Our research question is which language editions provide correct values for each attribute so that data fusion can be carried out. Initial experiments proved that quality of attributes is correlated with the overall quality of the Wikipedia article providing them. Wikipedia offers functionality to assign a quality class to an article but unfortunately majority of articles have not been graded by community or grades are not reliable. In this paper we analyse the features and models that can be used to evaluate the quality of articles, providing foundation for the relative quality assessment of infobox's attributes, with the purpose to improve the quality of DBpedia.
Author Krzysztof Węcel (WIiGE / KIE)
Krzysztof Węcel,,
- Department of Information Systems
, Włodzimierz Lewoniewski (WIiGE / KIE)
Włodzimierz Lewoniewski,,
- Department of Information Systems
Publication size in sheets0.6
Book Abramowicz Witold (eds.): BIS 2015 International Workshops Poznań, Poland, June 24–26, 2015. Revised Papers, Lecture Notes in Business Information Processing, vol. 228, 2015, Springer International Publishing, ISBN 978-3-319-26761-6, [978-3-319-26762-3 ], 344 p., DOI:10.1007/978-3-319-26762-3
Keywords in EnglishData quality; Information quality; DBpedia; Wikipedia; Infobox; Data mining; Wikirank
Languageen angielski
Score (nominal)15
Score sourceconferenceIndex
ScoreMinisterial score = 15.0, 12-02-2020, BookChapterSeriesAndMatConfByIndicator
Ministerial score (2013-2016) = 15.0, 12-02-2020, BookChapterSeriesAndMatConfByIndicator
Publication indicators WoS Citations = 12
Citation count*21 (2020-09-16)
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?