Comparative Analysis of the Informativeness and Encyclopedic Style of the Popular Web Information Sources

Nina Khairova , Włodzimierz Lewoniewski , Krzysztof Węcel , Mamyrbayev Orken , Mukhsina Kuralai


Nowadays, very often decision making relies on information that is found in the various Internet sources. Preferred are texts of the encyclopedic style, which contain mostly factual information. We propose to combine the logic-linguistic model and the universal dependency treebank to extract facts of various quality levels from texts. Based on Random Forest as a classification algorithm, we show the most significant types of facts and types of words that most affect the encyclopedic-style of the text. We evaluate our approach on four corpora based on Wikipedia, social and mass media texts. Our classifier achieves over 90% F-measure.
Author Nina Khairova
Nina Khairova,,
, Włodzimierz Lewoniewski (WIiGE / KIE)
Włodzimierz Lewoniewski,,
- Department of Information Systems
, Krzysztof Węcel (WIiGE / KIE)
Krzysztof Węcel,,
- Department of Information Systems
, Mamyrbayev Orken - Institute of Information and Computational Technologies, Kazakhstan
Mamyrbayev Orken,,
, Mukhsina Kuralai - Al-Farabi Kazakh National University, Kazakhstan
Mukhsina Kuralai,,
Publication size in sheets0.55
Book Abramowicz Witold, Paschke Adrian (eds.): Business Information Systems 21st International Conference, BIS 2018, Berlin, Germany, July 18-20, 2018, Proceedings, Lecture Notes in Business Information Processing, vol. 320, 2018, Springer, ISBN 978-3-319-93930-8, [978-3-319-93931-5], 426 p., DOI:10.1007/978-3-319-93931-5
Keywords in Englishencyclopedic, informativeness, universal dependency, random forest, facts extraction, Wikipedia, mass media
Languageen angielski
Score (nominal)70
Score sourceconferenceList
ScoreMinisterial score = 70.0, 11-09-2020, ChapterFromConference
Citation count*7 (2020-09-16)
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?