Estimating the Quality of Articles in Russian Wikipedia Using the Logical-Linguistic Model of Fact Extraction

Khairova, Nina; Lewoniewski, Włodzimierz; Węcel, Krzysztof

doi:10.1007/978-3-319-59336-4_3

Estimating the Quality of Articles in Russian Wikipedia Using the Logical-Linguistic Model of Fact Extraction

Nina Khairova⁷,
Włodzimierz Lewoniewski⁸ &
Krzysztof Węcel⁸

Conference paper
First Online: 28 May 2017

1230 Accesses
8 Citations
13 Altmetric

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 288))

Abstract

We present the method of estimating the quality of articles in Russian Wikipedia that is based on counting the number of facts in the article. For calculating the number of facts we use our logical-linguistic model of fact extraction. Basic mathematical means of the model are logical-algebraic equations of the finite predicates algebra. The model allows extracting of simple and complex types of facts in Russian sentences. We experimentally compare the effect of the density of these types of facts on the quality of articles in Russian Wikipedia. Better articles tend to have a higher density of facts.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://meta.wikimedia.org/wiki/List_of_Wikipedias.
2.
http://wikirank.net.
3.
We use ‘Subject’, ‘Object’ and ‘Predicate’ with the first upper-case letters to denote the element of a fact triplet Subject \(\text {-}{>}\) Predicate \(\text {-}{>}\) Object.
4.
https://pymorphy2.readthedocs.io.
5.
http://opencorpora.org.

References

Anderka, M.: Analyzing and predicting quality flaws in user-generated content: the case of Wikipedia. PhD, Bauhaus-Universitaet Weimar Germany (2013)
Google Scholar
Lipka, N., Stein, B.: Identifying featured articles in wikipedia: writing style matters. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1147–1148 (2010)
Google Scholar
Khairova, N.F., Petrasova, S., Gautam, A.P.S.: The logical-linguistic model of fact extraction from English texts. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 625–635. Springer, Cham (2016). doi:10.1007/978-3-319-46254-7_51
Chapter Google Scholar
Arthur, J.D., Stevens, K.T.: Document quality indicators: a framework for assessing documentation adequacy. J. Softw. Maint. Res. Pract. 4(3), 129–142 (1992)
Article Google Scholar
Knight, S.A., Burn, J.: Developing a framework for assessing information quality on the world wide web. Informing Sci. J. 8, 159–172 (2005)
Google Scholar
Shpak, O., Löwe, W., Wingkvist, A., Ericsson, M.: A method to test the information quality of technical documentation on websites. In: 2014 14th International Conference on Quality Software, pp. 296–304, October 2014
Google Scholar
Lex, E., Juffinger, A., Granitzer, M.: Objectivity classification in online media. In: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, HT 2010, pp. 293–294. ACM, New York (2010)
Google Scholar
Weber, N., Schoefegger, K., Bimrose, J., Ley, T., Lindstaedt, S., Brown, A., Barnes, S.-A.: Knowledge maturing in the semantic mediawiki: a design study in career guidance. In: Cress, U., Dimitrova, V., Specht, M. (eds.) EC-TEL 2009. LNCS, vol. 5794, pp. 700–705. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04636-0_71
Chapter Google Scholar
Blumenstock, J.E.: Size matters: word count as a measure of quality on wikipedia. In: WWW, pp. 1095–1096 (2008)
Google Scholar
Wingkvist, A., Ericsson, M., Löwe, W.: Making sense of technical information quality - a software-based approach measuring the quality of technical data depends on developing models from which metrics can be extracted and analyzed. Using an open source tool the authors describe one approach to this (2012)
Google Scholar
Fellbaum, C.: Wordnet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Stein, B., Granitzer, M.: Measuring the quality of web content using factual information. In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality - WebQuality 2012, p. 7 (2012)
Google Scholar
Horn, C., Zhila, A., Gelbukh, A., Kern, R., Lex, E.: Using factual density to measure informativeness of web documents. In: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013). NEALT Proceedings Series 16, Oslo University, Norway, 22–24 May 2013, Number 085, pp. 227–238. Linköping University Electronic Press (2013)
Google Scholar
Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Commun. ACM 51(12), 68–74 (2008)
Article Google Scholar
Eugene, A., Luis, G.: Extracting relations from large plain-text collections. In: Proceedings of ACM 2000 (2000)
Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics (2011)
Google Scholar
Bondarenko, M., Shabanov-Kushnarenko, J.: The intelligence theory. In: SMIT, Kharkiv, p. 576 (2007)
Google Scholar
Petrasova, S., Khairova, N.: Automatic identification of collocation similarity. In: 2015 Xth International Scientific and Technical Conference, Computer Sciences and Information Technologies (CSIT), pp. 136–138, September 2015
Google Scholar
Fillmore, C.J.: The case for case. In: Bach, E., Harms, R. (eds.) Universals in Linguistic Theory. Holt, Rinehart, and Winston, London (1968)
Google Scholar
Osborne, T., Gross, T.: Constructions are catenae: construction grammar meets dependency grammar. Cogn. Linguist. 23(1), 165–216 (2012)
Article Google Scholar
Węcel, K., Lewoniewski, W.: Modelling the quality of attributes in wikipedia infoboxes. In: Abramowicz, W. (ed.) BIS 2015. LNBIP, vol. 228, pp. 308–320. Springer, Cham (2015). doi:10.1007/978-3-319-26762-3_27
Chapter Google Scholar
Lewoniewski, W., Węcel, K., Abramowicz, W.: Quality and importance of wikipedia articles in different languages. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 613–624. Springer, Cham (2016). doi:10.1007/978-3-319-46254-7_50
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

National Technical University “Kharkiv Polytechnic Institute”, NTU “KhPI” 2, Kyrpychova str., Kharkiv, 61002, Ukraine
Nina Khairova
Poznań University of Economics and Business, Al. Niepodległości 10, 61-875, Poznań, Poland
Włodzimierz Lewoniewski & Krzysztof Węcel

Authors

Nina Khairova
View author publications
You can also search for this author in PubMed Google Scholar
Włodzimierz Lewoniewski
View author publications
You can also search for this author in PubMed Google Scholar
Krzysztof Węcel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nina Khairova .

Editor information

Editors and Affiliations

Poznan University of Economics, Poznan, Poland
Witold Abramowicz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khairova, N., Lewoniewski, W., Węcel, K. (2017). Estimating the Quality of Articles in Russian Wikipedia Using the Logical-Linguistic Model of Fact Extraction. In: Abramowicz, W. (eds) Business Information Systems. BIS 2017. Lecture Notes in Business Information Processing, vol 288. Springer, Cham. https://doi.org/10.1007/978-3-319-59336-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-59336-4_3
Published: 28 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59335-7
Online ISBN: 978-3-319-59336-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics