Skip to main content

Estimating the Quality of Articles in Russian Wikipedia Using the Logical-Linguistic Model of Fact Extraction

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 288))

Abstract

We present the method of estimating the quality of articles in Russian Wikipedia that is based on counting the number of facts in the article. For calculating the number of facts we use our logical-linguistic model of fact extraction. Basic mathematical means of the model are logical-algebraic equations of the finite predicates algebra. The model allows extracting of simple and complex types of facts in Russian sentences. We experimentally compare the effect of the density of these types of facts on the quality of articles in Russian Wikipedia. Better articles tend to have a higher density of facts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://meta.wikimedia.org/wiki/List_of_Wikipedias.

  2. 2.

    http://wikirank.net.

  3. 3.

    We use ‘Subject’, ‘Object’ and ‘Predicate’ with the first upper-case letters to denote the element of a fact triplet Subject \(\text {-}{>}\) Predicate \(\text {-}{>}\) Object.

  4. 4.

    https://pymorphy2.readthedocs.io.

  5. 5.

    http://opencorpora.org.

References

  1. Anderka, M.: Analyzing and predicting quality flaws in user-generated content: the case of Wikipedia. PhD, Bauhaus-Universitaet Weimar Germany (2013)

    Google Scholar 

  2. Lipka, N., Stein, B.: Identifying featured articles in wikipedia: writing style matters. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1147–1148 (2010)

    Google Scholar 

  3. Khairova, N.F., Petrasova, S., Gautam, A.P.S.: The logical-linguistic model of fact extraction from English texts. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 625–635. Springer, Cham (2016). doi:10.1007/978-3-319-46254-7_51

    Chapter  Google Scholar 

  4. Arthur, J.D., Stevens, K.T.: Document quality indicators: a framework for assessing documentation adequacy. J. Softw. Maint. Res. Pract. 4(3), 129–142 (1992)

    Article  Google Scholar 

  5. Knight, S.A., Burn, J.: Developing a framework for assessing information quality on the world wide web. Informing Sci. J. 8, 159–172 (2005)

    Google Scholar 

  6. Shpak, O., Löwe, W., Wingkvist, A., Ericsson, M.: A method to test the information quality of technical documentation on websites. In: 2014 14th International Conference on Quality Software, pp. 296–304, October 2014

    Google Scholar 

  7. Lex, E., Juffinger, A., Granitzer, M.: Objectivity classification in online media. In: Proceedings of the 21st ACM Conference on Hypertext and Hypermedia, HT 2010, pp. 293–294. ACM, New York (2010)

    Google Scholar 

  8. Weber, N., Schoefegger, K., Bimrose, J., Ley, T., Lindstaedt, S., Brown, A., Barnes, S.-A.: Knowledge maturing in the semantic mediawiki: a design study in career guidance. In: Cress, U., Dimitrova, V., Specht, M. (eds.) EC-TEL 2009. LNCS, vol. 5794, pp. 700–705. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04636-0_71

    Chapter  Google Scholar 

  9. Blumenstock, J.E.: Size matters: word count as a measure of quality on wikipedia. In: WWW, pp. 1095–1096 (2008)

    Google Scholar 

  10. Wingkvist, A., Ericsson, M., Löwe, W.: Making sense of technical information quality - a software-based approach measuring the quality of technical data depends on developing models from which metrics can be extracted and analyzed. Using an open source tool the authors describe one approach to this (2012)

    Google Scholar 

  11. Fellbaum, C.: Wordnet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  12. Lex, E., Voelske, M., Errecalde, M., Ferretti, E., Cagnina, L., Horn, C., Stein, B., Granitzer, M.: Measuring the quality of web content using factual information. In: Proceedings of the 2nd Joint WICOW/AIRWeb Workshop on Web Quality - WebQuality 2012, p. 7 (2012)

    Google Scholar 

  13. Horn, C., Zhila, A., Gelbukh, A., Kern, R., Lex, E.: Using factual density to measure informativeness of web documents. In: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013). NEALT Proceedings Series 16, Oslo University, Norway, 22–24 May 2013, Number 085, pp. 227–238. Linköping University Electronic Press (2013)

    Google Scholar 

  14. Etzioni, O., Banko, M., Soderland, S., Weld, D.S.: Open information extraction from the web. Commun. ACM 51(12), 68–74 (2008)

    Article  Google Scholar 

  15. Eugene, A., Luis, G.: Extracting relations from large plain-text collections. In: Proceedings of ACM 2000 (2000)

    Google Scholar 

  16. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545. Association for Computational Linguistics (2011)

    Google Scholar 

  17. Bondarenko, M., Shabanov-Kushnarenko, J.: The intelligence theory. In: SMIT, Kharkiv, p. 576 (2007)

    Google Scholar 

  18. Petrasova, S., Khairova, N.: Automatic identification of collocation similarity. In: 2015 Xth International Scientific and Technical Conference, Computer Sciences and Information Technologies (CSIT), pp. 136–138, September 2015

    Google Scholar 

  19. Fillmore, C.J.: The case for case. In: Bach, E., Harms, R. (eds.) Universals in Linguistic Theory. Holt, Rinehart, and Winston, London (1968)

    Google Scholar 

  20. Osborne, T., Gross, T.: Constructions are catenae: construction grammar meets dependency grammar. Cogn. Linguist. 23(1), 165–216 (2012)

    Article  Google Scholar 

  21. Węcel, K., Lewoniewski, W.: Modelling the quality of attributes in wikipedia infoboxes. In: Abramowicz, W. (ed.) BIS 2015. LNBIP, vol. 228, pp. 308–320. Springer, Cham (2015). doi:10.1007/978-3-319-26762-3_27

    Chapter  Google Scholar 

  22. Lewoniewski, W., Węcel, K., Abramowicz, W.: Quality and importance of wikipedia articles in different languages. In: Dregvaite, G., Damasevicius, R. (eds.) ICIST 2016. CCIS, vol. 639, pp. 613–624. Springer, Cham (2016). doi:10.1007/978-3-319-46254-7_50

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nina Khairova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Khairova, N., Lewoniewski, W., Węcel, K. (2017). Estimating the Quality of Articles in Russian Wikipedia Using the Logical-Linguistic Model of Fact Extraction. In: Abramowicz, W. (eds) Business Information Systems. BIS 2017. Lecture Notes in Business Information Processing, vol 288. Springer, Cham. https://doi.org/10.1007/978-3-319-59336-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59336-4_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59335-7

  • Online ISBN: 978-3-319-59336-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics