Līdzsvarots mūsdienu latviešu valodas tekstu korpuss un tā tekstu atlases kritēriji

Kristīne Levāne-Petrova

doi:10.15388/baltistica.0.8.2113

Articles

Kristīne Levāne-Petrova

UL Institute of Mathematics and Computer Science

Published 2012-09-01

https://doi.org/10.15388/baltistica.0.8.2113

PDF

Keywords

The Balanced Corpus of Modern Latvian
computer linguistics

How to Cite

Levāne-Petrova, K. (2012) “Līdzsvarots mūsdienu latviešu valodas tekstu korpuss un tā tekstu atlases kritēriji”, Baltistica, 47(-), pp. 89–98. doi:10.15388/baltistica.0.8.2113.

Download Citation

Abstract

THE BALANCED CORPUS OF MODERN LATVIAN AND THE TEXT SELECTION CRITERIA

Summary

Recently The Balanced Corpus of Modern Latvian (~3.5 million running words) has been created in the Institute of Mathematics and Computer Science (IMCS) (see http://www.korpuss.lv). The Corpus has been compiled from printed and electronic materials created after 1990. The Corpus is automatically morphologically tagged: for each token all the syntactically valid interpretations are stored.

Texts for the Corpus were chosen according to different text selection criteria: for instance, time, media, domain, etc. This article discusses the text selection criteria chosen for this Corpus, problems related to Corpus design and text selection criteria, solutions found for these problems and future plans regarding the Corpus.

PDF

References

This work is licensed under a Creative Commons Attribution 4.0 International License.

Downloads

Download data is not yet available.

Most read articles by the same author(s)

Kristīne Levāne-Petrova, The passive voice perfect tense forms and their distribution in Latvian , Baltistica: Vol. 54 No. 1 (2019): Baltistica