How to cite the Lexical Database of Lithuanian Language Usage. Authors should acknowledge their use of the database by providing the following reference to it:

Kovalevskaitė, Jolanta; Bielinskienė, Agnė; Boizou, Loic; Jancaitė, Laima; Rimkutė, Erika.

Mokomasis lietuvių kalbos vartosenos leksikonas [e-resource]. Kaunas: Vytautas Magnus University, 2021. DOI: https://doi.org/10.7220/kalbu.vdu.lt.leksikonas

The Lexical Database of Lithuanian Language Usage is an electronic resource intended for teachers and learners of Lithuanian as a foreign language. The database was developed on the basis of the written data of the Pedagogic Corpus, which amounts to about 620,000 words. This small, monolingual, and automatically morphologically annotated corpus, which was compiled for the needs of teaching (learning) the Lithuanian language, was used to develop the list of headwords for the database and to study the usage regularities of the selected words and multi-word units.

The lexical database contains 3,700 lexical items including individual words and multi-word units (such as compound nouns, fixed expressions, and sayings). Its headword list consists of the following categories:

(1) words (verbs, nouns, adjectives, and adverbs) used at least 100 times in all the four levels represented in the pedagogic corpus (levels from A1 to B2); in total, appr. 700 words;

(2) derivatives and multi-word units related to the above-mentioned most common 700 words, which in total make up 3,000 lexical items.

When developing the lexical database, the main goal was to collect data that could be applicable in teaching Lithuanian as a foreign language at higher proficiency levels. We aimed to provide as much information as possible about authentic present-day use of Lithuanian vocabulary (individual words and multi-word units) that would be relevant to language learning.

In this database, you will find information on how the described words and multi-word units are used in the current Lithuanian language, i.e. how they are spelt, pronounced, what forms they are most often used in, and what lexical and grammatical patterns they typically occur in. Not only examples but also corpus patterns are provided to reveal the usage regularities of the most frequent lexical items: this approach aims to show the interrelation between different word meanings and the lexical and grammatical environment they appear in. The database does not provide explicit definitions of word meanings; instead, the regularities recorded in corpus patterns help to distinguish and understand different word meanings. Such a principle of distinguishing and describing meaning is taken over from corpus linguistics, where the meaning of a word is considered to be the word with its immediate context.

This corpus-based lexicographic resource was developed by a team of researchers of the Center of Computational Linguistics at Vytautas Magnus University:

Jolanta Kovalevskaitė (Research Team Coordinator),

Agnė Bielinskienė,

Loic Boizou,

Laima Jancaitė,

Erika Rimkutė.

The audio recordings as well as data on stress and pronunciation were prepared by Asta Kazlauskienė and Sigita Dereškevičiūtė.

Web Developer: Petras Pauliūnas

If you notice any inaccuracies in the lexical database or if you have any questions or suggestions, please write to: jolanta.kovalevskaite@vdu.lt.

Thank you!


Search form v.4.7, Lexicon DB v.3.3 (2021-11-22 11:13)