Investigation of VITS Text-to-Speech for the Lithuanian Language
Articles
Vytautas Lėveris
Vilnius University image/svg+xml
Gražina Korvel
Vilnius University image/svg+xml
Published 2026-05-08
https://doi.org/10.15388/LMITT.2026.15
PDF

Keywords

Text-to-Speech
Lithuanian language
VITS
Speech Synthesis
Phoneme-based modeling

Abstract

This study investigates the performance of the VITS model for Lithuanian speech synthesis under different training configurations. Experiments were conducted using datasets with phoneme-based and grapheme-based text representations, accented text, and both single-speaker and multi-speaker setups. The goal was to evaluate how linguistic pre-processing and speaker diversity influence synthesis quality. Model outputs were compared using objective measures. The results provide insights into the impact of phoneme representation and accent information on the quality of Lithuanian neural TTS systems.

PDF

References

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Downloads

Download data is not yet available.

Most read articles by the same author(s)