Macrosociolinguistics and Minority Languages

2949-5997

Peoples' Friendship University of Russia

50694

10.22363/2949-5997-2025-3-2-131-145

HHYPZI

The Languages of the Peoples of Russian Federation: Digital Documentation Tools and Media Accessibility

Языки народов Российской Федерации: цифровые инструменты документирования и медиадоступность

Research Article

Production of audiobooks in the languages of the peoples of Russia using speech synthesizers: problems and prospects

Производство аудиокниг на языках народов России с использованием синтезаторов речи: проблемы и перспективы

https://orcid.org/0000-0002-5006-5975

5459-0852

Pozhidaev

Mikhail S.

Пожидаев

Михаил Сергеевич

Ph.D. in computer science, Associate Professor at the Department of Theoretical Foundations of Computer Science at the Institute of Applied Mathematics and Computer Science

кандидат технических наук, доцент кафедры теоретических основ информатики института прикладной математики и компьютерных наук

msp@luwrain.org

https://orcid.org/0000-0002-1825-7379

7424-5366

Teplykh

Elena S.

Теплых

Елена Сергеевна

psychologist, a junior researcher at the Laboratory of Interdisciplinary Research

психолог, младший научный сотрудник лаборатории междисциплинарных исследований

elena@luwrain.org

https://orcid.org/0009-0005-6954-6874

6954-5385

Danilov

Sergey I.

Данилов

Сергей Ильич

PhD student at the Department of General and Russian Linguistics, Faculty of Philology

аспирант кафедры общего и русского языкознания филологического факультета

1042250116@rudn.ru

National Research Tomsk State UniversityНациональный исследовательский Томский государственный университет

RUDN UniversityРоссийский университет дружбы народов

17062026

13114518062026

2025

Pozhidaev M.S., Teplykh E.S., Danilov S.I.

Пожидаев М.С., Теплых Е.С., Данилов С.И.

http://creativecommons.org/licenses/by/4.0

https://macrosociolingusictics.ru/MML/article/view/50694

The creation of audiobooks in the languages of the peoples of Russia using speech synthesizers is a scientifically and socially significant task. The relevance of the research is driven by the development of speech technologies and state policies supporting linguistic diversity, including in the digital space. The stady examines a standard algorithm for audiobook creation, distinguishing between invariant and language-specific development stages. The study notes that the main difficulties are associated with the stages requiring linguistic adaptation of the text for speech synthesis: annotation and the expansion of abbreviations and acronyms. For low-resource languages, tasks such as segmentation, tokenization, and contextual annotation, including the processing of homographs and specific phonetic features, pose particular challenges. In conclusion, it is argued that full automation of audiobook creation for the languages of Russia’s peoples using current speech synthesis technology is currently unfeasible. Developing audiobooks in such languages requires the prior creation of specialized linguistic resources. A necessary condition is the formation of a parallel corpus of texts and audio recordings produced by native speakers. Therefore, the successful implementation of such projects demands significant preliminary work on compiling training datasets and adapting algorithms to the specific features of each language.

Создание аудиокниг на языках народов России с применением синтезаторов речи - научно и социально значимая задача. Актуальность исследования обусловлена развитием речевых технологий и государственной политикой поддержки языкового разнообразия в т. ч. в цифровом пространстве. Рассмотрен типовой алгоритм создания аудиокниги, выделены инвариантные и лингво-специфичные этапы разработки. Отмечено, что основные сложности связаны с этапами, требующими языковой адаптации текста к озвучиванию синтезатором речи: аннотированием, расшифровкой аббревиатур и сокращений. Для малоресурсных языков особую проблему представляют задачи сегментации, токенизации и контекстного аннотирования, включая обработку омографов и фонетических особенностей конкретных языков. Сделан вывод о невозможности полной автоматизации процесса создания аудиокниг на языках народов России с использованием синтезаторов речи на данном этапе развития этой технологии. Создание аудиокниг на таких языках требует предварительной разработки специализированных лингвистических ресурсов. Необходимым условием является формирование параллельного корпуса текстов и аудиозаписей, созданных носителями языка. Таким образом, успешная реализация подобных проектов требует значительных предварительных работ по сбору обучающих датасетов и адаптации алгоритмов под специфику конкретного языка.

minority languageslow-resource languagesmachine learningtext recognitionspeech synthesisrecurrent neural networks

миноритарные языкималоресурсные языкимашинное обучениераспознавание текстасинтезирование речирекуррентные нейронные сети

Alyunina, Yu.M. (2021). «Geometry in Russian»: online course on Russian language for specific purpose. In А.А. Urazbekova, Yu.М. Alyunina, А.S. Vasilieva, V.V. Samsonova, E.S. Sedova, T.A. Sirotina, Modern Russian Language: Functioning and Teaching Problems: Bulletin. XXVI International Scientific and Practical Conference, Budapest, May 14, 2021. Volume 35. [Sovremennyi russkii yazyk: funktsionirovanie i problemy prepodavaniya: Vestnik. XXVI Mezhdunarodnaya nauchno-prakticheskaya konferentsiya, Budapesht, 14 maya 2021 goda. Tom 35]. Budapest: Russian Center for Science and Culture in Budapest Publ. P. 7–17. (In Russ.). EDN: UCTJWX

Алюнина Ю.М. «Геометрия по-русски»: организация учебного материала в электронном курсе по научному стилю речи // Современный русский язык: функционирование и проблемы преподавания: Вестник. XXVI Международная научно-практическая конференция, Будапешт, 14 мая 2021 года. Т. 35 / под ред. А.А. Уразбековой, Ю.М. Алюниной, А.С. Васильевой, В.В. Самсоновой, Е.С. Седовой, Т.А. Сиротиной. Будапешт : Российский центр науки и культуры в Будапеште, 2021. С. 7–17. EDN: UCTJWX

Alyunina, Yu.М. (2025). Tsifrovye tekhnologii v perevode [Digital technologies in translation]. Lan’ Publ. (In Russ.).

Алюнина Ю.М. Цифровые технологии в переводе. СПб. : Лань, 2025. 144 с.

Arulprakash, A., Synthiya, M., Vijila, T., & Rajabhusanam, C. (2023). Tamil speech synthesizer app for android: Text processing module enhancement. Indian Journal of Science and Technology, 16(7), 485–491. https://doi.org/10.17485/IJST/v16i7.2165 EDN: ZDIRTC

Воркунова И.О., Кисиева А.А., Наумова А.А. Редактирование как один из основных этапов составления тифломаршрута // Теория и практика составления тифломаршрутов для навигации лиц с нарушением зрения на станциях метрополитена : монография / под ред. А.В. Козуляева. Казань : Бук, 2025. С. 112–116. EDN: EUWYYO

Drozashchikh, N.V., & Efimova, E.V. (2025). Lemmatization of low-resource languages in diachronic linguistics: problems and solutions. Izvestia: Herzen University Journal of Humanities & Sciences, (217), 302–311. (In Russ.). https://www.doi.org/10.33910/1992–6464–2025–217–302–311 EDN: AKDLRR

Дрожащих Н.В., Ефимова Е.В. Лемматизация малоресурсных языков в диахронической лингвистике: проблемы и решения // Известия Российского государственного педагогического университета РГПУ им. А.И. Герцена. 2025. № 217. С. 302–311. https:// doi.org/10.33910/1992–6464–2025–217–302–311 EDN: AKDLRR

Li, N., Liu, S., Liu, Y., Zhao, S., & Liu, M. (2019). Neural speech synthesis with transformer network. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 6706–6713. https://doi.org/10.1609/aaai.v33i01.33016706

Лобарёв Д.С., Лобарёв Н.Д. Синтез недетерминированных конечных автоматов по регулярным выражениям алгоритмом Глушкова в формате JFF // Вестник Полоцкого государственного университета. Серия С. Фундаментальные науки. 2025. № 1 (44). С. 9–13. https://doi.org/10.52928/2070-1624-2025-44-1-9-13 EDN: TEVIHV

Lobaryov, D.S., & Lobaryov, N.D. (2025). Synthesis of nondeterministic finite automaton from regular expressions by Glushkov’s algorithm in JFF format. Herald of Polotsk State University. Series C. Fundamental Sciences, (1), 9–13. (In Russ.). https://doi.org/10.52928/2070–1624–2025–44–1–9–13 EDN: TEVIHV

Пунегова Г.В. Тембральные характеристики голоса персонажа (на примере прозаических произведений коми писателей) // Вестник угроведения. 2025. Т. 15. № 1 (60). С. 80–89. https://doi.org/10.30624/2220-4156-2025-15-1-80-89 EDN: EMPAHK

Mache, S.R., Baheti, M.R., & Namrata Mahender, C. (2015). Review on text-to-speech synthesizer. International Journal of Advanced Research in Computer and Communication Engineering, 4(8), 54–59. https://doi.org/10.17148/IJARCCE.2015.4812

Arulprakash A., Synthiya M., Vijila T., Rajabhusanam C. Tamil speech synthesizer app for android: text processing module enhancement // Indian Journal of Science and Technology. 2023. Vol. 16. № 7. P. 485–491. https://doi.org/10.17485/IJST/v16i7.2165 EDN: ZDIRTC

Punegova, G.V. (2025). Timbral characteristics of a character’s voice (on the example of prose works by Komi writers). Bulletin of Ugric Studies, 15(1), 80–89. (In Russ.). https://doi.org/10.30624/2220–4156–2025–15–1–80–89 EDN: EMPAHK

Li N., Liu S., Liu Y., Zhao S., Liu M. Neural speech synthesis with transformer network // Proceedings of the AAAI Conference on Artificial Intelligence. 2019. Vol. 33. № 01. P. 6706–6713. https://doi.org/10.1609/aaai.v33i01.33016706

Tan, X., Qin, T., Soong, F., & Liu, T.-Y. (2021). A survey on neural speech synthesis. arXiv:2106.15561v3. https://doi.org/10.48550/arXiv.2106.15561

Mache S.R., Baheti M.R., Namrata Mahender C. Review on text-to-speech synthesizer // International Journal of Advanced Research in Computer and Communication Engineering. 2015. Vol. 4. № 8. P. 54–59. https://doi.org/10.17148/IJARCCE.2015.4812

10.

Tosun, M., & Dincer, K. (2018). Determination of sound transmission loss in lightweight concrete walls and modeling artificial neural network. Selçuk Üniversitesi Mühendislik Bilim Ve Teknoloji Dergisi, 6(3), 461–477. https://doi.org/10.15317/Scitech.2018.145

Tan X., Qin T., Soong F., Liu T.-Y. A survey on neural speech synthesis // arXiv. 2021. https://doi.org/10.48550/arXiv.2106.15561

11.

Wang, Y., Skerry-Ryan, RJ, Stanton, D., Wu, Y., Weiss, R.J., Jaitly, N., Yang, Z., Xiao, Y., Chen, Zh., Bengio, S., Le, Q., Agiomyrgiannakis, Y., Clark, R., & Saurous, R.A. (2017). Tacotron: Towards end-to-end speech synthesis. arXiv:1703.10135. https://doi.org/10.48550/arXiv.1703.10135

Tosun M., Dincer K. Determination of sound transmission loss in lightweight concrete walls and modeling artificial neural network // Selçuk Üniversitesi Mühendislik Bilim Ve Teknoloji Dergisi. 2018. Vol. 6. № 3. P. 461–477. https://doi.org/10.15317/Scitech.2018.145

12.

Vorkunova, I.О., Kisieva, А.А., & Naumova, А.А. (2025). Editing as one of the main stages of compiling a typhlo route. In Kozulaev, A.V. Teoriya i praktika sostavleniya tiflomarshrutov dlya navigatsii lits s narusheniem zreniya na stantsiyakh metropolitena [Theory and practice of creating tiflo-routes for navigation of visually impaired people at metro stations]. Kazan’: Buk Publ. P. 112–116. (In Russ.). EDN: EUWYYO

Wang, Y., Skerry-Ryan, RJ, Stanton, D., Wu, Y., Weiss, R.J., Jaitly, N., Yang, Z., Xiao, Y., Chen, Zh., Bengio, S., Le, Q., Agiomyrgiannakis, Y., Clark, R., Saurous, R.A. Tacotron: Towards End-to-End Speech Synthesis // arXiv:1703.10135. 2017. https://doi.org/10.48550/arXiv.1703.10135

13.

Zheng, Y., Li, X., Xie, F., & Lu, L. (2020). Improving end-to-end speech synthesis with local recurrent neural network enhanced transformer. In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona: ICASSP. P. 6734–6738. https://doi.org/10.1109/ICASSP40776.2020.9054148

Zheng Y., Li X., Xie F., Lu L. Improving end-to-end speech synthesis with local recurrent neural network enhanced transformer // ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona : ICASSP, 2020. P. 6734–6738. https://doi.org/10.1109/ICASSP40776.2020.9054148