Danijel Koržinek
Polish-Japanese Academy of Information Technology, Warsaw, Poland
Krzysztof Marasek
Polish-Japanese Academy of Information Technology, Warsaw, Poland
Łukasz Brocki
Polish-Japanese Academy of Information Technology, Warsaw, Poland
Krzysztof Wołk
Polish-Japanese Academy of Information Technology, Warsaw, Poland
Download articlePublished in: Selected papers from the CLARIN Annual Conference 2016, Aix-en-Provence, 26–28 October 2016, CLARIN Common Language Resources and Technology Infrastructure
Linköping Electronic Conference Proceedings 136:4, p. 54-62
Published: 2017-05-23
ISBN: 978-91-7685-499-0
ISSN: 1650-3686 (print), 1650-3740 (online)
This paper describes the speech processing activities conducted at the Polish consortium of the CLARIN project. The purpose of this segment of the project was to develop specific tools that would allow for automatic and semi-automatic processing of large quantities of acoustic speech data. The tools include the following: grapheme-to-phoneme conversion, speech-to-text alignment, voice activity detection, speaker diarization, keyword spotting and automatic speech transcription. Furthermore, in order to develop these tools, a large high-quality studio speech corpus was recorded and released under an open license, to encourage development in the area of Polish speech research. Another purpose of the corpus was to serve as a reference for studies in phonetics and pronunciation. All the tools and resources were released on the the Polish CLARIN website. This paper discusses the current status and future plans for the project.
Speech corpora, speech recognition, speech alignment, grapheme-to-phoneme, speaker diarization, voice activity detection, keyword spottingspotting
[Bisani and Ney2008] Maximilian Bisani and Hermann Ney. 2008. Joint-sequence models for grapheme-tophoneme conversion. Speech Communication, 50(5):434–451.
[Boersma and others2002] Paulus Petrus Gerardus Boersma et al. 2002. Praat, a system for doing phonetics by computer. Glot international, 5.
[Brocki et al.2012] Lukasz Brocki, Krzysztof Marasek, and Danijel Koržinek. 2012. Multiple model text normalization for the polish language. In International Symposium on Methodologies for Intelligent Systems, pages 143–148. Springer.
[Cassidy and Harrington2001] Steve Cassidy and Jonathan Harrington. 2001. Multi-level annotation in the emu speech database management system. Speech Communication, 33(1):61–77.
[CLARIN-NL2013] CLARIN-NL. 2013. Ttnww - tst tools voor het nederlands als webservices in een workflow. https://portal.clarin.nl/node/1964. [Online; accessed 2016-09-27].
[Huijbregts2006] Marijn Huijbregts. 2006. Shout speech recognition toolkit.
[Katsamanis et al.2011] Athanasios Katsamanis, Matthew Black, Panayiotis G Georgiou, Louis Goldstein, and S Narayanan. 2011. Sailalign: Robust long speech-text alignment. In Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research.
[Kisler et al.2016] Thomas Kisler, Uwe Reichel, Florian Schiel, Christoph Draxler, Bernhard Jackl, and Nina Pörner. 2016. Bas speech science web services - an update of current developments. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016), Paris, France. European Language Resources Association (ELRA).
[Lenkiewicz et al.2012] Przemyslaw Lenkiewicz, Eric Auer, Oliver Schreer, Stefano Masneri, Daniel Schneider, and Sebastian Tschöpe. 2012. Avatechâ?A¸Tautomated annotation through audio and video analysis. In LREC 2012: 8th International Conference on Language Resources and Evaluation, pages 209–214. European Language Resources Association.
[Marasek et al.2014] Krzysztof Marasek, Krzysztof Wolk, Danijel Koržinek, Lukasz Brocki, and Ryszard Gubrynowicz. 2014. Spoken language translation for polish. Forum Acousticum.
[Meignier and Merlin2010] Sylvain Meignier and Teva Merlin. 2010. Lium spkdiarization: an open source toolkit for diarization. In CMU SPUD Workshop, volume 2010.
[Mikolov et al.2013] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111–3119.
[Nouza et al.2015] Jan Nouza, Petr Cerva, and Radek Safarik. 2015. Cross-lingual adaptation of broadcast transcription system to polish language using public data sources. In 7th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Poland, pages 181–185.
[Pezik2015] Piotr Pezik. 2015. Spokes-a search and exploration service for conversational corpus data. In Selected Papers from the CLARIN 2014 Conference, October 24-25, 2014, Soesterberg, The Netherlands, number 116 in Linköping Electronic Conference Proceedings, pages 99–109. Linköping University Electronic Press.
[Povey et al.2011] Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, and Karel Vesely. 2011. The kaldi speech recognition toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, December. IEEE Catalog No.: CFP11SRW-USB.
[Vu et al.2014] Ngoc Thang Vu, David Imseng, Daniel Povey, Petr Motlicek, Tanja Schultz, and Hervé Bourlard. 2014. Multilingual deep neural network based acoustic modeling for rapid language adaptation. In 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7639–7643. IEEE.
[Winkelmann and Raess2014] Raphael Winkelmann and Georg Raess. 2014. Introducing a web application for labeling, visualizing speech and correcting derived speech signals. In LREC, pages 4129–4133.