Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech

Öktem, Alp; Farrús, Mireia; Wanner, Leo

doi:10.1007/978-3-319-68456-7_11

Alp Öktem¹⁶,
Mireia Farrús¹⁶ &
Leo Wanner^16,17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

829 Accesses
8 Citations
1 Altmetric

Abstract

Until very recently, the generation of punctuation marks for automatic speech recognition (ASR) output has been mostly done by looking at the syntactic structure of the recognized utterances. Prosodic cues such as breaks, speech rate, pitch intonation that influence placing of punctuation marks on speech transcripts have been seldom used. We propose a method that uses recurrent neural networks, taking prosodic and lexical information into account in order to predict punctuation marks for raw ASR output. Our experiments show that an attention mechanism over parallel sequences of prosodic cues aligned with transcribed speech improves accuracy of punctuation generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Punctuation Restoration System for Slovene Language

Inserting Punctuation to ASR Output in a Real-Time Production Environment

Punctuation Prediction for Chinese Spoken Sentence Based on Model Combination

Notes

1.
http://www.ted.com.

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014). http://arxiv.org/abs/1409.0473
Ballesteros, M., Wanner, L.: A neural network architecture for multilingual punctuation generation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)
Google Scholar
Baron, D., Shriberg, E., Stolcke, A.: Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues. Channels 20(61), 41 (2002)
Google Scholar
Batista, F., Moniz, H., Trancoso, I., Mamede, N.: Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans. Audio Speech Lang. Process. 20(2), 474–485 (2012)
Article Google Scholar
Boersma, P., Weenink, D.: Praat: doing phonetics by computer [computer program] (2016). http://www.praat.org/
Cho, E., Niehues, J., Kilgour, K., Waibel, A.: Punctuation insertion for real-time spoken language translation. In: Proceedings of the Eleventh International Workshop on Spoken Language Translation (2015)
Google Scholar
Cho, E., Niehues, J., Waibel, A.: Segmentation and punctuation prediction in speech language translation using a monolingual translation system. In: International Workshop on Spoken Language Translation (IWSLT) 2012 (2012)
Google Scholar
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014). http://arxiv.org/abs/1406.1078
Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding, pp. 35–40 (2001)
Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). http://dl.acm.org/citation.cfm?id=1953048.2021068
MathSciNet MATH Google Scholar
Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. CoRR abs/1505.08075 (2015). http://arxiv.org/abs/1505.08075
Farrús, M., Lai, C., Moore, J.D.: Paragraph-based prosodic cues for speech synthesis applications. In: Proceedings of the 8th International Conference on Speech Prosody (2016)
Google Scholar
Favre, B., Grishman, R., Hillard, D., Ji, H., Hakkani-Tur, D., Ostendorf, M.: Punctuating speech for information extraction. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, pp. 5013–5016. IEEE (2008)
Google Scholar
Fung, J.G., Hakkani-Tür, D., Magimai-Doss, M., Shriberg, E., Cuendet, S., Mirghafori, N.: Cross-linguistic analysis of prosodic features for sentence segmentation. In: Eighth Annual Conference of the International Speech Communication Association (2007)
Google Scholar
Hillard, D., Huang, Z., Ji, H., Grishman, R., Hakkani-Tur, D., Harper, M., Ostendorf, M., Wang, W.: Impact of automatic comma prediction on POS/name tagging of speech. In: Spoken Language Technology Workshop, pp. 58–61. IEEE (2006)
Google Scholar
Jakubícek, M., Horák, A.: Punctuation detection with full syntactic parsing. Spec. Issue: Nat. Lang. Process. Appl. 46, 335–346 (2010)
Google Scholar
Khomitsevich, O., Chistikov, P., Krivosheeva, T., Epimakhova, N., Chernykh, I.: Combining prosodic and lexical classifiers for two-pass punctuation detection in a Russian ASR system. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 161–169. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_20
Chapter Google Scholar
Kolář, J., Lamel, L.: Development and evaluation of automatic punctuation for French and English speech-to-text. In: Proceedings of INTERSPEECH, pp. 1376–1379 (2012)
Google Scholar
Kolář, J., Švec, J., Psutka, J.: Automatic punctuation annotation in Czech broadcast news speech. In: in Proceedings of the SPECOM (2004)
Google Scholar
Levy, T., Silber-Varod, V., Moyal, A.: The effect of pitch, intensity and pause duration in punctuation detection. In: 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel (IEEEI), pp. 1–4. IEEE (2012)
Google Scholar
Ling, W., Trancoso, I., Dyer, C., Black, A.W.: Character-based neural machine translation. CoRR abs/1511.04586 (2015)
Google Scholar
Liu, Y., Chawla, N.V., Harper, M.P., Shriberg, E., Stolcke, A.: A study in machine learning from imbalanced data for sentence boundary detection in speech. Comput. Speech Lang. 20(4), 468–494 (2006)
Article Google Scholar
Lu, W., Ng, H.T.: Better punctuation prediction with dynamic conditional random fields. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 177–186. Association for Computational Linguistics (2010)
Google Scholar
Matusov, E., Mauser, A., Ney, H.: Automatic sentence segmentation and punctuation prediction for spoken language translation. In: International Workshop on Spoken Language Translation (IWSLT) 2006 (2006)
Google Scholar
Miranda, J., Neto, J.P., Black, A.W.: Improved punctuation recovery through combination of multiple speech streams. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). pp. 132–137. IEEE (2013)
Google Scholar
Peitz, S., Freitag, M., Mauser, A., Ney, H.: Modeling punctuation prediction as machine translation. In: International Workshop on Spoken Language Translation (IWSLT) 2011 (2011)
Google Scholar
Schuster, M., Paliwal, K.K., General, A.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Shen, W., Yu, R.P., Seide, F., Wu, J.: Automatic punctuation generation for speech. In: IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009, pp. 586–589. IEEE (2009)
Google Scholar
Theano Development Team: Theano: a Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688, May 2016. http://arxiv.org/abs/1605.02688
Tilk, O., Alumäe, T.: LSTM for punctuation restoration in speech transcripts. In: Proceedings of INTERSPEECH, pp. 683–687 (2015)
Google Scholar
Tilk, O., Alumäe, T.: Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Proceedings of INTERSPEECH, pp. 3047–3051 (2016)
Google Scholar
Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: INTERSPEECH, pp. 3097–3101 (2013)
Google Scholar
Wang, T., Cho, K.: Larger-context language modelling. CoRR abs/1511.03729 (2015). http://arxiv.org/abs/1511.03729

Download references

Acknowledgements

We would like to thank Francesco Barbieri for offering his technical insights throughout this work. This work is part of the KRISTINA project, which has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Grant Agreement number H2020-RIA-645012. The second author is partially funded by the Spanish Ministry of Economy, Industry and Competitiveness through the Ramón y Cajal program.

Author information

Authors and Affiliations

Universitat Pompeu Fabra, Barcelona, Spain
Alp Öktem, Mireia Farrús & Leo Wanner
Catalan Institute for Research and Advanced Studies (ICREA), Barcelona, Spain
Leo Wanner

Authors

Alp Öktem
View author publications
You can also search for this author in PubMed Google Scholar
Mireia Farrús
View author publications
You can also search for this author in PubMed Google Scholar
Leo Wanner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alp Öktem .

Editor information

Editors and Affiliations

University of Le Mans, Le Mans, France
Nathalie Camelin
University of Le Mans, Le Mans, France
Yannick Estève
Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Öktem, A., Farrús, M., Wanner, L. (2017). Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-68456-7_11
Published: 27 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68455-0
Online ISBN: 978-3-319-68456-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Punctuation Restoration System for Slovene Language

Inserting Punctuation to ASR Output in a Real-Time Production Environment

Punctuation Prediction for Chinese Spoken Sentence Based on Model Combination

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Punctuation Restoration System for Slovene Language

Inserting Punctuation to ASR Output in a Real-Time Production Environment

Punctuation Prediction for Chinese Spoken Sentence Based on Model Combination

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation