Skip to main content

Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10583))

Included in the following conference series:

Abstract

Until very recently, the generation of punctuation marks for automatic speech recognition (ASR) output has been mostly done by looking at the syntactic structure of the recognized utterances. Prosodic cues such as breaks, speech rate, pitch intonation that influence placing of punctuation marks on speech transcripts have been seldom used. We propose a method that uses recurrent neural networks, taking prosodic and lexical information into account in order to predict punctuation marks for raw ASR output. Our experiments show that an attention mechanism over parallel sequences of prosodic cues aligned with transcribed speech improves accuracy of punctuation generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.ted.com.

References

  1. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014). http://arxiv.org/abs/1409.0473

  2. Ballesteros, M., Wanner, L.: A neural network architecture for multilingual punctuation generation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)

    Google Scholar 

  3. Baron, D., Shriberg, E., Stolcke, A.: Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues. Channels 20(61), 41 (2002)

    Google Scholar 

  4. Batista, F., Moniz, H., Trancoso, I., Mamede, N.: Bilingual experiments on automatic recovery of capitalization and punctuation of automatic speech transcripts. IEEE Trans. Audio Speech Lang. Process. 20(2), 474–485 (2012)

    Article  Google Scholar 

  5. Boersma, P., Weenink, D.: Praat: doing phonetics by computer [computer program] (2016). http://www.praat.org/

  6. Cho, E., Niehues, J., Kilgour, K., Waibel, A.: Punctuation insertion for real-time spoken language translation. In: Proceedings of the Eleventh International Workshop on Spoken Language Translation (2015)

    Google Scholar 

  7. Cho, E., Niehues, J., Waibel, A.: Segmentation and punctuation prediction in speech language translation using a monolingual translation system. In: International Workshop on Spoken Language Translation (IWSLT) 2012 (2012)

    Google Scholar 

  8. Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014). http://arxiv.org/abs/1406.1078

  9. Christensen, H., Gotoh, Y., Renals, S.: Punctuation annotation using statistical prosody models. In: Proceedings of the ISCA Workshop on Prosody in Speech Recognition and Understanding, pp. 35–40 (2001)

    Google Scholar 

  10. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). http://dl.acm.org/citation.cfm?id=1953048.2021068

    MathSciNet  MATH  Google Scholar 

  11. Dyer, C., Ballesteros, M., Ling, W., Matthews, A., Smith, N.A.: Transition-based dependency parsing with stack long short-term memory. CoRR abs/1505.08075 (2015). http://arxiv.org/abs/1505.08075

  12. Farrús, M., Lai, C., Moore, J.D.: Paragraph-based prosodic cues for speech synthesis applications. In: Proceedings of the 8th International Conference on Speech Prosody (2016)

    Google Scholar 

  13. Favre, B., Grishman, R., Hillard, D., Ji, H., Hakkani-Tur, D., Ostendorf, M.: Punctuating speech for information extraction. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, pp. 5013–5016. IEEE (2008)

    Google Scholar 

  14. Fung, J.G., Hakkani-Tür, D., Magimai-Doss, M., Shriberg, E., Cuendet, S., Mirghafori, N.: Cross-linguistic analysis of prosodic features for sentence segmentation. In: Eighth Annual Conference of the International Speech Communication Association (2007)

    Google Scholar 

  15. Hillard, D., Huang, Z., Ji, H., Grishman, R., Hakkani-Tur, D., Harper, M., Ostendorf, M., Wang, W.: Impact of automatic comma prediction on POS/name tagging of speech. In: Spoken Language Technology Workshop, pp. 58–61. IEEE (2006)

    Google Scholar 

  16. Jakubícek, M., Horák, A.: Punctuation detection with full syntactic parsing. Spec. Issue: Nat. Lang. Process. Appl. 46, 335–346 (2010)

    Google Scholar 

  17. Khomitsevich, O., Chistikov, P., Krivosheeva, T., Epimakhova, N., Chernykh, I.: Combining prosodic and lexical classifiers for two-pass punctuation detection in a Russian ASR system. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS, vol. 9319, pp. 161–169. Springer, Cham (2015). doi:10.1007/978-3-319-23132-7_20

    Chapter  Google Scholar 

  18. Kolář, J., Lamel, L.: Development and evaluation of automatic punctuation for French and English speech-to-text. In: Proceedings of INTERSPEECH, pp. 1376–1379 (2012)

    Google Scholar 

  19. Kolář, J., Švec, J., Psutka, J.: Automatic punctuation annotation in Czech broadcast news speech. In: in Proceedings of the SPECOM (2004)

    Google Scholar 

  20. Levy, T., Silber-Varod, V., Moyal, A.: The effect of pitch, intensity and pause duration in punctuation detection. In: 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel (IEEEI), pp. 1–4. IEEE (2012)

    Google Scholar 

  21. Ling, W., Trancoso, I., Dyer, C., Black, A.W.: Character-based neural machine translation. CoRR abs/1511.04586 (2015)

    Google Scholar 

  22. Liu, Y., Chawla, N.V., Harper, M.P., Shriberg, E., Stolcke, A.: A study in machine learning from imbalanced data for sentence boundary detection in speech. Comput. Speech Lang. 20(4), 468–494 (2006)

    Article  Google Scholar 

  23. Lu, W., Ng, H.T.: Better punctuation prediction with dynamic conditional random fields. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 177–186. Association for Computational Linguistics (2010)

    Google Scholar 

  24. Matusov, E., Mauser, A., Ney, H.: Automatic sentence segmentation and punctuation prediction for spoken language translation. In: International Workshop on Spoken Language Translation (IWSLT) 2006 (2006)

    Google Scholar 

  25. Miranda, J., Neto, J.P., Black, A.W.: Improved punctuation recovery through combination of multiple speech streams. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). pp. 132–137. IEEE (2013)

    Google Scholar 

  26. Peitz, S., Freitag, M., Mauser, A., Ney, H.: Modeling punctuation prediction as machine translation. In: International Workshop on Spoken Language Translation (IWSLT) 2011 (2011)

    Google Scholar 

  27. Schuster, M., Paliwal, K.K., General, A.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  28. Shen, W., Yu, R.P., Seide, F., Wu, J.: Automatic punctuation generation for speech. In: IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009, pp. 586–589. IEEE (2009)

    Google Scholar 

  29. Theano Development Team: Theano: a Python framework for fast computation of mathematical expressions. arXiv e-prints abs/1605.02688, May 2016. http://arxiv.org/abs/1605.02688

  30. Tilk, O., Alumäe, T.: LSTM for punctuation restoration in speech transcripts. In: Proceedings of INTERSPEECH, pp. 683–687 (2015)

    Google Scholar 

  31. Tilk, O., Alumäe, T.: Bidirectional recurrent neural network with attention mechanism for punctuation restoration. In: Proceedings of INTERSPEECH, pp. 3047–3051 (2016)

    Google Scholar 

  32. Ueffing, N., Bisani, M., Vozila, P.: Improved models for automatic punctuation prediction for spoken and written text. In: INTERSPEECH, pp. 3097–3101 (2013)

    Google Scholar 

  33. Wang, T., Cho, K.: Larger-context language modelling. CoRR abs/1511.03729 (2015). http://arxiv.org/abs/1511.03729

Download references

Acknowledgements

We would like to thank Francesco Barbieri for offering his technical insights throughout this work. This work is part of the KRISTINA project, which has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Grant Agreement number H2020-RIA-645012. The second author is partially funded by the Spanish Ministry of Economy, Industry and Competitiveness through the Ramón y Cajal program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alp Öktem .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Öktem, A., Farrús, M., Wanner, L. (2017). Attentional Parallel RNNs for Generating Punctuation in Transcribed Speech. In: Camelin, N., Estève, Y., Martín-Vide, C. (eds) Statistical Language and Speech Processing. SLSP 2017. Lecture Notes in Computer Science(), vol 10583. Springer, Cham. https://doi.org/10.1007/978-3-319-68456-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68456-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68455-0

  • Online ISBN: 978-3-319-68456-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics