Singing Synthesis: with a little help from my attention

Angelini, Orazio; Moinet, Alexis; Yanagisawa, Kayoko; Drugman, Thomas

Electrical Engineering and Systems Science > Audio and Speech Processing

arXiv:1912.05881 (eess)

[Submitted on 12 Dec 2019 (v1), last revised 6 May 2020 (this version, v2)]

Title:Singing Synthesis: with a little help from my attention

Authors:Orazio Angelini, Alexis Moinet, Kayoko Yanagisawa, Thomas Drugman

View PDF

Abstract:We present UTACO, a singing synthesis model based on an attention-based sequence-to-sequence mechanism and a vocoder based on dilated causal convolutions. These two classes of models have significantly affected the field of text-to-speech, but have never been thoroughly applied to the task of singing synthesis. UTACO demonstrates that attention can be successfully applied to the singing synthesis field and improves naturalness over the state of the art. The system requires considerably less explicit modelling of voice features such as F0 patterns, vibratos, and note and phoneme durations, than previous models in the literature. Despite this, it shows a strong improvement in naturalness with respect to previous neural singing synthesis models. The model does not require any durations or pitch patterns as inputs, and learns to insert vibrato autonomously according to the musical context. However, we observe that, by completely dispensing with any explicit duration modelling it becomes harder to obtain the fine control of timing needed to exactly match the tempo of a song.

Comments:	Submitted to Interspeech 2020
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:1912.05881 [eess.AS]
	(or arXiv:1912.05881v2 [eess.AS] for this version)
	https://doi.org/10.48550/arXiv.1912.05881

Submission history

From: Orazio Angelini [view email]
[v1] Thu, 12 Dec 2019 11:17:30 UTC (255 KB)
[v2] Wed, 6 May 2020 12:12:45 UTC (344 KB)

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Singing Synthesis: with a little help from my attention

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Audio and Speech Processing

Title:Singing Synthesis: with a little help from my attention

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators