Co-Separating Sounds of Visual Objects

Gao, Ruohan; Grauman, Kristen

Computer Science > Computer Vision and Pattern Recognition

arXiv:1904.07750 (cs)

[Submitted on 16 Apr 2019 (v1), last revised 20 Aug 2019 (this version, v2)]

Title:Co-Separating Sounds of Visual Objects

Authors:Ruohan Gao, Kristen Grauman

View PDF

Abstract:Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel. Current methods for visually-guided audio source separation sidestep the issue by training with artificially mixed video clips, but this puts unwieldy restrictions on training data collection and may even prevent learning the properties of "true" mixed sounds. We introduce a co-separation training paradigm that permits learning object-level sounds from unlabeled multi-source videos. Our novel training objective requires that the deep neural network's separated audio for similar-looking objects be consistently identifiable, while simultaneously reproducing accurate video-level audio tracks for each source training pair. Our approach disentangles sounds in realistic test videos, even in cases where an object was not observed individually during training. We obtain state-of-the-art results on visually-guided audio source separation and audio denoising for the MUSIC, AudioSet, and AV-Bench datasets.

Comments:	ICCV 2019, Project page: this http URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1904.07750 [cs.CV]
	(or arXiv:1904.07750v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1904.07750

Submission history

From: Ruohan Gao [view email]
[v1] Tue, 16 Apr 2019 15:07:50 UTC (2,655 KB)
[v2] Tue, 20 Aug 2019 21:18:03 UTC (2,754 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.MM

< prev | next >

new | recent | 2019-04

Change to browse by:

cs
cs.CV
cs.SD
eess
eess.AS

References & Citations

DBLP - CS Bibliography

listing | bibtex

Ruohan Gao
Kristen Grauman

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Co-Separating Sounds of Visual Objects

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Co-Separating Sounds of Visual Objects

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators