Diversify Your Datasets: Analyzing Generalization via Controlled Variance in Adversarial Datasets

Rozen, Ohad; Shwartz, Vered; Aharoni, Roee; Dagan, Ido

Computer Science > Computation and Language

arXiv:1910.09302 (cs)

[Submitted on 21 Oct 2019]

Title:Diversify Your Datasets: Analyzing Generalization via Controlled Variance in Adversarial Datasets

Authors:Ohad Rozen, Vered Shwartz, Roee Aharoni, Ido Dagan

View PDF

Abstract:Phenomenon-specific "adversarial" datasets have been recently designed to perform targeted stress-tests for particular inference types. Recent work (Liu et al., 2019a) proposed that such datasets can be utilized for training NLI and other types of models, often allowing to learn the phenomenon in focus and improve on the challenge dataset, indicating a "blind spot" in the original training data. Yet, although a model can improve in such a training process, it might still be vulnerable to other challenge datasets targeting the same phenomenon but drawn from a different distribution, such as having a different syntactic complexity level. In this work, we extend this method to drive conclusions about a model's ability to learn and generalize a target phenomenon rather than to "learn" a dataset, by controlling additional aspects in the adversarial datasets. We demonstrate our approach on two inference phenomena - dative alternation and numerical reasoning, elaborating, and in some cases contradicting, the results of Liu et al.. Our methodology enables building better challenge datasets for creating more robust models, and may yield better model understanding and subsequent overarching improvements.

Comments:	CoNLL 2019
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1910.09302 [cs.CL]
	(or arXiv:1910.09302v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1910.09302

Submission history

From: Ohad Rozen [view email]
[v1] Mon, 21 Oct 2019 12:34:53 UTC (1,742 KB)

Computer Science > Computation and Language

Title:Diversify Your Datasets: Analyzing Generalization via Controlled Variance in Adversarial Datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Diversify Your Datasets: Analyzing Generalization via Controlled Variance in Adversarial Datasets

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators