short-paper

On the Reusability of Open Test Collections

Authors:

Julia KiselevaAuthors Info & Claims

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 827 - 830

https://doi.org/10.1145/2766462.2767788

Published: 09 August 2015 Publication History

Get Access

Abstract

Creating test collections for modern search tasks is increasingly more challenging due to the growing scale and dynamic nature of content, and need for richer contextualization of the statements of request. To address these issues, the TREC Contextual Suggestion Track explored an open test collection, where participants were allowed to submit any web page as a result for a personalized venue recommendation task. This prompts the question on the reusability of the resulting test collection: How does the open nature affect the pooling process? Can participants reliably evaluate variant runs with the resulting qrels? Can other teams evaluate new runs reliably? In short, does the set of pooled and judged documents effectively produce a post hoc test collection? Our main findings are the following: First, while there is a strongly significant rank correlation, the effect of pooling is notable and results in underestimation of performance, implying the evaluation of non-pooled systems should be done with great care. Second, we extensively analyze impacts of open corpus on the fraction of judged documents, explaining how low recall affects the reusability, and how the personalization and low pooling depth aggravate that problem. Third, we outline a potential solution by deriving a fixed corpus from open web submissions.

References

[1]

C. Buckley, D. Dimmick, I. Soboroff, and E. Voorhees. Bias and the limits of pooling for large collections. Information retrieval, 10(6):491--508, 2007.

Digital Library

Google Scholar

[2]

B. Carterette. On rank correlation and the distance between rankings. In SIGIR, pages 436--443, 2009.

Digital Library

Google Scholar

[3]

G. V. Cormack and T. R. Lynam. Power and bias of subset pooling strategies. In SIGIR, pages 837--838, 2007.

Digital Library

Google Scholar

[4]

E. M. Voorhees. Evaluation by highly relevant documents. In SIGIR, SIGIR '01, pages 74--82. ACM, 2001.

Digital Library

Google Scholar

[5]

E. M. Voorhees, J. Lin, and M. Efron. On run diversity inevaluation as a service. In SIGIR, pages 959--962, 2014.

Digital Library

Google Scholar

[6]

E. Yilmaz, J. A. Aslam, and S. Robertson. A new rank correlation coefficient for information retrieval. In SIGIR, pages 587--594, 2008.

Digital Library

Google Scholar

[7]

J. Zobel. How reliable are the results of large-scale information retrieval experiments? In SIGIR, pages 307--314,1998

Digital Library

Google Scholar

Cited By

View all

Liu JLiu J(2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
https://doi.org/10.1007/978-3-031-23229-9_2
Liu J(2022)Toward Cranfield-inspired reusability assessment in interactive information retrieval evaluationInformation Processing & Management10.1016/j.ipm.2022.10300759:5(103007)Online publication date: Sep-2022
https://doi.org/10.1016/j.ipm.2022.103007
Hashemi SKamps JBielikova MHerder EDesmarais MCena F(2017)On the Reusability of Personalized Test CollectionsAdjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization10.1145/3099023.3099044(185-189)Online publication date: 9-Jul-2017
https://dl.acm.org/doi/10.1145/3099023.3099044
Show More Cited By

Index Terms

On the Reusability of Open Test Collections
1. Information systems
  1. Information retrieval

Recommendations

Measuring the reusability of test collections
WSDM '10: Proceedings of the third ACM international conference on Web search and data mining

While test collection construction is a time-consuming and expensive process, the true cost is amortized by reusing the collection over hundreds or thousands of experiments. Some of these experiments may involve systems that retrieve documents not ...
On the Reusability of Personalized Test Collections
UMAP '17: Adjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization

Test collections for offline evaluation remain crucial for information retrieval research and industrial practice, yet reusability of test collections is under threat by different factors such as dynamic nature of data collections and new trends in ...
Reusable test collections through experimental design
SIGIR '10: Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval

Portable, reusable test collections are a vital part of research and development in information retrieval. Reusability is difficult to assess, however. The standard approach--simulating judgment collection when groups of systems are held out, then ...

Comments

Information & Contributors

Information

Published In

SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

August 2015

1198 pages

ISBN:9781450336215

DOI:10.1145/2766462

General Chair:
Ricardo Baeza-Yates
Yahoo Labs, USA
,
Program Chairs:
Mounia Lalmas
Yahoo Labs, UK
,
Alistair Moffat
University of Melbourne, Australia
,
Berthier Ribeiro-Neto
Google, Brazil, and UFMG, Brazil

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 August 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Seventh Framework Programme

Conference

SIGIR '15

Sponsor:

SIGIR

SIGIR '15: The 38th International ACM SIGIR conference on research and development in Information Retrieval

August 9 - 13, 2015

Santiago, Chile

Acceptance Rates

SIGIR '15 Paper Acceptance Rate 70 of 351 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
188
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Liu JLiu J(2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
https://doi.org/10.1007/978-3-031-23229-9_2
Liu J(2022)Toward Cranfield-inspired reusability assessment in interactive information retrieval evaluationInformation Processing & Management10.1016/j.ipm.2022.10300759:5(103007)Online publication date: Sep-2022
https://doi.org/10.1016/j.ipm.2022.103007
Hashemi SKamps JBielikova MHerder EDesmarais MCena F(2017)On the Reusability of Personalized Test CollectionsAdjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization10.1145/3099023.3099044(185-189)Online publication date: 9-Jul-2017
https://dl.acm.org/doi/10.1145/3099023.3099044
Hashemi SHupperetz WKamps Jvan der Vaart MKelly DCapra RBelkin NTeevan JVakkari P(2016)Effects of Position and Time Bias on Understanding Onsite Users' BehaviorProceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval10.1145/2854946.2855004(277-280)Online publication date: 13-Mar-2016
https://dl.acm.org/doi/10.1145/2854946.2855004
Kiseleva JKamps JClarke C(2016)Contextual Search and ExplorationInformation Retrieval10.1007/978-3-319-41718-9_1(3-23)Online publication date: 26-Jul-2016
https://doi.org/10.1007/978-3-319-41718-9_1

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Measuring the reusability of test collections

On the Reusability of Personalized Test Collections

Reusable test collections through experimental design