skip to main content
10.1145/2766462.2767788acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

On the Reusability of Open Test Collections

Published: 09 August 2015 Publication History

Abstract

Creating test collections for modern search tasks is increasingly more challenging due to the growing scale and dynamic nature of content, and need for richer contextualization of the statements of request. To address these issues, the TREC Contextual Suggestion Track explored an open test collection, where participants were allowed to submit any web page as a result for a personalized venue recommendation task. This prompts the question on the reusability of the resulting test collection: How does the open nature affect the pooling process? Can participants reliably evaluate variant runs with the resulting qrels? Can other teams evaluate new runs reliably? In short, does the set of pooled and judged documents effectively produce a post hoc test collection? Our main findings are the following: First, while there is a strongly significant rank correlation, the effect of pooling is notable and results in underestimation of performance, implying the evaluation of non-pooled systems should be done with great care. Second, we extensively analyze impacts of open corpus on the fraction of judged documents, explaining how low recall affects the reusability, and how the personalization and low pooling depth aggravate that problem. Third, we outline a potential solution by deriving a fixed corpus from open web submissions.

References

[1]
C. Buckley, D. Dimmick, I. Soboroff, and E. Voorhees. Bias and the limits of pooling for large collections. Information retrieval, 10(6):491--508, 2007.
[2]
B. Carterette. On rank correlation and the distance between rankings. In SIGIR, pages 436--443, 2009.
[3]
G. V. Cormack and T. R. Lynam. Power and bias of subset pooling strategies. In SIGIR, pages 837--838, 2007.
[4]
E. M. Voorhees. Evaluation by highly relevant documents. In SIGIR, SIGIR '01, pages 74--82. ACM, 2001.
[5]
E. M. Voorhees, J. Lin, and M. Efron. On run diversity inevaluation as a service. In SIGIR, pages 959--962, 2014.
[6]
E. Yilmaz, J. A. Aslam, and S. Robertson. A new rank correlation coefficient for information retrieval. In SIGIR, pages 587--594, 2008.
[7]
J. Zobel. How reliable are the results of large-scale information retrieval experiments? In SIGIR, pages 307--314,1998

Cited By

View all
  • (2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
  • (2022)Toward Cranfield-inspired reusability assessment in interactive information retrieval evaluationInformation Processing & Management10.1016/j.ipm.2022.10300759:5(103007)Online publication date: Sep-2022
  • (2017)On the Reusability of Personalized Test CollectionsAdjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization10.1145/3099023.3099044(185-189)Online publication date: 9-Jul-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
August 2015
1198 pages
ISBN:9781450336215
DOI:10.1145/2766462
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 August 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. contextual suggestion
  2. evaluation
  3. test collection reusability

Qualifiers

  • Short-paper

Funding Sources

Conference

SIGIR '15
Sponsor:

Acceptance Rates

SIGIR '15 Paper Acceptance Rate 70 of 351 submissions, 20%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Formally Modeling Users in Information RetrievalA Behavioral Economics Approach to Interactive Information Retrieval10.1007/978-3-031-23229-9_2(23-64)Online publication date: 18-Feb-2023
  • (2022)Toward Cranfield-inspired reusability assessment in interactive information retrieval evaluationInformation Processing & Management10.1016/j.ipm.2022.10300759:5(103007)Online publication date: Sep-2022
  • (2017)On the Reusability of Personalized Test CollectionsAdjunct Publication of the 25th Conference on User Modeling, Adaptation and Personalization10.1145/3099023.3099044(185-189)Online publication date: 9-Jul-2017
  • (2016)Effects of Position and Time Bias on Understanding Onsite Users' BehaviorProceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval10.1145/2854946.2855004(277-280)Online publication date: 13-Mar-2016
  • (2016)Contextual Search and ExplorationInformation Retrieval10.1007/978-3-319-41718-9_1(3-23)Online publication date: 26-Jul-2016

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media