skip to main content
10.1145/3397271.3401096acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Agreement and Disagreement between True and False-Positive Metrics in Recommender Systems Evaluation

Published: 25 July 2020 Publication History

Abstract

False-positive metrics can capture an important side of recommendation quality, focusing on the impact of suggestions that are disliked by users, as a complement of common metrics that only measure the amount of successful recommendations. In this paper we research the extent to which false-positive metrics agree or disagree with true-positive metrics in the offline evaluation of recommender systems. We discover a surprising degree of systematic disagreement that was occasionally noted but not explained in the literature by previous authors. We find an explanation for the discrepancy be-tween the metrics in the effect of popularity biases, which impact false and true-positive metrics in very different ways: instead of rewarding the recommendation of popular items, as with true-positive, false-positive metrics penalize the popular. We determine precise conditions and cases in the general trends, with a formal explanation for our findings, which we confirm and illustrate empirically in experiments with different datasets.

Supplementary Material

MP4 File (3397271.3401096.mp4)
We present some effects of popularity biases in recommender systems evaluation for true and false-positive metrics. First, we discuss about a different perspective for evaluation; from the point of view of the count of false-positives. Second, we describe the computation of precision and anti-precision under incomplete (MNAR) and complete relevance knowledge (MAR). Third, we illustrate the effects of popularity biases using Movielens and CM100K datasets. Finally, we summarize the presentation, present some conclusions and describe the other findings we explained further in our paper.

References

[1]
G. Adomavicius and A. Tuzhilin. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng., 17, 6 (Jun. 2005). IEEE, Piscataway, NJ, USA, 734--749.
[2]
R. F. Baumeister, E. Bratslavsky, C. Finkenauer and K. D. Vohs. Bad is Stronger than Good. Review of General Psychology, 5, 4 (December 2001). American Psychological Association, Wasnington, D.C., USA, 323--370.
[3]
A. Bellogín, P. Castells and I. Cantador. Statistical Biases in Information Re-trieval Metrics for Recommender Systems. Information Retrieval 20, 6 (Jul. 2017). Springer, Dordrecht, Netherlands, 606--634.
[4]
A. Bellogín, P. Castells and I. Cantador. Precision-Oriented Evaluation of Rec-ommender Systems: An Algorithmic Comparison. In Proc. of the 5th ACM Conf. on Recommender Systems (RecSys 2011). ACM, New York, NY, USA, 333--336.
[5]
B. Brost, R. Mehrotra and T. Jehan. The Music Streaming Sessions Dataset. In Proc. of The World Wide Web Conference (TheWebConf 2019). ACM, New York, NY, USA, 2594--2600.
[6]
C. Buckley and E. M. Voorhees. Retrieval Evaluation with Incomplete Infor-mation. In Proc. of the 27th Annual Int. ACM SIGIR Conf. on Research and Devel-opment in Information Retrieval (SIGIR 2004). ACM, New York, NY, USA, 25--32.
[7]
R. Cañamares and P. Castells. Should I Follow the Crowd? A Probabilistic Anal-ysis of the Effectiveness of Popularity in Recommender Systems. In Proc. of the 41st Annual Int. ACM SIGIR Conference on Research and Development in In-formation Retrieval (SIGIR 2018). ACM, New York, NY, USA, 415--424.
[8]
R. Cañamares and P. Castells. A Probabilistic Reformulation of Memory-Based Collaborative Filtering -- Implications on Popularity Biases. In Proc. of the 40th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2017). ACM, New York, NY, USA, 215--224.
[9]
P. Castells and R. Cañamares. Characterization of Fair Experiments for Recommender System Evaluation -- A Formal Analysis. In Proc. of the Workshop on Offline Evaluation for Recommender Systems (REVEAL 2018) at the 12th ACM Conference on Recommender Systems (RecSys 2018).
[10]
P. Castells, N. J. Hurley and S. Vargas. Novelty and Diversity in Recommender Systems. In Recommender Systems Handbook, 2nd ed., F. Ricci, L. Rokach and B. Shapira (Eds.). Springer, New York, NY, USA, 2015, 881--918.
[11]
P. Y. K. Chau, S. Y. Ho, K. K. W. Ho and Y. Yao. Examining the effects of malfunc-tioning personalized services on online users' distrust and behaviors. Decision Support Systems 56 (Dec. 2013). Elsevier, Amsterdam, Netherlands, 180--191.
[12]
P. Cremonesi, F. Garzotto, S. Negro, A. V. Papadopoulos and R. Turrin. Looking for "Good" Recommendations: A Comparative Evaluation of Recommender Systems. In Proc. of Human-Computer Interaction -- INTERACT 2013 -- 14th International Conference (Interact 2013). Springer, New York, NY, USA, 152--168.
[13]
P. Cremonesi, F. Garzotto and R. Turrin. User-Centric vs. System-Centric Eval-uation of Recommender Systems. In Proc. of Human-Computer Interaction -- INTERACT 2013 -- 14th IFIP TC 13 International Conference (Interact 2013). Springer, New York, NY, USA, 334--351.
[14]
C. Elkan. The foundations of cost-sensitive learning. In Proc. of the 17th Inter-national Joint Conference of Artificial Intelligence (IJCAI 2001). Morgan Kauf-mann, Burlington, MA, USA, 973--978.
[15]
B. Fields. Contextualize Your Listening: The Playlist as Recommendation Engine. Doctoral thesis, Goldsmiths, University of London, 2011.
[16]
D. Fleder and K. Hosanagar. Blockbuster culture's next rise or fall: The impact of recommender systems on sales diversity. Management Science 55, 5 (May 2009). Informs, Catonsville, MD, USA, 697--712.
[17]
E. Frolov and I. Oseledets. Fifty Shades of Ratings: How to Benefit from a Nega-tive Feedback in Top-N Recommendations Tasks. In Proc. of the 10th ACM Con-ference on Recommender Systems (RecSys 2016). ACM, New York, NY, USA, 91--98.
[18]
Z. Gantner, L. Drumond, C. Freudenthaler and L. Schmidt-Thieme. Personalized Ranking for Non-Uniformly Sampled Items. In Proc. of the International Conference on KDD Cup 2011 (KDDCUP 2011). JLMR.org, 231--247.
[19]
A. Germain and J. Chakareski. Spotify Me: Facebook-assisted automatic playlist generation. In Proc. of the IEEE 15th International Workshop on Multimedia Signal Processing (MMSP 2013). IEEE Press, Piscataway, NJ, USA, 25--28.
[20]
P. Gopalan, J. M. Hofman and D. M. Blei. Scalable Recommendation with Pois-son Factorization. In Proc. of the 31st Conference on Uncertainty in Artificial In-telligence (UAI 2015). AUAI Press, Arlington, Virginia, USA, 326--335.
[21]
G. Guo, J. Zhang, Z. Sun and N. Yorke-Smith. LibRec: A Java Library for Recommender Systems. In Posters, Demos, Late-breaking Results and Workshop Proc. of the 23rd Conf. on User Modelling, Adaptation and Personalization (UMAP 2015).
[22]
X. He, H. Zhang, M.-Y. Kan and T.-S. Chua. Fast Matrix Factorization for Online Recommendation with Implicit Feedback. In Proc. of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval (SIGIR 2016). ACM, New York, NY, USA, 549--558.
[23]
J. L. Herlocker, J. A. Konstan, L. G. Terveen and J. T. Riedl. Evaluating Collabora-tive Filtering Recommender Systems. ACM Transactions on Information Systems 22, 1 (Jan. 2004). ACM, New York, NY, USA, 5--53.
[24]
T. Hofmann. Latent Semantic Models for Collaborative Filtering. ACM Transactions on Information Systems 22, 1 (Jan. 2004). ACM, New York, NY, USA.
[25]
Y. Hu, Y. Koren and C. Volinsky. Collaborative Filtering for Implicit Feedback Datasets. In Proc. of the 8th IEEE International Conference on Data Mining (ICDM 2008). IEEE Computer Society, Washington, DC, USA, 15--19.
[26]
M. Jahrer and A. Töscher. Collaborative filtering ensemble for ranking. In Proc. of the International Conf. on KDD Cup 2011 (KDDCUP 2011). JMLR.org, 153--167.
[27]
D. Jannach, L. Lerche, I. Kamehkhosh, and M. Jugovac. What Recommenders Recommend: an Analysis of Recommendation Biases and Possible Counter-measures. User Modeling and User-Adapted Interaction 25, 5 (Dec. 2015). Springer, Dordrecht, Netherlands, 427--491.
[28]
Y. Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proc. of the 14th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2008). ACM, New York, NY, USA, 426--434.
[29]
A. Lipani, M. Lupu and A. Hanbury. Splitting Water: Precision and Anti-Precision to Reduce Pool Bias. In Proc. of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2015). ACM, New York, NY, USA, 103--112.
[30]
B. M. Marlin and R. S. Zemel. Collaborative Prediction and Ranking with Non-random Missing Data. In Proc. of the 3rd ACM Conference on Recommender Systems (RecSys 2009). ACM, New York, NY, USA, 5--12.
[31]
F. M. Maxwell and J. A. Konstan. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems, 5, 4 (December 2015).
[32]
S. M. McNee, J. Riedl, J. A. Konstan. Being Accurate is not enough: How Accura-cy Metrics Have Hurt Recommender Systems. In Proc. of ACM CHI 2006 Conference on Human Factors in Computing Systems (CHI 2006). ACM, New York, NY, USA, 1097--1101.
[33]
X. Ning, C. Desrosiers and G. Karypis. A Comprehensive Survey of Neighbor-hood-Based Recommender Systems. In Recommender Systems Handbook, 2nd ed., F. Ricci, L. Rokach and B. Shapira (Eds.). Springer, New York, NY, USA, 37--76.
[34]
X. Ning and G. Karypis. SLIM: Sparse Linear Methods for Top-N Recommender Systems. In Proc. of the IEEE 11th International Conference on Data Mining (ICDM 2011). IEEE Computer Society, Washington, DC, USA, 497--506.
[35]
E. Pampalk, T. Pohle and G. Widmer Dynamic playlist generation based on skip-ping behavior. In Proc. of the 6th International Conference on Music Information Retrieval (ISMIR 2005), 634--637.
[36]
W. Pan and L. Chen. GBPR: Group Preference Based Bayesian Personalized Ranking for One-Class Collaborative Filtering. In Proc. of the 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013). AAAI Press, 2691--2697.
[37]
L. A. S. Pizzato, T. Rej, J. Akehurst, I. Koprinska, K. Yacef and J. Kay. Recommending people to people: the nature of reciprocal recommenders with a case study in online dating. User Modeling and User-Adapted Interaction 23, 5 (Nov. 2013). Springer, New York, NY, USA, 447--488.
[38]
F. Provost and T. Fawcett. Robust classification for imprecise environments. Machine Learning 42, 3 (Mar. 2001). Springer, New York, NY, USA, 203--231.
[39]
S. E. Robertson. The Probability Ranking in IR. Journal of Documentation 33, 4 (Jan. 1977), 294--304.
[40]
T. Sakai. Alternatives to Bpref. In Proc. of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007). ACM, New York, NY, USA, 71--78.
[41]
P. Sánchez and A. Bellogín. Measuring anti-relevance: a study on when recommendation algorithms produce bad suggestions. In Proc. of the 12th ACM Conf. on Recommender Systems (RecSys 2018). ACM, New York, NY, USA, 367--371.
[42]
G. Shani and A. Gunawardana, Evaluating Recommendation Systems. In Rec-ommender Systems Handbook, 2nd ed., F. Ricci, L. Rokach and B. Shapira (Eds.). Springer, New York, NY, USA, 265--308.
[43]
Y. Shi, M. Larson and A. Hanjalic. List-wise learning to rank with matrix factorization for collaborative filtering. In Proc. of the 4th ACM conference on Recommender systems (RecSys 2010). ACM, New York, NY, USA, 269--272.
[44]
T. Schnabel, A. Swaminathan, A. Singh, N. Chandak and T. Joachims. 2016. Recommendations as Treatments: Debiasing Learning and Evaluation. In Proc. of the 33rd International Conference on Machine Learning (ICML 2016). Proc. of Machine Learning Research, Sheffield, UK, 1670--1679.
[45]
H. Steck. Training and Testing of Recommender Systems on Data Missing not at Random. In Proc. of the 16th ACM SIGKDD Int. Conference on Knowledge Discovery and Data Mining (KDD 2010). ACM, New York, NY, USA, 713--722.
[46]
H. Steck. Item Popularity and Recommendation Accuracy. In Proc. of the 5th ACM Conference on Recommender Systems (RecSys 2011). ACM, New York, NY, USA, 125--132.
[47]
H. Steck. Evaluation of recommendations: rating prediction and ranking. In Proceedings of the 7th ACM Conference on Recommender Systems (RecSys 2013). ACM, New York, NY, USA, 213--220.
[48]
K. Wang, T. Walker and Z. Zheng. PSkip: estimating relevance ranking quality from web search click through data. In Proc. of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD 2019). ACM, New York, NY, USA, 1355--1364.
[49]
L. Yang, Y. Cui, Y. Xuan, C. Wang, S. Belongie and D. Estrin. Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback. In Proc. of the 12th ACM Conference on Recommender Systems (RecSys 2018). ACM, New York, NY, USA, 279--287.
[50]
E. Yilmaz and J. A. Aslam. Estimating average precision with incomplete and imperfect judgments. In Proc. of the 15th ACM International Conf. on Information and Knowledge Management (CIKM 2006). ACM, New York, NY, USA, 102--111.
[51]
D. Yin, S. D. Bond and H. Zhang. Are bad reviews always stronger than good? Asymmetric negativity bias in the formation of online consumer trust. In Proc. of the 31st International Conference on Information Systems (ICIS 2010). Association for Information Systems, pp. 1--18.
[52]
Z. Yuan and E. Oja. Projective Nonnegative Matrix Factorization for Image Com-pression and Feature Extraction. In Proc. of the 14th Scandinavian conference on Image Analysis (SCIA 2005). Springer-Verlag, Berlin, Heidelberg, 333--342.
[53]
C. Zhai and J. Lafferty. A risk minimization framework for information re-trieval. Information Processing and Management 42, 1 (Jan. 2006). Pergamon Press, Inc. Elmsford, NY, USA, 31--55.

Cited By

View all
  • (2024)Exploring the Landscape of Recommender Systems Evaluation: Practices and PerspectivesACM Transactions on Recommender Systems10.1145/36291702:1(1-31)Online publication date: 7-Mar-2024
  • (2024)Negative Feedback for Music PersonalizationProceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3627043.3659553(195-200)Online publication date: 22-Jun-2024
  • (2024)A survey on popularity bias in recommender systemsUser Modeling and User-Adapted Interaction10.1007/s11257-024-09406-0Online publication date: 1-Jul-2024
  • Show More Cited By

Index Terms

  1. Agreement and Disagreement between True and False-Positive Metrics in Recommender Systems Evaluation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGIR '20: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
      July 2020
      2548 pages
      ISBN:9781450380164
      DOI:10.1145/3397271
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 25 July 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. evaluation
      2. false positives
      3. metrics
      4. non-random missing data
      5. popularity bias
      6. recommender systems

      Qualifiers

      • Research-article

      Funding Sources

      • Australian Technology Network

      Conference

      SIGIR '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 792 of 3,983 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)19
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 14 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Exploring the Landscape of Recommender Systems Evaluation: Practices and PerspectivesACM Transactions on Recommender Systems10.1145/36291702:1(1-31)Online publication date: 7-Mar-2024
      • (2024)Negative Feedback for Music PersonalizationProceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3627043.3659553(195-200)Online publication date: 22-Jun-2024
      • (2024)A survey on popularity bias in recommender systemsUser Modeling and User-Adapted Interaction10.1007/s11257-024-09406-0Online publication date: 1-Jul-2024
      • (2023)Continuous Integration and Delivery Practices for Cyber-Physical Systems: An Interview-Based StudyACM Transactions on Software Engineering and Methodology10.1145/357185432:3(1-44)Online publication date: 26-Apr-2023
      • (2023)A Critical Study on Data Leakage in Recommender System Offline EvaluationACM Transactions on Information Systems10.1145/356993041:3(1-27)Online publication date: 7-Feb-2023
      • (2023)Reconciling the Quality vs Popularity Dichotomy in Online Cultural MarketsACM Transactions on Information Systems10.1145/353079041:1(1-34)Online publication date: 9-Jan-2023
      • (2023)Simulation of Database Interactions for Early Validation of Digitized Enterprise ProcessesProcedia Computer Science10.1016/j.procs.2023.01.336219:C(658-665)Online publication date: 1-Jan-2023
      • (2022)A Revisiting Study of Appropriate Offline Evaluation for Top-N Recommendation AlgorithmsACM Transactions on Information Systems10.1145/354579641:2(1-41)Online publication date: 21-Dec-2022
      • (2022)Exploiting Negative Preference in Content-based Music Recommendation with Contrastive LearningProceedings of the 16th ACM Conference on Recommender Systems10.1145/3523227.3546768(229-236)Online publication date: 12-Sep-2022
      • (2021)Quantification of the Impact of Popularity Bias in Multi-stakeholder and Time-Aware EnvironmentsAdvances in Bias and Fairness in Information Retrieval10.1007/978-3-030-78818-6_8(78-91)Online publication date: 25-Jun-2021

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media