Agreement and Disagreement between True and False-Positive Metrics in Recommender Systems Evaluation

False-positive metrics can capture an important side of recommendation quality, focusing on the impact of suggestions that are disliked by users, as a complement of common metrics that only measure the amount of successful recommendations. In this paper we research the extent to which false-positive metrics agree or disagree with true-positive metrics in the offline evaluation of recommender systems. We discover a surprising degree of systematic disagreement that was occasionally noted but not explained in the literature by previous authors. We find an explanation for the discrepancy be-tween the metrics in the effect of popularity biases, which impact false and true-positive metrics in very different ways: instead of rewarding the recommendation of popular items, as with true-positive, false-positive metrics penalize the popular. We determine precise conditions and cases in the general trends, with a formal explanation for our findings, which we confirm and illustrate empirically in experiments with different datasets.

We present some effects of popularity biases in recommender systems evaluation for true and false-positive metrics. First, we discuss about a different perspective for evaluation; from the point of view of the count of false-positives. Second, we describe the computation of precision and anti-precision under incomplete (MNAR) and complete relevance knowledge (MAR). Third, we illustrate the effects of popularity biases using Movielens and CM100K datasets. Finally, we summarize the presentation, present some conclusions and describe the other findings we explained further in our paper.


      evaluation
      false positives
      metrics
      non-random missing data
      popularity bias
      recommender systems


