Abstract
There is a strong need for industrial recommender systems to output an integrated ranking of items from different categories, such as video and news, to maximize overall user satisfaction. Integrated ranking faces two critical challenges. First, there is no universal metric to evaluate the contribution of each item due to the huge discrepancies between items. Second, user’s short-term preference may shift fast between diverse items during her interaction with the recommender system. To address the above challenges, we propose a reinforcement learning (RL) based framework called RLMixer to approach the sequential integrated ranking problem. Benefiting from the credit assignment mechanism, RLMixer can decompose the overall user satisfaction to items of different categories, so that they are comparable. To capture the user’s short-term preference, RLMixer explicitly learns user interest vectors by a carefully designed contrastive loss. In addition, RLMixer is trained in a fully offline manner for the convenience in industrial applications. We show that RLMixer significantly outperforms various baselines on both public PRM datasets and industrial datasets collected from a widely used AppStore. We also conduct online A/B tests on millions of users through the AppStore. The results show that RLMixer brings over 4% significant revenue gain.
J. Wang and M. Zhao—The first two authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336 (1998)
Chen, M., Beutel, A., Covington, P., Jain, S., Belletti, F., Chi, E.H.: Top-k off-policy correction for a reinforce recommender system. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. pp. 456–464 (2019)
Fu, M., Agrawal, A., Irissappane, A.A., Zhang, J., Huang, L., Qu, H.: Deep reinforcement learning framework for category-based item recommendation. IEEE Transactions on Cybernetics (2021)
Fujimoto, S., Meger, D., Precup, D.: Off-policy deep reinforcement learning without exploration. In: International Conference on Machine Learning, pp. 2052–2062 (2019)
Geyik, S.C., Ambler, S., Kenthapadi, K.: Fairness-aware ranking in search & recommendation systems with application to linkedin talent search. In: Proceedings of the 25th acm sigkdd international conference on knowledge discovery & data mining, pp. 2221–2231 (2019)
Guanjie, Z., et al.: Drn: A deep reinforcement learning framework for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 167–176 (2018)
Koutsopoulos, I.: Optimal advertisement allocation in online social media feeds. In: Proceedings of the 8th ACM International Workshop on Hot Topics in Planet-Scale Mobile Computing and Online Social Networking, pp. 43–48 (2016)
Kumar, A., Fu, J., Soh, M., Tucker, G., Levine, S.: Stabilizing off-policy q-learning via bootstrapping error reduction. In: Advances in Neural Information Processing Systems, pp. 11761–11771 (2019)
Kumar, A., Zhou, A., Tucker, G., Levine, S.: Conservative q-learning for offline reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 1179–1191 (2020)
Levine, S., Kumar, A., Tucker, G., Fu, J.: Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643 (2020)
Liao, G., et al.: Cross dqn: Cross deep q network for ads allocation in feed. arXiv preprint arXiv:2109.04353 (2021)
Matsushima, T., Furuta, H., Matsuo, Y., Nachum, O., Gu, S.S.: Deployment-efficient reinforcement learning via model-based offline optimization. In: International Conference on Learning Representations (2021)
Pei, C., et al.: Personalized re-ranking for recommendation. In: Proceedings of the 13th ACM Conference on Recommender Systems, pp. 3–11 (2019)
Wu, Y., Tucker, G., Nachum, O.: Behavior regularized offline reinforcement learning. arXiv preprint arXiv:1911.11361 (2019)
Xiao, T., Wang, D.: A general offline reinforcement learning framework for interactive recommendation. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence (2021)
Yan, J., Xu, Z., Tiwana, B., Chatterjee, S.: Ads allocation in feed via constrained optimization. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3386–3394 (2020)
Zhao, M., Li, Z., Bo, A., Haifeng, L., Yifan, Y., Chen, C.: Impression allocation for combating fraud in e-commerce via deep reinforcement learning with action norm penalty. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence, pp. 3940–3946 (2018)
Zhao, X., Gu, C., Zhang, H., Yang, X., Liu, X., Tang, J., Liu, H.: Dear: Deep reinforcement learning for online advertising impression in recommender systems. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence, pp. 750–758 (2021)
Zhou, G., et al.: Deep interest evolution network for click-through rate prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 5941–5948 (2019)
Zhou, G., et al.: Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1059–1068 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, J. et al. (2023). RLMixer: A Reinforcement Learning Approach for Integrated Ranking with Contrastive User Preference Modeling. In: Kashima, H., Ide, T., Peng, WC. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2023. Lecture Notes in Computer Science(), vol 13937. Springer, Cham. https://doi.org/10.1007/978-3-031-33380-4_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-33380-4_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33379-8
Online ISBN: 978-3-031-33380-4
eBook Packages: Computer ScienceComputer Science (R0)