skip to main content
10.1145/3511808.3557363acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Open access

MetaTrader: An Reinforcement Learning Approach Integrating Diverse Policies for Portfolio Optimization

Published: 17 October 2022 Publication History

Abstract

Portfolio management is a fundamental problem in finance. It involves periodic reallocations of assets to maximize the expected returns within an appropriate level of risk exposure. Deep reinforcement learning (RL) has been considered a promising approach to solving this problem owing to its strong capability in sequential decision making. However, due to the non-stationary nature of financial markets, applying RL techniques to portfolio optimization remains a challenging problem. Extracting trading knowledge from various expert strategies could be helpful for agents to accommodate the changing markets. In this paper, we propose MetaTrader, a novel two-stage RL-based approach for portfolio management, which learns to integrate diverse trading policies to adapt to various market conditions. In the first stage, MetaTrader incorporates an imitation learning objective into the reinforcement learning framework. Through imitating different expert demonstrations, MetaTrader acquires a set of trading policies with great diversity. In the second stage, MetaTrader learns a meta-policy to recognize the market conditions and decide on the most proper learned policy to follow. We evaluate the proposed approach on three real-world index datasets and compare it to state-of-the-art baselines. The empirical results demonstrate that MetaTrader significantly outperforms those baselines in balancing profits and risks. Furthermore, thorough ablation studies validate the effectiveness of the components in the proposed approach.

Supplementary Material

MP4 File (CIKM22-fp0439.mp4)
In this paper, we propose a novel reinforcement learning approach to solve the portfolio optimization problem by integrating diverse trading policies.

References

[1]
Saud Almahdi and Steve Y Yang. 2017. An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Systems with Applications 87 (2017), 267--279.
[2]
Benjamin Balaguer and Stefano Carpin. 2011. Combining imitation and reinforce- ment learning to fold deformable planar objects. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 1405--1412.
[3]
Vivian Batista, Noel Alonso, Luis Alonso, and María Moreno García. 2010. A Multiagent System for Efficient Portfolio Management, Vol. 71. 53--60. https://doi.org/10.1007/978-3-642-12433-4_7
[4]
Rui Cheng and Qing Li. 2021. Modeling the Momentum Spillover Effect for Stock Prediction via Attribute-Driven Graph Attention Networks. In AAAI.
[5]
Marcos Lopez De Prado. 2018. Advances in financial machine learning. John Wiley & Sons.
[6]
Yi Ding, Weiqing Liu, Jiang Bian, Daoqiang Zhang, and Tie-Yan Liu. 2018. Investor- Imitator: A Framework for Trading Knowledge Extraction. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD '18). Association for Computing Machinery, New York, NY, USA, 1310--1319. https://doi.org/10.1145/3219819. 3220113
[7]
Eugene F. Fama and Kenneth R. French. 1996. Multifactor Explanations of Asset Pricing Anomalies. Journal of Finance 51 (1996), 55--84.
[8]
Fuli Feng, Huimin Chen, Xiangnan He, Ji Ding, Maosong Sun, and Tat-Seng Chua. 2019. Enhancing Stock Movement Prediction with Adversarial Training. Technical Report. 5843--5849 pages. https://doi.org/10.24963/ijcai.2019/810
[9]
Mark Grinblatt, Sheridan Titman, and Russ Wermers. 1995. Momentum investment strategies, portfolio performance, and herding: A study of mutual fund behavior. The American economic review (1995), 1088--1105.
[10]
Ben Hambly, Renyuan Xu, and Huining Yang. 2021. Recent Advances in Reinforcement Learning in Finance. arXiv preprint arXiv:2112.04553 (2021).
[11]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[12]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[13]
Min Hou, Chang Xu, Yang Liu, Weiqing Liu, Jiang Bian, Le Wu, Zhi Li, Enhong Chen, and Tie-Yan Liu. 2021. Stock Trend Prediction with Multi-Granularity Data: A Contrastive Learning Approach with Adaptive Fusion. Association for Computing Machinery (2021), 700--709.
[14]
Hao Hu and Guo-Jun Qi. 2017. State-Frequency Memory Recurrent Neural Networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70 (Sydney, NSW, Australia) (ICML'17). JMLR.org, 1568--1577.
[15]
Ziniu Hu, Weiqing Liu, Jiang Bian, Xuanzhe Liu, and Tie-Yan Liu. 2018. Listening to Chaotic Whispers: A Deep Learning Framework for News-Oriented Stock Trend Prediction. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (Marina Del Rey, CA, USA) (WSDM '18). Association for Computing Machinery, New York, NY, USA, 261--269. https://doi.org/10. 1145/3159652.3159690
[16]
Zhenhan Huang and Fumihide Tanaka. 2022. MSPM: A modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management. PLoS ONE 17 (2022).
[17]
Narasimhan Jegadeesh and Sheridan Titman. 1993. Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency. The Journal of Finance 48, 1 (1993), 65--91. https://doi.org/10.1111/j.1540--6261.1993.tb04702.x
[18]
Narasimhan Jegadeesh and Sheridan Titman. 2015. Cross-Sectional and Time-Series Determinants of Momentum Returns. The Review of Financial Studies 15, 1 (06 2015), 143--157. https://doi.org/10.1093/rfs/15.1.143 arXiv:https://academic.oup.com/rfs/article-pdf/15/1/143/24432396/150143.pdf
[19]
Zhengyao Jiang, Dixing Xu, and Jinjun Liang. 2017. A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem. ArXiv abs/1706.10059 (2017).
[20]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017).
[21]
Jinho Lee, Raehyun Kim, Seok-Won Yi, and Jaewoo Kang. 2020. MAPS: multi-agent reinforcement learning-based portfolio management system. arXiv preprint arXiv:2007.05402 (2020).
[22]
Bin Li and Steven C. H. Hoi. 2014. Online Portfolio Selection: A Survey. ACM Comput. Surv. 46, 3, Article 35 (jan 2014), 36 pages. https://doi.org/10.1145/2512962
[23]
Duan Li and Wan-Lung Ng. 2000. Optimal Dynamic Portfolio Selection: Multi-period Mean-Variance Formulation. Mathematical Finance 10, 3 (2000), 387--406. https://doi.org/10.1111/1467-9965.00100
[24]
Harry Markowitz. 1952. Portfolio Selection. The Journal of Finance 7, 1 (1952), 77--91. http://www.jstor.org/stable/2975974
[25]
Russell Mendonca, Abhishek Gupta, Rosen Kralev, Pieter Abbeel, Sergey Levine, and Chelsea Finn. 2019. Guided meta-policy search. Advances in Neural Information Processing Systems 32 (2019).
[26]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. nature 518, 7540 (2015), 529--533.
[27]
John Moody and Matthew Saffell. 1998. Reinforcement learning for trading. Advances in Neural Information Processing Systems 11 (1998).
[28]
Tobias J. Moskowitz, Yao Hua Ooi, and Lasse Heje Pedersen. 2012. Time series momentum. Journal of Financial Economics 104, 2 (2012), 228--250. https://doi.org/10.1016/j.jfineco.2011.11.003 Special Issue on Investor Sentiment.
[29]
Jan Mossin. 1968. Optimal Multiperiod Portfolio Policies. The Journal of Business 41, 2 (1968), 215--229. http://www.jstor.org/stable/2351447
[30]
Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. 2018. Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA). IEEE, 6292--6299.
[31]
Dean A Pomerleau. 1991. Efficient training of artificial neural networks for autonomous navigation. Neural computation 3, 1 (1991), 88--97.
[32]
James M Poterba and Lawrence H Summers. 1988. Mean reversion in stock prices: Evidence and implications. Journal of financial economics 22, 1 (1988), 27--59.
[33]
Yao Qin, Dongjin Song, Haifeng Cheng, Wei Cheng, Guofei Jiang, and Garrison W. Cottrell. 2017. A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (Melbourne, Australia) (IJCAI'17). AAAI Press, 2627--2633.
[34]
Ramit Sawhney, Shivam Agarwal, and Arnav Wadhwa. 2020. Deep Attentive Learning for Stock Movement Prediction From Social Media Text and Company Correlations. (01 2020), 8415--8426. https://doi.org/10.18653/v1/2020.emnlp-main.676
[35]
Ramit Sawhney, Shivam Agarwal, Arnav Wadhwa, Tyler Derr, and Rajiv Ratn Shah. 2021. Stock Selection via Spatiotemporal Hypergraph Attention Network: A Learning to Rank Approach. 35 (May 2021), 497--504. https://ojs.aaai.org/ index.php/AAAI/article/view/16127
[36]
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2008. The graph neural network model. IEEE transactions on neural networks 20, 1 (2008), 61--80.
[37]
Si Shi, Jianjun Li, Guohui Li, Peng Pan, and Ke Liu. 2021. XPM: An Explainable Deep Reinforcement Learning Framework for Portfolio Management. Association for Computing Machinery, New York, NY, USA, 1661--1670. https://doi.org/10. 1145/3459637.3482494
[38]
Marc C. Steinbach. 2001. Markowitz Revisited: Mean-Variance Models in Financial Portfolio Analysis. SIAM Rev. 43, 1 (jan 2001), 31--85. https://doi.org/10.1137/ S0036144500376650
[39]
Shuo Sun, Rundong Wang, and Bo An. 2021. Reinforcement Learning for Quantitative Trading. arXiv preprint arXiv:2109.13851 (2021).
[40]
Wen Sun, J Andrew Bagnell, and Byron Boots. 2018. Truncated horizon policy search: Combining reinforcement learning & imitation learning. arXiv preprint arXiv:1805.11240 (2018).
[41]
Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An intro- duction. MIT press.
[42]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS'17). Curran Associates Inc., Red Hook, NY, USA, 6000--6010.
[43]
Jingyuan Wang, Ze Wang, Jianfeng Li, and Junjie Wu. 2018. Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (London, United Kingdom) (KDD '18). Association for Computing Machinery, New York, NY, USA, 2437--2446. https://doi.org/10.1145/3219819. 3220060
[44]
Jingyuan Wang, Yang Zhang, Ke Tang, Junjie Wu, and Zhang Xiong. 2019. Al- phastock: A buying-winners-and-selling-losers investment strategy using in- terpretable deep reinforcement attention networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1900--1908.
[45]
Rundong Wang, Hongxin Wei, Bo An, Zhouyan Feng, and Jun Yao. 2020. Commission fee is not enough: A hierarchical reinforced framework for portfolio management. arXiv preprint arXiv:2012.12620 (2020).
[46]
Zhicheng Wang, Biwei Huang, Shikui Tu, Kun Zhang, and Lei Xu. 2021. Deep- Trader: A Deep Reinforcement Learning Approach for Risk-Return Balanced Portfolio Management with Market Conditions Embedding. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 643--650.
[47]
Jin Xu, Jingbo Zhou, Yongpo Jia, Jian Li, and Xiong Hui. 2020. An Adaptive Master- Slave Regularized Model for Unexpected Revenue Prediction Enhanced with Alternative Data. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). 601--612. https://doi.org/10.1109/ICDE48307.2020.00058
[48]
Yunan Ye, Hengzhi Pei, Boxin Wang, Pin-Yu Chen, Yada Zhu, Ju Xiao, and Bo Li. 2020. Reinforcement-learning based portfolio management with augmented asset movement prediction states. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 1112--1119.
[49]
Jaemin Yoo, Yejun Soun, Yong-chan Park, and U Kang. 2021. Accurate Multivariate Stock Movement Prediction via Data-Axis Transformer with Multi-Level Contexts. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (Virtual Event, Singapore) (KDD '21). Association for Computing Machinery, New York, NY, USA, 2037--2045. https://doi.org/10.1145/3447548.3467297
[50]
Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).
[51]
Liang Zeng, Lei Wang, Hui Niu, Jian Li, Ruchen Zhang, Zhonghao Dai, Dewei Zhu, and Ling Wang. 2021. Trade When Opportunity Comes: Price Movement Forecasting via Locality-Aware Attention and Iterative Refinement Labeling. https://doi.org/10.48550/ARXIV.2107.11972

Cited By

View all
  • (2024)MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency TradingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672064(4712-4721)Online publication date: 25-Aug-2024
  • (2024)FreQuant: A Reinforcement-Learning based Adaptive Portfolio Optimization with Multi-frequency DecompositionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671668(1211-1221)Online publication date: 25-Aug-2024
  • (2024)Cross-Insight Trader: A Trading Approach Integrating Policies with Diverse Investment Horizons for Portfolio Management2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00356(4685-4698)Online publication date: 13-May-2024
  • Show More Cited By

Index Terms

  1. MetaTrader: An Reinforcement Learning Approach Integrating Diverse Policies for Portfolio Optimization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management
      October 2022
      5274 pages
      ISBN:9781450392365
      DOI:10.1145/3511808
      • General Chairs:
      • Mohammad Al Hasan,
      • Li Xiong
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2022

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. deep reinforcement learning
      2. imitation learning
      3. meta-policy learning
      4. portfolio management

      Qualifiers

      • Research-article

      Conference

      CIKM '22
      Sponsor:

      Acceptance Rates

      CIKM '22 Paper Acceptance Rate 621 of 2,257 submissions, 28%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)883
      • Downloads (Last 6 weeks)90
      Reflects downloads up to 23 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency TradingProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672064(4712-4721)Online publication date: 25-Aug-2024
      • (2024)FreQuant: A Reinforcement-Learning based Adaptive Portfolio Optimization with Multi-frequency DecompositionProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671668(1211-1221)Online publication date: 25-Aug-2024
      • (2024)Cross-Insight Trader: A Trading Approach Integrating Policies with Diverse Investment Horizons for Portfolio Management2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00356(4685-4698)Online publication date: 13-May-2024
      • (2024)Large Language Model for Dynamic Strategy Interchange in Financial Markets2024 9th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA)10.1109/ICCCBDA61447.2024.10569928(306-312)Online publication date: 25-Apr-2024
      • (2024)Trend-Heuristic Reinforcement Learning Framework for News-Oriented Stock Portfolio ManagementICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10447993(5120-5124)Online publication date: 14-Apr-2024
      • (2024)Curriculum learning empowered reinforcement learning for graph-based portfolio management: Performance optimization and comprehensive analysisNeural Networks10.1016/j.neunet.2024.106537179(106537)Online publication date: Nov-2024
      • (2024)An asset subset-constrained minimax optimization framework for online portfolio selectionExpert Systems with Applications10.1016/j.eswa.2024.124299254(124299)Online publication date: Nov-2024
      • (2024)NGDRL: A Dynamic News Graph-Based Deep Reinforcement Learning Framework for Portfolio OptimizationDatabase Systems for Advanced Applications10.1007/978-981-97-5572-1_29(407-417)Online publication date: 31-Aug-2024
      • (2023)Reinforcement Learning for Quantitative TradingACM Transactions on Intelligent Systems and Technology10.1145/358256014:3(1-29)Online publication date: 24-Mar-2023
      • (2023)A Deep Temporal Factor Analysis Method for Large Scale Financial Portfolio SelectionICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP49357.2023.10095847(1-5)Online publication date: 4-Jun-2023
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media