Researchers at Oxford University develop Deep Double Duelling Q-Learning for translating trading signals into SOTA trading strategies

A standard method used by effective quantitative trading techniques is the generation of trade signals with a statistically significant association with future prices. The actions resulting from these signals are then intended to take positions to profit from potential price fluctuations. The more crucial the performance, the higher the signal frequency and strategy turnover.

A limit order is buying or selling a security at a set price or higher. Limit order books are pretty popular financial market mechanisms and are extensively used by exchanges worldwide. The security expert keeps track of limit orders placed for securities in their database. The specialist managing the limit order book ensures that the top priority order gets executed ahead of other orders in the book and ahead of other orders held or filed by other traders at an equal or worse price. The introduction of AI has significantly impacted the trading system. Although studies have shown that LOB prices can be predicted over short periods, it is still challenging to develop an ideal trading strategy quickly enough to turn this predictability into trading profits.

The research team at Oxford University proposed Deep Duelling Double Q-Learning with the APEX (asynchronous prioritized experience replay) architecture in their new paper ASYNCHRONOUS DEEP DOUBLE DUELLING Q-LEARNING FOR TRADING-SIGNAL EXECUTION IN LIMIT ORDER BOOK MARKETS. This approach translates predictive signals into optimal limit order trading strategies using deep reinforcement learning. Reinforcement learning has been used to learn a variety of tasks in limit order book market contexts, including trading, portfolio optimization, market creation, and optimal trade execution.

The team enables the placement of limit orders at various prices in a LOB trading environment by establishing a unique action and state space. The RL agent also learns to employ limit orders of single units of stock to manage its inventory as it holds a variety of sized long or short positions over time, in addition to the timing and level placement of limit orders.

🔥 Recommended Read: Leveraging TensorLeap for Effective Transfer Learning: Overcoming Domain Gaps

More generally, it shows a real-world application of RL to create limit-order trading strategies, which are still typically created by hand as part of a trading system.

As a result of significant portfolio turnover, transaction costs might have an unacceptably substantial influence on earnings, making it challenging to integrate high-frequency forecasts into tradeable and lucrative strategies. Researchers propose that RL can be a valuable tool for carrying out this translational function and learning the best solutions for a particular signal and market combination. The necessity to manually adjust execution techniques for various markets and signals is removed by this kind of strategy customization, which has been found to enhance performance significantly. A single observation space may be created for practical uses by combining several distinct signals. As a result, the RL problem might be immediately merged with the difficulty of incorporating several forecasts into a cohesive trading strategy.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 13k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Niharika is a Technical consulting intern at Marktechpost. She is a third year undergraduate, currently pursuing her B.Tech from Indian Institute of Technology(IIT), Kharagpur. She is a highly enthusiastic individual with a keen interest in Machine learning, Data science and AI and an avid reader of the latest developments in these fields.