Repetition and Exploration in Offline Reinforcement Learning-based Recommendations

Abstract

Methods for reinforcement learning for recommendation (RL4Rec) have been gaining a substantial amount of attention, as they can optimize long-term user engagement. To avoid expensive online interactions with actual users, offline RL4Rec has been proposed to optimize methods based on logged user interactions. The evaluation of offline RL4Rec methods solely depends on the overall performance of the resulting recommendations, and thus may inaccurately reflect true performance. Instead, we conduct a novel study on the evaluation of offline RL4Rec methods from a repetition-and-exploration perspective, where we separately evaluate and compare the performance of recommending relevant repeat items (i.e., items that a user has interacted with) and exploratory items (i.e., items that the user did not interact with so far). Our experimental results reveal a significant disparity between repetition performance and exploration performance of RL4Rec methods. Furthermore, we find that the consideration of future gains sensitively affects the optimization of RL4Rec methods. Overall, our findings regarding repetition performance and exploration performance provide valuable insights for the future evaluation and optimization of RL4Rec methods.

Publication
In The 4th Workshop on Deep Reinforcement Learning for Information Retrieval at CIKM
Jin Huang
Jin Huang
PhD student

My research interest focuses on trustworthy intelligent information systems, with a focus on unbiasedness, fairness, robustness, and explainability.