Recommender systems aim to recommend the most suitable items to users from a
large number of candidates. Their computation cost grows as the number of user
requests and the complexity of services (or models) increases. Under the
limitation of computation resources (CRs), how to make a trade-off between
computation cost and business revenue becomes an essential question. The
existing studies focus on dynamically allocating CRs in queue truncation
scenarios (i.e., allocating the size of candidates), and formulate the CR
allocation problem as an optimization problem with constraints. Some of them
focus on single-phase CR allocation, and others focus on multi-phase CR
allocation but introduce some assumptions about queue truncation scenarios.
However, these assumptions do not hold in other scenarios, such as retrieval
channel selection and prediction model selection. Moreover, existing studies
ignore the state transition process of requests between different phases,
limiting the effectiveness of their approaches.

This paper proposes a Reinforcement Learning (RL) based Multi-Phase
Computation Allocation approach (RL-MPCA), which aims to maximize the total
business revenue under the limitation of CRs. RL-MPCA formulates the CR
allocation problem as a Weakly Coupled MDP problem and solves it with an
RL-based approach. Specifically, RL-MPCA designs a novel deep Q-network to
adapt to various CR allocation scenarios, and calibrates the Q-value by
introducing multiple adaptive Lagrange multipliers (adaptive-$lambda$) to
avoid violating the global CR constraints. Finally, experiments on the offline
simulation environment and online real-world recommender system validate the
effectiveness of our approach.

In this article, the authors address the issue of computation cost in recommender systems and how to make a trade-off between computation cost and business revenue. Recommender systems play a crucial role in suggesting suitable items to users from a large pool of options. As the number of user requests and the complexity of services increase, the computation cost also grows. Therefore, it is essential to find a balance between computation cost and business revenue, considering the limitation of computation resources.

The existing studies in this area have mainly focused on dynamically allocating computation resources in queue truncation scenarios. In these scenarios, the size of candidates is allocated based on optimization problems with constraints. However, these studies are limited as they either focus on single-phase computation resource allocation or introduce assumptions about queue truncation scenarios that do not hold in other situations such as retrieval channel selection and prediction model selection. Additionally, these studies overlook the state transition process of requests between different phases, which limits their effectiveness.

To address these limitations, the authors propose a Reinforcement Learning (RL) based Multi-Phase Computation Allocation approach (RL-MPCA). RL-MPCA aims to maximize the total business revenue while considering the limitation of computation resources. The authors formulate the computation resource allocation problem as a Weakly Coupled Markov Decision Process (MDP) problem and solve it using an RL-based approach.

Specifically, RL-MPCA introduces a novel deep Q-network that can adapt to various computation resource allocation scenarios. This deep Q-network is calibrated by multiple adaptive Lagrange multipliers (adaptive-$lambda$) to ensure that global computation resource constraints are not violated.

The effectiveness of RL-MPCA is validated through experiments conducted in both an offline simulation environment and an online real-world recommender system. These experiments prove that RL-MPCA is an effective approach for computation resource allocation in recommender systems.

This research is significant because it addresses a critical issue in recommender systems and provides a novel approach that considers the multi-disciplinary nature of computation resource allocation. By applying RL and MDP techniques, the authors demonstrate the applicability of reinforcement learning in solving complex optimization problems in the context of recommender systems.

Overall, this paper makes a valuable contribution to the field of recommender systems by proposing a new approach for computation resource allocation that can be easily adapted to different scenarios. The use of RL and MDP techniques showcases the interdisciplinary nature of this research, combining concepts from machine learning, optimization, and recommender systems.
Read the original article