Collaborative Research: Disentangling Exploration from Exploitation

DATE: March 1, 2020 to February 28, 2022

Project Outcomes Statement

A key tension in the study of experimentation revolves around the exploration of new possibilities and the exploitation of prior discoveries. Starting from Robbins (1952), a large literature in economics and statistics on experimentation via multi-armed bandits has married these two phenomena: Agents experiment by selecting potentially risky options and observing their resulting payoffs. This framework has been used in many applications, ranging from pricing decisions to labor market search. Nonetheless, in many applications, agents' exploration and exploitation need not be intertwined. An investor may evaluate projects she has not invested in, an employee may explore alternative jobs while working, etc. The project focuses on the consequences of disentangling exploration from exploitation.

The prior literature has exploited an important technical tool which is known as the Gittins index that attaches a value independently to each project---an index---determines which project is optimal to act on. In our setting, we show that such an index does not exist, which poses a challenge in offering general characterizations of optimal policies.

Nevertherless, it is possible to make progress in interesting special cases. Specifically, the first paper has focused on the case of what are known as Poisson bandits, which are widely used in applications. An important difference is the fact that in our setting, the decision maker eventually must learn to exploit the superior project, whereas the standard setting features incomplete learning.

The paper also shows that the opportunity to disentangle exploration from exploitation drastically alters the optimal policy. In particular, such a policy displays a lot more persistence than in the standard setting, and attains higher payoffs, especially when key parameters are not in extreme ranges.

The project also generalizes settings from prior literature, and obtains novel results that illuminate specific features of those settings.

An application to team experimentation also displays important differences: in the traditional theory team experimentation features free riding, whereas in our setting experimentation is efficient.