Can Author Manipulation of AI Referees be Welfare Improving?
This paper examines a new moral hazard in delegated decision-making: authors can embed hidden instructions—known as prompt injections—to bias AI referees in academic peer review. If AI reviews are inexpensive, referees may delegate fully. If AI quality is low, this leads to market collapse. The paper shows that the possibility of manipulation, combined with moderate detection, can paradoxically improve welfare in this low-quality regime. With intermediate detection probabilities, only low-quality authors undertake manipulation, and detection becomes informative about quality, inducing referees to mix between manual and AI reviews. This mixed-strategy equilibrium disciplines referees and generates information, potentially sustaining the market when AI quality is low. However, this benefit vanishes when AI quality is intermediate or high; in these cases, strong detection is optimal as it enables an efficient pure AI market. Thus, prompt injection can play a welfare-enhancing role, but only when AI reviews are easily produced but insufficiently accurate. The results highlight a non-monotonic relationship between enforcement and welfare in the presence of weak AI technologies.
-
-
Copy CitationJoshua S. Gans, "Can Author Manipulation of AI Referees be Welfare Improving?," NBER Working Paper 34082 (2025), https://doi.org/10.3386/w34082.Download Citation
-