Out of the Black Box: Uncertainty Quantification for LLMs via Conditional Probabilities
Autoregressive LLMs generate text by sampling from estimated probability distributions over the next token, conditional on prior context. We use these probabilities to construct an entropy-based measure of prediction uncertainty, termed inner confidence. In news classification, LLM predictions with higher inner confidence are systematically more accurate. To evaluate the measure's economic relevance, we form long-short portfolios based on LLM predictions. The portfolio based on high-confidence predictions achieves a Sharpe ratio about 20\% higher than the unconditional benchmark, while the one based on low-confidence predictions yields no excess returns. In contrast, self-declared confidence exhibits significant decoding biases and provides no comparable performance gains.
-
-
Copy CitationHui Chen, Antoine Didisheim, and Luciano A. Somoza, "Out of the Black Box: Uncertainty Quantification for LLMs via Conditional Probabilities," NBER Working Paper 34965 (2026), https://doi.org/10.3386/w34965.Download Citation