Algorithms, Judicial Discretion, and Pretrial Decisions

Featured in print Digest

This figure is a line graph titled, Influence of Hearing Unrelated Violent Felony on Pretrial Release Rates. The y-axis is labeled, pretrial release, percentage points. It ranges from negative 20 to positive 20, increasing in increments of 10.  The x-axis is labeled, work periods relative to hearing unrelated violent felony case. It ranges from negative four to positive 4, increasing in increments of 1. At 0, there is a vertical dotted line that is labeled, judge hears a case involving an unrelated defendant arrested for a serious violent felony while on pretrial release. There are two lines on the graph: High-skill judges and low-skill judges. The low-skill line begins near 0 and hovers around there until negative 1 work period. It then starts to decrease, reaching a bottom point about negative 12 percentage points at positive 2 work periods. The high-skill line begins at 0 and increases to around about positive 5 percentage points before then decreasing and leveling off at near 0 from negative one to positive four work periods.  The note on the figure reads, Shaded areas represent 95% confidence intervals. Pretrial release is defined as release before case disposition. The source line reads, Source: Researchersʼ calculations using data from multiple sources on court cases and arraignments.

The relative performance of data-driven algorithms and human decisionmakers, who are often able to override algorithmic recommendations, is an active subject of study in many settings. In a new study of pretrial release decisions by judges, Algorithmic Recommendations and Human Discretion (NBER Working Paper 31747), researchers Victoria AngelovaWill S. Dobbie, and Crystal S. Yang find a small fraction of judges outperform the algorithm, while most do not.

The researchers analyze data on pretrial decisions made by judges in a US city between October 2016 and March 2020. The city used an algorithm to produce a pretrial misconduct risk score and a release recommendation for each defendant. Judges had access to these scores as well as to a series of reports that detailed the information considered by the algorithm, which included the defendants’ parole/probation status, the number and details of any prior arrests or convictions, and age at first arrest. The judges also had access to other information that was not used in the algorithmic risk score and release recommendation, including demographics, details of the current charges, and aggravating risk factors like mental health status.

When bail judges can overrule the algorithmic recommendation regarding pretrial release, 90 percent underperform the algorithm.

The judges overrode the algorithm’s release recommendations for 12 percent of low-risk defendants for whom the algorithm recommended release and 54 percent of high-risk defendants for whom the algorithm recommended detention. The researchers investigated these override decisions to learn whether the judges disagreed with the algorithm’s implicit assumption of the acceptable risk threshold for pretrial release, or whether they made different assessments of the risk levels of different defendants in light of the information they received. The researchers constructed a counterfactual algorithmic misconduct rate based on the algorithm’s recommendations at each judge’s release rate. For each judge, they then compared these counterfactual rates with the actual pretrial misconduct rates for the defendants who were released, allowing them to assess the impact of allowing judges to exercise their discretion on the accuracy of release decisions.

On average, the judges underperform the algorithm with pretrial misconduct rates that are 2.4 percentage points higher than the algorithmic counterfactual at the same release rate. However, there was important heterogeneity across the judges, with approximately 90 percent underperforming the algorithm but 10 percent outperforming it. The two groups of judges had similar demographics, political leanings, and years of experience, but the lower-performing judges were more likely to have a background in law enforcement. They were also more likely to be affected by information extraneous to the case at hand. For example, if, prior to a pretrial release hearing, a low-performing judge heard a case involving an unrelated defendant who was arrested for a serious felony while on pretrial release, the judge’s release rate for other defendants declined in subsequent work periods. High-performing judges did not exhibit this pattern of behavior.

In an original survey conducted by the researchers, lower-performing judges appeared to place greater importance on demographic factors including race, while higher-performing judges instead focused on non-demographic factors not considered by the algorithm like substance abuse status and the ability to pay bail. Consistent with these stated preferences, lower-performing judges were more likely to assign monetary bail while higher-performing judges were more likely to attach nonmonetary conditions, such as drug treatment or counseling, to release. The researchers thus conclude that that the use of valuable private information not considered by the algorithm can make it possible for a skilled human decision-maker and an algorithm working together to outperform the algorithm alone.

— Emma Salomon

This research was funded by the Russell Sage Foundation and Harvard University.