Justice in Judgment
LLMs are increasingly used to assist academic peer review โ but do they judge papers on merit alone? We investigate whether LLMs replicate well-known human biases by reviewing the same paper under different author profiles across four axes: institutional affiliation, author gender, academic seniority, and publication history. Using a counterfactual design on 252 ICLR 2025 papers reviewed by 9 LLMs, we find that most models systematically assign higher scores to papers from prestigious institutions, senior author profiles, and authors with many top-tier publications. Gender bias is inconsistent in direction across models. These results call for bias-aware evaluation protocols before deploying LLMs in high-stakes scholarly decisions.
We evaluate 9 LLMs across four bias dimensions using ICLR 2025 papers. Each bar is split into: advantaged group scores higher (blue) ยท tied (gray) ยท disadvantaged group scores higher (red). The value on the right is the net bias score (blue% โ red%).
% of papers receiving a higher LLM score when attributed to a prestigious institution vs. a less-ranked one (same paper, same author name).
Each cell (RS row, RW column) shows the number of papers where the RS-affiliated author received a strictly higher LLM score. Affiliations sorted by net wins.
% of papers receiving a higher LLM score when attributed to a Senior PI (20+ years post-PhD) vs. an undergraduate student.
% of papers receiving a higher LLM score when attributed to an author with 100 top-tier publications vs. 0 publications.
% of papers rated higher under a male vs. female author name. Results are mixed โ neither direction dominates across all models.
Note: Gender bias direction varies by model. Blue bars = male-biased; red bars = female-biased.
252 papers from ICLR 2025 (accepted & rejected). Each paper reviewed under multiple author profiles per bias dimension.
Counterfactual design: identical paper content reviewed under different author metadata. LLM score compared directly between conditions for each paper.