📄 Survey · Under Review · 2026

Say No Too Often:
Over-Refusals in Foundation Models

Jiaxi Yang Shicheng Liu†‡ Abolfazl Ansari Yuchen Yang Dongwon Lee

The Pennsylvania State University
† Equal contribution  ·  ‡ Intern at Penn State  ·  ★ Corresponding author

Overview

What is over-refusal, why does it matter, and how do we study it?

Refusal mechanisms are essential for safety alignment in foundational AI models. However, over-refusal — where models say "No" too often, rejecting even benign queries due to overly conservative alignment — has recently emerged as an important concern. Unlike jailbreak (under-refusal), over-refusal arises from excessive safety alignment that suppresses legitimate user requests. In this survey, we present the first comprehensive framework dedicated to over-refusal, covering benchmarks, evaluation metrics, mitigation strategies, open challenges, and real-world applications.
Investigation framework for over-refusal in foundation models
Figure 1: Investigation framework for over-refusal in foundation models. We evaluate models on benchmarks using dedicated metrics to detect over-refusal. If identified, mitigation strategies are applied; unresolved issues motivate future work.
40+
Papers Surveyed
15+
Benchmarks
3
Modalities
9
Eval Metrics
5
Open Challenges

Contributions

Three main contributions of this survey paper.

1

First Comprehensive Survey on Over-Refusal

To the best of our knowledge, the first survey dedicated to over-refusal in foundation models, providing a unified framework for understanding and mitigating this problem.

2

Systematic Taxonomy

A systematic taxonomy of over-refusal benchmarks, evaluation metrics, and mitigation methods across LLMs, VLMs, and audio models — clarifying the current research landscape.

3

Challenges & Future Directions

Five key open challenges in over-refusal research with promising future directions, highlighting practical applications where mitigating over-refusal is critical.

Taxonomy

We organize the over-refusal literature across three research dimensions.

📋
Benchmarks
Datasets spanning single-turn questions, multi-turn dialogues, long-context, multilingual, and multimodal settings.
LLMs VLMs Audio / T2I Healthcare
📐
Evaluation Metrics
Metrics for over-refusal (ORR, CR, RS), under-refusal (TRR, ASR), and trade-off measures (MB-Score, NSI, ΔIR).
Over-Refusal Under-Refusal Trade-off
🛠️
Mitigation Methods
Training-based (SFT, DPO, GRPO), inference-time (activation steering, decoding calibration), and explanation-based approaches.
Training Inference-Time Explanation

Open Challenges

Five key challenges we identify in current over-refusal research.

01 · Human Perception
Evaluations focus on model-centric metrics while overlooking how users actually perceive refusal behaviors — same ORR can cause vastly different user experiences.
02 · Explanation Utility Functions
Attribution methods (SHAP, Integrated Gradients) use general utility functions not designed for refusal, leading to suboptimal trigger identification.
03 · Domain-Specific Over-Refusal
Safety boundaries differ substantially across domains (healthcare, finance, law). General methods are hard to adapt without domain-specific benchmarks.
04 · Other Modalities
Video-language and embodied AI models remain largely unexplored. Modality-specific benchmarks for systematic evaluation are urgently needed.
05 · Ambiguous Safety Boundaries
The line between benign and harmful is often inherently unclear, complicating both benchmark construction and determining appropriate mitigation degree.

Citation

If you find our work useful, please consider citing our paper.

@article{yang2025sayno, title = {Say No Too Often: Over-Refusals in Foundation Models}, author = {Yang, Jiaxi and Liu, Shicheng and Ansari, Abolfazl and Yang, Yuchen and Lee, Dongwon}, journal = {arXiv preprint}, year = {2026}, note = {Under review}, url = {https://github.com/abbottyanginchina/Awesome-Over-Refusal} }

We maintain a continuously-updated paper list at github.com/abbottyanginchina/Awesome-Over-Refusal . Pull requests welcome!