Skip to content
#

false-refusal-rate

Here is 1 public repository matching this topic...

When Better Means Less: Quantifying What Benchmarks Miss Between Model Generations. 2,310 controlled comparisons show GPT-5 series lost 6.7x creativity and gained 4.4x false refusals vs chatgpt-4o-latest — invisible to standard benchmarks.

  • Updated Feb 23, 2026
  • Python

Improve this page

Add a description, image, and links to the false-refusal-rate topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the false-refusal-rate topic, visit your repo's landing page and select "manage topics."

Learn more