- PUBLICATION Introducing HealthBench
- PUBLICATION OpenAI o3 and o4-mini System Card
- PUBLICATION Our updated Preparedness Framework
- OPENAI BrowseComp: a benchmark for browsing agents
- PUBLICATION PaperBench: Evaluating AI’s Ability to Replicate AI Research
- OPENAI Addendum to GPT-4o System Card: 4o image generation
- OPENAI Detecting misbehavior in frontier reasoning models
Latest Briefs
Fast updates from the latest stories.
PUBLICATION
Measuring the performance of our models on real-world tasks
6 months ago
PUBLICATION
Detecting and reducing scheming in AI models
7 months ago
PUBLICATION
Collective alignment: public input on our Model Spec
7 months ago
PUBLICATION
Accelerating life sciences research
7 months ago