| Name | Semi-private eval | Public eval |
|--------------------------------------|-------------------|-------------|
| Jeremy Berman | 53.6% | 58.5% |
| Akyürek et al. | 47.5% | 62.8% |
| Ryan Greenblatt | 43% | 42% |
| OpenAI o1-preview (pass@1) | 18% | 21% |
| Anthropic Claude 3.5 Sonnet (pass@1) | 14% | 21% |
| OpenAI GPT-4o (pass@1) | 5% | 9% |
| Google Gemini 1.5 (pass@1) | 4.5% | 8% |
https://arxiv.org/pdf/2412.04604