Arena Leaderboard
Elo rankings from human & LLM judge votes
Judge
👑
| Rank | Model | Company | Elo | Win% | W / L / T | Battles |
|---|---|---|---|---|---|---|
| 1 | GPT-5.2 | 1037 | 66.7% | 4W / 1L / 4T | 9 | |
| 2 | Nanonets OCR2+ | 1034 | 75.0% | 3W / 1L / 0T | 4 | |
| 3 | GPT-5.4 · Medium Reasoning | 1032 | 64.3% | 4W / 2L / 1T | 7 | |
| 4 | Claude Sonnet 4.6 | 1019 | 62.5% | 2W / 1L / 1T | 4 | |
| 5 | GPT-5.4 · Low Reasoning | 1013 | 62.5% | 2W / 1L / 1T | 4 | |
| 6 | Claude Opus 4.6 | 1009 | 50.0% | 2W / 2L / 7T | 11 | |
| 7 | GPT-5 Mini | 1001 | 50.0% | 1W / 1L / 0T | 2 | |
| 8 | Gemini 3.1 Pro | 1000 | — | — | 0 | |
| 9 | Gemini 2.5 Flash | 1000 | 50.0% | 0W / 0L / 1T | 1 | |
| 10 | Claude Sonnet 4.6 · Thinking | 998 | 54.5% | 4W / 3L / 4T | 11 | |
| 11 | Claude Opus 4.6 · Low Thinking | 985 | 37.5% | 0W / 1L / 3T | 4 | |
| 12 | Gemini 2.5 Flash · Thinking | 984 | 25.0% | 0W / 1L / 1T | 2 | |
| 13 | Nanonets OCR3 | 984 | 0.0% | 0W / 1L / 0T | 1 | |
| 14 | GPT-4.1 | 983 | 25.0% | 0W / 1L / 1T | 2 | |
| 15 | Gemini 3 Flash | 980 | 37.5% | 1W / 3L / 4T | 8 | |
| 16 | GPT-5.4 | 973 | 25.0% | 0W / 2L / 2T | 4 | |
| 17 | Gemini 2.5 Pro | 968 | 25.0% | 1W / 3L / 0T | 4 |