Arena Leaderboard

Elo rankings from human & LLM judge votes

Battle Now
Judge
RankModelCompanyEloWin%W / L / TBattles
1GPT-5.2openai103766.7%4W / 1L / 4T9
2Nanonets OCR2+nanonets103475.0%3W / 1L / 0T4
3GPT-5.4 · Medium Reasoningopenai103264.3%4W / 2L / 1T7
4Claude Sonnet 4.6anthropic101962.5%2W / 1L / 1T4
5GPT-5.4 · Low Reasoningopenai101362.5%2W / 1L / 1T4
6Claude Opus 4.6anthropic100950.0%2W / 2L / 7T11
7GPT-5 Miniopenai100150.0%1W / 1L / 0T2
8Gemini 3.1 Progoogle10000
9Gemini 2.5 Flashgoogle100050.0%0W / 0L / 1T1
10Claude Sonnet 4.6 · Thinkinganthropic99854.5%4W / 3L / 4T11
11Claude Opus 4.6 · Low Thinkinganthropic98537.5%0W / 1L / 3T4
12Gemini 2.5 Flash · Thinkinggoogle98425.0%0W / 1L / 1T2
13Nanonets OCR3nanonets9840.0%0W / 1L / 0T1
14GPT-4.1openai98325.0%0W / 1L / 1T2
15Gemini 3 Flashgoogle98037.5%1W / 3L / 4T8
16GPT-5.4openai97325.0%0W / 2L / 2T4
17Gemini 2.5 Progoogle96825.0%1W / 3L / 0T4