All Benchmarks/OlmOCR Bench

OlmOCR Bench

v1.0

7,010 pass/fail tests built by Allen AI. Feed a page image in, get structured text out. Scored on five axes: does it render math correctly, preserve table layout, pick up headers and captions, suppress watermarks and page numbers, and maintain reading order. Sources include arXiv papers, old scans, financial filings, and multi-column layouts.

Models Evaluated

23

Dataset Size

1,403 pages ยท 7,010 tests

Metrics

5

Source

View on GitHub

Overall Score = Average of Math, Table, Present, Absent, Order, and Base scores

Rankings

#
Model
Overall
Math
Table
Present
Absent
Order
1Nanonets OCR2+Nanonets82.282.586.371.693.176.1
2Datalab MarkerDatalab82.184.283.375.588.673.5
3Qwen3.5-9BAlibaba78.185.586.571.757.273.8
4Qwen3.5-4BAlibaba77.286.085.068.955.774.5
5Gemini 3.1 ProGoogle74.682.389.870.543.365.0
6Claude Sonnet 4.6Anthropic74.487.186.047.041.866.6
7Claude Opus 4.6Anthropic73.986.184.549.139.967.7
8Qwen3.5-2BAlibaba73.781.280.764.558.767.7
9Gemini-3-ProGoogle73.580.185.273.430.573.8
10GPT-5.4OpenAI73.483.191.166.925.274.7
11GPT-5.2OpenAI72.279.386.868.927.172.7
12Mistral Small 4Mistral AI69.666.083.955.644.772.9
13Gemini-3-FlashGoogle69.279.264.575.727.869.0
14GLM-OCRZhipu AI66.768.458.936.189.171.9
15Qwen3.5-0.8BAlibaba65.672.063.748.868.260.6
16Ministral-8BMistral AI57.850.879.242.055.771.4
17GPT-5-MiniOpenAI56.751.570.670.226.174.4
18Claude Haiku 4.5Anthropic56.258.783.325.542.053.4
19GPT-4.1OpenAI55.560.059.147.334.959.4
20Llama-3.2-Vision-11BMeta47.239.460.830.569.951.9
21Pixtral-12BMistral AI36.832.846.715.474.525.1
22GPT-5-NanoOpenAI22.82.455.113.043.147.0
23Gemma-3-12B-ITGoogle20.610.034.714.667.97.7

Metrics

MathHigher is better

Accuracy of mathematical equation rendering and LaTeX output.

TableHigher is better

Fidelity of table structure and content preservation.

PresentHigher is better

Correct inclusion of visible document elements like headers and captions.

AbsentHigher is better

Correct suppression of artifacts like watermarks and page numbers.

OrderHigher is better

Accuracy of multi-column and complex layout reading order.