DualEntry Labs

The 2026 Accounting AI Benchmark

Name: Accounting AI Benchmark 2026 Dataset
Creator: DualEntry
Published: 2026-02-20

We tested the top AI models on real accounting work for accuracy. Compare for yourself.

Model comparison

Model

Overall Accuracy

Model Provider

License Type

Claude Sonnet 4.6

63.4%

Anthropic

Closed

OpenAI GPT-4

19.8%

OpenAI

Closed

OpenAI GPT-4-0613

19.8%

OpenAI

Closed

OpenAI GPT-5.1

57.4%

OpenAI

Closed

Qwen3 Coder Next

57.4%

Alibaba

Open

Z.ai GLM-5

65.3%

Zhipu AI

Open

MiniMax M2.5

65.3%

MiniMax

Open

Z.ai GLM-4.7

56.4%

Zhipu AI

Open

OpenAI GPT-5.2

76.2%

OpenAI

Closed

Nemotron Nano 12B

32.7%

NVIDIA

Open

Claude Haiku 4.5

61.4%

Anthropic

Closed

Z.ai GLM-4.7 Flash

56.4%

Zhipu AI

Open

Gemini 2.5 Flash-Lite

27.7%

Google

Closed

Methodology

Learn more

The methodology used for this benchmark was to divide a set of specific, domain-related questions into different categories. These categories represent the core workflow of a general accounting system.

Questions were designed against a provisioned chart of accounts and a minimal context capable of providing the information required for the questions to function without loading too much information into the prompt.

Each benchmark runs in an isolated environment per organization, without any link to a real account in our system. Each run is agnostic to the others.

The grading is deterministic, meaning there is no “reasoning” behind the answers beyond a simple binary-logic decision.

Each benchmark is allowed to run multiple times, computing accuracy, standard deviation per category, and difficulty tier.

The whole benchmark is task-oriented, not trivia-based, giving us the flexibility to perform actions in our system such as delegate_to_record_draft or use other tooling systems expected for the agent.