DualEntry Labs

The 2026 Accounting AI Benchmark

We tested the top AI models on real accounting work for accuracy.
Compare for yourself.

Model comparison
Model Provider
License Type
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Model
Overall Accuracy
Model Provider
License Type
Claude Sonnet 4.6
63.4%
63.4
Anthropic
Closed
OpenAI GPT-4
19.8%
19.8
OpenAI
Closed
OpenAI GPT-4-0613
19.8%
19.8
OpenAI
Closed
OpenAI GPT-5.1
57.4%
57.4
OpenAI
Closed
Qwen3 Coder Next
57.4%
57.4
Alibaba
Open
Z.ai GLM-5
65.3%
65.3
Zhipu AI
Open
MiniMax M2.5
65.3%
65.3
MiniMax
Open
Z.ai GLM-4.7
56.4%
56.4
Zhipu AI
Open
OpenAI GPT-5.2
76.2%
76.2
OpenAI
Closed
Nemotron Nano 12B
32.7%
32.7
NVIDIA
Open
Claude Haiku 4.5
61.4%
61.4
Anthropic
Closed
Z.ai GLM-4.7 Flash
56.4%
56.4
Zhipu AI
Open
Gemini 2.5 Flash-Lite
27.7%
27.7
Google
Closed

Methodology

The methodology used for this benchmark was to divide a set of specific, domain-related questions into different categories. These categories represent the core workflow of a general accounting system.

Learn more

The methodology used for this benchmark was to divide a set of specific, domain-related questions into different categories. These categories represent the core workflow of a general accounting system.

Questions were designed against a provisioned chart of accounts and a minimal context capable of providing the information required for the questions to function without loading too much information into the prompt.

Each benchmark runs in an isolated environment per organization, without any link to a real account in our system. Each run is agnostic to the others.

The grading is deterministic, meaning there is no “reasoning” behind the answers beyond a simple binary-logic decision.

Each benchmark is allowed to run multiple times, computing accuracy, standard deviation per category, and difficulty tier.

The whole benchmark is task-oriented, not trivia-based, giving us the flexibility to perform actions in our system such as delegate_to_record_draft or use other tooling systems expected for the agent.

Category
Questions
What it tests
Transaction Classification
13
Mapping bank transactions to the correct chart of accounts
Journal Entry Creation
13
Creating balanced journal entries with correct accounts and amounts
Accounts Payable
13
Bills, vendor payments, vendor credits
Accounts Receivable
12
Invoices, customer payments, credit memos
Bank Reconciliation
12
Identifying reconciling items and computing adjusted balances
Financial Reporting
13
Ratios, cash flow, balance sheet analysis
Month-End Close
12
Accruals, deferrals, depreciation, reversals
AI Accounting Knowledge
13
Multiple-choice conceptual accounting knowledge