What is agentic audit testing?

Agentic audit testing is when an AI agent executes a control or substantive test procedure end-to-end — walking samples, applying procedures to evidence, recording results, and producing a reviewable workpaper. The auditor scopes the test and reviews the agent's work; the agent handles the per-sample execution that was previously manual. It is not autonomous audit and it does not replace auditor judgment.

Does agentic AI change the auditor's signoff requirements?

No. The auditor still signs off as preparer and reviewer. The methodology has not changed. What changes is what the signoff is attesting to — instead of attesting to manually executing each sample, the auditor is attesting to having reviewed the agent's work and concurring with its conclusions. The defensibility burden is the same; it lives in a different artifact.

Can I expand sample sizes when agentic testing is available?

Yes — but sample size should still be a methodology decision, not a capacity decision. Agentic execution removes the time-cost constraint that drove conventional 30-sample defaults, which means the right sample size is now determined by the risk and population characteristics, not by what fits in the budget. Some tests still warrant a small sample; others should expand to the full population.

What if the AI is wrong on a sample?

The auditor reviews the agent's work and can override any conclusion. The override is captured in the same workpaper trail with the auditor's reasoning. The agent is a preparer-side tool; the auditor remains the final judgment. Tools that don't support clean override or that hide the override from the trail are not appropriate for audit work.

Is this the same as 'continuous auditing'?

No. Continuous auditing is about running tests on live production data on an ongoing basis. Agentic audit testing is about how a specific test procedure executes — it can be used in continuous or periodic contexts. The two concepts compose well, but they are different layers of the same problem.

Agentic audit testing — what changes when AI does, not just drafts

Picture an auditor at a control test. The procedure is written. The evidence is uploaded. The sampling rule is set. The auditor's job, until very recently, was to click through each item, mark it against the procedure, and capture the result.

For most of 2025, AI helped with one half of that. It drafted the procedure. It suggested the sampling approach. It even pre-filled the expected result. But the click-through was still a human task. The auditor read each sample, made the judgment, and recorded it.

That's where the change happens.

In agentic audit testing, the AI doesn't just draft the procedure. It runs it. It walks each sample, applies the procedure to the evidence, records what it found, and presents a workpaper for review. The auditor's role changes from execution to assurance over execution.

This is a small operational shift. It is also a bigger structural shift than most teams treat it as.

What "agentic" actually means in audit

The word "agentic" has been doing a lot of work in marketing copy lately, mostly imprecisely. In an audit context, it has a narrow technical meaning:

The AI carries out a multi-step procedure end to end.
It uses the evidence the auditor scoped, not arbitrary data.
It produces a structured workpaper, not free-form prose.
It logs every action so the chain from procedure → evidence → conclusion is reviewable.

It doesn't mean the AI decides what to audit. It doesn't mean the AI authorizes anything. It doesn't mean the AI signs off. Those remain auditor responsibilities, and the methodology hasn't moved on that.

What moves is the cost of executing a procedure once the auditor has scoped it.

The defensibility load shifts left

In a non-agentic workflow, an auditor signing a test attests to two things at once: that the procedure was sound, and that they executed it correctly. The signoff bundles both.

When AI executes the procedure, the auditor's signoff still covers both, but the second part is now an attestation about something the AI did. That's not a problem — it's the same situation as reviewing a junior's work — but it requires a different kind of working paper.

The working paper has to make the AI's actions reviewable. It has to show which sample was examined, what evidence was consulted, what rule was applied, what the result was, and where any judgment call was made. If a reviewer can read that paper and not need to re-do the test, the defensibility holds.

If the working paper is just a column of "pass" labels with no chain of work, the auditor's signoff is supporting a conclusion they cannot defend. That isn't agentic audit — that's careless audit with an AI shortcut.

The shift is that more of the defensibility load moves into the AI's workpaper, and less into the auditor's procedure documentation. The total load is the same. Where it lives changes.

What changes operationally

Three things change once an agentic loop is wired into a test step.

Time per sample drops. A test that took an auditor twenty minutes per sample — pulling the source document, applying the procedure, recording the result — takes seconds in agentic mode. The auditor reviews the AI's pass rather than performing it. For high-volume tests with consistent procedures (existence testing, completeness testing, simple recalculations), the leverage is order-of-magnitude.

Sample size becomes a methodology decision, not a capacity decision. Most audit teams sample at thirty because thirty is what they have time for. When per-sample execution cost collapses, the sample size constraint disappears. The auditor can run the procedure across the whole population, or across a stratified slice that actually addresses the risk. The question moves from "what can we fit in the budget" to "what does our methodology require."

Reviewer attention re-centers on judgment. Reviewers used to spend time confirming arithmetic, checking that the procedure was followed, and re-performing samples that looked unusual. In agentic mode, the AI handles the mechanical part. The reviewer's time goes to the procedure design, the scope, the exceptions the AI flagged, and the conclusions the auditor drew. The reviewer's value rises, not falls.

What doesn't change

The methodology doesn't change. Procedures are still designed by auditors. Sampling rules are still set by auditors. Conclusions are still drawn by auditors. The AI is a tool inside a step, not a replacement for the step's owner.

The signoff doesn't change either. Preparer signs, reviewer signs. Independence still holds — the auditor reviewing the AI's work has the same independence relationship to the AI's preparer-side activity as they had to a junior's manual work.

The audit standard doesn't change. PCAOB, IIA, and ISA all already accommodate AI-assisted work; they require disclosure, judgment retention, and a reviewable trail. Agentic execution doesn't run afoul of any of them. It does, however, raise the floor on what "reviewable trail" means in practice — which is mostly a software-quality question, not a standards question.

The two questions to ask any agentic audit tool

If you're evaluating an agentic audit tool — yours or a vendor's — the two questions worth asking are simple.

Can a reviewer reconstruct what the agent did, end-to-end, without trusting the agent?

If the answer requires "trust our model," that's not auditable agentic work. The trail has to stand on its own.

Can the auditor override every AI conclusion, with the override captured in the same trail?

If the AI's output is the ceiling — if the auditor can disagree but the disagreement doesn't show in the workpaper — the tool isn't designed for audit. It's designed for speed at the expense of defensibility, which is the wrong trade for this profession.

The interesting thing about agentic audit testing isn't that AI does the work. AI has been doing fragments of audit work for two years. The interesting thing is the redistribution of attention — away from mechanical execution, toward methodology, scope, and exception judgment.

That's where audit value sits anyway. Agentic execution just lets more of the team's time arrive there.