Industry news

How to evidence AI agent compliance: 5 expectations from the FCA

May 15, 2026
8 min read

The Financial Conduct Authority hasn’t written an AI rulebook and it won’t. The regulator has been explicit, repeatedly, that AI in financial services already sits inside the existing framework. Consumer Duty. SMCR. Operational resilience. The same rules that already govern everything else in the firm. The same rules, the same standards, just applied to something new.

That sounds permissive. It is the opposite.

Generic AI deployments fall apart at the point a second line tries to translate “outcomes-focused supervision” into a stack of documents that survives a regulator’s read. The framework is clear. The evidence is the question. And most firms scoping agentic AI in 2026 still cannot answer it.

This is the gap we have been working through across our AI on Trial: The Burden of Proof campaign. This piece is for the people who actually have to make the case in practice. Compliance leads. Heads of Risk. Second-line teams. The senior managers whose names sit on the SMCR responsibility map and who will be the ones explaining things, in writing, if it goes wrong.

What follows: what the regulator actually expects to see, what most firms are missing, and what a compliant evidence pack looks like for an AI agent deployment in 2026.

The five things the FCA wants on file

The FCA’s published guidance and the Bank of England’s February 2026 AI roundtables point to a consistent set of expectations. An AI agent operating inside a regulated customer journey needs documented evidence across five areas. Not one of them is optional. Skip any one of them and the second line cannot sign off, which means the agent cannot go live, which means the agentic strategy stays on the slide deck.

1. Pre-deployment risk assessment. A documented analysis of how the AI agent affects each of the four Consumer Duty outcomes: products and services, price and value, consumer understanding, consumer support. Identify the risks. Identify the mitigations. Identify what you will monitor and how.

2. Senior manager accountability. SMCR Senior Manager Conduct Rule 2 requires named individuals to take reasonable steps to ensure their business area is controlled effectively. The FCA has confirmed this applies to AI systems. Delegating a decision to an algorithm does not delegate the liability. The agent is not on the SMCR map. The human who signed off on it is. We worked through this in detail in Count I of the AI on Trial series: the evidence pack has to show who signed off, what they signed off on, and what controls were in place when they did.

3. Real-time monitoring and intervention. Traditional model risk management assumes batch decisions and quarterly reviews. AI agents make customer-facing decisions continuously. The second-line evidence requirement is that the firm can detect when an agent is behaving outside risk appetite, and intervene, before a poor outcome reaches the customer. Post-event analysis alone does not meet the standard. By the time you spot the problem in the rear-view mirror, the harm has happened. Count III of the series covered why reviewing 3% of customer interactions tells you almost nothing about the other 97%, and why that gap becomes structurally impossible to defend at agentic scale.

4. Audit trail at the interaction level. Every conversation an AI agent conducts has to be auditable against regulatory obligations. Not sampled. Audited. The FCA’s emphasis on evidence-based supervision means firms have to be able to demonstrate what the agent did, why it did it, and whether the outcome was good. Count II of the series sets out what “retrievable” actually means in practice. The Bank of England has put it more bluntly: firms’ traditional model risk management approach to validation will not be sustainable as generative AI and agentic systems proliferate, and the human-in-the-loop concept is itself being challenged by the rise of agentic AI. The supervisor used to be a person reviewing a sample. The supervisor now has to be infrastructure.

5. Third-party and operational resilience documentation. Where the AI capability comes from a vendor, contracts have to support FCA oversight expectations: audit rights, access to data and models, incident reporting, exit arrangements. Where the AI supports an important business service, operational resilience rules apply, including impact tolerance testing. The signed MSA is not the evidence. The exercised audit rights are.

What good evidence looks like

The shorthand most firms use is “we have governance documents.” That is not what the regulator means by evidence. Documents are the input. The evidence is what the documents prove.

Here is the difference, document by document.

Document	What firms typically have	What the FCA expects
Risk assessment	A general AI policy	A Consumer Duty impact assessment per deployment, with outcome-specific mitigations
Sign-off record	Named senior owner	Named owner plus a signed assessment of the specific controls relied on at sign-off
Monitoring	Quarterly performance review	Continuous monitoring with documented thresholds and intervention logs
Audit trail	Sample QA on a percentage of cases	100% interaction review, with reasoning traces preserved
Third-party oversight	Vendor contract	Contract plus due diligence file, audit rights exercised at least annually, incident reporting tested

Most firms are not failing on the first column. They are failing on the second. The gap is not about effort. It is about infrastructure. A second line that wants to produce the right-hand column with the resources currently sitting in column one is going to need a different operating model, not a longer document.

Three places second-line teams are currently exposed

Across recent FCA reviews and the Bank of England’s industry consultation, three gaps come up again and again.

The first is the sampling problem. The FCA’s February 2025 review of ongoing financial advice services, covering the 22 largest advice firms, found suitability reviews were delivered in approximately 83% of cases. In a further 15% of cases clients either declined or did not respond to the offer, leaving fewer than 2% where firms made no effort at all. Those are the headline numbers for human-led advice. A firm sampling 2-3% of files for QA will not find that gap. An AI agent making thousands of decisions a day cannot be assured this way at all. The full argument sits in Count III, but the conclusion is simple: sampling assumes variability you can extrapolate from. An AI agent does not vary. It is consistent. Including when it is consistently wrong.

The second is the explainability gap. When the FCA asks why an agent made a specific recommendation, “the model decided” is not an answer. Firms need reasoning traces preserved at the interaction level. Generic foundation models do not produce these by default. Specialist financial services models, trained with explainable outputs, do. This is the argument we made in Count V, and it is the reason FinLLM was built the way it was.

The third is the senior manager exposure. SMCR responsibilities cannot be delegated to a model. The named senior manager signing off on an agentic deployment is personally accountable for the controls, in writing, in their own name. If the controls cannot be evidenced, the exposure sits with the individual, not the technology, and not the vendor. The model does not take the call from the FCA. Count I sets out what reasonable steps look like in practice for a senior manager whose business area now includes an AI agent.

What to prepare now, regardless of which agent you deploy

The deployment timeline for any given AI agent is uncertain. The evidence requirements are not. Five concrete actions for the next 90 days, whether or not the agent itself is built yet.

Map the four Consumer Duty outcomes against your planned AI agent surface. Identify which outcomes the agent affects most directly. Document mitigations for each. Count IV covers why this matters most for vulnerable customers, where guidance quality at scale is the regulator’s most likely first enforcement target.
Confirm the SMCR responsibility map. Name the senior manager accountable for the agent. Document the controls they are relying on. File the assessment. Do this before the agent is built, not after.
Specify the monitoring requirements. What thresholds trigger intervention. What intervention looks like. Who reviews the intervention log. How often. If these are not on paper, they do not exist.
Audit your current oversight infrastructure. Most firms have call QA. Few have anything resembling continuous AI agent monitoring at the interaction level. Identify the gap explicitly and put a number on it. Vague gaps do not get budget.
Tighten your third-party documentation. If the AI capability is vendor-provided, exercise audit rights. Test incident reporting. Verify exit arrangements work in practice. Do it now, in a quiet moment, rather than during the supervisory call that prompts the question.

One more thing: the EU AI Act deadline

For UK firms with EU customers or EU exposure, the EU AI Act adds another layer. AI systems used in credit scoring and insurance pricing are classified as high-risk under Annex III. That means full technical documentation, automatic logging of every decision, and human oversight provisions. The binding compliance deadline is 2 August 2026, though a proposed Digital Omnibus amendment may push parts of it to late 2027. The proposal has not yet been adopted, which means firms should plan against the August 2026 date and treat any extension as a bonus.

A model with no financial services provenance, no documented methodology, and no audit logging cannot be made compliant by wrapping it in a policy document at the eleventh hour. The provenance argument sits in Count V.

The Consumer Duty AI Evidence Checklist

Aveni has built a Consumer Duty AI Evidence Checklist that maps directly to the five areas above. It is the same template Aveni uses with the firms we work with, condensed into a working document second-line teams can use immediately.

Read the Consumer Duty AI Evidence Checklist

Read the full AI on Trial: The Burden of Proof series

Frequently Asked Questions

What evidence does the FCA require for AI agent deployment in financial services? The FCA requires five categories of evidence: a pre-deployment risk assessment against the four Consumer Duty outcomes, named senior manager accountability under SMCR, real-time monitoring with documented intervention thresholds, audit trails at the interaction level (not sampled), and third-party and operational resilience documentation where the AI is vendor-provided.

Does the FCA have specific rules for AI in financial services? No. The FCA has chosen not to publish a bespoke AI rulebook. Its position is that existing rules, including Consumer Duty, SMCR, operational resilience, and the Principles for Businesses, apply to AI systems the same way they apply to any other regulated activity. The regulator expects firms to evidence compliance against the existing framework.

Who is accountable when an AI agent makes a poor decision in a regulated firm? The named senior manager under SMCR remains accountable. SMCR Senior Manager Conduct Rule 2 requires named individuals to take reasonable steps to ensure their business is controlled effectively. The FCA has confirmed this applies to AI systems. Delegating a decision to an algorithm does not transfer the liability to the algorithm or the vendor.

Is sampling-based QA acceptable for AI agent oversight? No. Sampling 2-3% of cases works for human-led interactions because human variability is bounded. AI agents making thousands of decisions a day require continuous, interaction-level monitoring. The Bank of England’s February 2026 industry roundtables noted that firms’ traditional model risk management approach to validation will not be sustainable as generative AI and agentic systems proliferate, and that the human-in-the-loop concept is itself being challenged by agentic AI.

What is a Consumer Duty AI evidence pack? A Consumer Duty AI evidence pack is the set of documents a firm needs to demonstrate that its AI deployment delivers good outcomes against the four Consumer Duty outcomes: products and services, price and value, consumer understanding, consumer support. It includes pre-deployment impact assessment, ongoing monitoring records, intervention logs, and outcome data.

Does the EU AI Act apply to UK financial services firms? The EU AI Act applies to UK firms that serve EU customers or have EU regulatory exposure, particularly in credit scoring and insurance pricing. The binding compliance deadline for high-risk AI systems is 2 August 2026, though a proposed Digital Omnibus amendment may delay parts of this to late 2027. Requirements include full technical documentation, automatic decision logs, and human oversight provisions.