Responsible AI

How banks should evidence AI oversight when interactions run into the millions

May 21, 2026
10 min read

Part of the AI on Trial: The Burden of Proof campaign series

AI accountability for banks: what proof looks like under SMCR and Consumer Duty

A UK retail bank deployed AI across call centres, chatbots, fraud triage, and customer service. Tens of thousands of customer interactions a day. AI-assisted decisions on every one. When the FCA asked the bank’s CRO how the firm supervised those decisions and how it evidenced good outcomes for customers, the answer drew on three things: a governance policy, a sample-based QA programme, and a model risk framework written before the bank deployed any generative AI. The policy was sound. The infrastructure was not.

This is the defence brief for banking. Five governance gaps run through the entire AI on Trial series. Three of them apply with particular force to banks operating at scale: unsupervised AI agents, sampling-based monitoring, and AI models with no financial services provenance. This article translates those three charges into the specific regulatory and operational context of UK banking.

Why banks face the sharpest AI accountability test in financial services

Banks operate at a scale that makes AI both essential and unforgiving.

A retail bank may handle several million customer interactions a month across call centres, branch networks, mobile apps, chatbots, and digital channels. Tens of thousands of those interactions involve some form of AI assistance: triaging customer queries, generating responses, identifying fraud, flagging vulnerability, drafting summaries, deciding which complaints need human escalation. The benefit is obvious. The accountability problem is too.

When AI assists with millions of decisions a month, three things become true that did not apply to the same firm five years ago. First, the volume of decisions exceeds what any human oversight function can review individually. Second, the consistency of AI decisions means that an error replicates at machine speed across every interaction where the same conditions appear. Third, the regulatory expectations have caught up with the technology faster than most banks have built the infrastructure to meet them.

The FCA has been explicit about where the responsibility sits. The Mills Review is examining what oversight senior managers must have over AI decisions. Consumer Duty requires firms to evidence good outcomes across all customers, not a sample. The Senior Managers and Certification Regime puts personal accountability on named individuals for the decisions their function delivers, including the decisions delivered by AI.

For banks, the practical implication is direct. Every senior manager whose function uses AI now carries personal liability for AI outputs. Every customer interaction now needs to be assessed against Consumer Duty outcome standards. Every AI model in production now needs documented provenance.

This article walks through what proof looks like for each of these three obligations.

For a full breakdown of all five governance gaps, see the complete framework → AI Governance in UK Financial Services: The Accountability Framework

SMCR liability for AI decisions: who is accountable when no human is in the loop

The first question the FCA is likely to ask any bank deploying AI at scale is the most basic one: who is responsible when the AI makes the wrong decision?

Under SMCR, the answer cannot be “the AI.” Senior Managers carry personal regulatory accountability for the decisions their function delivers. The regime was written for a world where humans made the decisions and other humans supervised them. The regulator has confirmed that this principle does not change when AI is doing the work. The Senior Manager remains accountable.

For banks, this creates a specific problem that did not exist when AI was confined to back-office automation. An AI agent handling customer service queries makes decisions that affect customer outcomes. A model triaging fraud cases makes decisions that affect financial crime risk. A chatbot responding to mortgage queries makes decisions that fall under FCA conduct rules. Each of these decisions has a senior manager whose name sits on the org chart above the function that owns the AI. Each of those senior managers is now personally accountable for what the AI does.

The accountability gap that opens up looks like this. A traditional supervisory model assumes the senior manager can see what their function is doing, intervene when something goes wrong, and demonstrate to the regulator that controls are in place. An AI function processing tens of thousands of decisions a day breaks all three assumptions. The senior manager cannot see every decision. They cannot intervene in real time. They cannot demonstrate control unless the supporting infrastructure produces structured evidence of what the AI did and why.

The supporting infrastructure has not kept up. Most banks deploying AI today rely on three things to demonstrate oversight: a governance policy, a sample-based QA programme, and an AI risk register. None of these answer the question the FCA is now asking. The policy describes what should happen, not what did. The sample reviews a fraction of interactions, not all of them. The risk register identifies risks, not the specific actions taken to mitigate them in each interaction.

What proof looks like for SMCR accountability is concrete. The bank can show, for any AI-assisted decision, who the responsible senior manager was, what the AI recommended, what the human did with that recommendation, and what the customer received as a result. The bank can demonstrate that this evidence exists not just in policy but in the operational record, retrievable on demand, for every interaction the AI touched.

For a deeper look at SMCR liability for AI, see Count I: SMCR Compliance and AI Agent Oversight.

Consumer Duty at bank scale: why 3% sampling will not survive a thematic review

The second question is whether the bank can evidence good customer outcomes across its full customer base.

Banks have been sampling for decades. A 2% to 5% QA sample was the industry norm long before Consumer Duty arrived. It worked because the regulator was checking process compliance, not outcome evidence, and because the scale of customer interactions made full coverage operationally impossible. Both of those conditions have changed.

Consumer Duty requires firms to evidence outcomes across all customers, not a sample. The FCA reviewed 180 first-year Consumer Duty board reports and found firms approving summaries based on incomplete monitoring. The regulator was direct: process completion is not the same as outcome evidence. For a bank processing millions of interactions a month, a 2% sample tells the board something about 2% of its customers. It tells the board nothing about the other 98%.

The structural problem is sharper for banks than for any other sector in financial services. A wealth manager with 5,000 high-net-worth clients can plausibly review a meaningful proportion of advice interactions. A retail bank with millions of customers cannot. Sampling at scale produces statistical estimates, not evidence. The FCA has signalled that statistical estimates will not be enough to demonstrate Consumer Duty compliance going forward.

The agentic AI dimension makes the gap worse. As banks deploy AI agents to handle direct customer interactions, those agents operate around the clock, across thousands of conversations a day. When an agent’s guidance contains an error, the error does not stay in one adviser’s caseload. It spreads across the customer base, continuously, until someone catches it. A 3% sample of a 24/7 AI agent’s output is less likely to find a systemic problem than a 3% sample of a human team’s work, because the AI’s errors cluster around specific triggers rather than spreading evenly across the population.

What proof looks like for Consumer Duty at bank scale is full coverage of customer interactions, not a sample of them. Every call, every chat, every digital interaction assessed against the firm’s Consumer Duty framework. Vulnerability indicators caught in real time. Conduct risk patterns identified across the population. Complaints and dissatisfaction signals flagged before they become formal complaints. The evidence the bank produces for the board and the regulator covers the whole customer base, not extrapolated estimates from a fraction of it.

For a deeper look at the sampling problem, see Count III: Why Sampling 3% of Customer Interactions Falls Short of Consumer Duty.

Model provenance: why generic LLMs create regulatory exposure for banks

The third question concerns the AI models themselves.

Most banks deploying AI in customer-facing roles today rely on general-purpose large language models built by major US technology companies. These models were trained on broad internet text, fine-tuned for general use, and deployed across thousands of industries. They were not built for UK financial services. They were not aligned to FCA regulatory requirements. They were not designed with the auditability that a UK regulator now expects.

That mismatch creates regulatory exposure that most banks have not yet quantified. The 2026 Stanford AI Index Report tracked transparency across the leading foundation models and found average disclosure scores fell from 58 in 2024 to 40 in 2025. The companies building the models firms depend on are telling firms less about how those models work, not more. The weakest area of disclosure is upstream: training data, compute resources, and post-deployment impact. For a UK bank, this means the foundational model behind its AI deployment is becoming less explainable at the same time as regulatory expectations for explainability are rising.

The EU AI Act adds a hard deadline. High-risk provisions take effect in August 2026, and many bank AI use cases (credit scoring, fraud detection, automated customer assessment) fall under the high-risk classification. The Act requires traceability of training data, documentation of model design choices, and demonstrable risk management across the model lifecycle. A generic LLM with no financial services provenance struggles to meet these requirements out of the box.

For banks, the practical consequence is that the model risk frameworks written before generative AI was deployed do not cover the new exposure. Traditional model risk management assumes the model is statistical, finite in scope, and trained on a defined dataset the bank controls. Generative AI inverts each of those assumptions. The model is probabilistic. Its scope expands with each new use case. The training data is opaque and belongs to a third party.

What proof looks like for model provenance is documented training data, documented regulatory alignment, and a model that can explain its outputs in a way an auditor can verify. Banks that deploy domain-specific models built for financial services carry a governance advantage that generic LLMs cannot match. The model risk framework needs to be updated to reflect the new exposure. Procurement processes need to include provenance as a criterion. And the senior managers accountable for AI decisions need to know what the model behind those decisions actually is.

For a deeper look at model provenance, see Count V: Why Financial Services AI Needs Domain-Specific Models.

What the regulatory timeline looks like for banks in 2026

The pressure is concentrated. Several deadlines fall in the same window.

The FCA’s Mills Review will report to the FCA Board in summer 2026, with comprehensive guidance on how SMCR and Consumer Duty apply to AI deployment expected by year-end. The review is examining the specific question of what senior manager oversight looks like when AI is making decisions, which means new guidance will land directly on the accountability questions banks are still working through.

The EU AI Act’s high-risk provisions take effect in August 2026. UK banks with EU operations, EU customers, or EU vendor relationships face direct compliance requirements. The Act’s traceability and risk management standards will also become a de facto benchmark for the FCA, even where it has no formal jurisdiction.

The targeted support regime, relevant to banks delivering AI-driven nudges and recommendations to customers, came into force in April 2026. The FCA has indicated that multi-firm thematic reviews of AI deployment will form part of its 2025 to 2026 supervisory agenda.

For the banks reading this article, the practical question is which of those deadlines arrives first and what the firm’s defence brief looks like when it does. Banks that arrive at a thematic review or a Section 166 with documented AI oversight, 100% interaction monitoring, and provenance evidence for their models will pass. Banks that arrive with a governance policy, a 3% sample, and a generic LLM behind their customer interactions will not.

Read how Tier 1 banks should rebuild the Three Lines of Defence for AI agents →

How Aveni helps banks build the defence brief

The case-by-case work of building the infrastructure described above is what Aveni is built for.

Aveni Detect is in production at UK banks today. It analyses every customer interaction across voice, chat, documents, and digital channels. It assesses each interaction against the firm’s compliance framework automatically, flagging conduct risk, vulnerability indicators, suitability concerns, and Consumer Duty outcome markers. For a bank moving from sampling to full coverage, Detect closes the gap between policy and evidence. The board reports draw on data covering the entire customer base. The Consumer Duty obligations are met with structured evidence at scale.

Aveni’s broader platform addresses the wider accountability stack. Agent Assure provides governance infrastructure for AI agent oversight in line with SMCR expectations. FinLLM is the purpose-built language model for UK financial services, designed for the provenance and auditability requirements the EU AI Act and the FCA now expect. Together, these products turn the abstract obligations of the regulatory timeline into operational evidence the bank can produce on demand.

The firms that will answer the FCA’s questions most convincingly are the banks that stopped treating governance as a policy exercise and started treating it as an infrastructure problem. Aveni is the expert witness for the defence. Not the prosecution. We help banks build the case, not fear the courtroom.

Where does your bank stand?

The three governance gaps covered in this article are the ones that apply most directly to UK banks. The other two charges in the AI on Trial series cover audit trail and guidance quality and may also apply depending on your AI deployment.

See how Aveni helps UK banks build the defence brief →

This article is part of Aveni’s AI on Trial: The Burden of Proof campaign series.

Read how Tier 1 banks should rebuild the Three Lines of Defence for AI agents →

Read the full series: