AI agent governance financial services

Who is going to govern the agents UK banks are about to deploy?

On 15 May 2026, OpenAI launched its personal finance experience in ChatGPT. Pro users in the US can now connect over 12,000 financial institutions through Plaid, including Schwab, Fidelity, Chase, Robinhood, American Express, and Capital One. ChatGPT can read balances, transactions, investments, and liabilities. The Intuit integration coming next will likely extend that further, into action: paying, applying, transacting.

OpenAI says more than 200 million people already ask ChatGPT personal finance questions every month. Until last week those people received generic answers. From last week, they received answers grounded in their actual accounts.

The strategic read for UK financial services is simple, and almost everyone has read it the same way. The customer relationship that banks, wealth firms, and advisers have spent decades building is starting to shift toward the interface where people already ask their questions. UK banks watching this announcement will not wait. The agentic response is already being scoped at board level.

There is one question that has not yet been answered, and it is the question that will determine whether the agentic response actually lands.

Who governs the agents?

The conversation about AI agents in UK financial services has so far been overwhelmingly about capability. What can the agent do. What journey can it handle. What model sits behind it. What budget it requires.

That conversation is the easy one. Building an AI agent that can hold a financial conversation, gather information, and surface a suitable outcome is no longer a research problem. The off-the-shelf models can get most firms to roughly 70% quality in a matter of weeks.

The harder conversation is what happens once the agent is live. A regulated firm deploying an AI agent into a customer conversation needs to answer four questions before launch, and to keep answering them every day after:

  • How do we know this agent behaves within our defined risk appetite?
  • How do we intervene when it does not?
  • How do we evidence Consumer Duty obligations across every interaction?
  • How do we apply the same compliance standard to the agent as we apply to a human adviser?

These aren’t technical questions. They’re governance questions. And the firms that get them wrong aren’t going to be told off privately. They’re going to be issued a Final Notice.

The FCA position is consistent

The FCA does not soften its conduct standards when the actor is software. The Consumer Duty does not pause when the interaction is automated. Senior managers do not get to point at a model and say “the agent did it.” The regulator’s position throughout the AI Live Testing programme and the Supercharged Sandbox has been clear: the standard applies to the outcome, and the firm is accountable for the outcome regardless of how it was produced.

Some firms are reading that as a brake on AI deployment. The firms moving fastest are reading it as the spec for how to build.

A second-line risk function that signs off on an AI agent deployment needs three things: evidence that the agent was stress tested against realistic scenarios before launch, evidence that the agent is monitored against risk thresholds in real time, and evidence that every interaction it handles is auditable against Consumer Duty obligations.

If those three pieces of evidence aren’t in place, the deployment is a regulatory bet. Some firms will take that bet. Most will not.

What unified assurance looks like

There’s a model emerging for what AI agent governance looks like in a UK regulated context, and it is starting to settle around a small number of shared principles.

One compliance standard across human and AI interactions. The FCA doesn’t differentiate between a poor outcome delivered by an adviser and a poor outcome delivered by an agent. Neither should the oversight infrastructure. A unified assurance layer applies the same evaluation to both, drawing from the same regulatory knowledge base.

Pre-deployment stress testing as a precondition. Before an agent goes live, it’s tested against scenarios drawn from real interaction data and regulatory materials. The output is an evidence pack the second line can sign off on. Without that pack, deployment is speculative.

Real-time intervention, not just post-event analysis. Once the agent is in customer interactions, the governance layer monitors behaviour continuously and escalates to human review when risk thresholds are approached. By the time a poor outcome is detected post-event, the harm has already happened.

Specialist models, not just generic ones. General-purpose foundation models get firms to 70%. That last 30%, the gap between a working demo and a production system that the regulator will accept, requires models trained on financial services data, labelled by compliance experts, and refined through real deployments. The gap is not a model problem. It is a domain problem.

What banks should do this quarter

The firms scoping their agentic response now will define the category. Three actions to focus on:

  1. Define the assurance requirement before the agent. Agree internally what evidence the second line will need to sign off on an agent deployment. If the assurance requirement is defined first, the agent build is bounded by it.
  1. Map the agent surface area pragmatically. Servicing, onboarding, suitability triage, and complaint handling are common starting points. Advice itself is rarely the right first deployment, regardless of where the technology is.
  1. Choose the model strategy deliberately. The 70% / 30% split between general-purpose and specialist models is real. Decide which work belongs in which tier before procurement, not after.

The OpenAI + Plaid announcement is the warning shot. The disruption is what UK financial services firms do next, and whether the second line of defence will let them deploy what they build.
That is the governance gap. Aveni is one of a small number of UK firms working specifically on this problem, and we have spent seven years building toward it. There is more on our approach to AI agent assurance, including how this was validated in the FCA’s Supercharged Sandbox, on the Aveni website.

Frequently Asked Questions

What is AI agent governance in financial services?

AI agent governance is the framework of controls, monitoring, and evidence that allows a regulated financial services firm to deploy an AI agent in customer interactions while meeting its conduct, Consumer Duty, and oversight obligations. It typically combines pre-deployment testing, real-time monitoring, and audit-ready interaction analysis.

Does the FCA permit AI agents in customer interactions?

The FCA does not prohibit AI agents in customer interactions. The regulator’s position is that the same conduct standards, Consumer Duty obligations, and oversight requirements apply regardless of whether the interaction is human-led or AI-led. Firms can deploy AI agents provided they can evidence the controls.

What is the FCA Supercharged Sandbox?

The FCA Supercharged Sandbox is the regulator’s testing programme for advanced AI applications in financial services, allowing firms to test AI products in a controlled regulatory environment before broader deployment.

Why is governing AI agents harder than governing human advisers?

Governing AI agents is not necessarily harder than governing human advisers, but it requires a different evidence model. Human oversight relies heavily on retrospective sampling. AI agents can be monitored continuously and intervened in real time, but that requires infrastructure most firms have not yet built.

What happens if a UK bank deploys an AI agent without an assurance layer?

A UK bank that deploys an AI agent without an assurance layer carries open-ended regulatory exposure. The Consumer Duty applies to every interaction the agent handles. Without evidence of pre-deployment testing, real-time monitoring, and audit trail, the firm cannot demonstrate that the agent operated within conduct standards, which creates Final Notice risk.

Share with your community!

In this article

Related Articles

Join our newsletter

Be the first to hear about new features, releases, and best-practice guides.

Aveni AI Logo