On 15 May 2026, OpenAI gave ChatGPT your bank account. Pro users in the US can now plug in more than 12,000 institutions through Plaid, Schwab and Fidelity and Chase and Robinhood and Amex and Capital One among them. The chatbot reads your balances, your transactions, your holdings, what you owe. The Intuit integration coming next takes it past reading and into doing: paying, applying, transacting.
OpenAI says 200 million people already bring their money questions to ChatGPT every month. Last week those questions got generic answers. This week they get answers that know exactly what is in your account.
For years, your bank owned the money conversation. You wanted a mortgage, you asked them. You wanted to know if you could afford the holiday, you asked them. That conversation is walking out the door and into ChatGPT, and banks know it. Nobody is waiting to see how this plays out. The plans are already being drawn up in boardrooms.
Which leaves one question, and it decides whether any of that response is worth building: who governs the agents?
Who governs the agents?
So far, the talk about AI agents has been all about what they can do. What journey can the agent handle. Which model sits behind it. What it costs to run. How quickly it can go live.
That is the easy part. Building an agent that can hold a financial conversation, ask the right questions, and land on a sensible answer stopped being hard a while ago. Off-the-shelf models will get most firms to roughly 70% of the way there in a few weeks.
The hard part starts the moment the agent goes live and starts talking to real customers. A regulated firm has four questions to answer before launch, and to keep answering every day after:
- How do we know the agent is staying inside the risk appetite we set?
- How do we step in when it does not?
- How do we prove we met our Consumer Duty obligations on every single interaction?
- How do we hold the agent to the same standard we would hold a human adviser?
None of that is an engineering problem. It is governance. And the FCA does not deal with governance failures gently. Get them wrong badly enough, they issue a Final Notice.
The FCA position is consistent
The FCA does not relax its rules because the thing on the other end of the conversation is software. Consumer Duty does not switch off when a customer is talking to a bot. And no senior manager gets to shrug and say the agent did it. Right through the AI Live Testing programme and the Supercharged Sandbox, the regulator has said the same thing: it judges the outcome, and the firm owns that outcome no matter what produced it.
You can read that as a handbrake on AI. The firms moving quickest are reading it as instructions.
A second-line risk team signing off an agent needs three things on the table. Proof that the agent was stress tested against real-world scenarios before it went anywhere near a customer. Proof that it is being watched against risk thresholds while it runs. And proof that every interaction it handles can be pulled up and checked against Consumer Duty.
Without those three things, putting the agent live is a gamble with the regulator. A few firms will take it. Most have more sense.
What unified assurance looks like
Here is where we think AI agent governance in a UK regulated context has to start. A few principles, and they hang together.
One standard for humans and agents alike. The FCA does not care whether a bad outcome came from an adviser or a bot, so the oversight should not care either. The same checks, drawn from the same regulatory knowledge, applied to both.
Stress testing before anyone goes live. Before the agent talks to a customer, you run it against scenarios built from real interactions and real regulatory material. What comes out is an evidence pack the second line can actually sign. No pack, no launch.
Catch it in the moment, not in the post-mortem. Once the agent is live, something has to be watching its behaviour as it happens and pulling in a human the moment risk starts creeping up. Wait for the post-event review and the customer has already been harmed.
Models built for the job. General-purpose models get you to 70%. The last 30%, the stretch between a slick demo and something the regulator will actually accept, needs models trained on financial services data, labelled by people who know compliance, and sharpened through real deployments. That last stretch is a domain problem. It takes financial services data and compliance expertise, not a bigger model.
What banks should do this quarter
The firms scoping their response now will set the terms for everyone who follows. Three things to get right.
- Decide what good evidence looks like before you build the agent. Agree internally what the second line will need to see to sign a deployment off. Settle that first and the build has a shape to fit.
- Pick the easy ground first. Servicing, onboarding, suitability triage, complaint handling. These are sensible places to start. Advice itself almost never is, however tempting the technology makes it look.
- Choose your model strategy on purpose. The 70/30 split between general and specialist models is real. Work out which job belongs in which tier before you buy, not after.
The OpenAI and Plaid launch is the warning shot. The real story is what UK financial services does next, and whether the second line of defence will let firms put live what they have built.
So the question lands back with the banks. We have spent seven years at Aveni on the assurance layer that answers it, validated in the FCA’s Supercharged Sandbox. If that is the problem on your desk, let’s chat.
Read the full AI on Trial series
- The Governance Accountability Framework
- Count I: SMCR Compliance and AI Agent Oversight
- Count II: Why AI-Assisted Advice Needs a Retrievable Audit Trail
- Count III: Why Reviewing 3% of Calls Tells You Nothing About the Other 97%
- Count IV: When AI Gets It Wrong: The Risk to Vulnerable Customers
- Count V: Why Financial Services AI Needs Domain-Specific Models
Frequently Asked Questions
What is AI agent governance in financial services?
AI agent governance is the framework of controls, monitoring, and evidence that allows a regulated financial services firm to deploy an AI agent in customer interactions while meeting its conduct, Consumer Duty, and oversight obligations. It typically combines pre-deployment testing, real-time monitoring, and audit-ready interaction analysis.
Does the FCA permit AI agents in customer interactions?
The FCA does not prohibit AI agents in customer interactions. The regulator’s position is that the same conduct standards, Consumer Duty obligations, and oversight requirements apply regardless of whether the interaction is human-led or AI-led. Firms can deploy AI agents provided they can evidence the controls.
What is the FCA Supercharged Sandbox?
The FCA Supercharged Sandbox is the regulator’s testing programme for advanced AI applications in financial services, allowing firms to test AI products in a controlled regulatory environment before broader deployment.
Why is governing AI agents harder than governing human advisers?
Governing AI agents is not necessarily harder than governing human advisers, but it requires a different evidence model. Human oversight relies heavily on retrospective sampling. AI agents can be monitored continuously and intervened in real time, but that requires infrastructure most firms have not yet built.
What happens if a UK bank deploys an AI agent without an assurance layer?
A UK bank that deploys an AI agent without an assurance layer carries open-ended regulatory exposure. The Consumer Duty applies to every interaction the agent handles. Without evidence of pre-deployment testing, real-time monitoring, and audit trail, the firm cannot demonstrate that the agent operated within conduct standards, which creates Final Notice risk.