Part of the AI on Trial: The Burden of Proof campaign series
The Defence Brief: Insurance
The customer is on the phone because their husband died last week. They want to know what happens to the joint life policy. They are crying. They mention, in passing, that money is going to be tight. They ask whether the policy still pays out if they cancel the direct debit for a month.
This is a vulnerable customer interaction. Health, life event, resilience, capability — three of the FCA’s four drivers are in play inside the first ninety seconds. Under Consumer Duty, the firm has to identify the vulnerability, respond to it appropriately, and evidence that it did both. If an AI tool is anywhere in the loop — taking the call, drafting the follow-up email, processing the cancellation, triaging the claim — the obligation applies to the AI too.
Now run that scenario across a few million insurance interactions a year. Claims, complaints, renewals, lapse calls, underwriting disclosures, first notification of loss. The vulnerability signals are not scattered through the volume. They are the volume.
The FCA already sees this. Its 2025 review of Consumer Duty board reports flagged claims handling and vulnerability support as the two weakest areas across the industry. The Protect Association’s 2026 sector outlook called claims outcomes “the clearest test of Consumer Duty.” Persistently low claims acceptance rates are now sitting on the regulator’s risk watchlist for boards.
So the question for insurers deploying AI is not whether the regulator is paying attention. It is whether your firm can prove, in writing, that your AI is recognising vulnerability when it sees it, responding the way the FCA expects, and leaving a trail you can hand over on request.
This brief takes three of the five charges from Aveni’s AI on Trial series — Counts III, IV, and V — and works through what they mean for insurance specifically.
The sector-specific problem
Most financial services customers contact their provider when nothing much is wrong. They want to check a balance, update a beneficiary, ask about a fund. Insurance is different. Most customers contact their insurer when something has gone wrong. A burst pipe. A diagnosis. A funeral. A car wrapped around a tree.
The customer journey is structured around bad days. That is the product.
It is also the reason vulnerability concentration in insurance is higher than in any other regulated sector. The FCA’s working definition under FG21/1 covers four drivers: health, life events, resilience, capability. Insurance interactions touch all four with unusual frequency.
- Bereavement claims. Life event, often with resilience stress underneath.
- Critical illness claims. Health and resilience together.
- Home contents claims after a flood or fire. Life event, sometimes capability.
- Premium affordability conversations during a cost-of-living squeeze. Resilience.
- Policy lapse calls where the customer cannot afford to keep cover. Resilience, sometimes health.
- Complaints about a rejected claim. Often capability, sometimes all four.
None of this is exotic. It is the normal work of an insurance contact centre on a normal Tuesday.
Now add the AI layer. Insurers are deploying AI across claims triage, document processing, fraud detection, virtual assistants for first notification of loss, automated underwriting, and outbound customer communications. Each one of those use cases puts AI in the room with customers who are statistically more likely to be vulnerable than not.
The FCA’s vulnerability guidance is clear that vulnerability is not a fixed state. A customer can become vulnerable mid-interaction. The firm has to respond accordingly. The AI has to respond accordingly. And the firm has to evidence that the AI did.
What the FCA actually means by a vulnerable customer
The FCA’s definition, under FG21/1, is a customer who, “due to their personal circumstances, is especially susceptible to harm, particularly when a firm is not acting with appropriate levels of care.”
The four drivers:
- Health. Physical or mental health conditions affecting decision-making or engagement with financial products.
- Life events. Bereavement, divorce, redundancy, becoming a carer, retirement, diagnosis.
- Resilience. Low capacity to absorb financial or emotional shocks.
- Capability. Difficulty with literacy, numeracy, digital skills, or understanding financial concepts.
The regulator’s Financial Lives Survey found that 58% of adults in poor health had experienced issues with financial providers because of their condition. Separate FCA research on retail investments found firms identifying around 1% of customers as vulnerable against an underlying population estimate many times that. The detection gap is enormous, and the FCA has been blunt about it.
For insurers, this means vulnerability identification cannot be an annual training exercise. It cannot be a tick-box at onboarding. It has to be live capability that runs across every customer interaction, including every AI-assisted one.
Charge III applied to insurance: where 3% sampling fails hardest
Charge III argued that sample-based quality assurance is no longer defensible under Consumer Duty. Most QA teams review somewhere between 1% and 3% of customer interactions. Under a regime that expects firms to evidence good outcomes for every customer, that gap is a governance problem.
In insurance, the gap is worse. Here is the maths.
A bereavement claim is a small fraction of total call volume. So is a critical illness disclosure during underwriting. So is a vulnerable customer mentioning that they cannot afford the premium. These are exactly the interactions the FCA cares about. They are also exactly the interactions least likely to fall inside a random 3% sample.
Random sampling is, by design, blind to outcome severity. It treats a routine policy enquiry and a bereavement claim as equivalent units. If you run a million calls a month and review 30,000, the odds of any specific bereavement call making it into the review queue are tiny. The interactions where vulnerability handling matters most are the ones most likely to be missed entirely.
The FCA’s 2025 board reports review said exactly this. The regulator found that many firms could not evidence consistent identification and support of vulnerable customers across the claims journey. The MI was missing. The sampling regimes were not built to surface the cases where vulnerability handling had broken down.
Bigger samples do not fix this. The maths is still against you. The fix is monitoring that covers every interaction, with the ability to flag vulnerability indicators automatically and route the cases that need a human reviewer.
Charge IV applied to insurance: when AI gets the moment wrong
Charge IV addressed what happens when AI gives poor guidance, and Consumer Duty product and service standards apply to every interaction it touches.
In insurance, this charge has a sharper edge.
AI-assisted interactions in claims, underwriting, and customer service are increasingly the first point of contact. A virtual assistant taking a first notification of loss. An AI-drafted email replying to a complaint about a settlement amount. An automated underwriting decision on critical illness cover after a recent diagnosis. Each of these is a moment where the customer may be vulnerable. Each one sets the tone for everything that follows.
Generic AI tools handle these moments badly. There are two reasons.
The first is that vulnerability is rarely declared. Customers do not say “I am vulnerable.” They say things like “I am just trying to get this sorted before I go in for treatment next week,” or “I have been on my own since my wife passed away in March,” or “Money is a bit tight at the moment so I just wanted to check the policy was still active.” These signals are buried mid-sentence in otherwise routine interactions. A model trained on broad internet data does not reliably catch them.
The second is that detection on its own is not enough. The FCA’s vulnerability guidance is unambiguous: identifying a vulnerable customer triggers an obligation, not a compliance tick. The interaction has to change. Pace, language, signposting, escalation. The firm has to evidence that the change happened. Many insurers use the TEXAS model — Thank, Explain, eXplicit consent, Ask, Signpost — as the operational shape of that response. The question for any AI deployment is whether the tool recognises the cues that trigger a TEXAS conversation, and whether it either handles it appropriately or hands off to someone who can.
Here is what going wrong looks like in practice.
An automated claims chatbot receives a message from a customer asking to cancel a life policy. It processes the cancellation request and confirms cover will lapse. What it missed: the same message mentioned the customer had just finished chemotherapy and was “trying to get my affairs in order.” A trained human handler would have caught this in the first three seconds. The chatbot just cancels the policy.
The customer experiences harm. The firm cannot evidence that the interaction met the Consumer Duty standard. The AI did exactly what it was asked. That is the problem.
This is the scene of the crime Charge IV describes. The investigative work starts after the fact. By then the customer is already gone.
Charge V applied to insurance: why generic models miss the signals
Charge V made the case for model provenance. Where was the AI trained, on what data, against what benchmarks, in what jurisdiction.
In insurance, the provenance question lands hard because the language of insurance vulnerability is specific. The vocabulary is specific. The disclosure patterns are specific. The regulatory framing, the policy terminology, and the customer journey shape are all specific.
A general-purpose foundation model trained mostly on public internet data has limited exposure to UK regulated insurance interactions, the FCA’s vulnerability framework, or the operational vocabulary of claims handling. When that model is deployed into insurance use cases, two failures follow. It misses signals it was never trained to recognise. And it produces outputs in a register that does not match how UK regulated insurers are expected to communicate, particularly with vulnerable customers.
The FCA’s expectation, and increasingly the EU AI Act’s expectation for high-risk systems, is that firms can answer four questions about the AI they deploy. Where was the model trained. On what data. With what benchmarks. Within what jurisdiction. For most generic foundation models, the honest answer is some version of “we cannot tell you, and neither can the vendor.”
For insurers, that answer will not hold up.
What the evidence file actually looks like
If a regulator asked tomorrow what your AI is doing across vulnerable customer interactions, six questions need answers.
- What interactions are being touched by AI. Every channel, every use case, every model in production.
- Which of those interactions involve vulnerable customers. Identified by which signals, against what definition, with what supporting documentation.
- How vulnerability indicators are surfaced in real time. Detection mechanism, response protocol, escalation route.
- How the AI’s behaviour during those interactions is monitored. Coverage, sampling logic, exception handling, outcome tracking.
- How adverse outcomes feed back into the system. Complaints data, MI on claims acceptance and decline, vulnerability data linked to redress.
- What the senior manager under SMCR has actually reviewed. Board pack, assurance reports, action log on identified gaps.
Most firms can answer some of these questions for some of their use cases. Few can answer all six across the full AI footprint. The FCA’s 2025 board reports review said this directly: firms are reporting on Consumer Duty activity, not on Consumer Duty outcomes. The data linking the two is often missing.
Closing those gaps is the work.
Where Aveni fits in
Two of Aveni’s products do most of the heavy lifting for insurers.
Aveni Detect runs across customer interactions — voice calls, video meetings, written communications — and assesses every one for Consumer Duty risk, vulnerability indicators, complaints handling, and regulatory signals. Detect is live and deployed across financial services firms, including in insurance. It moves the firm from 3% sampling to complete coverage. It surfaces the calls a reviewer needs to hear, rather than asking reviewers to find them.
For insurers, Detect identifies vulnerability cues across the four FCA drivers, tracks complaints handling against DISP expectations, and connects vulnerability MI to outcomes MI. That is Charge III answered.
Guidance Agents is the real-time coaching layer Aveni is designing for advisers and claims handlers. As an interaction unfolds, the system flags vulnerability indicators, suggests the appropriate response framework, and prompts the handler to follow the firm’s protocol. Guidance Agents is on our roadmap, not yet deployed. It is being built to answer Charge IV directly: AI that helps the handler get the moment right, before it becomes a scene of the crime.
FinLLM is the domain-specific large language model Aveni built for UK financial services. It first launched in 2025 and is in live testing with a tier-one UK bank. Its training data, benchmarks, and hosting jurisdiction are documented. That is Charge V answered.
The combination is built to deliver what the evidence file requires: monitoring that covers every interaction, coaching that improves the interaction in the moment, and a model underneath that can defend its own provenance.
Where your firm stands
The five governance gaps in the AI on Trial series are the questions the FCA, your board, and your senior managers are working through right now. For insurers, three of them converge on the same operational reality. Vulnerable customers are everywhere in insurance, and the AI deployed across the customer journey has to recognise them, respond to them, and evidence both.
The defence is not built after a regulatory request arrives. It is built into the operating model now, while the FCA’s expectations are still being set, while the Mills Review is still consulting, and while firms still have a window to define what good looks like rather than have it defined for them.
This article is part of Aveni’s AI on Trial: The Burden of Proof campaign series.
Read the full series:
- AI Governance in UK Financial Services: The Accountability Framework (hub page)
- Count I: SMCR Compliance and AI Agent Oversight
- Count II: AI Advice Without an Audit Trail
- Count III: Why Sampling 3% Falls Short of Consumer Duty
- Count IV: When AI Guidance Goes Wrong at Scale
- Count V: Why Financial Services AI Needs Domain-Specific Models
Defence briefs by sector:
Sources cited in this article:
- FCA, FG21/1: Guidance for firms on the fair treatment of vulnerable customers (2021)
- FCA, Consumer Duty: Principle 12, cross-cutting rules, four outcomes (2023, updated ongoing)
- FCA, Review of Consumer Duty board reports: 180-firm review (2025)
- FCA, Financial Lives Survey
- Protect Association, 2026 sector outlook on Consumer Duty and claims
- Treasury Committee report on AI in financial services (January 2026)
- FCA Mills Review on AI in financial services (ongoing, recommendations expected end-2026)