Responsible AI

How a pension provider proves good outcomes now when the customer retires in 20 years

June 2, 2026
10 min read

Part of the AI on Trial: The Burden of Proof campaign series

The Defence Brief: Pensions

A UK pension provider deployed AI across its contact centre, retirement guidance tools, vulnerable customer triage, and member communications. The customer base spans three generations. Some members will not draw their pension for forty years. Others are mid-decumulation today. When the FCA asks the provider’s Chief Risk Officer how the firm evidences good outcomes from AI-influenced decisions over that time horizon, the answer needs to reach further back than the policy framework currently does. The decisions being made today will be assessed against outcomes that may not surface until 2046.

This is the defence brief for pensions. Five governance gaps run through the entire AI on Trial series. Three of them apply with particular force to pension providers: unsupervised AI agents in member-facing roles, sampling-based monitoring across a long-duration product, and AI models with no documented financial services provenance. This article translates those three charges into the specific regulatory and operational context of UK pensions.

Why pensions face the longest accountability tail in financial services

Pensions are the only retail financial product where the customer outcome cannot be assessed for decades.

A retail bank knows within weeks whether a fraud decision was correct. A wealth manager knows within a year whether a portfolio recommendation was suitable. A pension provider may not know until the member retires whether the cumulative effect of AI-influenced nudges, default fund allocations, retirement guidance, and consolidation prompts left the member in a better or worse position than they would otherwise have been.

That time gap creates a governance problem that does not exist in other sectors. When AI assists a customer interaction today, the record of what the AI did needs to outlive the staff who deployed it, the systems that ran it, and in some cases the regulatory regime under which it was approved. The evidence base needs to be retrievable in 2030 or 2040 with the same fidelity it had in 2026. Most pension providers have not built infrastructure for that.

The FCA has been explicit about where the pressure is now landing. The Value for Money Framework consultation (CP26/1) closed in March 2026, with first VFM assessments targeted for 2028. The targeted support regime for pensions launched in April 2026, giving providers permission to make limited recommendations to groups of customers under Consumer Duty obligations. Unit-linked pensions and long-term savings sit inside the FCA’s 2025/26 Consumer Duty market study programme, with findings expected in H1 2026.

For pension providers, the practical implication is direct. Every AI-assisted interaction now generates a regulatory record that may be tested years after the interaction took place. Every default fund allocation, every retirement nudge, every vulnerable customer triage decision needs to be assessable against outcome standards that will evolve over the lifetime of the product. Every AI model in production needs documented provenance that an auditor can trace back to its training data.

This article walks through what proof looks like for each of these three obligations.

For a full breakdown of all five governance gaps, see the complete framework → AI Governance in UK Financial Services: The Accountability Framework

SMCR liability for long-horizon AI decisions: who is accountable when the outcome surfaces in 2046

The first question the FCA is likely to ask any pension provider deploying AI is who carries personal liability when an AI-influenced decision turns out to have produced a poor outcome years after the senior manager who deployed it has left the firm.

Under SMCR, the answer cannot be a generic reference to the AI vendor or the model owner. The senior manager whose function deployed the AI is personally accountable for the decisions that function delivers, including the decisions delivered by AI. The regime does not have an expiry date. Liability sits with the senior manager who held the responsibility at the time, evidenced by the regulatory record the firm produces on demand.

For pensions, this creates a specific governance problem. The AI agent answering a member’s question about consolidation in 2026 is making a decision whose downstream impact may not be visible until that member retires. If a thematic review in 2031 asks the firm to evidence the quality of consolidation guidance given five years earlier, the record needs to show what the AI recommended, what the human did with that recommendation, what the member received, and how the firm assessed that outcome against Consumer Duty standards at the time. Most firms cannot produce that record today.

The accountability gap is sharper for pensions than for most other sectors because the supervisory model assumes the senior manager can see the outcome of decisions made by their function. In a product with a 30-year duration, the outcome window does not align with the supervisory window. The senior manager accountable for the AI decision in 2026 may have moved on by the time the outcome is assessed. The infrastructure needs to carry the accountability forward even when the people do not.

What proof looks like for SMCR accountability in pensions is concrete. The firm can show, for any AI-assisted member interaction, who the responsible senior manager was at the time, what the AI did, what the human did with that recommendation, and how the outcome was assessed against the Consumer Duty framework in force. The record is retrievable on demand, durable over the product lifetime, and structured so that an auditor in 2035 can reconstruct what happened in 2026 with the same fidelity the firm had at the time.

For a deeper look at SMCR liability for AI, see Count I: SMCR Compliance and AI Agent Oversight.

Consumer Duty at pension scale: why sampling cannot evidence outcomes that surface over decades

The second question is whether the provider can evidence good customer outcomes across a member base whose results compound over decades.

Pension providers have historically used sample-based QA to monitor adviser quality, complaint handling, and contact centre interactions. The samples typically cover 2% to 5% of interactions. That approach was built for a product cycle measured in months, not decades. It does not survive contact with the Value for Money Framework or the targeted support regime.

Consumer Duty requires firms to evidence outcomes across all customers, not a sample. The FCA reviewed first-year Consumer Duty board reports and found firms approving summaries based on incomplete monitoring. Process completion is not outcome evidence. For a pension provider with several hundred thousand members, a 3% sample tells the board something about a fraction of its customers. It tells the board very little about the long-term trajectory of the other 97%.

The structural problem in pensions is that the outcomes regulators are asking firms to evidence accumulate over time. A nudge towards a default fund in year one looks different by year fifteen. A retirement guidance interaction at age 55 looks different by age 75. The VFM Framework’s four-point RAG rating system requires firms to assess value against a commercial market comparator group across investment performance, costs, and service quality. Service quality is not a snapshot. It is a multi-decade pattern of interactions, each one assessable against the standard in force at the time.

Targeted support sharpens the problem further. From April 2026, pension providers can make recommendations to groups of customers under Consumer Duty obligations. The recommendations are only permitted where consumers are likely to be put in a better position. The firm needs to evidence that judgement across the population being targeted, not a sample. When AI is used to identify the target group, segment it, and deliver the recommendation, the evidence requirement covers every interaction in the cohort.

What proof looks like for Consumer Duty at pension scale is full coverage of member interactions over the product lifetime, not a sample of them. Every contact centre call, every digital interaction, every targeted support recommendation assessed against the Consumer Duty framework. Vulnerability indicators caught in real time. Outcome markers tracked across cohorts and across decades. The evidence the provider produces for the board, the IGC, and the regulator covers the whole member base and the whole product duration.

For a deeper look at the sampling problem, see Count III: Why Sampling 3% of Customer Interactions Falls Short of Consumer Duty.

Model provenance for long-duration products: why generic LLMs cannot evidence decisions decades later

The third question concerns the AI models themselves and whether the firm can document what they were and how they worked at the time a decision was made.

Most pension providers deploying AI in member-facing roles today rely on general-purpose large language models built by major US technology companies. These models were trained on broad internet text, deployed across thousands of industries, and updated by the vendor on a release schedule the provider does not control. They were not built for UK pensions. They were not aligned to FCA regulatory requirements. They were not designed to produce a record that would still be defensible in a thematic review years after the interaction took place.

That mismatch creates regulatory exposure that grows with time. The 2026 Stanford AI Index Report tracked transparency across leading foundation models and found average disclosure scores fell from 58 in 2024 to 40 in 2025. The companies building the models firms depend on are telling firms less about how those models work, not more. For a pension provider with a 30-year product cycle, the foundational model behind today’s member interactions is becoming less explainable at the same time as the regulatory requirement to evidence what it did becomes more durable.

The EU AI Act adds a hard deadline. High-risk provisions take effect in August 2026, and several pension use cases (member triage, vulnerability identification, targeted support segmentation) fall under the high-risk classification. The Act requires traceability of training data, documentation of model design choices, and demonstrable risk management across the model lifecycle. A generic LLM that has been updated by its vendor twelve times since the original deployment struggles to meet these requirements. The model that made the decision is not the model that exists today.

For pensions, the practical consequence is that the model risk frameworks written before generative AI was deployed do not cover the new exposure. A traditional model risk framework assumes the model is statistical, finite, and trained on a dataset the firm controls. A general-purpose LLM inverts each of those assumptions. The model is probabilistic. Its behaviour shifts with each vendor update. The training data is opaque and belongs to a third party. None of that survives a regulatory review years after the fact unless the firm has built the documentation infrastructure to preserve it.

What proof looks like for model provenance in pensions is documented training data, documented regulatory alignment, and a model whose state at any point in the product lifecycle can be reconstructed and explained. Providers that deploy domain-specific models built for financial services carry a governance advantage that generic LLMs cannot match. The model risk framework needs to be updated to reflect the duration of the product. Procurement processes need to include provenance and stability as criteria. The senior managers accountable for AI decisions need to know what model was running when a decision was made, and they need to be able to evidence that decades later.

For a deeper look at model provenance, see Count V: Why Financial Services AI Needs Domain-Specific Models.

What the regulatory timeline looks like for pension providers in 2026

The pressure is concentrated. Several deadlines fall in the same window.

The targeted support regime launched in April 2026, giving pension providers new permissions to make recommendations to groups of customers. The permissions come with direct Consumer Duty obligations and a clear regulatory expectation that providers will evidence the basis for each recommendation.

The Value for Money Framework consultation closed in March 2026, with first VFM assessments targeted for 2028. Service quality metrics will sit alongside investment performance and cost in the assessment, and the four-point RAG rating system will produce publicly disclosed results that compare each arrangement against a commercial market comparator group.

The FCA’s Consumer Duty market study on unit-linked pensions and long-term savings is expected to publish findings in H1 2026, focused on transparency of charges across the value chain and how firms assess overall product value.

The EU AI Act’s high-risk provisions take effect in August 2026. UK pension providers with EU customers or EU vendor relationships face direct compliance requirements, and the Act’s traceability standards will become a de facto benchmark even where formal jurisdiction does not apply.

The FCA’s Mills Review will report to the FCA Board in summer 2026, with comprehensive guidance on how SMCR and Consumer Duty apply to AI deployment expected by year-end.

For the pension providers reading this article, the practical question is which of those deadlines arrives first and what the firm’s defence brief looks like when it does. Providers that arrive at a thematic review, a VFM assessment, or a Section 166 with documented AI oversight, full coverage of member interactions, and provenance evidence for their models will pass. Providers that arrive with a governance policy, a sample-based QA programme, and a generic LLM behind their targeted support will not.

How Aveni helps pension providers build the defence brief

The case-by-case work of building the infrastructure described above is what Aveni is built for.

Aveni Detect is in production at UK financial services firms today. It analyses every customer interaction across voice, chat, and digital channels. It assesses each interaction against the firm’s compliance framework automatically, flagging conduct risk, vulnerability indicators, suitability concerns, and Consumer Duty outcome markers. For a pension provider moving from sampling to full coverage, Detect closes the gap between policy and evidence. The board reports, the IGC reports, and the VFM assessments draw on data covering the entire member base.

Agent Assure provides governance infrastructure for AI agent oversight in line with SMCR expectations. It produces the structured record of what the AI did, what the human did, and what the member received that a senior manager needs to evidence accountability over the product lifetime. Together with FinLLM, the purpose-built language model for UK financial services, these products turn the abstract obligations of the regulatory timeline into operational evidence the provider can produce on demand, today and in twenty years.

The firms that will answer the FCA’s questions most convincingly are the providers that stopped treating governance as a policy exercise and started treating it as an infrastructure problem. Aveni is the expert witness for the defence. Not the prosecution. We help pension providers build the case, not fear the courtroom.

Where does your pension book stand?

The three governance gaps covered in this article are the ones that apply most directly to UK pension providers. The other two charges in the AI on Trial series cover audit trail and guidance quality and may also apply depending on your AI deployment.

See how Aveni helps UK pension providers build the defence brief →

This article is part of Aveni’s AI on Trial: The Burden of Proof campaign series.

Read the full series: