Table of Contents
The FCA’s Consumer Duty requires firms to take “reasonable steps” to deliver good outcomes for customers. Most firms believe they are meeting that standard. The problem is that the FCA’s interpretation of “reasonable” has shifted significantly since the Duty came into force in July 2023.
During the initial implementation phase, many firms treated Consumer Duty compliance as a documentation exercise. They updated policies, trained staff, reviewed a sample of customer interactions, and reported to the board. That approach was sufficient to meet the first deadline. It doesn’t satisfy the FCA’s current expectations.
The regulator has made clear that evidencing Consumer Duty compliance requires data, not just processes. Firms need to demonstrate good outcomes across all customer interactions, not just the small percentage that lands in a QA sample. And they need to do this consistently, with audit trails that hold up under regulatory scrutiny.
This guide is a comprehensive resource covering the full scope of Consumer Duty evidence requirements. It explains what the FCA means by “reasonable steps” and how that standard has evolved, where the most common evidence gaps sit across financial services firms, why manual sampling introduces regulatory risk that many firms underestimate, what a credible and scalable evidence framework looks like in practice, how to evaluate technology for systematic Consumer Duty monitoring, and how to move from sample-based QA to full-coverage evidence in a phased, realistic timeline.
Whether you lead compliance, run QA operations, or sit on the board of a regulated firm, this guide provides a structured path from where most firms are today to where the FCA expects them to be.
What the FCA Means by “Reasonable Steps”
The Consumer Duty’s cross-cutting rules require firms to act in good faith, avoid foreseeable harm, and enable customers to pursue their financial objectives. Principle 12, the overarching standard, states that a firm must act to deliver good outcomes for retail customers. The FCA’s Finalised Guidance (FG22/5) sets this against the benchmark of what could “reasonably be expected of a prudent firm.”

In practice, “reasonable steps” means firms must demonstrate they have identified, monitored, and acted on evidence of customer outcomes across all four Duty outcomes: products and services, price and value, consumer understanding, and consumer support.
The FCA has been explicit about what this requires. Its published good and poor practice findings state that firms need “the right culture and governance” and must use “data to identify, monitor and confirm they are satisfied that their customers’ outcomes are consistent with the Duty.” Firms that wait for the FCA to intervene rather than addressing issues proactively, are falling short of this standard.The critical word is “evidence.” Board reports must include the results of monitoring, evidence of poor outcomes (including whether specific customer groups are affected), and an overview of actions taken. Process completion alone does not satisfy this test.
The FCA’s multi-firm review of insurance firms confirmed this
The review found that firms’ approaches were “overly focused on processes being completed rather than on the outcomes delivered,” and that “few firms were able to provide clear evidence of where the monitoring of outcomes had directly led to the firm taking action.”
The FCA has made it clear that having a QA process in place is necessary, but that alone does not constitute evidence of good outcomes. They want to see what you found, what it means, and what you did about it.
For boards and senior leadership, this creates a direct governance obligation. Consumer Duty reporting that reaches the board must contain specific, data-backed evidence, not summary statistics built on incomplete monitoring.
The Evidence Gap Most Firms Have
For many firms, the evidence gap is structural. Existing QA and compliance processes were designed for a different regulatory environment, one where sampling a small percentage of interactions, recording completion rates, and responding to complaints was considered adequate.

Consumer Duty sets a higher standard. Firms need to evidence good outcomes across the entire customer base, not just the interactions they review. They need data that’s specific to each of the four Duty outcomes, with tolerances and thresholds that are clearly articulated, regularly reviewed, and linked to remedial action.
The most common evidence gaps across financial services firms fall into four areas.
Coverage
Most firms review between 2% and 5% of customer interactions through manual QA. That means 95% to 98% of conversations, meetings, and service interactions go unreviewed. When the FCA asks a firm to demonstrate that customers are consistently receiving good outcomes, a sample covering 2% of interactions provides limited assurance.
The coverage gap is compounded by volume. A 500-adviser network generating 25,000 client interactions per year would need approximately 37,500 hours of assessor time at 90 minutes per case to achieve full coverage manually. That’s the equivalent of roughly 18 full-time compliance staff dedicated solely to QA review. For most firms, that level of resourcing is not feasible, which means the coverage gap persists by default rather than by design.
The FCA’s position is that firms should be able to confirm outcomes across their full customer base. A firm that reviews 2% of interactions and extrapolates the results is making assumptions about the other 98%. Under Consumer Duty, assumptions are not evidence.
Consistency
Manual QA is inherently subjective. Two assessors reviewing the same interaction will often reach different conclusions about outcome quality, risk severity, and required follow-up. Across a large adviser network, this creates variation in how outcomes are measured and reported, which undermines the reliability of the evidence.
This variation is particularly acute for firms operating across multiple offices, regions, or business lines. A QA team in one office may apply stricter criteria for consumer understanding than a team in another. When this inconsistency feeds into board reporting, the firm’s evidence base becomes unreliable at precisely the moment it needs to be defensible.
For Heads of Operations and QA Managers, consistency is also a training and resourcing challenge. Maintaining calibration across a team of assessors requires regular benchmarking sessions, shared case studies, and documented assessment criteria. Many firms run these exercises, but the inherent subjectivity of manual review limits how far calibration can go.
Specificity
Many firms repackage existing management information (MI) for Consumer Duty reporting without asking whether it actually demonstrates outcomes. The FCA has flagged this directly, noting that firms should “not be complacent and assume that they can just repackage existing data.” The regulator wants firms to think seriously about what information they need to understand their customers’ outcomes and the issues they may face.
The specificity gap shows up most clearly in board packs. A board report that shows “95% of reviewed interactions rated satisfactory” tells leadership very little about whether customers are receiving good outcomes across each of the four Duty areas. It does not reveal whether consumer understanding was tested, whether vulnerability was identified and acted on, whether price and value was assessed against the target market, or whether products remained suitable over time.
Compliance Officers face the challenge of translating existing MI into outcome-specific reporting without the underlying data to support it. When the data was never collected with Consumer Duty outcomes in mind, retrofitting it to meet the standard produces reports that look complete but lack substance.
Timeliness
Manual sampling typically runs weeks or months behind live interactions. By the time an issue surfaces in a QA review, the harm may already have occurred. The opportunity for early intervention has passed.
The timeliness gap has direct consequences for firms in sectors where customer interactions carry high conduct risk. Protection insurance conversations, equity release advice, and debt management calls all involve scenarios where delayed identification of a suitability concern or missed vulnerability indicator can result in measurable customer harm. For these firms, the gap between when a problem occurs and when it is identified represents both regulatory risk and potential redress exposure.
The FCA expects firms to act proactively. Evidence collected months after the fact demonstrates what went wrong; it does not demonstrate oversight.
Why Manual Sampling Creates Regulatory Risk
Sampling has been the foundation of QA in financial services for decades. A firm selects a percentage of customer interactions (typically 2% to 5%), reviews them against a set of criteria, and reports the findings. On paper, this looks rigorous. In practice, it introduces risks that are difficult to defend under Consumer Duty.
Statistical risk. A 2% sample gives a firm visibility of 2% of what is happening. The remaining 98% is invisible. If a conduct issue, vulnerability indicator, or suitability concern sits within that 98%, the firm has no evidence it existed, no record that it was identified, and no audit trail showing what action was taken. Under Consumer Duty, the FCA expects firms to confirm that customers are receiving good outcomes consistently. A sample that misses the vast majority of interactions makes that confirmation unreliable.
Selection bias. Firms typically sample either randomly or based on basic criteria such as interaction type or adviser. Neither method reliably captures the highest-risk interactions. A complaint that was never formally raised, an affordability concern mentioned mid-conversation, or a vulnerable customer who did not self-identify will not appear in a criteria-based sample, and may not surface in a random one either.
Response time. By the time a manual QA process identifies a pattern (for example, a specific adviser consistently failing to confirm understanding, or a product being recommended without adequate disclosure), multiple customers may already have experienced poor outcomes. The FCA expects firms to act proactively. Retrospective sampling, by its nature, limits a firm’s ability to intervene early.

For QA Managers, these risks are often well understood but difficult to resolve within existing resource constraints. The issue is not awareness; it is capacity. Manual QA teams are typically stretched thin, and increasing the sample rate from 2% to even 10% would require a significant uplift in headcount without fundamentally changing the structural limitations of the approach.
Technology Approaches to Systematic Evidence
Delivering 100% coverage through manual QA would require an impractical number of reviewers. A 500-adviser network generating 25,000 client interactions per year would need approximately 37,500 hours of assessor time at 90 minutes per case. That is the equivalent of roughly 18 full-time compliance staff dedicated solely to QA review.
This is where technology becomes essential. AI-powered monitoring platforms analyse every customer interaction automatically, flagging conduct risk, vulnerability indicators, suitability concerns, and Consumer Duty outcome markers at a fraction of the time and cost of manual review.
Several capabilities matter when evaluating technology for Consumer Duty evidence.
Interaction analysis across channels. The platform should handle voice calls, meeting recordings, written correspondence, and advice documentation. Consumer Duty applies across all customer touchpoints, and evidence needs to reflect that.
Real-time and retrospective monitoring. Retrospective analysis reviews completed interactions and generates evidence for reporting and audit. Real-time monitoring flags issues during or immediately after an interaction, enabling faster intervention. The most effective approaches combine both.
Outcome-specific assessment. The technology should assess interactions against Consumer Duty’s four outcomes directly, with configurable criteria that reflect the firm’s specific products, services, and target market. Generic sentiment analysis does not meet this standard.
Evidence capture and export. Every flagged interaction should generate a traceable evidence record linking the original conversation to the identified risk, the outcome assessment, and any subsequent action. This creates the audit trail the FCA expects when it examines a firm’s Consumer Duty evidence.
Integration with existing workflows. The technology should feed into existing QA processes, compliance dashboards, and board reporting frameworks. Evidence that sits in a standalone system, disconnected from the firm’s governance structure, adds complexity without improving oversight.

As a reference point, firms using Aveni Detect have achieved 100% interaction coverage (up from typical 2–3% manual sampling), with QA assessment time reduced by 83% and vulnerability detection completed 75% faster. These results are documented in Aveni’s published case studies.
Implementation Roadmap

Phase 1: Evidence Gap Analysis (Weeks 1 to 4)
Start by mapping your current evidence against the FCA’s four Consumer Duty outcomes. For each outcome, document what data you currently collect, how much of your customer base it covers, and where the gaps sit. Pay particular attention to coverage (what percentage of interactions are reviewed), consistency (how standardised your assessment criteria are), and timeliness (how quickly issues surface after they occur).
This analysis gives you a clear picture of where your evidence is strong and where regulatory risk exists.
Phase 2: Technology Selection (Weeks 4 to 8)
With your evidence gaps documented, evaluate technology options against your specific requirements. Key selection criteria include: channel coverage (voice, written, face-to-face), integration with your existing CRM and compliance platforms, the platform’s ability to assess against Consumer Duty outcomes specifically, and evidence export capabilities for board reporting and regulatory review.
Consider running a proof-of-concept with a small subset of interactions to validate accuracy and assess how the technology handles your firm’s specific language, products, and customer profiles.
Phase 3: Rollout and Adoption (Weeks 8 to 14)
Deploy in stages. Start with a single team, product line, or office to establish baseline metrics and refine assessment criteria before expanding. Incorporate feedback from QA assessors, compliance leads, and frontline staff as you scale. Training should focus on how the technology supports existing QA workflows (not replaces them), and how the evidence it generates feeds directly into Consumer Duty board reporting.
Phase 4: Continuous Improvement (Ongoing)
Consumer Duty compliance is not a one-time project. The FCA expects firms to refine their monitoring, update their tolerances, and act on what the evidence tells them over time. Build a review cycle that assesses the effectiveness of your evidence framework quarterly, incorporates regulatory developments, and tracks improvement in customer outcomes.
Book a demo to see how firms achieve full Consumer Duty coverage within 12 weeks →
Where to Start
The FCA’s expectations around Consumer Duty evidence will continue to increase. The regulator has signalled that multi-firm reviews, sector-specific scrutiny, and enhanced board reporting requirements are all part of its 2025–2026 agenda. Firms that depend on sampling and manual processes face growing regulatory risk with each review cycle.
Building a scalable evidence framework takes planning, the right technology, and a phased approach. But the starting point is straightforward: understand where your current evidence falls short, and begin closing the gaps.
Three Next Steps for Your Firm
1. Assess your evidence gaps. Map your current monitoring against the FCA’s four Consumer Duty outcomes and identify where coverage, consistency, and timeliness fall short.
2. Explore technology-enabled monitoring. See how AI-powered platforms deliver 100% interaction coverage and outcome-specific evidence tailored to your firm’s needs.
3. Build your business case. Understand the cost, timeline, and return on investment for moving from sample-based QA to systematic evidence at scale.
Book a demo to see how Aveni Detect and Aveni Assist deliver Consumer Duty evidence at scale →
SOURCES REFERENCED:
- • FCA Finalised Guidance FG22/5 (Consumer Duty, July 2022)
- • FCA Consumer Duty: Good Practice and Areas for Improvement
- • FCA Insurance Multi-Firm Review of Outcomes Monitoring Under the Consumer Duty
- • FCA Consumer Duty Information for Firms (updated December 2025)
- • FCA Review of Consumer Duty Requirements (September 2025)
- • Aveni Detect published case study data (100% coverage, 83% QA time reduction, 75% faster vulnerability detection)