Most NHS AI projects stall in pilots. Digital scribes have done something different. These are AI tools that listen to doctor-patient conversations and automatically draft clinical notes. They moved from trial to procurement to national policy endorsement in under two years. The UK government’s 10-Year Health Plan, published in July 2025, explicitly named AI scribes as a core enabler, claiming the technology could free capacity equivalent to “adding 2,000 more doctors into general practice.” Nineteen suppliers are now on NHS England’s self-certified registry. Trusts are signing multi-million-pound contracts. This is not a pilot any more.
This article is supported by a live adoption tracker listing every NHS trust, vendor, and published study. The tracker is updated as events occur. View the tracker →
What digital scribes are and why they succeeded
A digital scribe (or ambient voice technology, in NHS England’s preferred terminology) listens to a clinical consultation through a microphone, transcribes the conversation, and uses a large language model to draft structured clinical notes. The clinician reviews the draft, edits where necessary, and approves it. The notes flow into the electronic patient record. The whole process replaces what was previously 5–10 minutes of manual typing or dictation after each appointment.
The scale of the documentation burden they address is enormous. NHS England delivered 363.6 million GP appointments in 2024–25, 146 million hospital outpatient appointments, and 27 million A&E attendances. Every one of those encounters requires clinical documentation. Research consistently shows that for every hour of direct patient time, clinicians spend nearly two additional hours on documentation. The NHS Long Term Workforce Plan estimated that AI could automate up to 44% of general practice administrative workload.
The technology arrived at a moment of acute need. Two-thirds of NHS clinical staff report working additional hours solely to manage administrative tasks. Twenty-five per cent of NHS medics report burnout. The documentation burden is not merely an inconvenience; it is a workforce retention problem, a capacity constraint, and a barrier to the kind of consultations patients deserve.
Digital scribes solved a problem clinicians already had, in a way that required no new workflow, no new screens, and no new training. The technology fits around existing practice rather than demanding that practice changes to fit the technology.
Who’s using them
Adoption has moved faster than most NHS technology deployments. The landmark evaluation, led by Great Ormond Street Hospital’s DRIVE innovation unit and funded by NHS England, ran from May 2024 to April 2025 across nine London NHS sites including hospitals, GP practices, mental health services, and ambulance teams. It evaluated over 17,000 patient encounters using TORTUS, a UK-developed AI scribe. When the results were published in September 2025, they triggered a wave of procurement activity.
University Hospitals of Leicester and Northamptonshire became the first NHS trusts to jointly procure ambient voice technology, awarding a four-year contract to Accurx Scribe in March 2026 covering 10,000 clinicians and 2.5 million appointments per year. Modality Partnership, the NHS’s largest GP super-partnership, went live with Heidi Health across 53 sites and 360 GPs. Microsoft launched Dragon Copilot in the UK in September 2025; by February 2026 Satya Nadella was visiting Manchester University NHS Trust to see it in action. Oracle Health launched its Clinical AI Agent in the UK in February 2026 after an NHS pilot, with Barts Health among the first sites. Guy’s and St Thomas’ already had 10,000 clinicians using Nuance’s Dragon Medical One. Oxford University Hospitals piloted four vendors simultaneously and found 87% of users saved time on administrative tasks.
The GOSH evaluation produced numbers that commissioners notice. Documentation time was halved. Direct patient interaction time increased by 23.5%. Overall appointment length fell by 8.2%. At St George’s University Hospital, A&E clinicians saw 13.4% more patients per shift. Ninety-two per cent of patients consented to the AI scribe being used. When scaled nationally to England’s 11,055 A&E clinicians, the trial team projected £176 million in documentation savings and £658 million in unlocked capacity per year.
The GOSH numbers are striking, but the evidence base is still maturing. The Nuffield Trust published the first phase of its rapid evaluation in February 2026, noting that robust evidence “remains surprisingly limited” relative to the pace of deployment. The largest randomised controlled trial to date, published in NEJM AI, tested two ambient scribes (DAX Copilot and Nabla) at UCLA and found time savings of under 10% for one product and non-significant savings for the other. Both improved clinician wellbeing. Over 50 ambient AI companies now offer services, but rigorous comparative evaluation is only beginning.
What the evidence shows
Across the published literature, three patterns are clear. Time savings are real but variable. Clinician satisfaction improvements are more consistent than efficiency gains. And patient consent rates are high, though the quality of that consent is debated.
The GOSH trial found clinicians saved an average of 3 minutes 13 seconds per consultation. Kaiser Permanente, in the largest deployment studied internationally, reported 16,000 hours saved across 7,260 physicians and 2.6 million encounters over one year. A paediatric study published in JAMIA Open found 2.8 minutes saved per appointment, translating to 1.5 hours per clinician per week. The Providence Health randomised trial found a 2.5-hour weekly reduction in documentation burden and a 30.3% decrease in burnout.
The burnout data is where the evidence is strongest. A study published in JAMA Network Open in August 2025, spanning Mass General Brigham and Emory Healthcare, found a 21.2% absolute reduction in burnout prevalence. Multi-centre quality improvement studies consistently show burnout falling from above 50% to below 40% after 30 days of use. Eighty-eight per cent of Kaiser Permanente physicians said AI scribes had a positive impact on visit interactions.
Accuracy is more complex. Studies report error rates of 1–3% for hallucinations and 3.5% for omissions. In a system processing hundreds of millions of consultations, even a 1% error rate means millions of notes with fabricated or missing details entering medical records. Oxford University Hospitals, which piloted four vendors simultaneously, found that 81.5% of encounters produced accurate records. Nearly one in five needed more than minor edits. The BMA warned in June 2025 of “a substantial degree of risk” and advised GPs to “pause their use” if in any doubt.
The published evidence
Twelve peer-reviewed studies and preprints form the current evidence base for AI clinical documentation tools. The table below lists each study, sorted by citation count. Titles link to the full paper.
The vendor landscape
Seven companies are competing seriously for NHS adoption, up from five at the start of 2026. Oracle Health launched its Clinical AI Agent in the UK in February 2026. X-on Health, already embedded in GP practices, partnered with TORTUS to bring ambient voice to primary care. The market is expected to consolidate from the 50-plus current participants to six or seven within the next few years. Distribution infrastructure, not product quality alone, is proving to be the determining factor.
The structural advantage belongs to companies with existing NHS distribution. Accurx reaches 98% of GP practices. Microsoft’s Nuance division has deep trust-level relationships through Dragon Medical One. Oracle Health holds 25% of the acute trust EPR market and can bundle its AI scribe with existing contracts. These platforms can add ambient AI as a feature upgrade rather than a new procurement. TORTUS, by contrast, has the strongest NHS evidence base but has addressed the distribution gap by partnering with X-on Health for GP practices. Heidi Health claims over 60% of NHS GPs but has limited hospital presence beyond its Walsall pilot.
NHS England moved to impose order on the market in January 2026, publishing a self-certified supplier registry requiring Class I medical device accreditation and DTAC compliance. Nineteen suppliers are listed. The National Chief Clinical Information Officer wrote to NHS organisations warning them to stop using unregistered AI scribing tools. NHS Shared Business Services launched a £150 million framework agreement for digital dictation, speech recognition, and ambient voice technology in October 2025, providing a formal procurement pathway.
The regulatory picture is more complex than the registry suggests. In April 2025, NHS England issued guidance developed with the MHRA clarifying that ambient scribing products using generative AI for summarisation “would be treated as high functionality and likely would qualify as a medical device.” Every AI scribe on the market performs summarisation. Under the MHRA’s framework, this means they are all software as a medical device (SaMD).
Yet the registry requires only Class I registration — the lowest device classification, which is self-declared by the manufacturer with no independent assessment. The same classification applies to bandages and tongue depressors. Whether a tool that generates clinical records using large language models, with documented hallucination rates, should sit at this level is a question that regulators have not yet resolved. TORTUS has been accepted into the MHRA’s AI Airlock programme, a regulatory sandbox exploring what happens when clinical documentation systems cross the boundary into decision support — and whether that triggers reclassification from Class I to Class IIa, which would require independent notified body assessment.
Internationally, the landscape is similarly unsettled. No AI scribe product holds FDA clearance in the United States. Dragon Medical One, DAX Copilot, and all competitors operate in the US market without formal medical device registration. The EU AI Act introduces a new framework for high-risk AI systems, but its application to clinical documentation tools is still being worked out.
How patients are asked about AI scribes varies. Some trusts use opt-in consent, where the clinician asks permission before activating the tool. Others use opt-out, where the scribe runs by default unless the patient objects. The GOSH evaluation reported 92% patient consent. Oxford reported 99.7%. NHS England guidance says trusts should be “transparent” and give patients “the chance to object” — but does not mandate which model to use.
Where they’re being adopted
Twenty-four NHS trusts and GP networks are now live or piloting AI scribes. London leads, driven by the GOSH DRIVE evaluation, but adoption is spreading across every English region. The chart below shows the count of trusts by region, coloured by vendor.
How the technology works
Despite the speed of adoption, surprisingly little has been published about what is actually happening technically inside these systems. From published deployment reports and trust announcements, three distinct integration patterns have emerged.
Mobile app capture. The most common pattern, used by Oracle Health at Barts and Microsoft Dragon Copilot at Manchester, involves clinicians placing a personal smartphone near the patient and recording via a dedicated app. The audio is processed in real time, with structured clinical notes and GP letters generated before the patient leaves the consultation room. At Barts, 250 clinicians used this approach, with the output flowing directly into the Millennium electronic patient record.
Telephony-embedded capture. In primary care, X-on Health’s Surgery Intellect takes a different approach: the scribe is integrated directly into the GP telephone system via Phonebar and Surgery Connect, requiring no separate microphone or additional device. Notes are filed automatically to EMIS or SystmOne via IM1 assurance. This zero-friction pattern may explain why primary care adoption has been faster than acute in some regions.
EPR-native capture. At Royal Devon, TORTUS launches from within the Epic electronic patient record, with notes pulled directly back into the clinical record. This represents the tightest integration, but brings its own challenges: clinicians report that formatting can degrade when output is transferred into the EPR template structure.
On safety architecture, approaches vary. X-on’s TORTUS integration uses an automated quality assurance layer called “The Shell” which verifies every line of output against the source transcript, reportedly removing 93% of hallucinations before clinician review. Most other deployments rely on clinician review as the sole safety gate. No deployment has published data on what happens when clinicians approve notes without thoroughly checking them — a gap the evidence base has not yet addressed.
Notably, no published NHS deployment discloses which speech recognition model or large language model powers the system. Audio retention policies vary: some vendors confirm recordings are deleted immediately after processing; others do not disclose their approach. These details matter for trusts conducting data protection impact assessments, but they remain largely opaque.
The evidence gap
Across the 21 deployment reports we analysed, a consistent pattern emerges in the quality of evidence. Of the sources that report quantified outcomes, none are randomised controlled trials. Five use a before-and-after design. The remainder are service evaluations, vendor case studies, or announcements without measured data.
Where outcomes are measured, the most common method is clinician self-report surveys. Only two deployments used validated instruments — Alder Hey’s pilot with Lyrebird Health used the PDQI-9 clinical note quality score, finding AI-generated notes scored 37.1 out of 40 compared with 34.6 for manual notes. Oxford University Hospitals logged every AI output and found that 37.3% required editing for accuracy and 44.4% of users experienced at least one hallucination during the pilot.
The Nuffield Trust’s NIHR-funded rapid evaluation, published in February 2026, reached a blunt conclusion: “robust evidence on what AVT actually delivers remains surprisingly limited.” The evaluation found that while time savings are consistently shown, “most existing evaluations stop at the point of documenting time savings. They rarely assess whether these savings lead to meaningful outcomes.” Does saved time become more patient-facing care? Better continuity? Reduced burnout? Or does it simply get absorbed into an already overloaded system?
Meanwhile, a growing body of academic work is documenting what can go wrong. A study published in BMJ Digital Health assessed AI scribe accuracy in primary care and found hallucinations and factual inaccuracies across multiple products. A framework published in Nature Digital Medicine proposed systematic methods for assessing hallucination rates in clinical LLMs. A preprint from UCL researchers reviewed the transparency of regulatory and data governance claims across AI scribe vendors, finding significant gaps between what vendors claim and what can be independently verified.
Strikingly, no published deployment has reported cost-per-consultation data, long-term retention figures beyond 12 months, impact on clinical coding accuracy, or formal patient safety incident data. These gaps are not criticisms of the technology; they reflect the pace at which deployment has outrun evaluation. But they are the questions that commissioners and regulators will increasingly need answered.
What this means for other health technologies
Digital scribes have succeeded where hundreds of NHS technology initiatives have stalled. The reasons are instructive for any company trying to get a health technology adopted.
Three conditions were met simultaneously. First, the technology delivered immediate clinician value. It saved time today, in this consultation, for this doctor. Not theoretically. Not after a two-year implementation. Today. Second, it required no workflow disruption. The tool sits in the background of an existing consultation. No new screens. No new processes. No retraining. Third, the vendors with the strongest adoption had existing distribution infrastructure. Microsoft already had Dragon Medical One in NHS trusts. Accurx already reached 98% of GP practices. Heidi already had relationships with GP federations. The sales motion was “add this feature” rather than “buy this new product.”
Immediate clinician value + zero workflow disruption + existing distribution infrastructure. Most NHS technology innovations have one of these. Digital scribes had all three. That is the pattern worth studying.
Notice what is absent from this formula: a NICE evaluation. No trust waited for a formal health technology assessment before buying an AI scribe. No commissioner asked for a cost-per-QALY estimate. The value of saving three minutes of documentation per consultation, or reducing clinician burnout by 21%, is obvious to anyone running a service. It does not need to be expressed in quality-adjusted life years.
This is the same structural tension we identified in our analysis of NHS market access for diagnostics. NICE’s evaluation framework was designed for pharmaceuticals, where clinical outcomes map neatly onto the QALY. Health technologies create value in workflow efficiency, system capacity, and clinician time. These are real, measurable benefits that sit outside the mortality-and-morbidity frame the formal evaluation demands. Digital scribes prove the point: trusts adopted them because the operational case was self-evident, bypassing the formal route entirely.
That should give pause to any innovator waiting for NICE to validate their technology before approaching trusts. The technologies that reach routine NHS use are not always the ones that complete formal evaluation first. They are the ones that solve a problem commissioners already know they have, in terms those commissioners already understand.
None of this means the technology is without risk. The evidence base is thinner than the policy enthusiasm suggests. Error rates of 1–3% in a system with 537 million annual consultations mean millions of potentially inaccurate notes. The question of what happens to the saved time (whether clinicians see more patients, have longer consultations, or simply work less) remains unanswered. And the parliamentary silence is notable: a technology that records the most sensitive conversations patients ever have is being deployed across the NHS without a single dedicated parliamentary debate.
How Concordance helps
This is one of the fastest-moving areas in NHS technology. New trust contracts, vendor funding rounds, regulatory shifts, and clinical evidence are emerging weekly. We are tracking it as it happens: which trusts are buying, which vendors are winning, what the evidence actually shows, and where the market is heading. This analysis will be updated as material developments occur.
If you are building a clinical AI product, competing in the ambient voice market, or advising a trust on procurement, we can help you make sense of a landscape that is changing faster than any single organisation can follow. Get in touch.