Study: Ambient AI Scribes Are Good, But Not Yet Ready for Prime Time

A Kaiser Permanente study of ambient AI scribes used to capture doctor’s notes and enter data into the EHR finds that they are improving the doctor-patient experience, but doctors still need to edit their notes

Ambient AI scribes designed to transcribe patient-physician encounters into the EHR may hold promise in reducing clinician workloads, but they aren’t there yet.

That’s the conclusion drawn from a recent study of more than 3,000 clinicians at the northern California-based Permanente Medical Group (TPMG) who used the technology in late 2023. The study, appearing online today in NEJM Catalyst Innovations in Care Delivery, finds that the AI tool did accurately represent the conversation between doctor and patient, but there was still a significant amount of editing that had to be done.

“Ongoing enhancements of the technology are needed and are focused on direct EHR integration, improved capabilities for incorporating medical interpretation, and enhanced workflow personalization options for individual users,” the study team, comprised of eight Kaiser Permanente researchers and executives, concluded. “Despite this technology’s early promise, careful and ongoing attention must be paid to ensure that the technology supports clinicians while also optimizing ambient AI scribe output for accuracy, relevance, and alignment in the physician–patient relationship.”

While automation and AI technology have been around for several years, the rapid advances of new forms of the technology have created a stir in several industries, including healthcare. AI and large language model (LLM) tools have the potential to not only handle administrative and back-office processes, but reduce workloads and stress for clinicians and staff by handling time-consuming and computer-driven tasks. Ambient AI scribes, for example, are designed to capture conversations and input data into the EHR, giving clinicians and staff the opportunity to interact with patients more freely instead of typing words into a laptop or trying to recall the gist of the conversation later.

While not the first study, the Kaiser Permanente study is one of the largest to test the technology in a clinical setting. It gives healthcare executives valuable insight into where the technology stands now, and what needs to be done to make it more effective.

According to the study, some 6,000 Kaiser Permanente clinicians have been using software-based medical dictation technology for at least two years. In August 2023, TPMG launched a two-week pilot with 47 physicians using an AI scribe; based on positive reactions from the physicians, the organization then secured licenses for 10,000 physicians and staff across several settings.

According to researchers, 3,442 physicians used that tool in the first 10 weeks of implementation for 303,266 encounters, with almost 100 physicians using the tool more than 100 times and one doctor using the tool for 1,210 encounters. Overall, the tool was used more than 19,000 times a week in seven of the 10 weeks studied.

In studying how clinicians and their staff used the technology, the research team identified four aspects of ambient AI scribes that would facilitate effective use:

Facilitate engagement by demonstrating growing and sustained adoption of ambient AI by number of clinicians and percentage of patient encounters across diverse specialties and settings.
Aim for effectiveness by reducing the burden of documentation within and outside of direct patient encounters.
Enhance the physician–patient relationship by increasing the amount of time physicians spend interacting with patients by improving engagement and reducing time spent interacting with a computer.
Maintain documentation quality by developing approaches to assess and safely use ambient AI technology capabilities in transcription and summarization.

And at the end of the study, the team listed four takeaways:

Ambient AI scribes “show early promise” in reducing the burden on clinicians to take notes and spend extra time entering that data into the EHR.
Both clinicians and patients said the technology improved the care experience, and some clinicians called the technology “transformational.”
While a review of AI-generated transcripts resulted in an average score of 48 out of 50 in 10 key factors, that doesn’t mean they can replace clinicians. There were inconsistencies, and clinicians still had to review the notes and make corrections “to ensure that they remain aligned with the physician-patient relationship.”
“Given the incredible pace of change, building a dynamic evaluation framework is essential to assess the performance of AI scribes across domains including engagement, effectiveness, quality, and safety.”

The research team also noted that AI technology is evolving quickly.

“The approaches to robustly evaluate the quality and safety of AI technologies, including tools such as large language models, remain incompletely defined,” they said. “The underlying algorithms and relevant regulations are also continuing to evolve rapidly, which will necessitate ongoing benchmarking, evaluation, and monitoring as the technology improves and vendors bring new software to market. Adoption rates and usage patterns are also expected to change as new user groups and application domains are identified and tested.”

With that in mind, the study offered advice for other healthcare organizations aiming to evaluate ambient AI scribes.

Find clinical champions to overcome barriers to adoption and create a culture that embraces innovative ideas.
Starte with a limited pilot involving a small number of clinicians, then scale up to a regional or larger-scale pilot with “opportunities for clinician and patient feedback that result in ongoing improvement that is tangible to stakeholders.”
Develop monitoring and benchmarking processes “that offer proactive assessment of the tools and their impact on meaningful goals.”

Eric Wicklund is the associate content manager and senior editor for Innovation at HealthLeaders.

KEY TAKEAWAYS

A team of researchers at The Kaiser Permanente Medical Group evaluated some 300,000 provider-patient encounters in late 2023 that were transcribed by an ambient AI tool.

Both patients and doctors said the technology improved the encounter, and it earned high marks for transcribing the conversation, but doctors still had to go back and make corrections before the data could be entered into the EHR.

The study proves that ambient AI technology has the potential to reduce stress on providers and ease their administrative workloads, while also making the physician-patient encounter more meaningful, but the technology isn’t there yet.