July 1, 2025

Microsoft AI outperforms doctors in diagnostic accuracy, cost efficiency

Editor's Note

Microsoft’s MAI Diagnostic Orchestrator (MAI-DxO)artificial intelligence (AI) system outperformed physicians on diagnostic accuracy, achieving an 80% score compared to only 20% for a panel of human doctors. Wired reported the news June 30, quoting an official calling the system “a genuine step toward medical superintelligence” and noting that the AI also reduced diagnostic costs by 20% by selecting less expensive tests and procedures.  

As detailed in the article, the study used 304 case studies from the New England Journal of Medicine to create a test known as the Sequential Diagnosis Benchmark. Microsoft’s system mimicked physician decision-making by sequentially reviewing symptoms, ordering tests, and narrowing down diagnoses. MAI-DxO worked by orchestrating multiple leading AI models—OpenAI’s GPT, Google’s Gemini, Anthropic’s Claude, Meta’s Llama, and xAI’s Grok—in a collaborative format modeled on a “chain-of-debate” among expert agents.

Although Microsoft has not yet committed to commercializing the tool, an unnamed company executive told Wired it might eventually be integrated into Bing to assist users in understanding symptoms. The executive also said the technology could form the basis for future tools that support or automate clinical decision-making. Microsoft plans to continue testing the system in real-world settings, the article notes.

The project adds to prior research by Microsoft and Google demonstrating that large language models can diagnose disease from medical records. However, Wired reports that this study reportedly differentiates itself by more closely replicating how physicians operate—progressing step-by-step through diagnostics.

Experts interviewed by Wired said the work is promising but not definitive. For example, David Sontag of MIT noted that the study was rigorous in methodology but pointed out that doctors were asked not to use external tools, which may not reflect typical clinical workflows. He also questioned whether the AI’s cost savings would hold up in practical settings where physicians consider additional factors, such as patient tolerance or equipment availability. The system’s true value will need to be validated in clinical trials comparing outcomes with those of physicians treating real patients.

Read More >>

Join our community

Learn More
Video Spotlight
Live chat by BoldChat