VeriHealth — True Health Discernment

Getting health information has never been easier.
Trusting it has never been harder.

Medical misinformation has always been dangerous. Social media gave it reach. AI makes it sound like the truth.

VeriHealth helps people reliably navigate health information.

46×
Misleading but technically accurate content lowered vaccination intentions 46 times more than fact-checked misinformation, because far more people saw it
95% → 35%
On identical medical scenarios, AI scores 95%. People interacting with that AI score only 35%, a 60-point drop, and worse than standard web search
Satisfaction ≠ accuracy
Nearly everyone who uses AI for health reports being satisfied with it. Yet users get the wrong answer two times out of three. People trust the tool precisely when it is failing them.
70%
Of people globally hold at least one misleading health belief, at virtually identical rates across education levels, age groups, and political affiliations
All figures peer-reviewed and source-verified. See Evidence Base →

Our Mission

True Health Discernment

People engage with health information constantly. What most lack is the framework to do it well: to ask what matters, to make sense of what comes back, to know what they don't know. That isn't a personal failure. It's a structural one. That framework was never part of general education. It still isn't.

VeriHealth develops that framework. Not for one source, one topic, or one claim. For all of them.

The VideoBot

A platform that develops the capacity to ask the right health question and evaluate what comes back, from any source. Two components, designed to work together, and each deployable independently.

Animated educational modules

Short-form video that develops health reasoning skills across languages and literacy levels: recognizing misleading framing, formulating precise questions, interpreting probabilistic language.

The Structured Socratic Interface

An AI-powered clinical interpreter that clarifies what the user is asking, surfaces false premises, and renders AI output in language the user can understand and act on safely.

Evidence-based design

Every design decision is grounded in the peer-reviewed literature: animated video for health literacy populations, Socratic dialogue for durable belief change, and structured intake to surface the unknown unknowns that free-text interfaces leave unasked.

Who We Are

Built by a team with the credentials to solve it

VeriHealth brings together clinical medicine, behavioral science, health communications, and institutional leadership: the four disciplines required to build a health communication infrastructure that works.

MW
Michael P. Walsh
MBA · President and Co-founder
CM
Catherine McCarthy
Chief Content Officer and Co-founder
SA
Smitha Arekapudi
MD, MBA · Project Director and Co-founder
EM
Everly Macario
ScD · Senior Research Advisor and Co-founder

The Problem

The internet democratized access to health information, but not the capacity to navigate it.

As a result, people struggle to know what to look for and to make sense of what they find.

The mediation that was lost

Before the Internet, the healthcare system mediated access to health information. The physician translated clinical knowledge into guidance people could act on. The Internet removed much of that mediation. What it left behind was a population navigating a complex information environment alone, where false and misleading content is harder to identify than ever. That gap has never been closed.

Social Media Widened It

Social media made the gap impossible to ignore.

The platforms built to connect people reward engagement, not accuracy. Misleading health content drives engagement. It confirms fears, validates instincts, and travels faster than correction.

"Misleading claims from credible sources can be more damaging than blatant falsehoods."

Van der Linden and Kyrychenko, Science, 2024
Allen et al. · Science · 2024
46×
Misleading but technically accurate health content lowered vaccination intentions 46 times more than fact-checked misinformation, driven by its far greater reach through mainstream channels.
Chandrasekaran et al. · JMIR · 2024
80%
More than four in five U.S. adults regularly encounter false and misleading health content on social media. Over 35% report seeing a lot of it, and an additional 45% report seeing some amount.

AI Arrived as the Solution. It Made the Problem Worse.

AI scores 95% on standardized clinical scenarios. People get the wrong answer roughly two times in three.

The landmark 2026 Oxford trial revealed the mechanism of failure, and it operates across two distinct layers: one that affects the information itself, and one that affects the user's ability to find and make sense of it.

At the model layer, AI health tools fabricate evidence, express false confidence, and absorb misinformation from the environment they inhabit.

At the interface layer, product design degrades users’ capacity to evaluate what they receive, systematically accommodates dangerous false premises, and amplifies existing health misconceptions at scale.

The knowledge exists in the model. The capacity to find it and make sense of it does not exist in the interaction.

KFF Tracking Poll · 2026
32%
One in three U.S. adults has used AI chatbots for health advice in the past year. Among those who have, 92% report satisfaction. Satisfaction is not safety.
OpenAI · 2026
40M/day
More than 40 million people use ChatGPT for health-related questions every day. Seven in ten of these conversations occur outside normal clinic hours, when no physician is available.

Model-layer failures

0.97%
Disclaimers Remaining
Safety disclaimers in AI health responses dropped from 26.3% in 2022 to 0.97% in 2025, a systematic erosion of the only protection most users received (Omar et al., 2025).
31.7%
Susceptibility to Fabricated Health Claims
Across 3.4 million prompts and 20 LLMs, AI accepted fabricated health data in 31.7% of cases. Fabricated clinical notes were accepted 46.1% of the time. The models generating confident medical answers are the same models absorbing confident medical misinformation (Omar et al., 2026).
~0%
Models That Can Say "I Don't Know"
Across 12 models, nearly all scored 0% on identifying unanswerable medical questions. The best performer achieved 3.7%. A model that is always confident and frequently wrong undermines the user's natural skepticism at the moment it is most needed (Griot et al., 2025).

Interface-layer failures

35%
Condition Identification
Participants using AI chatbots missed the correct medical condition in about two-thirds of cases, identifying it fewer than 34.5% of the time. Internet search outperformed all AI chatbot groups by a factor of 1.76 (Bean et al., 2026).
11.7x
Triage Derailed by Prior Framing
A single reassuring comment before a question ("my friend said it's nothing serious") made ChatGPT Health 11.7 times more likely to dismiss a real emergency. It also over-referred 65% of non-urgent cases (Ramaswamy et al., 2026).
Worse
Than No AI At All
Users with access to AI performed worse than those who used a search tool with no AI. The pull toward dependence is measurable: across 11 leading models, AI affirmed users' choices 49% more often than other people did, even when the user was wrong, and users trusted the agreeable model more and wanted to keep using it (Cheng et al., Science, 2026). The behavior that earns trust is the same behavior that distorts judgment, which makes the Interaction Gap self-reinforcing rather than static.
↓21%
Clinical Skill After AI Exposure
In a study of trained endoscopists, adenoma detection fell from 28.4% to 22.4% after routine AI exposure, a relative decline of about 21%, with AI use an independent predictor after multivariable adjustment. The finding is observational and in clinicians, not lay users, but the mechanism is the same: answer-delivery AI erodes the independent judgment it is meant to support (Budzyn et al., 2025).

The Solvable Paradox

The same technology, properly designed, can develop what was never built.

The same technology that deepens the gap through unstructured answer delivery can develop genuine health discernment through carefully designed structured dialogue. The evidence is peer-reviewed. The effect sizes are large.

Costello, Pennycook and Rand · Science · 2024
False beliefs cut by ~20%
In dialogues with 2,190 conspiracy believers, brief personalized AI conversations produced a durable reduction of approximately 20% in belief, persisting for at least two months. The effect was larger than prior interventions encouraging reflective thinking, which yielded reductions of only one to six points on comparable scales.
Xu et al. · Preprint · 2025
2× the CDC brochure
AI dialogue addressing HPV vaccine concerns produced more than twice the increase in vaccination intentions compared to a standard CDC brochure, with no significant moderators across age, race, education, or political party.
Hou et al. · Nature Medicine · 2025
3.85×
A Socratic AI chatbot increased vaccine receipt or appointments by a factor of 3.85 in a cluster-randomized trial of 2,671 parents. A peer-reviewed RCT of the exact mechanism VeriHealth employs.

"The design of the interaction, not the model's capabilities, determines whether AI deepens the gap or develops the discernment to close it."

Unsafe by Design, VeriHealth NFP, 2026

The VideoBot

A two-stage platform for health reasoning and more reliable health information

VeriHealth develops the framework for evaluating any health claim from any source, and applies that framework in real time at the moment it is needed. Two components, designed to work in sequence. Each deployable independently. Together, designed to produce outcomes that exceed either component independently.

Component One

Animated educational modules

Short-form animated video (30 to 120 seconds) developing health reasoning skills and foundational health knowledge. Animation is the right medium: a 2024 systematic review found that animated video generally improved health information recall, with the strongest effects for low health literacy populations. Eye-tracking studies show that stylized animated instructors direct attention toward content rather than the presenter, reducing extraneous cognitive load. Modules are available in English and Spanish, designed to work across literacy levels.

Current modules develop the capacity to

  • Understand how vaccines produce immunity and evaluate the claims you encounter about them
  • Recognize misleading health framing, including correlation presented as causation
  • Distinguish authority from evidence when evaluating health endorsements
  • Ask better questions of AI health tools and recognize incomplete answers
Component Two

The Structured Socratic Interface

The SSI addresses the Interaction Gap identified by Bean et al. (2026): the failure mode in which AI chatbots have correct knowledge but the free-text interface prevents users from accessing it. Users ask imprecise questions, embed false assumptions, and receive confident answers they cannot evaluate. The SSI restructures that interface.

On intake, it clarifies what the user is asking, surfaces embedded assumptions, and formulates a precise clinical query. On output, it renders the response in language the user can act on, with a probabilistic overlay that replaces false certainty with calibrated confidence.

Core functions

  • Structured intake replaces free-text chat with guided clinical dialogue
  • Assumption surfacing identifies false premises before they are accommodated
  • Probabilistic overlay replaces false certainty with calibrated confidence
  • Active redirection away from dangerous questions, not accommodation of them
  • Designed on the Socratic dialogue model whose mechanism is validated in peer-reviewed RCTs

The SSI is designed as a model-agnostic interaction layer, deployable against any medical LLM via API, with or without VeriHealth’s video modules.

Evidence Base

Every design decision is grounded in the literature

Animated video for health literacy

Hansen et al. (JMIR, 2024) systematic review of RCTs; Li, Wang and Mayer (British Journal of Educational Psychology, 2024) eye-tracking study; Meppelink et al. (JMIR, 2015) on attitude change in low-literacy populations.

Socratic dialogue for belief change

Costello, Pennycook and Rand (Science, 2024) on durable conspiracy belief reduction; Hou et al. (Nature Medicine, 2025) cluster-RCT on vaccine behavior; Simchon et al. (Current Opinion in Psychology, 2025) meta-analysis of 33 inoculation experiments.

Structured intake for the Interaction Gap

Bean et al. (Nature Medicine, 2026) Oxford interactive trial demonstrating that free-text chat causes users to perform worse than internet search; Sambara et al. (arXiv, 2026) on false premise accommodation in medical AI chatbots.

Why This Approach

Why we develop reasoning, not deliver facts

The evidence behind the design.

The Opening Argument

Think about the last time someone corrected a piece of health misinformation you had believed. You updated that belief. Then you encountered the next false claim. You had no more tools to evaluate it than you did before. The correction gave you an answer. It did not give you the ability to find the right answer on your own.

That is the central failure of the dominant approach to health misinformation. It corrects answers. It does not develop discernment. That is why VeriHealth is designed differently.

The Standard Approach

Correction

When a false health claim spreads, the standard response is to correct it: flag the article, issue an accurate statement, counter with facts. The approach is intuitive. It also has three structural limits.

It is reactive: it addresses claims after they have spread. It does not scale: there are far more false claims than corrective resources. And it builds no capacity to evaluate the next false claim. It is treatment, not inoculation.

VeriHealth's Approach

Reasoning

Build the cognitive framework to evaluate any claim from any source. One intervention produces durable, transferable resistance, persisting for months and extending to claims never covered in the original intervention.

The goal is not to tell people what is true. It is to develop the framework that lets them find it themselves.

What the Evidence Shows

Three bodies of evidence establish that reasoning-based interventions work

TITAN Consortium · EU Horizon Europe · 2022–2025

Across 14 partners and multiple languages, the TITAN consortium demonstrated that Socratic coaching produces more durable resistance to health misinformation than content correction. The core finding: the failure is cognitive, not informational. People lack structured frameworks for evaluating health claims. The solution is not better content. It is a developed capacity to evaluate it.

Costello, Pennycook and Rand · Science · 2024

In a study of 2,190 participants, brief, personalized AI conversations produced a durable reduction of about 20% in false beliefs, persisting for months. The critical feature was engaging each person's specific reasoning rather than delivering a generic correction, the same person-specific principle the SSI applies through Socratic questioning.

Why Both Stages Are Necessary

Why the two stages must work together

The animated modules and the SSI address different cognitive tasks, and both are necessary.

The modules build the declarative foundation of health discernment: what a confidence interval means, what hallucination looks like in practice, how to recognize misleading framing. The SSI develops the applied capacity: testing that foundation against the user's own specific question, surfacing the assumption embedded in it, and adapting in real time to where the reasoning needs to go.

These are cognitively distinct processes that require different pedagogical modes. A structured lesson can efficiently establish conceptual frameworks. It cannot engage individual misconceptions or surface specific reasoning errors in real time. That is what Socratic dialogue does. The two are designed to work together as complementary stages of a single intervention, and using both is our preferred path to durable health discernment.

Video Library

Free animated modules for anyone who seeks health information online

VeriHealth modules develop health reasoning skills, starting with the foundational knowledge that makes those skills work. Available in English and Spanish, produced with a human cultural adaptation review.

All Modules

Institutional
LSI Year 2 Update
VeriHealth year two update for the Leadership in Science Initiative.
English
AI Crisis
The AI Health Crisis
An introduction to the health information crisis for clinicians, funders, and researchers. This module explains why medical misinformation has reached a new level of danger, and why developing health discernment is the right response to it.
English
Measles
Measles Vaccine Safety
Measles is preventable but misinformation has contributed to its return in communities across the United States. This module explains what measles is, how the vaccine works, and how to evaluate the health claims you encounter about it.
English
Measles
Seguridad de la Vacuna contra el Sarampón
El sarampón es prevenible, pero la desinformación ha contribuido a su regreso en comunidades de todo Estados Unidos. Este módulo explica qué es el sarampón, cómo funciona la vacuna y cómo evaluar las afirmaciones de salud que encuentras en las redes sociales, con tu familia y con tu médico.
Spanish
HPV — For Parents
HPV Vaccination for Parents
HPV is one of the most common infections in the world and one of the most preventable. This module helps parents understand what HPV is, why vaccination matters, and how to evaluate the health information they encounter.
English
HPV — For Parents
Vacuna contra el VPH para Padres
El VPH es una de las infecciones más comunes en el mundo y una de las más prevenibles. Este módulo ayuda a los padres a entender qué es el VPH, por qué es importante la vacunación y cómo evaluar la información de salud que encuentran, incluyendo lo que dicen las redes sociales y los chatbots de IA.
Spanish
HPV — For Kids
HPV Vaccination for Kids
This module explains HPV vaccination in language designed for children and young adolescents: what the vaccine is, why doctors recommend it, and why accurate health information matters.
English
HPV — For Kids
Vacuna contra el VPH para Niños
Este módulo explica la vacuna contra el VPH en un lenguaje diseñado para niños y adolescentes: qué es la vacuna, por qué la recomiendan los médicos y por qué obtener información de salud precisa importa, sin importar de dónde venga.
Spanish
Health Reasoning
Correlation Is Not Causation
Ice cream does not cause shark attacks, but the same reasoning error that makes that claim seem plausible drives some of the most dangerous health misinformation online. This module develops the skill of telling the difference between correlation and causation.
English
Health Reasoning
Why You Shouldn’t Trust Health Endorsements
Endorsements from doctors and celebrities feel authoritative, but authority is not the same as evidence. This module develops the practice of asking what to verify before trusting any health claim, regardless of who is making it.
English
Health Reasoning
How to Get Better Health Answers from AI
AI health tools give incomplete answers when they receive incomplete questions, and they will not tell you what they missed. This module develops the skill of asking better questions so you get information you can actually act on.
English

White Paper

Unsafe by Design

Consumer Health AI Failure Modes, Their Solvability, and a Two-Stage Educational Response

PDF

Unsafe by Design: Consumer Health AI Failure Modes, Their Solvability, and a Two-Stage Educational Response

Michael P. Walsh  ·  May 2026  ·  Version 1.11  ·  60 verified sources

Request a copy →

About This Paper

This paper documents a health information crisis with a specific mechanism: a reasoning gap between what the information environment demands of users and what they have been equipped to provide. The Interaction Gap is not a technology failure. It is the predictable consequence of delivering answers to people who needed scaffolding for reasoning instead. The paper categorizes AI health tool failures across two distinct layers and presents the evidence that the mechanism producing durable health behavior change is not information delivery but structured reasoning dialogue that changes how people evaluate health information, in a transferable way and at the moment that reasoning is needed.

Abstract

The core finding from the 2026 peer-reviewed literature is that AI chatbots score 95% on standardized clinical scenarios yet give study participants the wrong answer roughly two times in three. This paper explains why that gap exists, and what it will take to close it.

Large Language Models (LLMs) have become a primary source of health information for hundreds of millions of people worldwide, yet the evidence base for their clinical safety reveals a set of failures that constitute a genuine public health crisis. This paper synthesizes findings from a landmark randomized trial published in Nature Medicine in February 2026 demonstrating that study participants using AI health tools perform worse than users of standard web search, alongside a broader evidence base comprising more than 40 peer-reviewed studies from 2023 to 2026, supplemented by preprint evidence and foundational literature in educational psychology and multimedia learning.

AI has become an active structural force in how health information is generated, consumed, trusted, and misunderstood. Its failures operate across two layers: at the model layer, where the technology generates plausible but fabricated content and absorbs misinformation from the environment it inhabits; and at the interface layer, where product design actively degrades users’ capacity to evaluate what they receive, systematically accommodates dangerous false premises, and amplifies existing health misconceptions at scale. The same commercial pressures that created these failures have also systematically eroded the safeguards that once warned users of the technology’s limitations.

Critically, this paper argues that the crisis has a specific mechanism: a reasoning gap between what the health information environment demands of its users and what those users have been equipped to provide. People are not failing to find health information. They are evaluating it without the cognitive tools the task requires, and that evaluation consistently leads them to wrong conclusions. The proliferation of AI-powered health tools has widened this gap rather than closing it, delivering diagnostic conclusions to users who lack the reasoning framework to evaluate them reliably. The Bean et al. finding that study participants using current AI health tools perform worse than users of standard web search is a direct measurement of this dynamic.

We further present evidence that the same technology, redesigned around structured dialogue rather than conversational answer-delivery, can produce durable reductions in health misinformation and measurable improvements in health behavior. Drawing on the multimedia learning and prebunking literatures, we describe a two-stage educational architecture (structured animated video instruction followed by personalized Socratic dialogue) that operationalizes this evidence into a coherent public health tool. We outline a research program to evaluate this architecture in clinical contexts and propose a broader research agenda for the field of consumer health AI.

Table of Contents

1Introduction: AI as the New Infrastructure of Health Information
2The Failure of Evaluation: Why We Were Fooled
3The Interaction Gap: When Knowledge Fails in Practice
4The Hallucination Crisis: Fabrication of Medical Authority
5AI as an Active Driver of Health Misinformation
6Safety Failures in High-Stakes Domains
7Bias, Equity, and the Digital Determinants of Health
8Consumer Trust and the Epistemological Trap
9The Disappearance of Safeguards
10The Liability Vacuum
11The Solvability Paradox: AI as Both Problem and Solution
12A Framework for Solvability
13A Research Agenda
AVerified Source Registry (60 sources)
BA Note on Methodology and Source Verification

Evidence Base

The peer-reviewed case for VeriHealth's approach

A curated registry of peer-reviewed literature informing VeriHealth's design. Every source has been independently verified against the primary publication.

Every design decision VeriHealth has made (the two-stage architecture, the animated format, the Socratic dialogue structure) is traceable to a specific finding in the peer-reviewed literature. Each choice reflects the evidence on what develops genuine health discernment rather than what merely delivers information. This table is that record.

Authors and YearJournalFindingCategory
Gong et al., 2025
J Med Internet Res 27:e84120
10.2196/84120Systematic review of 39 medical LLM benchmarks. Knowledge-based benchmarks: 84%-90% accuracy. Practice-based benchmarks: 45%-69%. Safety assessment accuracy: 40%-50%. Examination scores are insufficient and misleading proxies for clinical readiness.Interaction
Bean et al., 2026
Nature Medicine 32:609–615
10.1038/s41591-025-04074-yRCT of 1,298 participants. LLMs score 94.9% in isolation; study participants achieve under 34.5% condition identification. Internet search outperforms all AI chatbot groups by 1.76 times. AI chatbot users had 36% lower odds of recognizing urgent red-flag symptoms than internet search users (inverse of reported OR 1.57).Interaction
Ramaswamy et al., 2026
Nature Medicine
10.1038/s41591-026-04297-7Stress test of ChatGPT Health: 60 clinician-authored vignettes, 960 total responses across 16 factorial conditions. Prior low-acuity framing increased under-triage probability by OR 11.7 (95% CI: 3.7 to 36.6). Over-triaged 64.8% of non-urgent cases.Interaction
Sambara et al., 2026
arXiv (MedRedFlag)
arXiv:2601.09853Even when frontier models detect dangerous false assumptions, they accommodate them in 60 to 74% of cases. GPT-5 detects 88% of false premises but accommodates 73%.Interaction
Goh et al., 2024
JAMA Network Open
10.1001/jamanetworkopen.2024.40969Physicians randomized to GPT-4 showed no improvement over those using conventional resources (76% vs 74%, p=0.60). GPT-4 alone scored 16 points higher than either physician group.Interaction
Omar et al., 2026
Lancet Digital Health 8:100949
10.1016/j.landig.2025.1009493.4 million prompts across 20 LLMs. Overall susceptibility to fabricated health claims: 31.7%. Fabricated clinical notes: 46.1% susceptibility.Hallucination
Linardon et al., 2025
JMIR Mental Health
10.2196/8037119.9% of GPT-4o citations entirely fabricated; 45.4% of real citations contained errors. Fabrication highest for less-studied topics: 28 to 29% for eating disorders vs 6% for depression.Hallucination
Griot et al., 2025
Nature Communications
10.1038/s41467-024-55628-6Across 12 models, nearly all scored 0% on unanswerable question identification. Best performer (GPT-4o) achieved only 3.7%. Confident wrongness is the default.Hallucination
Omar et al., 2025
npj Digital Medicine
10.1038/s41746-025-01943-1Medical disclaimers in AI health outputs dropped from 26.3% in 2022 to 0.97% in 2025. Linear decline (R²=0.944), reduction of 8.1 percentage points per year.Hallucination
Costello, Pennycook and Rand, 2024
Science 385:eadq1814
10.1126/science.adq1814Over 2,190 conspiracy believers. Brief personalized AI conversation produced a durable reduction of approximately 20% in conspiracy beliefs (relative reduction; 16.8 and 12.3 points across two studies), persisting two months. Effect larger than reflective thinking interventions, which yielded only one- to six-point reductions on comparable scales.Intervention
Hou et al., 2025
Nature Medicine 31:1855–1862
10.1038/s41591-025-03618-6Cluster-RCT of 2,671 parents. Socratic AI chatbot increased vaccine receipt or appointments 3.85 times vs usual care. Improved vaccine literacy and health discernment.Intervention
Simchon et al., 2025
Current Opinion in Psychology
10.1016/j.copsyc.2025.101994Meta-analysis of 33 inoculation experiments, 37,025 participants. Inoculation improved discrimination without increasing response bias. Participants became more discerning, not uniformly skeptical.Intervention
Hansen et al., 2024
JMIR 26:e58306
10.2196/58306Systematic review of RCTs. Animated video consistently improved health information recall, with strongest effects for individuals with low health literacy.Intervention
Omar et al., 2025
International Journal for Equity in Health
10.1186/s12939-025-02419-0Systematic review of 24 studies. 91.7% identified biases. Gender bias in 93.7% of studies. Racial or ethnic biases in 90.9%. Bias in medical LLMs is pervasive and systemic.Equity
Chen et al., 2026
Nature Medicine
10.1038/s41591-026-04229-5Review of 4,609 LLM clinical studies. 45.9% from the U.S., 7.6% from the U.K. LLM safety profiles in non-English languages remain largely uncharacterized.Equity
Allen et al., 2024
Science
10.1126/science.adk3451Vaccine-skeptical content from mainstream outlets reduced vaccination intentions 46 times more than flagged misinformation. A single headline reached more than 50 million people.Landscape
Van der Linden and Kyrychenko, 2024
Science 384:959–960
10.1126/science.adp9117The dominant threat is technically true but misleadingly framed content. Demanding unattainable causal proof before acting on misinformation evidence serves inaction, not public health.Landscape
Montero et al., 2026
KFF Tracking Poll
kff.org, March 25, 202632% of U.S. adults used AI chatbots for health advice in the past year. 92% report satisfaction. Usage rising fastest among younger adults and those who cannot access or afford a physician.Landscape
ECRI, 2026
Top 10 Health Technology Hazards
ecri.orgMisuse of AI chatbots ranked #1 health technology hazard for 2026 by the independent nonpartisan patient safety organization ECRI. Chatbots produce authoritative-sounding responses that are not regulated as medical devices, not validated for clinical use, and programmed to satisfy users rather than provide accurate answers.Report
Edelman Trust Institute, 2026
Trust Barometer: Trust and Health · N=16,009 · 16 markets
edelman.com, March 202651% of people globally are confident in their ability to find and evaluate health information, down 10 points in one year. Statistically significant declines in 14 of 16 markets, consistent across age, education, and political affiliation. 70% hold at least one divisive health belief at equal rates across education levels. People with more divisive beliefs are most likely to consult AI for health guidance (61% monthly vs. 19%).Report

Research Agenda

Six research questions VeriHealth is built to answer

VeriHealth is not only building a platform. It is defining a research program at the intersection of educational psychology, misinformation science, and consumer health AI design.

VeriHealth's platform is designed to be evaluated, not just deployed. The six questions below are open questions for the field. The answers will matter beyond VeriHealth.

1
Interactive safety testing as a standard

Bean et al. (2026) demonstrated that in-silico benchmarks do not predict real-world safety. Any evaluation framework for consumer health AI must include interactive testing with diverse human populations before deployment, analogous to clinical trials for medication. The field needs standardized protocols for that testing. VeriHealth is building them.

MethodologySafety evaluation
2
The structured dialogue research program

AI can both worsen and improve health reasoning depending on interaction design. The specific question VeriHealth is built to answer: can a purpose-built Interaction Layer, deployable across any underlying medical LLM, reproducibly close the gap Bean et al. documented? Costello et al. provide proof of concept for the mechanism. What is needed now is a systematic program to identify generalizable design principles across health domains.

Interaction designRCT
3
The two-stage architecture research program

VeriHealth's two-stage design has strong theoretical grounding but lacks direct empirical evaluation in medical misinformation contexts. Key open questions: Does video-first, followed by dialogue, produce better outcomes than either component alone? Does a health reasoning intervention produce transfer to claims not covered in the instructional content? Does population heterogeneity in AI trust levels predict differential response? These questions require randomized trials with interactive human user testing and diverse populations.

Architecture designTransfer effects
4
Model immunization and the inoculation frontier

Van der Linden and Kyrychenko (2026) propose extending psychological inoculation to the models themselves: training LLMs to reject misinformation the way humans can be inoculated against it. The question of whether the same inoculation principle can operate at both the human and the model level is an open frontier for the field.

Model safetyInoculation
5
Longitudinal effects of habitual use of AI health tools

Current studies provide snapshots at single time points. No study has tracked the longitudinal effects of habitual use of AI health tools on a population's health literacy, reasoning capacity, or clinical outcomes. Given the Interaction Gap finding from the Oxford trial, and the emerging preprint evidence that habitual AI use erodes independent reasoning capacity, this is an urgent gap. Additionally, because models update continuously and silently, one-time testing is insufficient. An independent monitoring infrastructure, analogous to pharmacovigilance for medications, should track safety signals from deployed AI health tools on an ongoing basis.

LongitudinalSurveillance
6
The translation problem

A researchable problem in its own right: how do you communicate AI health risk to a public that experiences AI as helpful? What vocabulary works? What framing produces behavioral change rather than dismissal? What visual metaphors convey the Interaction Gap without inducing either panic or complacency? These questions are amenable to the experimental methodology used in the inoculation literature, and answering them is a prerequisite for any effective public health response. This is not a communications afterthought. It is a core scientific question.

Health communicationBehavioral science

The Crisis in Numbers

The numbers behind the crisis, and the intervention

All figures are drawn from peer-reviewed sources independently verified against primary publications. Each number represents a finding with direct implications for how health AI is deployed and regulated.

Every figure is from a peer-reviewed or independently verified source. Together they make the case that the crisis is real, the failure is structural, and the intervention evidence is strong.

AI performance: benchmarks vs. real use

Bean et al., Nature Medicine, 2026 · Gong et al., JMIR, 2025
Medical exam performance (in silico)
95%
Practice-based clinical benchmarks
45–69%
Safety assessment accuracy
40–50%
People: condition identification
35%
Urgent red flag recognition (vs. search)
36% lower odds

ChatGPT Health triage errors

Ramaswamy et al., Nature Medicine, 2026
Anchoring effect on under-triage (OR)
11.7×
Non-urgent cases over-triaged
65%

AI misinformation susceptibility by corpus type

Omar et al., Lancet Digital Health, 2026 · 3.4M prompts, 20 LLMs
Fabricated hospital discharge notes
46%
All prompt types (overall rate)
32%
Social media misinformation
9%

Safety disclaimer erosion, 2022 to 2025

Omar et al., npj Digital Medicine, 2025 · R²=0.944
2022 (baseline)
26%
2023
18%
2024
9%
2025
1%

Structured AI dialogue: intervention efficacy

Costello et al., Science, 2024 · Hou et al., Nature Medicine, 2025
False beliefs cut by ~20% (Costello et al.)
~20%
Vaccine uptake vs usual care (Hou et al.)
3.85×

Citation fabrication in medical AI

Linardon et al., JMIR Mental Health, 2025
Citations entirely fabricated
20%
Real citations containing errors
45%

Institutional consensus: AI chatbot misuse as patient safety hazard

ECRI Top 10 Health Technology Hazards, 2026
#1
Ranked the most significant health technology hazard of 2026 by ECRI, the independent nonpartisan patient safety organization, ahead of system outages and substandard medical products.
Key finding
Chatbots are programmed to satisfy the user rather than provide accurate answers. They are not regulated as medical devices and not validated for clinical use.
Context
AI chatbot misuse also ranked 5th on ECRI's 2024 hazards list. The trajectory is consistent with the research base VeriHealth tracks.

Public confidence in health information: a crisis in real time

Edelman Trust Barometer Special Report: Trust and Health, 2026 · N=16,009 · 16 markets
51%
Of people globally are confident in their ability to find and evaluate health information. Down 10 points in a single year. Statistically significant declines in 14 of 16 markets.
Who holds divisive health beliefs
70% of people hold at least one misleading health belief. The rate is virtually identical across education levels, age groups, and political leanings -- this is not an education problem.
Who turns to AI
People with more divisive health beliefs are more likely to consult AI for health guidance -- 61% monthly among those with many beliefs, versus 19% among those with none. The highest-risk users are the most active AI health users.

Our Team

The disciplines required to solve this problem, in one organization

VeriHealth was founded by people who speak the languages this problem requires: medical science, public health, health communication, and the languages of the communities most affected. The team exists to close the gap between what the evidence shows and what a parent can understand at midnight.

Leadership

MW
Michael P. Walsh, MBA
President and Co-founder
Harvard AB, Biochemistry · Harvard Business School MBA · UChicago Leadership & Society Fellow

Founder of Kilkenny Capital Management, where he raised and invested nearly $500 million in biotechnology assets over fifteen years. His synthesis of recent peer-reviewed evidence on consumer health AI failures is the evidentiary foundation of VeriHealth's platform design.

CM
Catherine McCarthy
Chief Content Officer and Co-founder
University of Leicester · UChicago Leadership & Society Fellow

Former BBC Senior Executive, WHO Media Consultant, and CEO of Medical Aid Films, where she built a library of animated films that taught vital knowledge and skills on women's and children's health to disadvantaged communities worldwide. She brings to VeriHealth the operational infrastructure of health communication at scale.

SA
Smitha Arekapudi, MD, MBA, ScM
Project Director and Co-founder
Swarthmore BA · Harvard ScM, NCI Fellow · Vanderbilt MD · Kellogg MBA · Diplomate, American Board of Anesthesiology · Fellow, American Society of Anesthesiologists

A practicing anesthesiologist with graduate training in epidemiology and cancer prevention policy, and leadership roles at the American Medical Association and American Society of Anesthesiologists. She brings to VeriHealth fluency in clinical medicine, public health methodology, and the institutional language of health systems: three languages that rarely coexist in one person.

EM
Everly Macario, ScD, ScM, EdM
Senior Research Advisor and Co-founder
Harvard School of Public Health ScD, ScM · Harvard Graduate School of Education EdM

Bilingual behavioral scientist and Director of Primary Care Research at the American Academy of Pediatrics, where she oversees vaccine hesitancy research through a national pediatric practice network. Co-founder of the MRSA Research Center at the University of Chicago, she brings to VeriHealth both the research infrastructure and the lived understanding of what it costs when families cannot access accurate health information.

Senior Medical Advisor

KP
Senior Medical Advisor
Kenneth Polonsky, MD
MD, University of Witwatersrand · Fellowship in Endocrinology, University of Chicago · National Academy of Medicine

Former President of the University of Chicago Medicine health system, Dean of the Pritzker School of Medicine, and Executive Vice President for Medical Affairs at the University of Chicago. A member of the National Academy of Medicine with more than 250 peer-reviewed publications, he connects VeriHealth to the academic medical center community.

Research Relationships

Oxford Internet Institute, University of Oxford

Michael Walsh conceived VeriHealth's Structured Socratic Interface in early 2025, before the Oxford Internet Institute's Bean et al. study was published. When that study appeared in Nature Medicine in February 2026, it independently confirmed the design principle VeriHealth had already built toward. That convergence is the basis of a developing research relationship with OII investigators Luc Rocher and Adam Mahdi.

TITAN Consortium · EU Horizon Europe

VeriHealth's Socratic design methodology draws directly on the TITAN project (5.7 million euros, 14 partners, 2022 to 2025), which demonstrated across multiple languages and cultural contexts that coaching people to reason about misleading information produces more durable resistance than content correction. The TITAN findings provide the strongest cross-cultural evidence base for the reasoning-first approach VeriHealth employs.

Why This Team

Medical misinformation is not a knowledge problem. It is a structural one: the framework required to evaluate health information well has never been part of general education, never been equally distributed, and nothing has replaced it. VeriHealth's founding team was assembled specifically around that gap. Every member is fluent in at least one of the languages the gap produces: medical, public health, health communication, and the languages of the communities most affected.

The founding insight was simple and serious: people who speak medical language and can hear misinformation as distortion have an obligation that those who cannot do so do not share. VeriHealth was founded by people who heard it, and decided that hearing it without acting on it was no longer acceptable.

News and Updates

VeriHealth and the field

Organizational updates from VeriHealth alongside key developments in the research and policy landscape we work within.

From VeriHealth

VeriHealth organizational announcements will appear here. Grant decisions, research milestones, clinical partnerships, and published work.

Medical Misinformation in the News
Jan 2026

ECRI, the independent nonpartisan patient safety organization, ranked AI chatbot misuse first on its annual Top 10 Health Technology Hazards report. The finding: chatbots produce authoritative-sounding responses that are not regulated as medical devices, not validated for clinical use, and capable of providing false or misleading information with significant patient harm implications. The report notes that chatbots are programmed to satisfy the user rather than provide accurate answers.

ECRI · Patient Safety
Mar 2026
Edelman Trust Barometer 2026: Global confidence in health information collapses

The annual Edelman Trust and Health survey of 16,000 people across 16 markets finds that only 51% of people globally are confident in their ability to find and evaluate health information, a decline of 10 points in a single year. The decline is statistically significant in 14 of 16 markets and is consistent across age groups, education levels, and political leanings. Separately, 70% of respondents hold at least one divisive health belief, and those with more divisive beliefs are more likely to turn to AI for health guidance.

Edelman Trust Institute · Global Survey
Apr 2026

BBC Inside Health features Oxford's Adam Mahdi explaining why AI scores 95% in isolation but users get the right answer only 35% of the time. England's Chief Medical Officer warns that chatbot answers are "both confident and wrong." Includes real patient accounts of AI advice gone right, and dangerously wrong.

BBC
Mar 2026

NPR covers both the Bean and Ramaswamy findings. Bean identifies the core problem as a two-way communication breakdown: users don't know what information AI needs, and the responses combine good and poor recommendations in ways that are difficult to distinguish.

NPR
Feb 2026

The New York Times Morning Briefing covers the published Bean et al. finding and its implications for the millions of Americans now turning to AI for health advice.

The New York Times
Feb 2026

The paper demonstrating that structured AI dialogue reduces conspiracy beliefs by 20%, the intervention evidence at the core of VeriHealth's design, wins the oldest award given by the American Association for the Advancement of Science. The prize last went to a social science paper in 1981.

Cornell University · AAAS
Nov 2025

A deeply reported Times feature on Americans using AI chatbots to compensate for a health system that leaves them without answers, and the risks that misplaced trust creates. References Oxford research and Harvard Medical School findings on AI sycophancy and false premise accommodation.

The New York Times
Mar 2025

The Economist covers Costello et al. and the emerging science of inoculation and critical thinking education as tools against misinformation: the two mechanisms at the core of VeriHealth's platform design.

The Economist
Sep 2024

A Perspective published in Science alongside Costello et al. Bago and Bonnefon assess the findings and conclude that a scalable intervention to recalibrate misinformed beliefs may be within reach, while raising the question of whether people will voluntarily engage with an AI designed to challenge what they believe.

Science

Get Involved

The infrastructure gap is real. The evidence for the solution exists. The work is now.

VeriHealth is at the stage where the right partnerships, with funders, clinical institutions, and research collaborators, determine whether a proven intervention reaches the populations that need it.

For Funders

We are seeking philanthropic and institutional funding to take a platform with peer-reviewed evidence and a clinical deployment pathway from prototype to population scale. Funding priorities below.

  • Platform development and multilingual content production
  • Community-based participatory user research
  • Clinical pilot design and IRB protocol
  • Peer-reviewed evaluation and publication
  • Community health worker network dissemination
Discuss funding →

For Clinical Collaborators

We are seeking clinical partners at major academic medical centers and community health institutions serving populations with limited access to trusted clinical relationships, across languages and literacy levels. VeriHealth is based in Chicago and is actively developing relationships with Chicago-area health systems.

  • Maternal and pediatric health settings
  • Community health centers and safety-net hospitals
  • Spanish-language and multilingual content at launch
  • IRB collaboration and study design
  • Community health worker integration
  • Compatible with existing health system AI infrastructure via API
Discuss collaboration →

For Research Collaborators

The research questions VeriHealth is built to answer are open questions for the field. We are seeking partners with expertise in health communication, misinformation science, and AI interaction design to help answer them. The answers will matter beyond VeriHealth.

  • Intervention evaluation and randomized trial design
  • Health communication and misinformation measurement
  • AI interaction and human factors research
  • IRB collaboration and institutional partnership
  • Peer-reviewed publication and field-building
  • Model-agnostic architecture enables cross-model Interaction Layer evaluation
Discuss research collaboration →

Chicago-based, nationally oriented. VeriHealth NFP is headquartered in Chicago, Illinois, with developing relationships across the Chicago academic medical center ecosystem. The platform is designed for national distribution through community health worker networks and trusted medical messengers at the point of care, free in multiple languages at launch.

Ready to talk?

Whether you are a funder, a clinical partner, or a research collaborator, we welcome the conversation.