Google Traditional Search Offers More Reliable Health Information Than AI Chatbots, Says Study

A new peer-reviewed study titled “The Reliability Gap: How Traditional Search Engines Outperform Artificial Intelligence (AI) Chatbots in Rosacea Public Health Information Quality”, published by Cureus, has just been released.

It explores an important and timely question: Can we truly rely on AI chatbots like ChatGPT and Gemini for accurate health information?

As more people turn to the internet and increasingly to generative AI tools for answers to their medical concerns, the line between convenience and credibility continues to blur.

generative AI tools

With just a few clicks, anyone can receive detailed responses from AI platforms but the reliability of that information remains uncertain.

This study directly compares the quality, credibility and readability of health-related content across four major platforms: Google, Bing, ChatGPT (GPT-3.5) and Gemini.

And the findings? As you will soon discover, they were both revealing and thought-provoking.

Interesting study from Cureus (peer reviewed) comparing Google/Bing to ChatGPT/Gemini for researching a health condition -> The Reliability Gap: How Traditional Search Engines Outperform Artificial Intelligence (AI) Chatbots in Rosacea Public Health Information Quality

“The goal… pic.twitter.com/kAKvgQ2Zqg

— Glenn Gabe (@glenngabe) June 23, 2025

What Was the Study All About?

The research was designed to test the accuracy, credibility, and readability of health information found online.

information found online

With rosacea as the focus, the study compared four platforms: Google, Bing, ChatGPT (GPT-3.5), and Gemini.

These represent two categories of online information sources including traditional search engines and modern AI chatbots.

Why rosacea? The condition affects millions and is frequently searched by individuals looking for information on symptoms, causes and treatment. It was a suitable subject for evaluating how well each platform delivers medical information meant for public understanding.

Each of the four platforms was asked the same question on December 4, 2024, essentially a search for the word “rosacea.” From there, researchers gathered and analyzed the top 20 results provided by each. They used three well-established evaluation tools to assess the content.

Three key tools were used to evaluate the content

DISCERN: It is used to evaluate the quality of consumer health information, especially related to treatment options.
JAMA Benchmark Criteria: Measures the credibility of online content by evaluating authorship, attribution of sources, currency of information, and conflict of interest disclosure.
Flesch-Kincaid Readability Metrics: Determines how easy or difficult the content is to read, giving a reading ease score and estimated U.S. grade level.

To ensure objectivity, the content was screened to include only freely accessible and English-language material meant for public use excluding paywalled content, forums like Reddit, advertisements and duplicate entries.

Two independent reviewers rated the content and their agreement was tested using Cohen’s Kappa, which showed strong inter-rater reliability ranging from 0.56 to 1.0.

So, What Did the Study Find?

The numbers speak volumes.

Google stood out as the most reliable source across all key evaluation metrics. It earned the highest DISCERN score 3.33 out of 5 for content quality and also topped the JAMA benchmark with a score of 3.70 out of 4 for credibility.

Bing followed closely behind but did not outperform Google.

On the other hand, ChatGPT showed the lowest performance, scoring just 2.20 for content quality and 2.38 for credibility.

Gemini did not fare much better, showing significant shortcomings as well. Even though these AI platforms offer clean and conversational answers, they simply did not meet the standards of evidence-based health communication.

Let’s break down some of the average scores for clarity:

Platform	Quality (DISCERN)	Credibility (JAMA)	Reading Ease Score	Grade Level
Google	3.33 ± 0.53	3.70 ± 0.44	44.91	11.6
Bing	3.13 ± 0.91	3.48 ± 0.92	33.62	13.6
Gemini	2.67 ± 0.87	3.15 ± 1.15	33.55	15.3
ChatGPT	2.20 ± 1.32	2.38 ± 1.44	31.26	14

While the readability scores did not vary as drastically as the quality and credibility scores, none of the platforms provided information at the recommended 6th–8th grade reading level. As advised by public health organizations like the American Medical Association (AMA) and the National Institutes of Health (NIH).

This means that most of the content whether from search engines or AI tools was written at a high school or even college reading level which makes it less accessible to the general public.

Proven Results: Data Confirms Google’s Lead in Health Information Quality

To make sure the results were not random, researchers applied ANOVA testing and Bonferroni-corrected t-tests.

These revealed that the differences between platforms, especially between ChatGPT and both Google and Bing were statistically significant with p-values less than 0.001.

In short: the performance gap is real, not just a matter of perception.

Even though readability differences across platforms were not statistically significant, they were still problematic. All platforms failed to meet the expected standard for public health communication and that alone is a concern.

Why Did AI Tools Like ChatGPT and Gemini Fall Behind?

At first impression, AI tools seem helpful. They give instant, conversational answers. But when it comes to real health knowledge, the study found serious problems.

Here is what went wrong:

Lack of Sources: ChatGPT and Gemini rarely cited proper medical sources. Sometimes they did not cite any sources at all.
Outdated or Missing Information: Some answers were missing key facts or even referred to old or incorrect data.
Too Much Jargon: The language was too technical for everyday readers; many responses were written at a college level.
No Risk Information: These AI tools often didn’t explain the risks or uncertainties of treatments, which are crucial for informed decisions.
Fake or Incomplete Citations: In some cases, AI tools made up sources or included broken links.

What Does This Means for You and for the Doctor?

If you are turning to the internet to learn about a condition like rosacea, it is still safest to rely on Google, which directs you to established and authoritative medical sites.

AI chatbots may be helpful for summarizing information or generating simplified explanations but they should not be used as the primary source for making health decisions.

For healthcare providers, the study underscores the need to discuss online information with patients not just to correct misinformation but to guide them in identifying trustworthy sources.

Physicians are increasingly becoming interpreters of digital information, helping patients distinguish between accurate content and misleading AI-generated summaries.

So, What Needs to Happen Next?

The study authors made several recommendations that could improve how AI is used in health communication:

AI developers should work on integrating citations, updating data regularly, and making content more readable.
Search engines should continue prioritizing reliable, science-based sources.
Healthcare educators and professionals must help the public improve digital health literacy.
Policymakers may need to step in and create standards for AI-generated health content to protect users from misinformation.

And for the general public, the advice is simple: always double-check health information, especially when it comes from AI platforms that don’t clearly reference their sources.

This study is a wake-up call to approach AI-generated content with healthy skepticism and importantly when it concerns your body, your diagnosis or your treatment plan.

Dileep Thekkethil

Author

Dileep Thekkethil is the Director of Marketing at Stan Ventures, where he applies over 15 years of SEO and digital marketing expertise to drive growth and authority. A former journalist with six years of experience, he combines strategic storytelling with technical know-how to help brands navigate the shift toward AI-driven search and generative engines. Dileep is a strong advocate for Google’s EEAT standards, regularly sharing real-world use cases and scenarios to demystify complex marketing trends. He is an avid gardener of tropical fruits, a motor enthusiast, and a dedicated caretaker of his pair of cockatiels.

Google Traditional Search Offers More Reliable Health Information Than AI Chatbots, Says Study

On this page

Free SEO Audit

What Was the Study All About?

Proven Results: Data Confirms Google’s Lead in Health Information Quality

Why Did AI Tools Like ChatGPT and Gemini Fall Behind?

What Does This Means for You and for the Doctor?

So, What Needs to Happen Next?

Dileep Thekkethil

Related Articles

Best Bang For Buck AI Tools…

Why the Internet Is Going Crazy…

Most Capable AI Model As Of…

Get Your Custom Proposal

On this page

Free SEO Audit

What Was the Study All About?

Proven Results: Data Confirms Google’s Lead in Health Information Quality

Why Did AI Tools Like ChatGPT and Gemini Fall Behind?

What Does This Means for You and for the Doctor?

So, What Needs to Happen Next?

Dileep Thekkethil

Related Articles

Best Bang For Buck AI Tools…

Why the Internet Is Going Crazy…

Most Capable AI Model As Of…