Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

highplainsdem

(59,307 posts)
45. Just one study:
Sun Dec 7, 2025, 09:52 PM
Sunday

ChatGPT’s Hallucination Problem: Study Finds More Than Half Of AI’s References Are Fabricated Or Contain Errors In Model GPT-4o
https://studyfinds.org/chatgpts-hallucination-problem-fabricated-references/

A Deakin University study of mental health literature reviews found that ChatGPT (GPT-4o) fabricated roughly one in five academic citations, with more than half of all citations (56%) being either fake or containing errors.

The AI’s accuracy varied dramatically by topic: depression citations were 94% real, while binge eating disorder and body dysmorphic disorder saw fabrication rates near 30%, suggesting less-studied subjects face higher risks.

Among fabricated citations that included DOIs, 64% linked to real but completely unrelated papers, making the errors harder to spot without careful verification.



I read a social media post the other day about another study showing a very high hallucination rate for AI summaries. Didn't bookmark it, so I don't have the link.

Hallucinations are inevitable with genAI.

Another study:

Language models cannot reliably distinguish belief from knowledge and fact
https://www.nature.com/articles/s42256-025-01113-8

As language models (LMs) increasingly infiltrate into high-stakes domains such as law, medicine, journalism and science, their ability to distinguish belief from knowledge, and fact from fiction, becomes imperative. Failure to make such distinctions can mislead diagnoses, distort judicial judgments and amplify misinformation. Here we evaluate 24 cutting-edge LMs using a new KaBLE benchmark of 13,000 questions across 13 epistemic tasks. Our findings reveal crucial limitations. In particular, all models tested systematically fail to acknowledge first-person false beliefs, with GPT-4o dropping from 98.2% to 64.4% accuracy and DeepSeek R1 plummeting from over 90% to 14.4%. Further, models process third-person false beliefs with substantially higher accuracy (95% for newer models; 79% for older ones) than first-person false beliefs (62.6% for newer; 52.5% for older), revealing a troubling attribution bias. We also find that, while recent models show competence in recursive knowledge tasks, they still rely on inconsistent reasoning strategies, suggesting superficial pattern matching rather than robust epistemic understanding. Most models lack a robust understanding of the factive nature of knowledge, that knowledge inherently requires truth. These limitations necessitate urgent improvements before deploying LMs in high-stakes domains where epistemic distinctions are crucial.



Recommendations

1 members have recommended this reply (displayed in chronological order):

Critical work was and is done slowly and carefully. usonian Sunday #1
Now it is on my mind MuseRider Sunday #2
I'm 56 and I've only seen a slide rule once or twice. Haggard Celine Sunday #3
False premise. NASA and the other entities involved used computers. They did not only use slide rules. Celerity Sunday #4
Didn't John Glenn ask the women mathmaticians of "Hiden Figures" to do manual calculations Deminpenn Sunday #5
Never saw that movie, but you said 'check the computer calculations' so computers were obviously used to a degree. Celerity Sunday #7
The computers... IcyPeas Sunday #15
See comment 15: the "computers" were women. TommyT139 Yesterday #57
No, the women checked the computers' outputs. Also see comments in this thread confirming that computers were used Celerity Yesterday #58
Jmo, but a lot of "AI" seems just to be a rebrand of things that Deminpenn Sunday #6
Its way more than that JCMach1 Sunday #10
A lot is what is called AI isn't. Ms. Toad Sunday #12
Then anyone could "write" such a thesis because it would require minimal knowledge and the AI highplainsdem Sunday #16
I've seen a couple reports that AI has already revealed some dangers... buzzycrumbhunger Sunday #53
. I've read a little about this, and here is what I think is going on.... reACTIONary Sunday #55
AI technology is the new reality anciano Sunday #8
You can't enhance creativity with AI, any more than you enhance creativity asking someone else to highplainsdem Sunday #9
I don't think this is fair. mr715 Sunday #14
Curious about what you mean when you say it inspires you. Do you mean you ask it for ideas? highplainsdem Sunday #18
It did a psychic reading mr715 Sunday #35
Cool 😎 .... anciano Sunday #40
See reply 37. highplainsdem Sunday #50
Okay, I'll give you an A+ for creativity just for writing a poem for a science communication workshop. highplainsdem Sunday #49
Yours is one of the few nuanced takes I've read about one of the major faults with AI... appmanga Sunday #54
Thanks, but I'm just trying to relay some of what I've heard from artists and writers and others highplainsdem 19 hrs ago #61
I mean... the poem was well received mr715 16 hrs ago #62
GenAI is very good at mimicry. highplainsdem 16 hrs ago #64
Some don't need it for creativity tinrobot Sunday #48
We didn't do it with a slide rule DavidDvorkin Sunday #11
I never used a slide rule. mr715 Sunday #13
It isn't at all cool that AI is being widely used for cheating and students are learning less as a highplainsdem Sunday #19
Students also learn more... WarGamer Sunday #22
GenAI is never hallucination-free. I don't know where you got the idea that it is. highplainsdem Sunday #23
It's because history is set. WarGamer Sunday #25
See reply 26. highplainsdem Sunday #27
If you ask Grok mr715 Sunday #41
It wasn't that long ago that Grok was identifying him as the main source of misinformation on X, highplainsdem Sunday #46
Now it does funny stuff mr715 Sunday #47
Yes. Smarter than Einstein, and more fit than LeBron James: highplainsdem Sunday #51
Re hallucinations - see this article: highplainsdem Sunday #26
Yup I'm a Gemini Pro 3.0 power user... since day 1 and version 1. WarGamer Sunday #29
You just contradicted what you said minutes ago about it being hallucination-free. highplainsdem Sunday #31
I specifically said history topics along with other disciplines that don't change and are "set" WarGamer Sunday #33
The topic doesn't matter. All genAI models can hallucinate on any topic. highplainsdem Sunday #34
*can yes... WarGamer Sunday #39
Just one study: highplainsdem Sunday #45
And see these threads about AI and hallucinations: highplainsdem Sunday #28
The undergrads I teach mr715 Sunday #36
We need to adapt. mr715 Sunday #42
I'll be the first to say it: What's a slide rule? Polybius Sunday #17
Wikipedia is very useful: highplainsdem Sunday #20
Hey, never cite wikipedia! mr715 Sunday #43
50 years ago if I told you I could hold a piece of glass and access global knowledge... WarGamer Sunday #21
You don't know if it was "dead accurate" unless you took the time to check that those were the highplainsdem Sunday #24
I did... I back checked it. WarGamer Sunday #32
Cool 😎.... anciano Sunday #30
Not exactly. highplainsdem Sunday #37
I find this discussion fascinating. It seems that the algorithm has figured out people are inherently lazy learners. cayugafalls Sunday #38
It is a fancy autocorrect mr715 Sunday #44
Like it or not, if you have a job interview these days you better have an AI story/strategy underpants Sunday #52
In some professions use of AI is a badge of dishonor. highplainsdem Yesterday #56
It's pretty mediocre jfz9580m Yesterday #59
We weren't allowed to use a slide rule in school. That was cheating. Emile Yesterday #60
Did you memorize logs? nt mr715 15 hrs ago #65
We had to walk 5 miles barefoot in snow to school too. Emile 15 hrs ago #67
Uphill both ways. mr715 15 hrs ago #68
LOL, that's right 👍. Emile 13 hrs ago #69
I wonder the same about calculators Torchlight 16 hrs ago #63
And grad students. nt mr715 15 hrs ago #66
Latest Discussions»General Discussion»I was thinking about this...»Reply #45