I believe ChatGPT generally gives accurate answers to most questions. Certainly: it produces answers that are more reliably true than a random average person. Obviously it cannot yet do advanced programming tasks: but generally it answers questions accurately.
Prove my position wrong.
What can I ask it that will produce factually incorrect answers?
As a side quest, a much easier one, what can I ask it that would cause it to produce extremely biased answers that fail to do justice to the truth of things?


LLMs are probabilistic, not deterministic, so you won’t get the exact same response every time for the exact same prompt.
Nonetheless, ChatGPT is frequently accused by its opponents of giving incorrect or false answers. I use it a lot and don’t find this to be true, so I’m wondering: what should I ask it to show me these inaccuracies?
For me, the most problematic answer was in response to: “What was India like before the British arrived?”
It presented (and still presents) an entirely positive vision of the wonderful utopia that existed before the British arrived.
Then you ask “what about the women being burnt alive in their husbands funeral pyres?” pointing out that it’s presented an incredibly biased representation of historical facts and narratives: it will apologise, give (interesting) excuses, and provide a corrective.
Nonetheless, it is an interesting example of bias.
Nobody in this thread has been able to provide me with any examples that produce inaccurate or bissed responses.
The top answer, about Israel (the reason I got banned from Reddit was for calling Israel genocidal in r/Worldnews) seems like a perfectly reasonable response to me: free of falsehood, balanced, rational.
your example about India described a biased response at first, which you then correct. you were only able to get that correction because you had knowledge already, but ppl generally rely on llms for questions about things they’re not experts in, so there unlikely to be able to correct for the bias.
I completely agree.
also https://www.psypost.org/ai-chatbots-fail-medical-misinformation-test-returning-inaccurate-and-fabricated-advice/
Also…reading the details…for GPT they used the antiquated 3.5 model from 2022 instead of the latest ones (I mean…wtf?), they evaluate responses in a vague manner with the ambiguous word “problematic” and some of the criticisms are pedantic and trivial: e.g. “all the chatbots wrote at a “difficult” reading level equivalent to college students, which reduces readability for the general public.”
Boohoo.
I suspect that the current GPT 5.5 model, on thinking mode, would make short work of the questions mentioned in that research article.
Very interesting, thanks.
Although some of those questions are difficult to answer in a way that would be deemed “problematic” by someone I suppose, like “Will women ever beat men in an elite marathon?”
However it seems to give a pretty good answer at the moment:
In a normal elite marathon, the best women are very unlikely to beat the best men outright. The current gap is still large: the men’s marathon world record is listed at 1:59:30 by Sabastian Sawe, while the women’s mixed-race world record is 2:09:56 by Ruth Chepngetich, and the women-only record is 2:15:41 by Tigst Assefa. That puts the best women roughly 8–13% behind the very best men, depending on which women’s category you compare.
So the answer depends on what you mean by “beat men”.
A top woman beating some elite men in the same marathon: yes, absolutely possible. In any given major race, some high-level male runners blow up, get the pacing wrong, have stomach problems, cramp, or simply have a bad day. A woman running 2:10–2:15 can beat many very serious male marathoners.
A woman beating the best male marathoners in the world, in the same race, under normal marathon conditions: probably no, unless something radically changes. The male advantage in oxygen-carrying capacity, haemoglobin, muscle mass, heart size, absolute power output, and running economy remains substantial at the elite marathon distance. Reviews of endurance running generally find that men outperform women by around 7–14% from sprints through the marathon, though the gap can narrow in ultra-endurance events.
The strongest argument for “maybe someday” is that women’s marathon history is younger. Women were excluded from serious distance-running opportunities for a long time, and the Olympic women’s marathon only began in 1984. Better funding, deeper talent pools, improved shoes, pacemaking, nutrition, altitude training, and professional support have helped women’s times fall dramatically. The women’s record has moved from over 2:20 in the late 1990s to around 2:10 now.
But the gap has not been closing towards zero. Men’s performances have also improved, especially with modern super-shoes and optimised pacing. Shoe studies suggest modern racing shoes can improve marathon times by meaningful margins for both men and women, so technology does not obviously eliminate the sex gap.
Where women may beat men more plausibly is beyond the marathon. In very long ultramarathons, especially events lasting many hours or days, the male advantage can shrink and occasionally reverse. Women may have advantages in fatigue resistance, fat metabolism, pacing discipline, and resilience over very long durations, although researchers warn that ultra-distance comparisons are often complicated by smaller female participation and uneven competition depth.
My best estimate: in the elite standard marathon, women will keep getting faster, and a woman may eventually run close to or under 2:05. But beating the best men outright over 26.2 miles looks biologically improbable rather than merely historically delayed. In ultramarathons, the answer is much more open, and in some extreme-distance contexts women already can beat men.
I’m pretty sure LLM are deterministic in design.
The fact it doesn’t give the same output for the same prompt is just a choice of the programmers to add randomness so it feels more natural.
But you can totally setup some LLMs to be perfectly deterministic.
Got any sources to back up that claim?
A good start is this :
https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
While it’s hard to get perfect determinism you can still get very close. But really I think it’s accurate to say that LLM are random because they are configured to be.
Depends on temperature parameter.