New Research Shows ChatGPT Can Contradict Itself Under Pressure

ChatGPT might sound confident when it answers questions, but a recent study suggests that confidence doesn’t always match accuracy.

Getty Images

Researchers from Washington State University (via SciTechDaily) found that when the same question was asked multiple times, the answers didn’t always stay the same. In some cases, they flipped back and forth between true and false, even though nothing about the question had changed.

The study, led by Washington State University, looked at how well the AI could judge whether scientific claims were supported by real research. The results were a bit mixed. While the system often sounded convincing, it didn’t always get things right, and more importantly, it wasn’t always consistent.

The same question didn’t always get the same answer.

Getty Images

Researchers tested hundreds of scientific claims and asked the exact same question ten times in a row. You’d expect a stable answer, but that wasn’t always what happened. In some cases, the AI would say a statement was true, then false, then true again, even though nothing had changed. That inconsistency stood out as one of the biggest issues. If the answer depends on when or how often you ask, it makes it harder to rely on, especially in situations where accuracy really matters.

The accuracy sounds decent at first, but drops on closer look.

Unsplash/Getty

At first glance, the results seemed fairly strong. The AI answered correctly around 76.5% of the time in 2024, rising to about 80% in a follow-up test in 2025. That might sound reassuring on the surface, but once researchers adjusted for random guessing, the picture changed. The system’s performance dropped to something much closer to average, roughly around a low D grade. In other words, it was better than chance, but not by as much as it first appeared.

It struggled most with spotting false information.

Getty Images

One of the most telling findings was how poorly the AI handled false claims. It only correctly identified false statements a small percentage of the time, which highlights a clear weakness. This is worth noting because recognising what isn’t true is often just as important as recognising what is. If a system struggles with that, it can end up sounding confident while still getting things wrong.

=Consistency was only moderate, even with identical prompts.

Unsplash/Lala Azizli

When the same question was repeated, the AI gave consistent answers in about 73% of cases. That means in more than a quarter of situations, the answer changed. For everyday use, that might not seem like a huge issue. But for anything involving research, business decisions, or important information, that level of variation becomes harder to ignore.

It can sound convincing without actually understanding.

Getty Images/iStockphoto

One of the key takeaways from the study is that the AI’s strength lies in how it communicates, not necessarily in how it reasons. It can produce answers that sound clear and confident, even when the underlying conclusion isn’t solid. Needless to say, that creates a gap between how trustworthy something feels and how accurate it actually is. It’s easy to assume something is correct because it’s well-written, but that isn’t always the case.

The problem isn’t just one version of AI.

Getty Images/iStockphoto

The researchers tested different versions of the system across two years and found similar patterns in both. That suggests the issue isn’t tied to one specific model, but is more about how this type of AI works in general. Other studies looking at similar tools have found comparable results, which reinforces the idea that this is a broader limitation rather than a one-off problem.

It’s better at repeating patterns than reasoning things through.

Getty Images

The study suggests that AI systems like this rely heavily on patterns from data rather than genuine understanding. They can recognise familiar structures and generate responses based on them, but that doesn’t always translate into deeper reasoning. That’s why answers can sometimes change. The system isn’t weighing evidence in the same way a person would, it’s generating what seems like the most fitting response based on patterns it has seen before.

Researchers say it’s still useful, but needs to be used carefully.

Getty Images

The takeaway isn’t that AI tools are useless, far from it. They can still be helpful for generating ideas, summarising information, or speeding up tasks. The issue is relying on them without checking the output. Experts suggest treating AI as a starting point rather than a final answer. That means double-checking important claims and not assuming accuracy just because something sounds confident.

The gap between fluency and accuracy is easy to miss.

Getty Images

One of the biggest risks is how natural the responses feel. When something reads smoothly and makes sense on the surface, it’s easy to trust it without questioning it further. This study highlights how important it is to pause and verify, especially when dealing with complex or factual information. A confident tone doesn’t guarantee a correct answer.

It’s a reminder that AI still has limits.

Getty Images/iStockphoto

There’s been a lot of hype around AI tools becoming smarter and more capable, but this research shows there are still clear gaps. They can assist with tasks and provide useful insights, but they don’t fully understand the information they’re working with. For now, the safest approach is to use AI as a helpful tool rather than a final authority. It can support your thinking, but it shouldn’t replace it, especially when accuracy really matters.