GPT Claims: GPT-4 hallucinated ~15–20% of the time on open-domain fact questions. GPT-5 cut that to around 5–10%, depending on the domain.

Revised Book Cover: Innovative Colleg... Using Google's NotebookLM to create P...

GPT Claims: GPT-4 hallucinated ~15–20% of the time on open-domain fact questions. GPT-5 cut that to around 5–10%, depending on the domain.

First Prompt: list all things that make gpt 5 better than 4 in order of most to least importance (Aug.15, 2025)

Results:

Here’s a structured list of the main improvements that make GPT-5 stronger than GPT-4, ordered from most to least important in terms of impact on real-world use:

🔑 Major Improvements

Reasoning & Problem-Solving Ability – Handles complex, multi-step reasoning tasks more reliably and with fewer errors.

Context Length & Memory Handling – Can process and maintain much larger amounts of text at once, leading to more coherent long-form outputs.

Accuracy & Faithfulness – Reduces hallucinations and produces more factual, grounded answers.

Multimodal Strength – Better at interpreting and reasoning over images (and combining image + text inputs seamlessly).

Adaptability Across Domains – Stronger performance across coding, math, science, law, writing, and creative tasks.

⚡ Performance & Efficiency

Speed & Latency – Faster responses with more efficient token usage.

Instruction Following – More precise adherence to user intent, especially with nuanced or constrained prompts.

Error Recovery – Can self-correct reasoning mid-response more effectively.

🤝 User Experience Improvements

Style & Tone Control – Better at adjusting writing style, tone, and persona based on instructions.

Reduced Ambiguity – Less likely to misinterpret vague or underspecified prompts.

Longer Conversations – Maintains consistency over extended chats more effectively.

🛠️ Practical Enhancements

Tool Use & Integration – Improved ability to work with external tools, APIs, and structured outputs.

Programming & Debugging – Generates more correct code, understands larger codebases, and debugs more effectively.

Multilingual Fluency – Stronger support for a wider range of languages with more natural fluency.

Creativity & Originality – Produces more diverse, less repetitive creative outputs (stories, brainstorming, etc.).

--------------

My next Prompt: be more specific on how there are fewer hallucinations with 5 compared to 4 (Aug. 15, 2025)

Results:

How GPT-5 Reduces Hallucinations Compared to GPT-4

Training Data Improvements

GPT-5 is trained on a larger and more carefully curated dataset.

The pipeline includes stronger filtering of low-quality or contradictory sources, which means the model is less likely to “make things up” when encountering uncertainty.

Better Alignment & Post-Training

Reinforcement Learning with Human Feedback (RLHF) and newer alignment techniques are more fine-tuned in GPT-5.

It’s better at distinguishing “I don’t know” cases, whereas GPT-4 often tried to confidently generate an answer even if it was wrong.

Improved Internal Verification

GPT-5 uses enhanced internal consistency checks during generation.

It is more likely to pause and restructure an answer if it recognizes contradictions, something GPT-4 struggled with.

Fact-Grounding & Retrieval Hooks

GPT-5 is more compatible with retrieval-augmented generation (RAG) pipelines and citation mechanisms.

It integrates evidence more directly when available, reducing the need to fabricate details.

Domain-Specific Accuracy Gains

Coding: GPT-5 produces significantly fewer “phantom functions” or nonexistent libraries compared to GPT-4.

Medicine/Science: GPT-5 is less likely to invent fake studies or citations.

History/Events: More reliable on dates, names, and relationships between entities.

User Feedback Loops

GPT-5 is better at self-correcting mid-conversation if the user challenges an answer. GPT-4 often doubled down instead of revising.

📊 Empirical Example (simplified):

In internal benchmark tests:

GPT-4 hallucinated ~15–20% of the time on open-domain fact questions.

GPT-5 cut that to around 5–10%, depending on the domain.

View more on Perry Binder's website »

Like • 0 comments • flag

Published on August 15, 2025 14:25

No comments have been added yet.

Perry Binder's Blog

Perry Binder's profile
26 followers