The Big LLM Architecture Comparison

LLM Research Papers: The 2025 List (J... From GPT-2 to gpt-oss: Analyzing the ...

The Big LLM Architecture Comparison

It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek-V3 and Llama 4 (2024-2025), one might be surprised at how structurally similar these models still are. Comparing LLMs to determine the key ingredients that contribute to their good (or not-so-good) performance is notoriously challenging: datasets, training techniques, and hyperparameters vary widely and are often not well documented. However, I think that there is still a lot of value in examining the structural changes of the architectures themselves to see what LLM developers are up to in 2025.

View more on Sebastian Raschka's website »

Like • 0 comments • flag

Published on July 18, 2025 23:00

No comments have been added yet.

Sebastian Raschka's Blog

Sebastian Raschka's profile
153 followers

Sebastian Raschka isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.

Follow Sebastian Raschka's blog with rss.

delete edit this post