The Big LLM Architecture Comparison

It has been seven years since the original GPT architecture was developed. At first glance, looking back at GPT-2 (2019) and forward to DeepSeek-V3 and Llama 4 (2024-2025), one might be surprised at how structurally similar these models still are. Comparing LLMs to determine the key ingredients that contribute to their good (or not-so-good) performance is notoriously challenging: datasets, training techniques, and hyperparameters vary widely and are often not well documented. However, I think that there is still a lot of value in examining the structural changes of the architectures themselves to see what LLM developers are up to in 2025.
 •  0 comments  •  flag
Share on Twitter
Published on July 18, 2025 23:00
No comments have been added yet.


Sebastian Raschka's Blog

Sebastian Raschka
Sebastian Raschka isn't a Goodreads Author (yet), but they do have a blog, so here are some recent posts imported from their feed.
Follow Sebastian Raschka's blog with rss.