If you need to learn CUDA but don't have experience with parallel computing, CUDA A Developer's Introduction offers a detailed guide to CUDA with a grounding in parallel fundamentals. It starts by introducing CUDA and bringing you up to speed on GPU parallelism and hardware, then delving into CUDA installation. Chapters on core concepts including threads, blocks, grids, and memory focus on both parallel and CUDA-specific issues. Later, the book demonstrates CUDA in practice for optimizing applications, adjusting to new hardware, and solving common problems.
A guide for developing and optimizing programs for NVidia GPUs.
This book goes into great depth about how GPUs work, how to write efficient GPU-accelerated software, and a wealth of tips and tricks for getting the job done. It's pretty heavy on theory, explanations, and low-level details, but also has plenty of code examples to draw from. There are whole chapters that go into depth on key concepts and technologies, but also a step-by-step optimization guide in chapter 9 that pulls all those ideas together in a practical context.
This book is over 550 pages long, but it doesn't need to be. Many ideas are explained repeatedly. This is helpful if you're having a hard time understanding, or want to jump around a lot, but can otherwise be a little tedious. There are also a lot of numbers, data tables, charts, and screenshots that don't add a lot of value, in my opinion. Often these data compare performance of some code across all of the author's test devices, but don't say a lot about the performance characteristics code itself or what you might expect for your project / hardware. This is particularly frustrating because this book was written in 2012, and covers outdated hardware extensively. For instance, it talks a lot about the differences between CC1 and CC2 devices, when the latest GPUs support CC9.
Despite these shortcomings, I still highly recommend this book. It's loaded with useful information. The author explains complex topics in very precise detail, with helpful metaphors and motivating examples. They have decades of parallel programming, and have filled this book with all sorts of useful insights, tips, and tricks for every circumstance. There's probably more in this book than you'll ever need, but on the flip side, it's got everything. It's worth at least touching on all these ideas so that if / when they come up later, you might remember them and pull out this book to get a refresher on all the details.
It's interesting to compare this book to Programming in Parallel with CUDA: A Practical Guide. That book starts with the assumption that CUDA is easy, encouraging the reader to just start coding, pointing out gotchas and important performance optimizations along the way. In contrast, this book assumes that CUDA is hard. Unless you learn the low-level details, think long and hard about design, and master all the tricks, you'll write bad code and waste much of what the GPU has to offer. Both of these perspectives are true, in a sense, and which is best for you depends on your needs. If you're working on a project and you want to try out making it faster by using GPUs, read Programming in Parallel with CUDA. If you want to become a CUDA expert and squeeze every ounce of power out of your hardware, this book is the better choice.
Anyone reading this book should probably do it with a grain of salt, it has been published in 2012 and some of the implementation details discussed have since become obsolete. Regardless, this is a fantastic book that delves into the paradigm shift a programmer must go through in order to effectively program parallel algorithms instead of serial ones. I highly recommend it.