Jump to ratings and reviews
Rate this book

Modern Computer Vision with PyTorch: A practical roadmap from deep learning fundamentals to advanced applications and Generative AI

Rate this book
The definitive book on computer vision is back and updated with the latest machine learning architecture, including 70+ pages on diffusion models Purchase of the print or Kindle book includes a free eBook in PDF format. The second edition of Modern Computer Vision with PyTorch is fully updated on top of the comprehensive coverage in the first edition to explain and provide practical examples of the latest multimodal models, CLIP and Stable Diffusion. Whether you’re a beginner or are looking to progress in your computer vision career, this book guides you through the fundamentals of neural networks (NNs) and PyTorch and shows you how to implement state-of-the-art architectures for real-world examples. You’ll discover the best practices for working with images, tweaking hyperparameters, and moving models into production. As you progress, you'll implement multiple use cases of 2D and 3D multi-object detection, segmentation, and human pose detection by learning about the R-CNN family, SSD, YOLO, U-Net architectures, and the Detectron2 platform. You’ll enter the world of generative AI, with facial generation and manipulation, and discover the impressive capabilities of diffusion models with image creation and in- and out-painting. Finally, you'll move your NN model to production on the AWS Cloud. By the end, you'll be able to leverage modern NN architectures to solve over 30 real-world CV problems confidently. This book is for beginners to PyTorch and intermediate-level machine learning practitioners who want to master computer vision techniques using deep learning and PyTorch. It's especially useful for those who are just getting started with neural networks, as it will enable you to learn from real-world use cases accompanied by notebooks in GitHub. Basic knowledge of the Python programming language and machine learning is all you need to get started with this book. For more experienced computer vision scientists, this book takes you through more advanced models from chapter 8 onward. (N.B. Additional chapters to be confirmed upon publication)

746 pages, Paperback

Published June 10, 2024

3 people are currently reading
6 people want to read

About the author

Ratings & Reviews

What do you think?
Rate this book

Friends & Following

Create a free account to discover what your friends think of this book!

Community Reviews

5 stars
2 (40%)
4 stars
1 (20%)
3 stars
1 (20%)
2 stars
1 (20%)
1 star
0 (0%)
Displaying 1 of 1 review
2 reviews
March 24, 2025
Great progression starting from the fundamentals of neural networks such as activation functions, loss optimization, and gradient descent. Then the fundamentals of convolutional neural networks are described before applying them in practice with object detection, instance segmentation, etc. This is then extended to video, autoencoders and GANs. The book finishes with reinforcement learning, transformers (combining computer vision and NLP models), and deploying models to production with AWS and Docker.

All algorithms are accompanied by PyTorch code (accounting for probably half of the total pages), and extremely helpful.
Displaying 1 of 1 review

Can't find what you're looking for?

Get help and learn more about the design.