Optimization is an important tool used in decision science and for the analysis of physical systems used in engineering. One can trace its roots to the Calculus of Variations and the work of Euler and Lagrange. This natural and reasonable approach to mathematical programming covers numerical methods for finite-dimensional optimization problems. It begins with very simple ideas progressing through more complicated concepts, concentrating on methods for both unconstrained and constrained optimization.
This is a reference book in the optimization field. The proofs may not be the most formal but they are clearly and well written. What I especially like in this book is the buildup for every concept and proof. It always starts with the general statement of a problem then a few examples that re-iterate some ideas used to solve each of the cases. Then it moves onto proving the solution for the general problem at which point you already have a sense of where this is going. I've rarely seen chapters put together so cleanly. For example, for the proof of KKT (a fundamental theorem) the authors take you through LICQ, the relationship between the tangent cone at a feasible point and the set of linearized feasible directions, the Farcas Lemma then the KKT proof. I appreciate they prove something that's so fundamental. The book has an appendix with some of the necessary math that's outside of this book's scope, something I found useful as it makes it self contained.
Dense, timeless and comprehensive. The density and mathematical nature of the material means every page takes much longer to consume than you expect - a slow read, but fully capable of taking a post-calculus computer science student to being a fully competent optimizer.
Best introduction and survey-reference for nonlinear programming. I like it substantially better than Bertsekas or Boyd/Vandenburghe, though it covers less theory than either.
Amazon currently has this book for around $60. At this price, I consider it an easy buy because of the wealth of information. Particularly good are chapters 12 and 5, which are about the theory of constrained optimization and conjugate gradient descent respectively. I've spent time on and off trying to understand the Karush-Kuhn-Tucker conditions (which is essentially the fundamental theorem of constrained optimization) without much success. After reading Chapter 12, I wondered why I struggled to understand it at all. The writing is so clear and rigorous, and NW provides so many examples, that's it's almost impossible not to understand and appreciate the KKT conditions. As for Chapter 5, I never really studied the conjugate gradient method before, but I had no trouble understanding at least the basics after finishing the exercises from the chapter. Given that one of the most read papers on the method is titled "An Introduction to the Conjugate Gradient Method without the Agonizing Pain", and I had none implies that NW did a damn fine job with that chapter.
The book, however, is not without its faults. Some of the chapters, especially the ones on Interior Programming for Nonlinear Programming and the one on Derivative-Free Optimization, seem like throwaways. Also, I found a number of errata not yet reflected on the most recent errata sheet from Nocedal's website (although I may send a mail about the problems if I get around to it). Most of these were simple typos, but at least one was on a problem that made its solution impossible, and another was on a fairly important theorem in the appendix.
My biggest disappointment with the book, though, was that most of the exercises were not especially difficult or illuminating. I've attempted 90-95% of the theoretical problems and, thinking back, I can only think of two really great ones: one in which you have to prove the Kantorovich Inequality (which, although not posed as a probability problem, you can elegantly solve using expectations), and one - denoted hard in the book - on a minimax lower bound of a derivative-free method. I can't vouch for the programming problems, because my computer has all sorts of issues right now, but once my new one comes in, I'll probably take a look at those problems and provide an update.
These caveats aside, I highly recommend this book. I think the standard reference for this field is Bertsekas' "Nonlinear programming", but if "Numerical Optimization" is not already considered to be a must-have, it should be.
An excellent text on the theory and algorithms of mathematical optimization, naturally focussing on convex problems. Its treatment is a bit more formal than some other texts I've seen (e.g., they delve into the depth of the analytic conditions necessary to really formally establish the KKT conditions), but at least they give some nice, visual examples of different conditions.
Not what I would recommend for "intuition building" in this space, but a good book for really getting to the whys behind a lot of the machinery of optimization.
We used this in a course, doing problems and as practicals matlab implementations. I read chapters 1,2,3,4,6,10,11,12,16. (IIRC)
The book often requires cross-referencing, but the core intuitions behind how the algorithms work and the proofs of why are good. Examples are generally provided. (While sometimes a bit confusing, there are enough details to use as pseudocode.) Chapter 10 on the Levenberg–Marquardt algorithm was perhaps a bit rushed.