A very nice book on parallel algorithms for modern shared memory machines. I already had a couple of books on related topics, but they either cover the specifics of some threading library but don't say anything about how you'd go about parallelizing a serial algorithm, or assume that you want to program a distributed memory system with a message passing API. So I found The Art of Concurrency quite useful.
The first few chapters cover topics such as theoretical models for parallel machines, how to reason about algorithm correctness, and the basics of some APIs (OpenMP, TBB, pthreads and Windows threads). Later chapters cover building blocks such as parallel sum and prefix scan, then map/reduce, and sorting and graph algorithms.
The writing is informal, but the book requires some serious commitment (I found it useful to implement some of the algorithms with the C++11 threading facilities). It's not a cookbook either: for instance, there are a few pages on parallelizing bubble sort and showing that the implementation is correct. No one in their right mind would use a parallel bubble sort in production code, but I found the discussion enlightening.
It does not cover topics such as SIMD or CUDA/OpenCL - thankfully, as there's more than enough to take in already.
Unfortunately, the sample code is a bit sloppy (allocating with new[] and deallocating with free? Also, declaring variables where they are used would have saved some space on the page...). And some of the algorithms only made sense after I looked them up on Wikipedia.
Still: worth reading.