Software Engineering discussion

Beautiful Code > Distributed Programming with MapReduce

Comments Showing 1-2 of 2 (2 new)    post a comment »
dateDown arrow    newest »

message 1: by Brad (new)

Brad (bradrubin) | 264 comments Mod
True beauty! This is one of my top two favorite chapters so far (Beautiful Concurrency is the other one).

There are many basic building blocks in an information retrieval application that need to scale and perform. Prior to Google's MapReduce framework, getting these to function and take advantage of parallelism and do so in a fault tolerant manner and perform well was something done separately for each building block. The beauty of MapReduce, created in 2004, is that it unifies many previously disparate algorithms under one umbrella with a simple programming model.

My doctorate is in the area of information retrieval, and I teach a class in it now. In my mind, MapReduce is a little like discovering a unifying theory in physics... everything after the discover just sort of snaps nicely in place and it is hard to appreciate what life was like before the unifying discovery. This chapter motivated me to include much more material about MapReduce in my course.

I have two criticisms: The chapter is basically a subset of the referenced Google Labs paper from 2004... a LONG time ago in information retrieval time. Second, it should have included a reference to the open source project Hadoop, which has support for MapReduce.

message 2: by Erik (new)

Erik | 165 comments Yes, this was a great chapter. It felt like a copy/paste from a white paper rather than specifically written for this book. Sounds like you recognized it as a Google Labs paper.

The full C++ code at the end of the chapter is great. I'd love to try that out and give it some iterative changes. I doubt I'll actually get around to doing that, but it could have some great application at my work.

back to top