Software Engineering discussion

10 views

Beautiful Code > Distributed Programming with MapReduce

Comments Showing 1-2 of 2 (2 new) post a comment »

date

newest »

message 1: by [deleted user] (new)

Aug 14, 2010 05:20PM

True beauty! This is one of my top two favorite chapters so far (Beautiful Concurrency is the other one).

There are many basic building blocks in an information retrieval application that need to scale and perform. Prior to Google's MapReduce framework, getting these to function and take advantage of parallelism and do so in a fault tolerant manner and perform well was something done separately for each building block. The beauty of MapReduce, created in 2004, is that it unifies many previously disparate algorithms under one umbrella with a simple programming model.

My doctorate is in the area of information retrieval, and I teach a class in it now. In my mind, MapReduce is a little like discovering a unifying theory in physics... everything after the discover just sort of snaps nicely in place and it is hard to appreciate what life was like before the unifying discovery. This chapter motivated me to include much more material about MapReduce in my course.

I have two criticisms: The chapter is basically a subset of the referenced Google Labs paper from 2004... a LONG time ago in information retrieval time. Second, it should have included a reference to the open source project Hadoop, which has support for MapReduce.

reply | flag

message 2: by Erik (new)

Aug 19, 2010 03:31PM

Yes, this was a great chapter. It felt like a copy/paste from a white paper rather than specifically written for this book. Sounds like you recognized it as a Google Labs paper.

The full C++ code at the end of the chapter is great. I'd love to try that out and give it some iterative changes. I doubt I'll actually get around to doing that, but it could have some great application at my work.

reply | flag