Consensus? Honestly I don't know. I've never seen map-reduce called a
"pattern" though so I'm being conservative in my terminology. *I* don't
know that it's a pattern, so I don't want to suggest that it is.
You cannot read an article or book on concurrency, or yourself start
thinking about concurrency, without confronting the idea that many
algorithms lend themselves to data parallelism. With some low-level
implementation details like key-value pairs, that's all that map-reduce is.
Is a parallel programming approach as fundamental as "split the data up
into chunks, operate independently on each chunk to get an intermediate
result, then combine the results" worthy of being called a design
pattern? I sure hope not.
Unfortunately it has already happened. I can find at least one reference
book from Microsoft Press on Parallel Programming, to wit, "Design
Patterns for Decomposition and Coordination on Multicore Architectures",
where "data aggregation with map-reduce" is called a design pattern. One
of the authors is Ralph Johnson of GOF fame, which doesn't help (in fact
he's been working for some years on shoehorning design patterns into the
parallel programming space).
I don't think that calling everything that occurs frequently in
programing a "pattern" is useful terminology. For the term "pattern" to
have meaning besides "thing" it has to have some sort of restricted or
qualified domain. And I haven't see a definition of pattern, besides
what GOF put in their book. So for now, I think it's not, until there's
some sort of consensus about what to call these things.
The "restricted" or "qualified" domain for design patterns (and you're
quite right, a design pattern should be a reuseable solution to a common
problem within a given context) was originally behavioral, structural
and creational problems in the object-oriented space. Now they've gone
apeshit and added concurrency and security and business models and user
interfaces and...well, everything.
These days they _do_ call everything that "occurs frequently in
programming" a pattern of one sort or another. So - IMHO - the term
stopped being useful years ago. It used to mean a way of solving a
design problem in OO, drawn from a manageable set of some dozens of
solutions. Now you've got people telling you that data parallelization
or object/thread pools are patterns.
And I do think that map-reduce deserves some sort of special
nomenclature, but whether "pattern" is the best word I don't know.
I'm cool with calling it (map-reduce) "an unavoidably obvious way for
operating on data in chunks and combining the results". Human computers
probably did it a century ago or more. Hell, people solving various
problems with paper and pencil for hundreds of years have been doing
data parallelization.
I read one blog by a guy
(
http://karticks.wordpress.com/2009/07/29/the-mapreduce-design-pattern-demystified/)
who has this to say:
"Over the last 5-6 years there has been quite a lot of buzz about the
Map/Reduce design pattern (yes, it is a design pattern)..."
and
"The key pattern in the above example is that a huge task is broken down
into smaller tasks, and each small task after it has finished produces
an intermediate result, and these intermediate results are combined to
produce the final result. That is the core of the Map/Reduce design
pattern."
God help us. _That_ is a design pattern? And Kartick (the author) has
summarized it well, incidentally, so there is no confusion here.
You may as well calling filtering lists a design pattern.
HINT: when someone protests that something is a design pattern, even
they know (maybe subconsciously) that there is a smell.
What's the difference between a pattern, an algorithm, and a system? I
don't rightly know. Thinking about now just off the top of my head, the
GOF design patterns were all fairly small in their scope, and map-reduce
is a rather large system in its scope and use.
Some specific implementations of map-reduce, perhaps the MapReduce
specialization of the general concept, maybe even using the specific
Hadoop implementation, _are_ large-scale. But I could do a useful pen &
pencil map-reduce operation example on a Java Map that would take a few
minutes and be very small-scale. Or maybe I'd just sum up the values in
an array by summing up parts of the array in turn, and summing the
intermediate results...
Anyway, for now map-reduce is special enough to have its own name:
map-reduce.
And let's not call it MapReduce.
AHS