Skimming through my emails today I have came across this interesting post on general@hadoop list:
From MTG dev |
Subject |
Lightning fast in-memory analytics on HDFS |
Date |
Mon, 24 Sep 2012 16:31:56 GMT |
Because a lot of people here are using HDFS day in and day out the following might be quite interesting for some.
Magna Tempus Group has just rolled out a readily available Spark 0.5 (www.spark-project.org) packaged for Ubuntu distribution. Spark delivers up to 20x faster experience (sic!) using in-memory analytics and a computational model that is different from MapReduce.
|
You can read the rest here. If you don’t know about Spark then you sure should check the Spark project website and see how cool is that. If you are lazy to dig through the information, here’s a brief summary for you (taken from the original poster’s Magna Tempus Group website)
- consists of a completely separate codebase optimized for low latency, although it can load data from any Hadoop input source, S3, etc.
- doesn’t have to use Hadoop, actually
- provides a new, highly efficient computational model, with programming interfaces in Scala, Java. We might start working soon on adding Groovy API to the set
- offers a lazy evaluation that allows a “postponed” execution of operations
- can do in-memory caching of data for later high-performance analytics. Yeah, go shopping for more RAM, gents!
- can be run locally on a multicore system or on a Mesos cluster
Yawn, some might say. There are Apache Drill and other things that seems to be highly promising and all. Well, not so fast.
To begin with, I am not aware about any productized version of Drill (merged with Open Dremel or vice versa). Perhaps, there are some other technologies around that are 20x faster than Hadoop – I just haven’t heard about them, so please feel free to correct me on this.
Also, Spark and some of its components (Mesos resource planner and such) have been happily adopted by interesting companies such as Twitter and so on.
What is not said out right is that an adoption of new in-memory high-performance analytics for big data by commercial vendors like Magna Tempus Group opens a completely new page in the BigData storybook.
I would “dare” to go as far as to assert that this new development means that Hadoop isn’t the smartest kid on the block anymore – there are other faster and perhaps clever fellas moving in.
And I can’t help but wonder if the Spark has lit a fire under the yellow elephant yet?