BigData platform space is getting hotter

Skimming through my emails today I have came across this interesting post on general@hadoop list:

From MTG dev
Subject Lightning fast in-memory analytics on HDFS
Date Mon, 24 Sep 2012 16:31:56 GMT
Because a lot of people here are using HDFS day in and day out the
following might be quite interesting for some.

Magna Tempus Group has just rolled out a readily available Spark 0.5
(www.spark-project.org) packaged for Ubuntu distribution. Spark delivers up
to 20x faster experience (sic!) using in-memory analytics and a computational
model that is different from MapReduce.

You can read the rest here. If you don’t know about Spark then you sure should check the Spark project website and see how cool is that. If you are lazy to dig through the information, here’s a brief summary for you (taken from the original poster’s Magna Tempus Group website)

  • consists of a completely separate codebase optimized for low latency, although it can load data from any Hadoop input source, S3, etc.
  • doesn’t have to use Hadoop, actually
  • provides a new, highly efficient computational model, with programming interfaces in Scala, Java. We might start working soon on adding Groovy API to the set
  • offers a lazy evaluation that allows a “postponed” execution of operations
  • can do in-memory caching of data for later high-performance analytics. Yeah, go shopping for more RAM, gents!
  • can be run locally on a multicore system or on a Mesos cluster

Yawn, some might say. There are Apache Drill and other things that seems to be highly promising and all. Well, not so fast.

To begin with, I am not aware about any productized version of Drill (merged with Open Dremel or vice versa). Perhaps, there are some other technologies around that are 20x faster than Hadoop – I just haven’t heard about them, so please feel free to correct me on this.

Also, Spark and some of its components (Mesos resource planner and such) have been happily adopted by interesting companies such as Twitter and so on.

What is not said out right is that an adoption of new in-memory high-performance analytics for big data by commercial vendors like Magna Tempus Group opens a completely new page in the BigData storybook.

I would “dare” to go as far as to assert that this new development means that Hadoop isn’t the smartest kid on the block anymore – there are other faster and perhaps clever fellas moving in.

And I can’t help but wonder if the Spark has lit a fire under the yellow elephant yet?

Advertisements

Author: DrCos

Dao-Clinicist, Groovy mon, Sprechstallmeister / Concerns separator / 道可道 非常道 / Disclaimer: all posts are my personal opinion and aren't of my affiliations

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s