I have came across this post from Platfora which, among other trivialities, says:
Hadoop is irresistible for this reason, but the big question that remains is how to use the data there once you’ve stored it. The challenge is that Hadoop is a very different architecture to traditional data warehouses. It is a batch engine — a lumbering freight train that can process immense amounts of data, but takes a while to get up to speed, so even the simplest question requires minutes of processing.
How lyrical! And then we got a glimpse of The Promised Land laying ahead:
Here at Platfora we are laser focused on this next phase of Hadoop. The result won’t just match the status quo, but exceed it in flexibility and the ability to scale and adapt to changing requirements. Exciting times are ahead – stay tuned.
No, wait – not an exactly promised land: just a promise of one. I wonder if this an attempt to damage control of yesterday’s announcement about a vendor’s support for Spark platform, that I was discussing in my last post? 🙂
Skimming through my emails today I have came across this interesting post on general@hadoop list:
|From MTG dev
||Lightning fast in-memory analytics on HDFS
||Mon, 24 Sep 2012 16:31:56 GMT
Because a lot of people here are using HDFS day in and day out the
following might be quite interesting for some.
Magna Tempus Group has just rolled out a readily available Spark 0.5
(www.spark-project.org) packaged for Ubuntu distribution. Spark delivers up
to 20x faster experience (sic!) using in-memory analytics and a computational
model that is different from MapReduce.
You can read the rest here. If you don’t know about Spark then you sure should check the Spark project website and see how cool is that. If you are lazy to dig through the information, here’s a brief summary for you (taken from the original poster’s Magna Tempus Group website)
- consists of a completely separate codebase optimized for low latency, although it can load data from any Hadoop input source, S3, etc.
- doesn’t have to use Hadoop, actually
- provides a new, highly efficient computational model, with programming interfaces in Scala, Java. We might start working soon on adding Groovy API to the set
- offers a lazy evaluation that allows a “postponed” execution of operations
- can do in-memory caching of data for later high-performance analytics. Yeah, go shopping for more RAM, gents!
- can be run locally on a multicore system or on a Mesos cluster
Yawn, some might say. There are Apache Drill and other things that seems to be highly promising and all. Well, not so fast.
To begin with, I am not aware about any productized version of Drill (merged with Open Dremel or vice versa). Perhaps, there are some other technologies around that are 20x faster than Hadoop – I just haven’t heard about them, so please feel free to correct me on this.
Also, Spark and some of its components (Mesos resource planner and such) have been happily adopted by interesting companies such as Twitter and so on.
What is not said out right is that an adoption of new in-memory high-performance analytics for big data by commercial vendors like Magna Tempus Group opens a completely new page in the BigData storybook.
I would “dare” to go as far as to assert that this new development means that Hadoop isn’t the smartest kid on the block anymore – there are other faster and perhaps clever fellas moving in.
And I can’t help but wonder if the Spark has lit a fire under the yellow elephant yet?
I have been giving this talk about Apache BigTop project and how it changes the landscape and competition for Hadoop distribution vendors, ISPs and ASVs.
The slides are available from here and I will update this post once the video is published by good folks from Yahoo!
I just came across this article in Forbes full of trivialities about Hadoop platform. And the I came across this picture of a baby elephant who perhaps had read the same article over its morning fest
Poor baby – it is allergic to the bullshit, apparently.
A very insightful article has been just posted by my good friend Rvs on Apache BigTop’s official blog.
And I think you just should go and read it if you care to understand why stacks are so much important and how BigTop helps to ease the life for people who are truing to write something more complex than Tic-Tac-Toe game for a smartphone.
As my former colleague John Kreisa nicely put in the HortonWorks 1.0 release announcement here (my warmest regards and best wishes to you guys!):
Those who have followed Hortonworks since our initial launch already know that we are absolutely committed to open source and the Apache Software Foundation. You will be glad to know that our commitment remains the same today. We don’t hold anything back. No proprietary code is being developed at Hortonworks.
And indeed. I have asked this questions about HortonWorks using BigTop to power up their platform offering some time ago and later pretty much repeated it in the form of comment to Shaun Connolly blog. To his credit, my question has been answered directly:
As far as BigTop goes, we at Hortonworks are using parts of BigTop for the HDP platform builds, so thanks for the efforts there!
I have meet the gentleman in person at the recent Hadoop Summit and we have a short yet nice chat about enterprise stacks and the role an open-source technology plays there.
So, it is time to put my initial question to rest as the fully answered one.
P.S. On a separate note: I have left a slightly different comment on Cloudera’ blog. Somehow, the comment doesn’t appear to be visible (at least I don’t see anything but “2 comments” line) nor had it been answered publicly (again, perhaps, it has been but I don’t see in on the page). In the Cloudera’s defense I have to say that I got an answering email from one of their execs, which I can’t publish for it was a private message.
I think this is awesome, really 😉 Warms my heart and all that!