Apache Ignite vs Apache Spark

Complimentary to my earlier post on Apache Ignite in-memory file-system and caching capabilities I would like to cover the main differentiation points of the Ignite and Spark. I see questions like this coming up repeatedly. It is easier to have them answered, so you don’t need to fish around the Net for the answers.

 – The main different is, of course, that Ignite is an in-memory computing system, e.g. the one that treats RAM as the primary storage facility. Whereas others – Spark included – only use RAM for processing. The former, memory-first approach, is faster because the system can do better indexing, reduce the fetch time, avoid (de)serializations, etc.

 – Ignite’s mapreduce is fully compatible with Hadoop MR APIs which let everyone to simply reuse existing legacy MR code, yet run it with >30x performance improvement. Check this short video demoing an Apache Bigtop in-memory stack, speeding up a legacy MapReduce code

 – Also, unlike Spark’s the streaming in Ignite isn’t quantified by the size of RDD. In other words, you don’t need to form an RDD first before processing it; you can actually do the real streaming. Which means there’s no delays in a stream content processing in case of Ignite

 – Spill-overs are a common issue for in-memory computing systems: after all memory is limited. In Spark where RDDs are immutable, if an RDD got created with its size > 1/2 node’s RAM then a transformation and generation of the consequent RDD’ will likely to fill all the node’s memory. Which will cause the spill-over. Unless the new RDD is created on a different node. Tachyon was essentially an attempt to address it, using old RAMdrive tech. with all its limitations.
Ignite doesn’t have this issue with data spill-overs as its caches can be updated in atomic or transactional manner. However, spill-overs are still possible: the strategies to deal with it are explained here

 – as one of its components Ignite provides the first-class citizen file-system caching layer. Note, I have already addressed the differences between that and Ignite, but for some reason my post got deleted from their user list. I wonder why? 😉

 – Ignite’s uses off-heap memory to avoid GC pauses, etc. and does it highly efficiently

 – Ignite guarantees strong consistency

 – Ignite supports full SQL99 as one of the ways to process the data w/ full support for ACID transactions

– Ignite supports in-memory SQL indexes functionality, which lets to avoid full-scans of data sets, directly leading to very significant performance improvements (also see the first paragraph)

 – with Ignite a Java programmer shouldn’t learn new ropes of Scala. The programming model also encourages the use of Groovy. And I will withhold my professional opinion about the latter in order to keep this post focused and civilized 😉

I can keep on rumbling for a long time, but you might consider reading this and that, where Nikita Ivanov – one of the founders of this project – has a good reflection on other key differences. Also, if you like what you read – consider joining Apache Ignite (incubating) community and start contributing!

Author: DrCos

Dao-Clinicist, Groovy mon, Sprechstallmeister / Concerns separator / 道可道 非常道 / Disclaimer: all posts are my personal opinion and aren't of my affiliations

2 thoughts on “Apache Ignite vs Apache Spark”

  1. Indeed the in-memory computing solution that Ignite offers seems unique through the combination of off-heap memory, guaranteed consistency and SQL99 access among other features.

    I am interested to implement a solution for R's annoying issue of expecting all data to be loaded in memory first.
    See Ryan Rosario's http://www.slideshare.net/bytemining/r-hpc, slide 2 for a glimpse.

    But lost of R packages use C++ for memory management.
    I see GridGain has portable objects (http://www.gridgain.com/content_tooltip/portable-objects-java-net-c/) but wondering what would be the performance tradeoffs compared to a native C++ solution.

    Please provide any references to better learn about these aspects.

Leave a comment