Distributed Thinking: A gentle introduction to distributed processing using Apache Storm and Apache Spark - Part 0

0. Introduction

This post is meant to serve as a starting point for people using Java or Scala to process large amounts of data, and need a quick introduction to how to do it - either in Spark or in Storm.

It is not meant to be a Spark vs Storm debate, there are plenty of those out there. A quick Google search yields several StackOverflow questions and technical blogs talking endlessly about it.

This is meant to be a starting point for people new to the whole concept of distributed processing of data, and need a headstart. It’s 2015, and my blog post is probably 5 years too late, but it’s never too late to get started!

What I plan to talk about:

  1. Distributed Thinking
    1. When to use it?
    2. Where to start?
    3. How to look at data?
  2. Processing Data Streams
  3. Processing Large Data Chunks

