undefined | Better HN

0 pointsiskander11y ago0 comments

I'd love to learn more about how they're using Spark, are there are any blog posts or tech talks floating around?

0 comments

Here's a talk at Hakka Labs done by a Ooyala Engineer (@evanfchan), which is how I knew they used Spark: https://www.youtube.com/watch?v=PjZp7K5z7ew - and the accompanying slides: http://www.slideshare.net/planetcassandra/south-bay-cassandr...

They use Spark on top of Cassandra, as well as they are users of Spark's version of Hive - Shark.

iskanderOP11y ago

Thanks for posting this. I'm starting to get a feel for when Spark is usable-- you need an underlying indexed data store which lets you fetch small subsets of your data into RDDs (or, your data can be tiny to begin with). We've been trying to use Spark on input sizes which, while smaller than our cluster's available memory, are probably too big for Spark to handle (> 1TB).

campers11y ago

These guys look to be doing some nice work integrating Cassandra and Spark http://blog.tuplejump.com/ They've piggybacked on the Cassandra clustering using a java agent to run the Spark masters. Doesn't seem to be a realease available yet though.

j / k navigate · click thread line to collapse

0 comments

nemothekid11y ago

They use Spark on top of Cassandra, as well as they are users of Spark's version of Hive - Shark.

iskanderOP11y ago

campers11y ago

j / k navigate · click thread line to collapse