undefined | Better HN

0 pointsalanctgardner212y ago0 comments

CAP theorem can apply to any clustered system, it doesn't have to be multi-site. What happens if 6 of your 12 machines die? What if they get cut off from the other 6?

edit: There's a bit of discussion further down about the SQL implementation. That's something I was very curious about as well. The projects linked below spend a lot of time working on supporting full ANSI SQL, and reducing latency by pushing down as many operations as possible. The Overview page doesn't appear to mention how filtering, aggregation, windowing, etc. work in your system.

Also, I noticed on your website that you compare InfiniSQL to Hadoop. How do you feel it compares to Impala (http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-t...) and Shark (https://amplab.cs.berkeley.edu/projects/shark-making-apache-...)?

0 comments

mtravis12y ago

Hi, Alan. Regarding CAP, I think that given redundant cluster interconnects, redundant managed power, odd # of cluster managers for quorum, all mean that split brain is just about out of the question, configured properly.

The main reason that I have a FAQ about Hadoop is that I have been asked repeatedly by people, "what's the difference between InfiniSQL & Hadoop?" It seems to be the data project most on a lot of people's minds. It's a fair question, so I created an FAQ entry on my site.

I don't know what Impala or Shark's performance are for transaction processing. Show me the numbers, in other words. I don't believe that Hadoop is going to eat the world, but that it's best use cases are probably in the reporting realm. It seems that Impala wants to bridge that gap between reporting and operational/transactional database--but I can't read from their architectural description just how well it would actually perform for transaction processing workloads.

Regarding Shark, it looks to me like they still see it as an parallelized reporting system, and not geared towards OLTP/operational workloads.

I expect that InfiniSQL will be able to handle quite sophisticated analytics workloads, as more capabilities are added, but they will be more for real-time. I don't see it displacing special purpose analytics environments, especially the masive unstructured ones.

Regarding filtering, aggregation, windowing, I haven't documented it yet. The SQL engine is pretty simple at this point--it parses, makes an abstract syntax tree, then executes. If you need more, then the code is there. :-)

jbellis12y ago

Ah, the myth of the sufficiently redundant network.

http://pl.atyp.us/wordpress/index.php/2010/10/when-partition...

http://kellabyte.com/2013/11/04/the-network-partitions-are-r...

mtravis12y ago

Redundant components and pathing, properly implemented and managed, are what allow enterprise storage arrays, mainframe clusters, Tandems, and the like, to operate 24x7 for years on end. Their myth seems to work.

1 more reply

vidarh12y ago

> configured properly.

... and there is where it falls apart. Sooner or later, "someone" is going to do something Incredibly Dumb that is going to take down a lot of nodes.

If you are betting that you can just add enough redundancy split brain can't happen, I have to question why I should take you seriously with important data.

(It also indicates to me this is going to be ludicrously expensive to set up though, but that's another issue)

mtravis12y ago

Redundancy, quorum protocol, proactive testing, rigorous QA.

And actual 24x7 environments with important data are ludicrously expensive. I expect InfiniSQL to be less expensive since it's based on x86_64, Linux, is open source. But yeah, hardware and environment need to be right.

I'm not sure what I said wrong, but I'm nowhere claiming that split brain (or any failure scenario) can be ruled out entirely in all circumstances--but in practice, split brain is avoided 24x7 for years on end in many different architectures.

It's not magic.

Just like you can't have "enough" storage redundancy. You can have a 100-way mirror of hard drives that will still lose data if you lose 100 disks in less time than somebody replaces one of them.

DanWaterworth12y ago

Congratulations, you have convinced me never to use your database. You can't ignore CAP, because machine failures, momentary high latency and bona fide network partitions happen.

spof312y ago

>>Show me the numbers

https://amplab.cs.berkeley.edu/benchmark/

cmccabe12y ago

I work at Cloudera (although not on Impala). I work on HDFS.

Impala was about creating a low-latency SQL engine for Hadoop, so that queries could be done interactively by a human at a keyboard. This is something that you don't really get with Hive (despite all the recent hype) because it simply is too slow, and has high startup costs. That's unlikely to change in the future because of the overhead of spinning up JVMs, starting MapReduce jobs, etc.

It seems like you are trying to target the OLTP market. That's a difficult market to crack. A lot of the value of things like Oracle and Microsoft SQL server is not in the database itself, but in the surrounding software. Performance is nice, but unless you can get orders of magnitude, it's very difficult to compete.

Will anybody ever bridge OLTP and OLAP? The last people who claimed to be trying to do that were Drawn to Scale, and we all know how that turned out. I think it's better to focus on doing one thing well.

Good luck.

gnaritas12y ago

> Regarding CAP, I think that given redundant cluster interconnects, redundant managed power, odd # of cluster managers for quorum, all mean that split brain is just about out of the question, configured properly.

Failure is never out of the question, you either plan for it or you suffer from it. CAP applies.

j / k navigate · click thread line to collapse

0 comments

mtravis12y ago

Regarding Shark, it looks to me like they still see it as an parallelized reporting system, and not geared towards OLTP/operational workloads.

jbellis12y ago

Ah, the myth of the sufficiently redundant network.

http://pl.atyp.us/wordpress/index.php/2010/10/when-partition...

http://kellabyte.com/2013/11/04/the-network-partitions-are-r...

mtravis12y ago

1 more reply

vidarh12y ago

> configured properly.

... and there is where it falls apart. Sooner or later, "someone" is going to do something Incredibly Dumb that is going to take down a lot of nodes.

If you are betting that you can just add enough redundancy split brain can't happen, I have to question why I should take you seriously with important data.

(It also indicates to me this is going to be ludicrously expensive to set up though, but that's another issue)

mtravis12y ago

Redundancy, quorum protocol, proactive testing, rigorous QA.

It's not magic.

Just like you can't have "enough" storage redundancy. You can have a 100-way mirror of hard drives that will still lose data if you lose 100 disks in less time than somebody replaces one of them.

DanWaterworth12y ago

Congratulations, you have convinced me never to use your database. You can't ignore CAP, because machine failures, momentary high latency and bona fide network partitions happen.

spof312y ago

>>Show me the numbers

https://amplab.cs.berkeley.edu/benchmark/

cmccabe12y ago

I work at Cloudera (although not on Impala). I work on HDFS.

Good luck.

gnaritas12y ago

Failure is never out of the question, you either plan for it or you suffer from it. CAP applies.

j / k navigate · click thread line to collapse