Accelerated SQL for JSON with AVX512 (Golang) (opens in new tab)

(github.com)

19 pointswmu4y ago9 comments

9 comments

This is a killer idea. However , I do not see anything in the README about distributed querying. Is that something you wish to tackle?

Also, any benchmarks comparing this to Apache Arrow or Apache Presto?

pmh914y ago

Hi! One of the authors here. We do have support for distributed querying, but it's not implemented in the command-line tool. (It makes for a much more complicated demo if you need multiple machines.) The query planner is happy to use as many machines as you can throw at it.

We don't yet have good comparative benchmarks against Arrow or Presto, although I'm hoping we can get those.

niviksha4y ago

Sneller head of product here. Arrow is a data exchange format, are you referring to benchmarking against DataFusion or Ballista? Also, on Presto - we did early benchmarks against Amazon's Athena (Presto under the covers) running on parquet, and will rerun these benchmarks shortly. The interesting thing to note vs Presto is that it is clunky to use with raw JSON - see https://prestodb.io/docs/current/functions/json.html. While benchmarking against Athena we actually used AWS Glue (Spark under the hood) to transform JSON into parquet, but that adds both complexity and latency to the overall pipeline, which doesn't show up in just query timings

bugboy734y ago

If you check out the Kubernetes folder in the repo, then you find the Kubernetes setup to run in a distributed environment (that is also highly available).

wmuOP4y ago

Disclaimer: I'm one of the authors of sneller core. We have been working on this project for more than a year. It's has got neat AVX512-centered architecture and many neat tricks inside.

gpapilion4y ago

Am I missing the comparison of avx ve non-avx performance?

fwessels4y ago

Sneller founder here: we do not have any non-AVX code so we cannot compare directly against that. But generally speaking our code always works on 16 lanes in parallel per core, so that gives a huge speed-up.

37ef_ced34y ago

This code is really nice. How will you profit?

fwessels4y ago

Thank for for the feedback, that is nice to hear. And as for the business question, we plan on launching a Sneller Cloud offering. (Sneller founder here)

j / k navigate · click thread line to collapse

9 comments

desk_minion4y ago

This is a killer idea. However , I do not see anything in the README about distributed querying. Is that something you wish to tackle?

Also, any benchmarks comparing this to Apache Arrow or Apache Presto?

pmh914y ago

We don't yet have good comparative benchmarks against Arrow or Presto, although I'm hoping we can get those.

niviksha4y ago

bugboy734y ago

If you check out the Kubernetes folder in the repo, then you find the Kubernetes setup to run in a distributed environment (that is also highly available).

wmuOP4y ago

Disclaimer: I'm one of the authors of sneller core. We have been working on this project for more than a year. It's has got neat AVX512-centered architecture and many neat tricks inside.

gpapilion4y ago

Am I missing the comparison of avx ve non-avx performance?

fwessels4y ago

37ef_ced34y ago

This code is really nice. How will you profit?

fwessels4y ago

Thank for for the feedback, that is nice to hear. And as for the business question, we plan on launching a Sneller Cloud offering. (Sneller founder here)

j / k navigate · click thread line to collapse