The SQL query engine Trino (formerly PrestoSQL) recaps a decade of innovation (opens in new tab)

(trino.io)

96 pointssimpligility3y ago66 comments

66 comments

I recently had to write SQL query generation for AWS Athena, which is based off Presto 0.217

It turns out that the dialect doesn't support LATERAL joins with a LIMIT in them. The below query only works if you remove the LIMIT clause.

https://i.stack.imgur.com/rdB1s.png

This makes saying things like "Fetch all artists where ..., for each artist fetch their first 3 albums where ..., and for each album fetch the top 10 tracks where ..." really difficult

Does Trino support this out of curiosity?

jjoonathan3y ago

AWS Athena: selling a buggy, old, stale copy of someone else's work (Presto / Trino) for high prices and getting away with it because you control the platform.

If that's not peak Amazon, I don't know what is.

RobinL3y ago

I think many users just see they can execute a query on huge data cheaply and incredibly quickly and are delighted. That's certainly my experience.

It's one of the backends available in Splink, our FOSS record linkage software and it's revolutionary how it allows users to execute large scale probabilistic record linkage ridiculously cheaply. It wasn't long ago you needed very expensive proprietary software plus a big on prem cluster, costing in the hundreds of thousands, to achieve this.

A lot of the magic for me is on the infrastructure side: how they can read/write large datasets from s3 so quickly, so the value isn't just in the SQL engine.

jfim3y ago

The biggest value for corporate users is that they get everything already included as part of their existing cloud agreement.

Adding a new vendor to the mix needs to involve procurement, the legal team, vendor negotiations, while using a new AWS feature is just a matter of using it, even if it's not as good as the original ISV's version and doesn't support the long term viability of the project.

pbronez3y ago

Splink looks cool. I'm familiar with Tamr and Senzing, but this is the first FOSS option I've come across.

jjoonathan3y ago

Yes, it's good to be platform king. We know. Low friction for you, high friction for everyone else.

dmitrykoval3y ago

other arguments aside .. Athena costs $5 per 1TB scanned and also supports predicates pushdown to S3 Select. I wouldn't call this expensive, at least in comparison to self hosted Presto.

hashhar3y ago

At a certain scale it does become very expensive. It's easy math.

When your monthly Athena bill crosses whatever it would cost to have 5 or 10 EC2 machines it'll be cheaper to use Trino. At my previous workplace we moved from ~$40,000/month to ~$18,000/month by replacing Athena.

Athena is a very good tool to start with - unless you have super large scale you'll probably not outgrow it. But when you do there's Trino.

I do contribute to Trino - although I was merely a user when that cost reduction happened.

2 more replies

lopatin3y ago

Can’t you say the same thing for EC2 but with Linux instead of Presto? Personally I like Athena. The fact that it’s in the Amazon platform and managed is a plus for me.

jjoonathan3y ago

EC2 involved substantial VM management and networking innovations that I respect. Ditto lambda and S3. I would not categorize any of these as OSS flips in nearly the way that I would categorize Athena as an OSS flip.

mr_toad3y ago

Setting up and maintaining your own Trino cluster isn’t exactly trivial. You pay a premium for not having to do that.

slt20213y ago

AWS is just managed open source as a service

hashhar3y ago

Works just fine in Trino.

  trino> USE memory.default;
  USE
  trino:default> create table artist (artistid int);
  CREATE TABLE
  trino:default> create table album (albumid int, artistid int);
  CREATE TABLE
  trino:default> insert into artist values 1, 2;
  INSERT: 2 rows
  
  Query 20220804_182827_00005_n4rat, FINISHED, 1 node
  Splits: 19 total, 19 done (100.00%)
  0.52 [0 rows, 0B] [0 rows/s, 0B/s]
  
  trino:default> insert into album values (11, 1), (12, 1), (21, 2);
  INSERT: 3 rows
  
  Query 20220804_182857_00006_n4rat, FINISHED, 1 node
  Splits: 19 total, 19 done (100.00%)
  0.18 [0 rows, 0B] [0 rows/s, 0B/s]
  
  trino:default> select * from (select * from artist limit 2) a cross join lateral (select * from album where album.artistid = a.artistid limit 2);
   artistid | albumid | artistid
  ----------+---------+----------
          1 |      12 |        1
          1 |      11 |        1
          2 |      21 |        2
  (3 rows)
  
  Query 20220804_182930_00007_n4rat, FINISHED, 1 node
  Splits: 41 total, 41 done (100.00%)
  0.35 [8 rows, 232B] [22 rows/s, 661B/s]