ReadySet Core: next-generation SQL caching, freely available (opens in new tab)

(readyset.io)

106 pointsmarzoeva3y ago54 comments

54 comments

If you release something new, you should make sure that your documentation contains useful information.

Even the most fundamental information like available configuration options, command-line arguments, deployment information and so on is missing.

Looking at the code it appears that you need Consul, Zookeeper and Redis to make this fly and the docs don't mention this anywhere. They (barely) explain how to run the SQL proxy on a local machine but thats it.

I wonder if the testimonials on your website are just pulled from thin air, I don't think any sane person would even experiment with this anywhere near production environments.

greg-m3y ago

Hey, PM @ ReadySet - fair points, and thanks for checking us out.

We've been in pretty heavy development and have been heads down on getting ReadySet into your hands as quickly as we could. We'll be doing a major documentation pass soon which will have more info about clustering, etc.

There's also a bit more detail in our development guide - see https://github.com/readysettech/readyset/blob/main/developme...

latchkey3y ago

From the blog post "Rather than forcing developers to switch to a key-value store"...

> need Consul, Zookeeper and Redis to make this fly

A hard dependency on 3 key value stores?

marzoevaOP3y ago

We need either Consul or Zookeeper (for leader election). No dependency on Redis!

That part of the blog post refers to ReadySet having a SQL interface, rather than a key-value one.

cpursley3y ago

Elixir/Erlang is not the most memory efficient, but it could be used for something like this without the need for Redis for Consul/Zookeeper.

zasdffaa3y ago

There are too many questions here. What does it not do? What's the overhead of monitoring the main DB and how's it done - triggers? Does it need schema changes? What about race conditions - can you guarantee none? What's the memory overhead you need for the cache? Can you control what gets cached?

> It can serve millions of reads per second on a single node ...

I'm not a network guy but that seems just astonishing - what is a 'node' here?

> ReadySet incrementally maintains result sets of SQL queries based on writes to the primary database.

So basically you've solved the general materialised view incremental update problem? That's an unsolved problem in general, surely?

Edit: not dissing but trying to see where the limits are.

umanwizard3y ago

> That's an unsolved problem in general, surely?

It's not. Materialize (my employer) incrementally maintains views too, using tech (Differential Dataflow) that has existed for almost 10 years: https://cs.stanford.edu/~matei/courses/2015/6.S897/readings/... .

ReadySet is based on Noria (Jon Gjengset's Ph.D thesis, explained for non-experts here: https://jon.thesquareplanet.com/noria-in-simpler-terms.pdf).

Taking a research project and making it into a production-ready product is hard work -- congrats to the ReadySet team on the launch, and best of luck!

zasdffaa3y ago

We may be talking very different things. From the postgres docs, a sample materialised view <https://www.postgresqltutorial.com/postgresql-views/postgres...> (I did a few tweaks as marked)

   CREATE MATERIALIZED VIEW rental_by_category
   AS
   SELECT c.name AS category,
     sum(p.amount) AS total_sales
    FROM (((((payment p
      JOIN rental r ON ((p.rental_id = r.rental_id)))
      JOIN inventory i ON ((r.inventory_id = i.inventory_id)))
      JOIN film f ON ((i.film_id <> f.film_id)))  -- tweak
      JOIN film_category fc ON ((f.film_id = fc.film_id)))
      JOIN category c ON ((fc.category_id < c.category_id)))  -- tweak
   GROUP BY c.name
   HAVING sum(p.amount) NOT IN (196, 203, 791)  -- tweak
   ORDER BY sum(p.amount) DESC

It can materialise and efficiently (read: incrementally) maintain the result set of that??

umanwizard3y ago

I believe the non-equijoin will cause problems for Materialize today (I don’t work on our optimizer team, so I’m not 100% sure and don’t take this as authoritative). We might turn that into a cross join followed by a filter.

I will answer that for sure later today when I’m back at my desk.

If you changed that back to an equals sign, yes, we could incrementally maintain your query.

1 more reply

trollied3y ago

Oracle has had MVs that can refresh on update for decades.

zasdffaa3y ago

So does Pgres & mssql, but general views that are incrementally updated - that's another matter. I'd be very surprised (and pleased).

marzoevaOP3y ago

This is indeed our goal– we're most of the way there with SQL 92 and plan to continue to expand our query support over time!

jjice3y ago

I love the concept. Not needing to have extra code and logic for a caching layer seems very nice. In my experience, I haven't ever been in a situation where I needed super heavy caching, but this seems like it gives it to you "for free". Interested to see if we see more of ReadySet in the future.

1 more reply

cpursley3y ago

Woah, I had the same idea not so long ago. Right now I'm using GraphCDN but would much rather cache at the database level. Looks like this could be a drop in for lots of people already on Postgres & MySQL (meaning no more dog-slow Rails apps).

There was a cool article about intercepting the Postgres connection with Elixir not long ago: https://docs.statetrace.com/blog/build-a-postgres-proxy/

greg-m3y ago

PM at ReadySet here - that's the idea! We think sub ms reads while still using SQL are pretty cool :)

If you want to dig-in more, hop into our community slack: https://readysetcommunity.slack.com/

jensneuse3y ago

As an alternative, you can use WunderGraph (oss) to compile GraphQL Queries to REST Endpoints so that you can use fastly or Cloudflare as a CDN (and the Browser Cache obviously): https://wundergraph.com/docs/overview/features/caching It supports configurable Cache-Control Headers per Operation and comes with ETags out of the box, so content can be invalidated easily.

cpursley3y ago

That's pretty cool, thanks.

steve-chavez3y ago

How does ReadySet interact with Row level security[1]? For RLS to work you'd need validation at the origin server anyway right?

[1]: https://www.postgresql.org/docs/current/ddl-rowsecurity.html

zasdffaa3y ago

Damn, that's a good question! Or security in general. But I wouldn't blame them at all if they didn't do it. Good thought though.

freitasm3y ago

Interesting. I would love to see this available for MS SQL Server.

I've played with Safepeak (1) which runs on Windows Server. It was sold later to an Israeli company (2), which have since gone out of business and assets ended up with another company and now sold as ScaleArc (3)

The original SafePeak is available free but no maintenance or anything, so not really production ready. It works, as tested in a test environment but eight years without support or updates...

(1) http://www.safepeak.org/ (2) https://en.wikipedia.org/wiki/SafePeak (3) https://www.devgraph.com/scalearc/

xtreak293y ago

GitHub repo : https://github.com/readysettech/readyset

mdaniel3y ago

BSL <https://github.com/readysettech/readyset/blob/main/LICENSE> so they're serious about the "freely available" part I guess

agacera3y ago

This looks really nice. I was reading about this and realized it had similar ideas to this [1] Phd thesis from Jon Gjengset. I checked his twitter [2] and it was based on his work indeed.

Great that someone is productionalizing this!

Btw, is Jon involved in ReadySet?

[1] Partial State in Dataflow-Based Materialized Views - https://github.com/mit-pdos/noria

[2] https://twitter.com/jonhoo/status/1537474261689872384

marzoevaOP3y ago

Yes, co-founder. You can read our initial announcement below!

https://twitter.com/jonhoo/status/1511401461669720068 https://readyset.io/blog/introducing-readyset

_ben_3y ago

PolyScale [1] is a serverless plug-and-play database edge cache. Our goal is for devs to be able to scale reads globally in a few minutes. It’s wire compatible with Postgres, MySQL, MS SQL Server (more coming including no-sql).

It has a global edge network, so no infrastructure to deploy and AI managed cache and auto invalidation, so no cache configuration needed.

[1] https://www.polyscale.ai/

trollied3y ago

> ReadySet is a lightweight SQL caching engine that precomputes frequently-accessed query results and automatically keeps these results up-to-date over time as the underlying data in your database changes

I don't see the point in using an extra app - you can do this natively in Postgres. Materialized views. https://www.postgresql.org/docs/current/rules-materializedvi...

hbrundage3y ago

Postgres materialized views have to be manually refreshed on a schedule, and so are always out of date, whereas ReadySet keeps your results up to date automatically as the input changes. For PG materialized views, the compute required proportional to the size of the input data, and is paid every time, whereas with ReadySet the computation is incremental, so it's proportional to the size of the change in the data over time.

And finally, ReadySet's (Noria's) big innovation is that the result set can be only partial, storing only the elements of the result set (and underlying data flow graph) that are frequently accessed, instead of the whole result set like a materialized view would.

adwf3y ago

Except then you need to be paying someone to monitor your queries and develop your views rather than just dropping a container in the middle with this app.

cpursley3y ago

Are materialized views aware of applied where filters?

tmikaeld3y ago

> Traditional databases would compute the results of this query from scratch every time it was issued.

Is this really the case that queries can't be cached on traditional databases?

greg-m3y ago

ReadySet PM here - depends on if there are writes to the table or not!

For example, MySQL deprecated their query cache, but previously it would only cache until there were any writes to the tables that the queries were referencing https://dev.mysql.com/doc/refman/5.7/en/query-cache-configur...

tmikaeld3y ago

I was just looking this up and it's correct, they don't cache queries (if they do, it's a separate feature), they only manage query planning in ways that make them faster.

Even CochroachDB doesn't do query cache, only query planning is cached. [0] [1]

[0] https://www.cockroachlabs.com/blog/memory-usage-cockroachdb/

[1] https://www.cockroachlabs.com/blog/query-plan-caching-in-coc...

CharlesW3y ago

It doesn't appear to be the case: https://docs.oracle.com/database/121/TGDBA/tune_result_cache...

d_watt3y ago

Interesting. What would you say the use case is for this, rather than setting up read replicas? Not having to maintain routing to the replicas on the application side?

_jezell_3y ago

When you care about perf a lot more than consistency

marzoevaOP3y ago

Hi! CEO of ReadySet here. You can think of ReadySet as being a cross between a traditional read replica and a custom caching layer (e.g. one you might build on top of Redis). With read replicas, you still rerun queries from scratch every time they're issued, which means you still have to think about things like query optimization. ReadySet caches frequently run queries in memory so you get super-fast query latencies on cache hits. Because of this, you can scale to much higher read throughputs without extra effort. This is especially useful for read-heavy applications (e.g. websites, certain types of dashboards, among others!)

You can read more about how it works here: https://docs.readyset.io/concepts/overview

jinjin23y ago

How do you deal with security? In modern databases like MongoDB permissions are granular down to the field level.

The same query could produce wildly different result based on the user issuing them, and the caching somehow has to take that into account. Is that something you address?

1 more reply

anentropic3y ago

Seems clever!

I'm curious what might be pathological cases, patterns of query watching and updates that give the cache a lot of work to do to keep up

wasd3y ago

I signed up for the waitlist! I noticed it asked about AWS, Azure, and GCP but we use Heroku. Hopefully, that won't put me too low on the list.

Do you have a sense for when people can try it? Most of our app is reads and we're using Rails + Redis and it's fine and sometimes a pain. Would love to try it.

greg-m3y ago

Hey, PM @ ReadySet here. Shoot an email to greg@readyset.io and we can see what we can do :)

pmarec3y ago

Are you looking into spreading the dataflow even more down to the clients ? Think realtime subscription for complex queries over structured data.

greg-m3y ago

Yes! We've thought about this in depth and have some ideas but I'd love to chat more. Shoot me an email: greg@readyset.io

pmarec3y ago

I seems like i need an @readyset.io address to join your slack. Do you confirm ?

greg-m3y ago

https://join.slack.com/t/readysetcommunity/shared_invite/zt-...

should work!

jensneuse3y ago

BSL :(

jjice3y ago

From their README:

> ReadySet is licensed under the BSL 1.1 license, converting to the open-source Apache 2.0 license after 4 years. The ReadySet team is hard at work getting the codebase ready to be hosted on Github.

simonw3y ago

Has anyone else done this thing with BSL for 4 years that then converts to Apache 2? I've not seen it before.

gst3y ago

In addition to the other software already listed here Materialize does this too: https://github.com/MaterializeInc/materialize/blob/main/LICE...

js4ever3y ago

Redpanda is doing the same. TBH I really don't like BSL, I prefer 10 times open core model

ritesofbryan3y ago

This is what Cockroach does as well: https://www.cockroachlabs.com/docs/stable/licensing-faqs.htm...

j / k navigate · click thread line to collapse

54 comments

aeyes3y ago

If you release something new, you should make sure that your documentation contains useful information.

Even the most fundamental information like available configuration options, command-line arguments, deployment information and so on is missing.

I wonder if the testimonials on your website are just pulled from thin air, I don't think any sane person would even experiment with this anywhere near production environments.

greg-m3y ago

Hey, PM @ ReadySet - fair points, and thanks for checking us out.

There's also a bit more detail in our development guide - see https://github.com/readysettech/readyset/blob/main/developme...

latchkey3y ago

From the blog post "Rather than forcing developers to switch to a key-value store"...

> need Consul, Zookeeper and Redis to make this fly

A hard dependency on 3 key value stores?

marzoevaOP3y ago

We need either Consul or Zookeeper (for leader election). No dependency on Redis!

That part of the blog post refers to ReadySet having a SQL interface, rather than a key-value one.

cpursley3y ago

Elixir/Erlang is not the most memory efficient, but it could be used for something like this without the need for Redis for Consul/Zookeeper.

zasdffaa3y ago

> It can serve millions of reads per second on a single node ...

I'm not a network guy but that seems just astonishing - what is a 'node' here?

> ReadySet incrementally maintains result sets of SQL queries based on writes to the primary database.

So basically you've solved the general materialised view incremental update problem? That's an unsolved problem in general, surely?

Edit: not dissing but trying to see where the limits are.

umanwizard3y ago

> That's an unsolved problem in general, surely?

ReadySet is based on Noria (Jon Gjengset's Ph.D thesis, explained for non-experts here: https://jon.thesquareplanet.com/noria-in-simpler-terms.pdf).

Taking a research project and making it into a production-ready product is hard work -- congrats to the ReadySet team on the launch, and best of luck!

zasdffaa3y ago

We may be talking very different things. From the postgres docs, a sample materialised view <https://www.postgresqltutorial.com/postgresql-views/postgres...> (I did a few tweaks as marked)

   CREATE MATERIALIZED VIEW rental_by_category
   AS
   SELECT c.name AS category,
     sum(p.amount) AS total_sales
    FROM (((((payment p
      JOIN rental r ON ((p.rental_id = r.rental_id)))
      JOIN inventory i ON ((r.inventory_id = i.inventory_id)))
      JOIN film f ON ((i.film_id <> f.film_id)))  -- tweak
      JOIN film_category fc ON ((f.film_id = fc.film_id)))
      JOIN category c ON ((fc.category_id < c.category_id)))  -- tweak
   GROUP BY c.name
   HAVING sum(p.amount) NOT IN (196, 203, 791)  -- tweak
   ORDER BY sum(p.amount) DESC

It can materialise and efficiently (read: incrementally) maintain the result set of that??

umanwizard3y ago

I will answer that for sure later today when I’m back at my desk.

If you changed that back to an equals sign, yes, we could incrementally maintain your query.

1 more reply

trollied3y ago

Oracle has had MVs that can refresh on update for decades.

zasdffaa3y ago

So does Pgres & mssql, but general views that are incrementally updated - that's another matter. I'd be very surprised (and pleased).

marzoevaOP3y ago

This is indeed our goal– we're most of the way there with SQL 92 and plan to continue to expand our query support over time!

jjice3y ago

1 more reply

cpursley3y ago

There was a cool article about intercepting the Postgres connection with Elixir not long ago: https://docs.statetrace.com/blog/build-a-postgres-proxy/

greg-m3y ago

PM at ReadySet here - that's the idea! We think sub ms reads while still using SQL are pretty cool :)

If you want to dig-in more, hop into our community slack: https://readysetcommunity.slack.com/

jensneuse3y ago

cpursley3y ago

That's pretty cool, thanks.

steve-chavez3y ago

How does ReadySet interact with Row level security[1]? For RLS to work you'd need validation at the origin server anyway right?

[1]: https://www.postgresql.org/docs/current/ddl-rowsecurity.html

zasdffaa3y ago

Damn, that's a good question! Or security in general. But I wouldn't blame them at all if they didn't do it. Good thought though.

freitasm3y ago

Interesting. I would love to see this available for MS SQL Server.

The original SafePeak is available free but no maintenance or anything, so not really production ready. It works, as tested in a test environment but eight years without support or updates...

(1) http://www.safepeak.org/ (2) https://en.wikipedia.org/wiki/SafePeak (3) https://www.devgraph.com/scalearc/

xtreak293y ago

GitHub repo : https://github.com/readysettech/readyset

mdaniel3y ago

BSL <https://github.com/readysettech/readyset/blob/main/LICENSE> so they're serious about the "freely available" part I guess

agacera3y ago

This looks really nice. I was reading about this and realized it had similar ideas to this [1] Phd thesis from Jon Gjengset. I checked his twitter [2] and it was based on his work indeed.

Great that someone is productionalizing this!

Btw, is Jon involved in ReadySet?

[1] Partial State in Dataflow-Based Materialized Views - https://github.com/mit-pdos/noria

[2] https://twitter.com/jonhoo/status/1537474261689872384

marzoevaOP3y ago

Yes, co-founder. You can read our initial announcement below!

https://twitter.com/jonhoo/status/1511401461669720068 https://readyset.io/blog/introducing-readyset

_ben_3y ago

It has a global edge network, so no infrastructure to deploy and AI managed cache and auto invalidation, so no cache configuration needed.

[1] https://www.polyscale.ai/

trollied3y ago

I don't see the point in using an extra app - you can do this natively in Postgres. Materialized views. https://www.postgresql.org/docs/current/rules-materializedvi...

hbrundage3y ago

adwf3y ago

Except then you need to be paying someone to monitor your queries and develop your views rather than just dropping a container in the middle with this app.

cpursley3y ago

Are materialized views aware of applied where filters?

tmikaeld3y ago

> Traditional databases would compute the results of this query from scratch every time it was issued.

Is this really the case that queries can't be cached on traditional databases?

greg-m3y ago

ReadySet PM here - depends on if there are writes to the table or not!

tmikaeld3y ago

I was just looking this up and it's correct, they don't cache queries (if they do, it's a separate feature), they only manage query planning in ways that make them faster.

Even CochroachDB doesn't do query cache, only query planning is cached. [0] [1]

[0] https://www.cockroachlabs.com/blog/memory-usage-cockroachdb/

[1] https://www.cockroachlabs.com/blog/query-plan-caching-in-coc...

CharlesW3y ago

It doesn't appear to be the case: https://docs.oracle.com/database/121/TGDBA/tune_result_cache...

d_watt3y ago

Interesting. What would you say the use case is for this, rather than setting up read replicas? Not having to maintain routing to the replicas on the application side?

_jezell_3y ago

When you care about perf a lot more than consistency

marzoevaOP3y ago

You can read more about how it works here: https://docs.readyset.io/concepts/overview

jinjin23y ago

How do you deal with security? In modern databases like MongoDB permissions are granular down to the field level.

The same query could produce wildly different result based on the user issuing them, and the caching somehow has to take that into account. Is that something you address?

1 more reply

anentropic3y ago

Seems clever!

I'm curious what might be pathological cases, patterns of query watching and updates that give the cache a lot of work to do to keep up

wasd3y ago

I signed up for the waitlist! I noticed it asked about AWS, Azure, and GCP but we use Heroku. Hopefully, that won't put me too low on the list.

Do you have a sense for when people can try it? Most of our app is reads and we're using Rails + Redis and it's fine and sometimes a pain. Would love to try it.

greg-m3y ago

Hey, PM @ ReadySet here. Shoot an email to greg@readyset.io and we can see what we can do :)

pmarec3y ago

Are you looking into spreading the dataflow even more down to the clients ? Think realtime subscription for complex queries over structured data.

greg-m3y ago

Yes! We've thought about this in depth and have some ideas but I'd love to chat more. Shoot me an email: greg@readyset.io

pmarec3y ago

I seems like i need an @readyset.io address to join your slack. Do you confirm ?

greg-m3y ago

https://join.slack.com/t/readysetcommunity/shared_invite/zt-...

should work!

jensneuse3y ago

BSL :(

jjice3y ago

From their README:

> ReadySet is licensed under the BSL 1.1 license, converting to the open-source Apache 2.0 license after 4 years. The ReadySet team is hard at work getting the codebase ready to be hosted on Github.

simonw3y ago

Has anyone else done this thing with BSL for 4 years that then converts to Apache 2? I've not seen it before.

gst3y ago

In addition to the other software already listed here Materialize does this too: https://github.com/MaterializeInc/materialize/blob/main/LICE...

js4ever3y ago

Redpanda is doing the same. TBH I really don't like BSL, I prefer 10 times open core model

ritesofbryan3y ago

This is what Cockroach does as well: https://www.cockroachlabs.com/docs/stable/licensing-faqs.htm...

j / k navigate · click thread line to collapse