Skip to content

Top New Best Ask Show Jobs

Faster MySQL with HTTP/3 | Better HN

Faster MySQL with HTTP/3 (opens in new tab)

(planetscale.com)

185 pointsmattrobenolt3y ago47 comments

47 comments

kamikaz1k3y ago

Love love love planetscale.

When I found it (you can thank Theo), shocked this isn't what AWS' serverless DB offering already was.

I agree with what the author mentioned in another comment, not dropping performance for non-serverless use cases is a decided win. I deeply appreciate the work being done to enable serverless applications, so thank you for the work and thank you for sharing your findings OP.

thatwasunusual3y ago

I've barely heard about PlanetScale, but from the looks of it, it looks interesting. However, is it compatible with "normal" MySQL, meaning that I just can change my connection information from `currentMySQLhost` to `planetscaleMySQLhost`, and it just works (after creating the database, tables etc., of course)?

mattrobenoltOP3y ago

Yup! There are a few caveats, but for the most part it'd be compatible.

https://planetscale.com/docs/reference/mysql-compatibility

thatwasunusual3y ago

That is pretty great, especially with such a generous free plan for testing.

However, no support for FOREIGN KEYs is a bit of a bummer. However, they explain it very well.[0]

Thanks for the reply! Will definitely give PlanetScale a try over the weekend!

[0] https://planetscale.com/docs/learn/operating-without-foreign...

mattrobenoltOP3y ago

Much love. <3

Awesome work! I'm loving planetscale for lighting a new fire under the butt of MySQL again. Have been demoing Vitess (Planetscale's underlying software) and have only positive things to say so far - the Kubernetes operator is wonderful.

I'm also curious about the comparison to the MySQL Classic protocol - would be interesting have an "as-close-as-possible" benchmark between Aurora MySQL "Serverless V2" and Planetscale. Even if it was as naive as "Given 100$ of credits, how many reads can you do at what average latency".

Would be nice to see a comparison with MySQL's X Protocol (protobuf based) as well as the "Classic" protocol. I don't think I've seen any MySQL compatible databases implement this protocol yet, so that's probably why it wasn't included.

https://dev.mysql.com/doc/dev/mysql-server/latest/page_mysql...

mattrobenoltOP3y ago

I have vaguely looked into it, but since we don't support it at PlanetScale and unlikely we will, it didn't seem worth it. It'd be hard to construct a practical test.

Similarly, since support is so low, it didn't make a lot of sense to double down and support it when we could do what works for us.

muhammadusman3y ago

I'm a huge fan of using PlanetScale, I've been using it for a few projects recently and can't wait to try out the "Boost" feature when it becomes available. I think it will reduce one more thing from my stack (caching/redis)!

The developer experience with PlanetScale has been my favorite so far, I use it with a few Next.js apps and the "scaling" part has been the easiest as I haven't had to think about a burst of traffic b/c PlanetScale handles it without me lifting a finger.

uluyol3y ago

I can see how you might want HTTP/3 across data centers, but you may want to stick to HTTP/2 for intra-DC workloads.

There's a ton of work to optimize TCP including hardware offloads that help push higher throughput. Basically we're talking library + kernel + hardware changes. It might be possible to get some of these into QUIC, but since QUIC is most compelling for WAN traffic, there's probably not much incentive for that.

mattrobenoltOP3y ago

Mostly agree here. I don't think in practice, you'd default want to use HTTP/3. I think as technology, it's still just very immature for reasons you mentioned.

In our case though, lots of our customers and lots of use cases do communicate over a WAN, and potentially large geographic distances. I think having this as an option is super interesting to see what we can do with it in the future.

What's so great about PlanetScale?

I feel like I live under a rock because I just don't get what's so great.

They're a hosted version of Vitess, which is sharded MySQL (and more), which is generally a pain in the butt to implement / gets built in-house at major tech companies over and over.

Planetscale is doing for Vitess what Fastly does for Varnish, if that makes sense. Or maybe, what Datadog does to statd? It's a hosted platform around an awesome and complex-to-maintain bit of open source software.

One of the best ways I've managed latency with MySQL is basically this:

1) use persistent connections, let the OS handle them and tweak it to allow (both connecting server and mysql server). And never close the connection on the application side. (This could lead to potential deadlocks, but there are ways around it, like closing bad connections to clear thread info on mysql).

2) run the whole thing in a transaction, simply begin transaction or autocommit if allowed (same thing)

Doing so, when you are done rendering the content, flush it and send the correct signal to say nginx or apache to say it's done (like PHP's fastcgi_finish_request when working with FPM), and then run your commit. Obviously used when you can safely disregard failed inserts.

mattrobenoltOP3y ago

> 1) use persistent connections, let the OS handle them and tweak it to allow (both connecting server and mysql server). And never close the connection on the application side. (This could lead to potential deadlocks, but there are ways around it, like closing bad connections to clear thread info on mysql).

This is definitely ideal, but one thing that you can't entirely control is the server side or what's between. Sometimes your connections get interrupted, and it's not possible to maintain a connection forever. Yes tho, this is the ideal thing you should do with a connection pooler.

> 2) run the whole thing in a transaction, simply begin transaction or autocommit if allowed (same thing)

This shouldn't really help with latency. Being in a transaction doesn't reduce latency. If we're being pedantic, it would likely increase latency due to having to execute a BEGIN and COMMIT query, which is typically two more round trips, one per query.

I think what you're getting at is something like pipelining, where you can send multiple queries in one request, and get multiple results back. This is technically supported over the mysql protocol, but isn't very widely adopted in practice.

still_grokking3y ago

> but one thing that you can't entirely control is the server side or what's between.

Why?

If you're not running stuff on other peoples computers you're very much in control.

What do I miss?

> If you're not running stuff on other people's computers you're very much in control.

mattrobenoltOP3y ago

We're a service provider. As a client and customer, you connect to us as a third party service. You don't control our uptime or connectivity. Nor do you control whatever network hops may be between.

This is optimizing edge cases. Handshake is done once per execution , and it's constant time , so "connect + select 1" should be measured as "connect", then "select 1". Other than that, 5ms latency is too high anyway for modules doing hundreds of small requests, so you better have read replicas real close to your code, and shard your masters.

mattrobenoltOP3y ago

Maybe in your case, but not always. But I do explicitly call out that I intentionally wanted a test of a "cold start". This is extremely relevant for short lived applications and processes. Think PHP, and serverless environments, etc.

The other tests are measuring already a warmed up connection.

There's also reason why I intentionally coupled "connect + select 1" as the test, because I wanted to make it as close of a comparison as possible. If it was simply a "connect", our HTTP API would be even more favorable since connecting doesn't do authentication or anything like that like the mysql protocol does.

tookledoodle3y ago

You call out PHP in both the blog post and this comment, which is interesting, because persistent pooling has been a part of PHP (and advised) for years.

Which leaves serverless and scripts (your other example from the blog post). Which, let’s be honest, are both edge cases at this point in time. Maybe that’ll change, but today it’s true.

Twenty year SRE here backing up the person you’re dismissing: you’re optimizing an edge case. Literally step one of operationalizing every system in existence is burying your DB behind a pooler. 100ms off a connect call in a script is not useful. The serverless improvement has some potential, but one would be forgiven for asking why you’d use an environment which doesn’t let you speak network protocols you’d like to speak.

orware3y ago

I'd be curious to hear about experiences with persistent pooling usage within PHP, since I don't feel that I've heard about it being advised as much myself over the years, but that could perhaps be due to old misperceptions about it.

I know there is the "Persistent Database Connections" section of the PHP manual and the mysqli extension within PHP supports connection pooling / persistent connections, but in my own experiences I've rarely seen them utilized, especially by the bigger open source projects out there such as WordPress, which has an 8 year old enhancement topic on the subject: https://core.trac.wordpress.org/ticket/31018. Putting your database behind a pooler, like ProxySQL let's say, is another option as the level of sophistication for a company/application increases, but most typical PHP setups I've used don't have that immediately available.

I've generally been under the impression that most projects/applications don't use the built-in pooling features for some of the reasons discussed in the link above, leading to those applications being more impacted by lengthier connection times due to a new connection being created at the beginning each request and then closed at the end of the request.

Now I'm inclined to experiment a bit with the built-in mysqli pooling feature though since it would seem a worthwhile feature for developers to experiment with more if it would lessen the connection time impact for each PHP request, particularly for databases that are further away and require secure connections.

Shaving off 100ms for a connection would be significant for most PHP users if they are currently having to open fresh connections on each request, especially if they were previously used to connection times of < 1ms when connecting to a local MySQL database.

> because persistent pooling has been a part of PHP (and advised) for years.

It may be advised, but I can assure you that it's not very common! I would guess that the vast, vast majority of PHP applications are _not_ pooling their connections. Especially when it comes to PHP hosted on Lambda, which is surprisingly a non-trivial amount of applications at this point.

mattrobenoltOP3y ago

I'm not a PHP expert, so I don't know the landscape there fully. I do know our customer complaints and can say people care about cold start times in the PHP space and others.

So while it may be an edge case for you, it's not for others. It also doesn't discredit any of the other testing that doesn't focus on cold starts.

Edit since you edited yours after I posted:

I'm not going to argue the merits of what platforms people choose and it's not really our position as PlanetScale to do that. We serve our customers.

TCP connect is not constant time. It stands a surprisingly large chance of failing and waiting TWO SECONDS to start again.

Most likely the parallel requests are done using a single connection.

So of course HTTP/2 will outperform, that's what it's designed to do.

Now try again, but use one connection per thread, and connect it before you start benchmarking, i.e. use it the way it's meant to be used.

mattrobenoltOP3y ago

The tests cover both cases if you read.

But either way, yes, that's fundamentally a benefit of being able to use HTTP. We can multiplex multiple sessions over one underlying connection.

Multiplexing is already done for you by your kernel, it's called having multiple TCP sessions.

The whole premise of HTTP/[23] is to do the same thing as you do with N TCP sessions, but paying for the session establishment latency only once instead of N times.

And most applications couldn't care less about that latency, because you only do it once.

mattrobenoltOP3y ago

My apologies for not meeting your bar. I guess you missed the parts where it's faster in a lot of other cases too, and not slower in any.

To me, the fact that it's not slower at all is the big win. I didn't anticipate that the results of this are going to say "this is 5x better". The stereotype is that if it's over HTTP, it must be slower.

And by every measure, it's not slower. In cases, that may be edge to you, or don't care about extra latency, they're still improved. Why would you not want something that's generically better?

There are many other things that are beneficial with using HTTP as a transport that haven't even been discussed here since this was entirely focused on performance. Without at least matching in performance, not many of the other things would matter.

This may be a stupid question, but wouldn't connection pooling offer the same benefit?

mattrobenoltOP3y ago

It helps some cases for sure and what I'd strongly recommend in practice. Connection pooling isn't always a viable option depending on the application though.

Connection pooling doesn't solve all the things we can improve by using HTTP as a base. We can be faster in just data transfer through compression, for example.

Using HTTP/3 starts to help tail latency that we can't solve with TCP. Unreliable networks with packet loss suffer greatly with TCP and not as badly with QUIC.

Isn't that solved by this? https://dev.mysql.com/doc/refman/5.6/en/connection-compressi...

mattrobenoltOP3y ago

In theory. In practice nothing implements this.

But in any case, even if your client did support this and the server supported it, we still need HTTP for other things. I don't think it's particularly a "gotcha". HTTP is also stateless, which has lots of benefits for us.

kurtextrem3y ago

Crazy results, really interesting experiment!

MuffinFlavored3y ago

Could the same also be done for Postgre?

mattrobenoltOP3y ago

Anything is possible with computers.

j / k navigate · click thread line to collapse