Better JIT for Postgres (opens in new tab)

(github.com)

155 pointsvladich2mo ago113 comments

113 comments

eru2mo ago

> However, standard LLVM-based JIT is notoriously slow at compilation. When it takes tens to hundreds of milliseconds, it may be suitable only for very heavy, OLAP-style queries, in some cases.

I don't know anything here, but this seems like a good case for ahead of time compilation? Or at least caching your JIT results? I can image much of the time, you are getting more or less the same query again and again?

olau2mo ago

Yes.

Some years ago we ported some code from querying out the data and tallying in Python (how many are in each bucket) to using SQL to do that. It didn't speed up the execution. I was surprised by that, but I guess the Postgres interpreter is roughly the same speed as Python, which when you think about it perhaps isn't that surprising.

But Python is truly general purpose while the core query stuff in SQL is really specialized (we were not using stored procedures). So if Pypy can get 5x speedup, it seems to me that it should be possible to get the same kind of speed up in Postgres. I guess it needs funding and someone as smart as the Pypy people.

eru2mo ago

That's curious. I regularly get speed ups when moving processing from Python to postgres. At least when using indices properly and when the shift reduces the amount of data carried back and forth.

bob10292mo ago

At some level the application needs to participate in the performance conversation too.

https://www.postgresql.org/docs/current/sql-prepare.html

masklinn2mo ago

Postgres’s PREPARE is per-connection so it’s pretty limited, and then connection poolers enter the fray and often can’t track SQL-level prepares.

And then the issue is not dissimilar to Postgres’s planner issues.

hinkley2mo ago

Oracle’s wasn’t but I haven’t used it in a very long time so that may not be longer be true.

The problem though was that it had a single shared pool for all queries and it could only run a query if it was in the pool, which is how out DB machine would max out at 50% CPU and bandwidth. We had made some mistakes in our search code that I told the engineer not to make.

SigmundA2mo ago

Unless you cache query plans like other RDBMS's then the client manually managing that goes away and its not limited to a single connection.

MS SQL still has prepared statements and they really haven't been used in 20 years since it gained the ability to cache plans based on statement text.

sourcegrift2mo ago

We have everything optimized, and yet somehow DB queries need to be "interpreted" at runtime. There's no reason for DB queries to not be precompiled.

jpfr2mo ago

The "byte-code" coming from the query planner typically only has a handful of steps in a linear sequence. Joins, filters, and such. But the individual steps can be very costly.

So there is not much to gain from JITing the query plan execution only.

JITing begins to make more sense, when the individual query plan steps (join, filter, ...) themselves be specialized/recompiled/improved/merged by knowing the context of the query plan.

catlifeonmars2mo ago

This is a neat idea. I want to take it further and precompile the entire DBMS binary for a specific schema.

menaerus2mo ago

Someone is already working on it: https://arxiv.org/pdf/2603.02081

catlifeonmars2mo ago

That looks interesting but it seems inefficient to put an LLM directly into the compilation pipeline, not to mention that it introduces nondeterministic behavior.

1 more reply

WJW2mo ago

How will you handle ALTER TABLE queries without downtime?

catlifeonmars2mo ago

That would definitely present a bit of a challenge, but:

- not all databases need migrations (or migrations without downtime)

- alternatively, ship the migrations as part of the binary

Adhoc modifications would still be more difficult but tbh that’s not necessarily a bug

Asm2D2mo ago

Many SQL engines have JIT compilers.

The problems related to PostgreSQL are pretty much all described here. It's very difficult to do low-latency queries if you cannot cache the compiled code and do it over and over again. And once your JIT is slow you need a logic to decide whether to interpret or compile.

I think it would be the best to start interpreting the query and start compilation in another thread, and once the compilation is finished and interpreter still running, stop the interpreter and run the JIT compiled code. This would give you the best latency, because there would be no waiting for JIT compiler.

aengelke2mo ago

> It's very difficult to do low-latency queries if you cannot cache the compiled code

This is not too difficult, it just requires a different execution style. Salesforce's Hyper for example very heavily relies on JIT compilation, as does Umbra [1], which some people regard as one of the fastest databases right now. Umbra doesn't cache any IR or compiled code and still has an extremely low start-up latency; an interpreter exists but is practically never used.

Postgres is very robust and very powerful, but simply not designed for fast execution of queries.

Disclosure: I work in the group that develops Umbra.

[1]: https://umbra-db.com/

Asm2D2mo ago

If I recall research papers regarding Umbra it's also using AsmJit as a JIT backend, which means that theoretically the compilation times would be comparable if you only consider code emitting overhead.

The problem will always be queries where the compilation is orders of magnitude more expensive than the query itself. I can imagine indexed lookup of 1 or few entries, etc... Accessing indexed entries like these are very well optimized by SQL query engines and possibly make no sense JIT optimizing.

2 more replies

chrisaycock2mo ago

> I think it would be the best to start interpreting the query and start compilation in another thread

This technique is known as a "tiered JIT". It's how production virtual machines operate for high-level languages like JavaScript.

There can be many tiers, like an interpreter, baseline compiler, optimizing compiler, etc. The runtime switches into the faster tier once it becomes ready.

More info for the interested:

https://ieeexplore.ieee.org/document/10444855

hinkley2mo ago

It’s also common for JITs to sprout a tier and shed a tier over time, as the last and first tiers shift in cost/benefit. If the first tier works better you delay the other tiers. If the last tier gets faster (in run time or code optimization) you engage it sooner, or strip the middle tier entirely and hand half that budget to the last tier.

Asm2D2mo ago

I write JITs so I know, but I always try to write in a way that even non-JIT people can understand :)

vladichOP2mo ago

The idea with parallel compilation is interesting. Worth considering, in some cases. The only problem with it is the same as too much parallelization - you can exhaust your CPU resources much faster. But with some sort of smart scheduling it should work. I'll think about it, thanks!

SigmundA2mo ago

Postgresql uses a process per connection model and it has no way to serialize a query plan to some form that can be shared between processes, so the time it takes to make the plan including JIT is very important.

Most other DB's cache query plans including jitted code so they are basically precompiled from one request to the next with the same statement.

zaphirplane2mo ago

What do you mean ? Cause the obvious thing is a shared cache and if there is one thing the writers of a db know it is locking

SigmundA2mo ago

Sharing executable code between processes it not as easy as sharing data. AFAIK unless somethings changed recently PG shares nothing about plans between process and can't even share a cached plan between session/connections.

2 more replies

hans_castorp2mo ago

> and it has no way to serialize a query plan to some form that can be shared between processes

https://www.postgresql.org/docs/current/parallel-query.html

"PostgreSQL can devise query plans that can leverage multiple CPUs in order to answer queries faster."

SigmundA2mo ago

Nothing to do with plan caching, thats just talking about plan execution of parallel operations which is that thread or process based in PG?

If process based then they can send small parts of plan across processes.

1 more reply

levkk2mo ago

See prepared statements.

array_key_first2mo ago

DB queries do get pre compiled and cached if you use prepared statements. This is why you should always use prepared statements if you can.

kbolino2mo ago

It is not always necessary to explicitly use prepared statements, though. For example, the pgx library for Go [1] and the psycopg3 library for Python [2] will automatically manage prepared statements for you.

[1]: https://pkg.go.dev/github.com/jackc/pgx/v5#hdr-Prepared_Stat...

[2]: https://www.psycopg.org/psycopg3/docs/advanced/prepare.html

fabian2k2mo ago

The last time I looked into it my impression was that disabling the JIT in PostgreSQL was the better default choice. I had a massive slowdown in some queries, and that doesn't seem to be an entirely unusual experience. It does not seem worth it to me to add such a large variability to query performance by default. The JIT seemed like something that could be useful if you benchmark the effect on your actual queries, but not as a default for everyone.

pjmlp2mo ago

That is quite strange, given that big boys RDMS (Oracle, SQL Server, DB2, Informix,...) all have JIT capabilities for several decades now.

SigmundA2mo ago

The big boys all cache query plans so the amount it time it take to compile is not really a concern.

vladichOP2mo ago

Postgres caches query plans too, the problem is you can only cache what you can share, and if your planner works well, you can share very little, there can be a lot of unique plans even for the same query

1 more reply

aengelke2mo ago

That's not generally correct. Compile-time is a concern for several databases.

1 more reply

hinkley2mo ago

I’m always surprised to learn LLVM is so slow given that was one of the original motivations for developing it. I don’t know if that’s down to feature creep or intrinsic complexity being higher than people presumed was the case for GCC.

Tanjreeve2mo ago

It's a compiler backend for programming languages not a runtime JIT compiler. Especially inside a DBMS a lot of the assumptions it was built with don't hold. Some people in DBMS world (mostly at TUM with Umbra/CedarDB) have written their own and others tried multi pass approaches where you have an interpreter first then a more optimised LLVM pass later.

hinkley2mo ago

It was intended to solve the problem of interactive coding sessions such as with Language Servers, which GCC utterly fails at (because what we think of as modern IDEs did not exist in 1990).

An awful lot of people have tried to use it as a JIT now and had to backpedal. I'm not sure how the one lead to the other but here we are.

the_biot2mo ago

What sort of things are people doing in their SQL queries that make them CPU bound? Admittedly I'm a meat-and-potatoes guy, but I like mine I/O bound.

Really amazed to see not one but several generic JIT frameworks though, no idea that was a thing.

vladichOP2mo ago

Most databases in practice are sub-terabyte and even sub-100Gb, their active dataset is almost fully cached. For most databases I worked with, cache hit rate is above 95% and for almost all of them it's above 90%. In that situation, most queries are CPU-bound. It's completely different from typical OLAP in this sense.

martinald2mo ago

Anything jsonb in my experience is quickly CPU bound...

jjice2mo ago

Definitely. If you're doing regular queries with filters on jsonb columns, having the index directly on the JSON paths is really powerful. If I have a jsonb filter in the codebase at all, it probably needs an index, unless I know the result set is already very small.

martinald2mo ago

Yeah, the other problem is I've really struggled to have postgres use multiple threads/cores on one query. Often maxes out one CPU thread while dozens go unused. I constantly have to fight loads of defaults to get this to change and even then I never feel like I can get it working quite right (probably operator error to some extent).

This compares to clickhouse where it constantly uses the whole hardware. Obviously it's easier to do that on a columnar database but it seems that postgres is actively designed to _not_ saturate multiple cores, which may be a good assumption in the past but definitely isn't a good one now IMO.

d01002mo ago

I've shaved off 30s of queries by transforming json columns into a string after the first CTE is done with it

wreath2mo ago

I think reading queries that are always served from cache are CPU bound because it also involves locking the buffers etc and there is no I/O involved.

throwaway1401262mo ago

PostgreSQL is Turing complete, so I guess they do what ever they want?

swaminarayan2mo ago

Have you tested this under high concurrency with lots of short OLTP queries? I’m curious whether the much faster compile time actually moves the point where JIT starts paying off, or if it’s still mostly useful for heavier queries.

vladichOP2mo ago

It's not useful for sub-millisecond queries like point lookups, or other simple ones that process only a few records. sljit option starts to pay off when you process (not necessarily return) hundreds of records. The more - the better. I'm still thinking about a caching option, that will allow to lift this requirement somewhat - for cached plans. For non-cached ones it will stay.

masklinn2mo ago

> By default, jit_above_cost parameter is set to a very high number (100'000). This makes sense for LLVM, but doesn't make sense for faster providers. It's recommended to set this parameter value to something from ~200 to low thousands for pg_jitter (depending on what specific backend you use and your specific workloads).

larodi2mo ago

sadly, no windows version yet AFAICT

vladichOP2mo ago

Added Windows (x86_64 for now) support

vladichOP2mo ago

It will be added soon

asah2mo ago

awesome! I wonder if it's possible to point AI at this problem and synthesize a bespoke compiler (per-architecture?) for postgresql expressions?

kvdveer2mo ago

Two things are holding back current LLM-style AI of being of value here:

* Latency. LLM responses are measured in order of 1000s of milliseconds, where this project targets 10s of milliseconds, that's off by almost two orders of magnitute.

* Determinism. LLMs are inherently non-deterministic. Even with temperature=0, slight variations of the input lead to major changes in output. You really don't want your DB to be non-deterministic, ever.

qeternity2mo ago

> LLMs are inherently non-deterministic.

This isn't true, and certainly not inherently so.

Changes to input leading to changes in output does not violate determinism.

magicalhippo2mo ago

> This isn't true

From what I understand, in practice it often is true[1]:

Matrix multiplication should be “independent” along every element in the batch — neither the other elements in the batch nor how large the batch is should affect the computation results of a specific element in the batch. However, as we can observe empirically, this isn’t true.

In other words, the primary reason nearly all LLM inference endpoints are nondeterministic is that the load (and thus batch-size) nondeterministically varies! This nondeterminism is not unique to GPUs — LLM inference endpoints served from CPUs or TPUs will also have this source of nondeterminism.

[1]: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

1 more reply

yomismoaqui2mo ago

Quoting:

"But why aren’t LLM inference engines deterministic? One common hypothesis is that some combination of floating-point non-associativity and concurrent execution leads to nondeterminism based on which concurrent core finishes first."

From https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

1 more reply

simonask2mo ago

> 1000s of milliseconds

Better known as "seconds"...

olau2mo ago

The suggestion was not to use an LLM to compile the expression, but to use an LLM to build the compiler.

j / k navigate · click thread line to collapse

113 comments

eru2mo ago

> However, standard LLVM-based JIT is notoriously slow at compilation. When it takes tens to hundreds of milliseconds, it may be suitable only for very heavy, OLAP-style queries, in some cases.

olau2mo ago

Yes.

eru2mo ago

That's curious. I regularly get speed ups when moving processing from Python to postgres. At least when using indices properly and when the shift reduces the amount of data carried back and forth.

bob10292mo ago

At some level the application needs to participate in the performance conversation too.

https://www.postgresql.org/docs/current/sql-prepare.html

masklinn2mo ago

Postgres’s PREPARE is per-connection so it’s pretty limited, and then connection poolers enter the fray and often can’t track SQL-level prepares.

And then the issue is not dissimilar to Postgres’s planner issues.

hinkley2mo ago

Oracle’s wasn’t but I haven’t used it in a very long time so that may not be longer be true.

SigmundA2mo ago

Unless you cache query plans like other RDBMS's then the client manually managing that goes away and its not limited to a single connection.

MS SQL still has prepared statements and they really haven't been used in 20 years since it gained the ability to cache plans based on statement text.

sourcegrift2mo ago

We have everything optimized, and yet somehow DB queries need to be "interpreted" at runtime. There's no reason for DB queries to not be precompiled.

jpfr2mo ago

The "byte-code" coming from the query planner typically only has a handful of steps in a linear sequence. Joins, filters, and such. But the individual steps can be very costly.

So there is not much to gain from JITing the query plan execution only.

JITing begins to make more sense, when the individual query plan steps (join, filter, ...) themselves be specialized/recompiled/improved/merged by knowing the context of the query plan.

catlifeonmars2mo ago

This is a neat idea. I want to take it further and precompile the entire DBMS binary for a specific schema.

menaerus2mo ago

Someone is already working on it: https://arxiv.org/pdf/2603.02081

catlifeonmars2mo ago

That looks interesting but it seems inefficient to put an LLM directly into the compilation pipeline, not to mention that it introduces nondeterministic behavior.

1 more reply

WJW2mo ago

How will you handle ALTER TABLE queries without downtime?

catlifeonmars2mo ago

That would definitely present a bit of a challenge, but:

- not all databases need migrations (or migrations without downtime)

- alternatively, ship the migrations as part of the binary

Adhoc modifications would still be more difficult but tbh that’s not necessarily a bug

Asm2D2mo ago

Many SQL engines have JIT compilers.

aengelke2mo ago

> It's very difficult to do low-latency queries if you cannot cache the compiled code

Postgres is very robust and very powerful, but simply not designed for fast execution of queries.

Disclosure: I work in the group that develops Umbra.

[1]: https://umbra-db.com/

Asm2D2mo ago

2 more replies

chrisaycock2mo ago

> I think it would be the best to start interpreting the query and start compilation in another thread

This technique is known as a "tiered JIT". It's how production virtual machines operate for high-level languages like JavaScript.

There can be many tiers, like an interpreter, baseline compiler, optimizing compiler, etc. The runtime switches into the faster tier once it becomes ready.

More info for the interested:

https://ieeexplore.ieee.org/document/10444855

hinkley2mo ago

Asm2D2mo ago

I write JITs so I know, but I always try to write in a way that even non-JIT people can understand :)

vladichOP2mo ago

SigmundA2mo ago

Most other DB's cache query plans including jitted code so they are basically precompiled from one request to the next with the same statement.

zaphirplane2mo ago

What do you mean ? Cause the obvious thing is a shared cache and if there is one thing the writers of a db know it is locking

SigmundA2mo ago

2 more replies

hans_castorp2mo ago

> and it has no way to serialize a query plan to some form that can be shared between processes

https://www.postgresql.org/docs/current/parallel-query.html

"PostgreSQL can devise query plans that can leverage multiple CPUs in order to answer queries faster."

SigmundA2mo ago

Nothing to do with plan caching, thats just talking about plan execution of parallel operations which is that thread or process based in PG?

If process based then they can send small parts of plan across processes.

1 more reply

levkk2mo ago

See prepared statements.

array_key_first2mo ago

DB queries do get pre compiled and cached if you use prepared statements. This is why you should always use prepared statements if you can.

kbolino2mo ago

[1]: https://pkg.go.dev/github.com/jackc/pgx/v5#hdr-Prepared_Stat...

[2]: https://www.psycopg.org/psycopg3/docs/advanced/prepare.html

fabian2k2mo ago

pjmlp2mo ago

That is quite strange, given that big boys RDMS (Oracle, SQL Server, DB2, Informix,...) all have JIT capabilities for several decades now.

SigmundA2mo ago

The big boys all cache query plans so the amount it time it take to compile is not really a concern.

vladichOP2mo ago

1 more reply

aengelke2mo ago

That's not generally correct. Compile-time is a concern for several databases.

1 more reply

hinkley2mo ago

Tanjreeve2mo ago

hinkley2mo ago

It was intended to solve the problem of interactive coding sessions such as with Language Servers, which GCC utterly fails at (because what we think of as modern IDEs did not exist in 1990).

An awful lot of people have tried to use it as a JIT now and had to backpedal. I'm not sure how the one lead to the other but here we are.

the_biot2mo ago

What sort of things are people doing in their SQL queries that make them CPU bound? Admittedly I'm a meat-and-potatoes guy, but I like mine I/O bound.

Really amazed to see not one but several generic JIT frameworks though, no idea that was a thing.

vladichOP2mo ago

martinald2mo ago

Anything jsonb in my experience is quickly CPU bound...

jjice2mo ago

martinald2mo ago

d01002mo ago

I've shaved off 30s of queries by transforming json columns into a string after the first CTE is done with it

wreath2mo ago

I think reading queries that are always served from cache are CPU bound because it also involves locking the buffers etc and there is no I/O involved.

throwaway1401262mo ago

PostgreSQL is Turing complete, so I guess they do what ever they want?

swaminarayan2mo ago

vladichOP2mo ago

masklinn2mo ago

larodi2mo ago

sadly, no windows version yet AFAICT

vladichOP2mo ago

Added Windows (x86_64 for now) support

vladichOP2mo ago

It will be added soon

asah2mo ago

awesome! I wonder if it's possible to point AI at this problem and synthesize a bespoke compiler (per-architecture?) for postgresql expressions?

kvdveer2mo ago

Two things are holding back current LLM-style AI of being of value here:

* Latency. LLM responses are measured in order of 1000s of milliseconds, where this project targets 10s of milliseconds, that's off by almost two orders of magnitute.

qeternity2mo ago

> LLMs are inherently non-deterministic.

This isn't true, and certainly not inherently so.

Changes to input leading to changes in output does not violate determinism.

magicalhippo2mo ago

> This isn't true

From what I understand, in practice it often is true[1]:

[1]: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...

1 more reply

yomismoaqui2mo ago

Quoting:

From https://thinkingmachines.ai/blog/defeating-nondeterminism-in...