Index Merges vs. Composite Indexes in Postgres and MySQL (opens in new tab)

(sirupsen.com)

175 pointsSirupsen3y ago38 comments

38 comments

A composite index can also be used for a partial index scan[1], so if you're frequently looking for just int100 as well as the int100,int1000 combination (as per the article's example), then a composite index int100,int1000 can be used for queries just filtering on int100.

The order of the columns in the composite index might matter then. We got some nice savings by reordering columns in indexes and changing the join order in queries (our DB wasn't that smart) or adding a "useless" filters to the where clause, allowing us to consolidate multiple indexes into one composite one.

[1]: might be using the wrong terminology here, I'm not talking about a partial index[2], but using only the first N dimensions of a M-dimensional index.

[2]: https://www.postgresql.org/docs/current/indexes-partial.html

winrid3y ago

There are some DBs where the order of fields doesn't matter in the compound index if you're just searching by one field (but of course is less efficient). I think Oracle supports this.

dspillett3y ago

All can search an index by just one field if it is the first field covered by the index, few can use an index if your predicate concerns the second and not the first.

IIRC Oracle does support this and calls it a skip-scan. If the first column in the index is not very selective this can be very efficient, though if the first column is good in terms of selectivity then a skip-scan will be very inefficient and the DB might as well just scan the whole index. Given that a good choice of index usually has a fairly selective column as its first, and that unless storage space is very cramped you are likely better off just defining an extra index without the column that you want to not touch in some queries, this means that a skip-scan isn't helpful as often as one might think which is one of the reasons DB engine maintainers give for not spending time implementing and maintaining the feature.

There are some use cases where having the option to skip-scan is definitely a significant bonus, but they are rare enough that I understand why most engine maintainers have not implemented them.

winrid3y ago

Yes, I'm talking about skip-scan. I know pretty much all DBs can use compound index predicates for querying :)

I can think of several cases where it would have been nice to have, sometimes "good enough" for a particular read pattern is better than additional write overhead etc.

1 more reply

thom3y ago

Postgres is capable of doing this with some index types, but generally more slowly than a B-tree index being asked about its leftmost columns:

https://www.postgresql.org/docs/current/indexes-multicolumn....

Also worth noting that for very complex workloads that need to support arbitrary subsets of equality matches over columns, a Bloom filter might work best:

https://www.postgresql.org/docs/current/bloom.html

mattashii3y ago

This dependz on the index type, but PostgreSQL also supports queries based on arbitrary attributes of multi-attribute indexes: both GIN and BRIN can be used for queries on any of the indexed expressions.

Maybe eventually PG will also get index skip scans for btrees, which could allow for arbitrary columns searches in the btree; but we're not there yet by a large margin.

SirupsenOP3y ago

MySQL can do skip-scans

https://dev.mysql.com/doc/refman/8.0/en/range-optimization.h...

winrid3y ago

Nice, I did not know MySQL supported this. Someday I'll work with MySQL again, good to know.

magicalhippo3y ago

Good point, I reworded it to be more general. Can't recall seeing our DB being that clever.

chrsig3y ago

> (but of course is less efficient)

This means the order does matter

winrid3y ago

Yes...?

Sorry my comment is not semantically accurate enough for you.

arynda3y ago

Comparison on Clickhouse, also runs in about 30-40ms, however there's no indexing being used and this is a full-table scan.

    create table if not exists test_table
    (
        id UInt64,
        text1 String,
        text2 String,
        int1000 UInt64,
        int100 UInt64,
        int10 UInt64,
        int10_2 UInt64
    )
    engine = MergeTree()
    order by (id)
    ;
    
    insert into test_table
    with
      repeat('b', 1024) as one_kib,
      repeat('b', 255) as bytes_255
    
      select
        number as id,
        one_kib,
        bytes_255,
        rand() % 1000 as int1000,
        rand() % 100 as int100,
        rand() % 10 as int10,
        rand() % 10 as int10_2
      from numbers(10e6)
    ;
  
  
  > select count(*) from test_table where int1000 = 1 and int100 = 1;
  
  ┌─count()─┐
  │    9949 │
  └─────────┘
  
  1 row in set. Elapsed: 0.034 sec. Processed 10.00 million rows, 160.00 MB (290.93 million rows/s., 4.65 GB/s.)

The same table but with 1B rows instead, runs in ~1800ms

  > select count(*) from test_table where int1000 = 1 and int100 = 1;

  ┌─count()─┐
  │  999831 │
  └─────────┘

  1 row in set. Elapsed: 1.804 sec. Processed 1.00 billion rows, 16.00 GB (554.24 million rows/s., 8.87 GB/s.)

[1] Converted the table create and insert logic from here: https://github.com/sirupsen/napkin-math/blob/master/newslett...

hodgesrm3y ago

> however there's no indexing being used and this is a full-table scan.

That first steatement about "no indexing being used" is not quite correct if the query is run exactly as you show in your nice example.

ClickHouse performs what is known as PREWHERE processing which will effectively use the int1000 and int100 columns as indexes. It scans those columns and knocks out any blocks (technically granules containing by default 8192 rows) that do not values that match the filter conditions. It then performs a scan on the remaining blocks to get the actual counts.

PREWHERE is effective because columns are compressed and scans are fast. If there's any pattern to the filter columns (for example monotonically increasing counters) or their values have high cardinality PREWHERE processing will remove a large number of blocks. This will make the rest of the scan far faster.

In your dataset it may not be especially efficient because you use random values, which don't necessarily compress well, and the values will appear in many blocks. It works much better in real datasets where data are more correlated.

EDIT: PREWHERE is much faster in cases where you are doing more complex aggregation on many columns. Counts of course don't need to scan any extra values so it's not helpful in this case.

p.s. Scans are ridiculously fast.

paulmd3y ago

> p.s. Scans are ridiculously fast.

this is really the lesson of SOLR. full-scan all the things, aggregate as you go, broadcast disk IO to multiple listeners.

why do a bunch of 4K random IO when you could full-scan at bus speed? yeah you can make the 4K random IO super fast but that's not where hardware is going, and it's also scalable/clusterable/shardable where RDBMS caps out at one machine and clustering is kinda ugly.

yxhuvud3y ago

Huh? That is exactly where hardware is going. What has been missing is the parts in between, the ability to emit enough random IO i parallel to saturate the interfaces.

1 more reply

arynda3y ago

> ClickHouse performs what is known as PREWHERE processing > p.s. Scans are ridiculously fast.

Good point, I should have mentioned this was basically a worst-case scenario for Clickhouse as the data layout is totally random (same approach as OP used in their benchmark) and isn't able to utilize any granule pruning, sorting, or skip indexing, but is still able to achieve such remarkable speeds.

hodgesrm3y ago

What's cool is that even in this case ClickHouse is still stupid fast compared to most other databases. ;)

stingraycharles3y ago

Out of curiosity:

> It scans those columns and knocks out any blocks (technically granules containing by default 8192 rows) that do not values that match the filter conditions

How is that not just a sequence scans? Of course it pre-emptively filters away entire blocks that do not contain the data, but indexes typically work differently: they’re calculated upon write, so that they can be queried really fast.

Is there a detail missing here, e.g. like bloom filters being used or something else that makes it different from a regular sequence scan?

1 more reply

Nican3y ago

I know nothing about ClickHouse, but how many cores are being used for these queries? And what is the core processing frequency?

I would think ClickHouse is tuned for analytics workloads, so it will throw plenty of cores at the problem, and not care much for the overhead. Meanwhile, I believe PostgreSQL is more tuned to transactional workloads, where it will not pay the query parallelism overhead, but optimize for multiple parallel workloads.

SirupsenOP3y ago

Are you aware of a good write-up on how Clickhouse/other columnar databases do the intersection?

hodgesrm3y ago

ClickHouse uses a single primary key index, which matches the sort order, plus skip indexes, which knock out blocks to scan. Here's a writeup that explains skip indexes.

https://altinity.com/blog/clickhouse-black-magic-skipping-in...

You can also check out the following webinar, which explains how ClickHouse indexes work in general. Here's a link to the discussion of indexes.

https://youtu.be/1TGGCIr6dMY?t=1933

p.s. The blog article is missing some images that WordPress seems to have lost but you'll still get the idea. (Should be fixed shortly.)

Disclaimer: I work for Altinity

1 more reply

arynda3y ago

Not in particular sorry, most of the good content I've found is on Altinity [1] and Alibaba's technical blogs [2][3]. These tend to be mostly focused on how the data itself is stored and how to use Clickhouse, but don't really dive into the specifics of how query processing is performed.

[1] https://altinity.com/blog/

[2] https://www.alibabacloud.com/blog/clickhouse-kernel-analysis...

[3] https://www.alibabacloud.com/blog/clickhouse-analysis-of-the...

twoodfin3y ago

One obvious way is to build a bitmap indexed by row position for each filter. Both the "&" intersect and the final bit count can be rocket fast on modern CPU vector units.

pdhborges3y ago

  For these 64-bit index entries we’d expect to have to scan roughly:
  index_row_size⋅rows=2⋅64bit⋅10^5=1.5MiB

Where do the 10^5 rows come from? With a composite index and a point query doesn't the database scan just the 100 returned rows?

SirupsenOP3y ago

You're absolutely right!

I forgot to move this around when I updated the article's structure. This is only relevant when doing the index merge. The article has been updated

Lukas19943y ago

Good stuff! What's the size difference between the composite index vs the two separate indices?

SirupsenOP3y ago

I've added this to the article, thanks!

Composite index (int64, int64): ~70 MiB in Postgres, ~350 MiB in MySQL

Single index (int64): ~70 MiB in Postgres, ~240 MiB in MySQL

If you assume the majority of an index are index entries of (int64, int64, int64) where the third number is some sort of identifier for the record on the heap, you'd expect this to be ~230 MiB. So Postgres does some compression very well here, and MySQL has a bit more overhead for its indexes it seems.

fabian2k3y ago

Could be the deduplication newer Postgres versions have for B-Tree indexes:

https://www.postgresql.org/docs/current/btree-implementation...

> 67.4.3. Deduplication

> A duplicate is a leaf page tuple (a tuple that points to a table row) where all indexed key columns have values that match corresponding column values from at least one other leaf page tuple in the same index. Duplicate tuples are quite common in practice. B-Tree indexes can use a special, space-efficient representation for duplicates when an optional technique is enabled: deduplication.

> Deduplication works by periodically merging groups of duplicate tuples together, forming a single posting list tuple for each group. The column key value(s) only appear once in this representation. This is followed by a sorted array of TIDs that point to rows in the table. This significantly reduces the storage size of indexes where each value (or each distinct combination of column values) appears several times on average. The latency of queries can be reduced significantly. Overall query throughput may increase significantly. The overhead of routine index vacuuming may also be reduced significantly.

SirupsenOP3y ago

Excellent, thank you! I'll add that to the article.

1 more reply

masklinn3y ago

An other thing which might be of interest: what if you use convering indexes for the two single indexes? Is postgres able to do an index-only scan then, thanks to the coverage?

Or does it decide to use only one index and filter based on the covered values?

libraryofbabel3y ago

As you mention in the article, this is small by modern hardware standards and means the indexes can be entirely held in memory. Curious how things look for much larger tables where the index has to be read from disk?

bawolff3y ago

Honestly im kind of surprised that they are even that close. I wonder if this changes at scale when the intersection is larger.

SirupsenOP3y ago

Ideally, I would add three graphs to the post:

(1) Table size on the x-axis, and time on the y-axis for index merge vs composite index

(2) Number of columns on the x-axis, and time on the y-axis for both

(3) Number of final matches on the x-axis, and time on the y-axis for both

But ran out of time and decided to test with the table size of 10M rows, and a 100-ish result set. That's in my experience a decent representation for what you might be doing with a relational database.

scotty793y ago

Isn't the cause of the difference between MySQL and PostgreSQL in this particular case is that COUNT is (was?) implemented weirdly in PostgreSQL and was always slower than in MySQL?

Can PostgreSQL respond to any query straight from index without touching the rows if the index contains all necessary fields?

j / k navigate · click thread line to collapse

38 comments

magicalhippo3y ago

[1]: might be using the wrong terminology here, I'm not talking about a partial index[2], but using only the first N dimensions of a M-dimensional index.

[2]: https://www.postgresql.org/docs/current/indexes-partial.html

winrid3y ago

There are some DBs where the order of fields doesn't matter in the compound index if you're just searching by one field (but of course is less efficient). I think Oracle supports this.

dspillett3y ago

All can search an index by just one field if it is the first field covered by the index, few can use an index if your predicate concerns the second and not the first.

There are some use cases where having the option to skip-scan is definitely a significant bonus, but they are rare enough that I understand why most engine maintainers have not implemented them.

winrid3y ago

Yes, I'm talking about skip-scan. I know pretty much all DBs can use compound index predicates for querying :)

I can think of several cases where it would have been nice to have, sometimes "good enough" for a particular read pattern is better than additional write overhead etc.

1 more reply

thom3y ago

Postgres is capable of doing this with some index types, but generally more slowly than a B-tree index being asked about its leftmost columns:

https://www.postgresql.org/docs/current/indexes-multicolumn....

Also worth noting that for very complex workloads that need to support arbitrary subsets of equality matches over columns, a Bloom filter might work best:

https://www.postgresql.org/docs/current/bloom.html

mattashii3y ago

Maybe eventually PG will also get index skip scans for btrees, which could allow for arbitrary columns searches in the btree; but we're not there yet by a large margin.

SirupsenOP3y ago

MySQL can do skip-scans

https://dev.mysql.com/doc/refman/8.0/en/range-optimization.h...

winrid3y ago

Nice, I did not know MySQL supported this. Someday I'll work with MySQL again, good to know.

magicalhippo3y ago

Good point, I reworded it to be more general. Can't recall seeing our DB being that clever.

chrsig3y ago

> (but of course is less efficient)

This means the order does matter

winrid3y ago

Yes...?

Sorry my comment is not semantically accurate enough for you.

arynda3y ago

Comparison on Clickhouse, also runs in about 30-40ms, however there's no indexing being used and this is a full-table scan.

    create table if not exists test_table
    (
        id UInt64,
        text1 String,
        text2 String,
        int1000 UInt64,
        int100 UInt64,
        int10 UInt64,
        int10_2 UInt64
    )
    engine = MergeTree()
    order by (id)
    ;
    
    insert into test_table
    with
      repeat('b', 1024) as one_kib,
      repeat('b', 255) as bytes_255
    
      select
        number as id,
        one_kib,
        bytes_255,
        rand() % 1000 as int1000,
        rand() % 100 as int100,
        rand() % 10 as int10,
        rand() % 10 as int10_2
      from numbers(10e6)
    ;
  
  
  > select count(*) from test_table where int1000 = 1 and int100 = 1;
  
  ┌─count()─┐
  │    9949 │
  └─────────┘
  
  1 row in set. Elapsed: 0.034 sec. Processed 10.00 million rows, 160.00 MB (290.93 million rows/s., 4.65 GB/s.)

The same table but with 1B rows instead, runs in ~1800ms

  > select count(*) from test_table where int1000 = 1 and int100 = 1;

  ┌─count()─┐
  │  999831 │
  └─────────┘

  1 row in set. Elapsed: 1.804 sec. Processed 1.00 billion rows, 16.00 GB (554.24 million rows/s., 8.87 GB/s.)

[1] Converted the table create and insert logic from here: https://github.com/sirupsen/napkin-math/blob/master/newslett...

hodgesrm3y ago

> however there's no indexing being used and this is a full-table scan.

That first steatement about "no indexing being used" is not quite correct if the query is run exactly as you show in your nice example.

EDIT: PREWHERE is much faster in cases where you are doing more complex aggregation on many columns. Counts of course don't need to scan any extra values so it's not helpful in this case.

p.s. Scans are ridiculously fast.

paulmd3y ago

> p.s. Scans are ridiculously fast.

this is really the lesson of SOLR. full-scan all the things, aggregate as you go, broadcast disk IO to multiple listeners.

yxhuvud3y ago

Huh? That is exactly where hardware is going. What has been missing is the parts in between, the ability to emit enough random IO i parallel to saturate the interfaces.

1 more reply

arynda3y ago

> ClickHouse performs what is known as PREWHERE processing > p.s. Scans are ridiculously fast.

hodgesrm3y ago

What's cool is that even in this case ClickHouse is still stupid fast compared to most other databases. ;)

stingraycharles3y ago

Out of curiosity:

> It scans those columns and knocks out any blocks (technically granules containing by default 8192 rows) that do not values that match the filter conditions

Is there a detail missing here, e.g. like bloom filters being used or something else that makes it different from a regular sequence scan?

1 more reply

Nican3y ago

I know nothing about ClickHouse, but how many cores are being used for these queries? And what is the core processing frequency?

SirupsenOP3y ago

Are you aware of a good write-up on how Clickhouse/other columnar databases do the intersection?

hodgesrm3y ago

ClickHouse uses a single primary key index, which matches the sort order, plus skip indexes, which knock out blocks to scan. Here's a writeup that explains skip indexes.

https://altinity.com/blog/clickhouse-black-magic-skipping-in...

You can also check out the following webinar, which explains how ClickHouse indexes work in general. Here's a link to the discussion of indexes.

https://youtu.be/1TGGCIr6dMY?t=1933

p.s. The blog article is missing some images that WordPress seems to have lost but you'll still get the idea. (Should be fixed shortly.)

Disclaimer: I work for Altinity

1 more reply

arynda3y ago

[1] https://altinity.com/blog/

[2] https://www.alibabacloud.com/blog/clickhouse-kernel-analysis...

[3] https://www.alibabacloud.com/blog/clickhouse-analysis-of-the...

twoodfin3y ago

One obvious way is to build a bitmap indexed by row position for each filter. Both the "&" intersect and the final bit count can be rocket fast on modern CPU vector units.

pdhborges3y ago

  For these 64-bit index entries we’d expect to have to scan roughly:
  index_row_size⋅rows=2⋅64bit⋅10^5=1.5MiB

Where do the 10^5 rows come from? With a composite index and a point query doesn't the database scan just the 100 returned rows?

SirupsenOP3y ago

You're absolutely right!

I forgot to move this around when I updated the article's structure. This is only relevant when doing the index merge. The article has been updated

Lukas19943y ago

Good stuff! What's the size difference between the composite index vs the two separate indices?

SirupsenOP3y ago

I've added this to the article, thanks!

Composite index (int64, int64): ~70 MiB in Postgres, ~350 MiB in MySQL

Single index (int64): ~70 MiB in Postgres, ~240 MiB in MySQL

fabian2k3y ago

Could be the deduplication newer Postgres versions have for B-Tree indexes:

https://www.postgresql.org/docs/current/btree-implementation...

> 67.4.3. Deduplication

SirupsenOP3y ago

Excellent, thank you! I'll add that to the article.

1 more reply

masklinn3y ago

An other thing which might be of interest: what if you use convering indexes for the two single indexes? Is postgres able to do an index-only scan then, thanks to the coverage?

Or does it decide to use only one index and filter based on the covered values?

libraryofbabel3y ago

bawolff3y ago

Honestly im kind of surprised that they are even that close. I wonder if this changes at scale when the intersection is larger.

SirupsenOP3y ago

Ideally, I would add three graphs to the post:

(1) Table size on the x-axis, and time on the y-axis for index merge vs composite index

(2) Number of columns on the x-axis, and time on the y-axis for both

(3) Number of final matches on the x-axis, and time on the y-axis for both

scotty793y ago

Isn't the cause of the difference between MySQL and PostgreSQL in this particular case is that COUNT is (was?) implemented weirdly in PostgreSQL and was always slower than in MySQL?

Can PostgreSQL respond to any query straight from index without touching the rows if the index contains all necessary fields?

j / k navigate · click thread line to collapse