The problem is that only btree indexes support uniqueness atm. That's the relevant unsupported features, not the ability to have constraints (which essentially just requires uniqueness support of the underlying index):
postgres[27716][1]# SELECT amname, pg_indexam_has_property(oid, 'can_unique') FROM pg_am;
┌────────┬─────────────────────────┐
│ amname │ pg_indexam_has_property │
├────────┼─────────────────────────┤
│ btree │ t │
│ hash │ f │
│ gist │ f │
│ gin │ f │
│ spgist │ f │
│ brin │ f │
└────────┴─────────────────────────┘
(6 rows)
Edit: different uses of word constrain (to constrain, and a constraint) seemed too confusing.Consider e.g. the common implementation where buckets are determined by masking out either the lowest or the highest bits of the hashvalue. If you mask out the highest bits and your values aren't sequential (pretty common), you end up with a lot of collisions. More extremely, if you mask the low bits and shift, if you only have small values everything ends up in the first bucket. Therefore what you want is a hashfunction where a one bit change at "one side" of the input value, is likely to affect most of the remaining bits.
Short version - hash indexes are faster in PG11, but they only apply to "where = foobar" queries, giving a 0(1) time. Btree indexes have O(logn)
But hash indexes can't be applied to range clauses, like "where < 50". You can still use a btree index however.
SO post:
Finally, hash indexes always require that the found row be confirmed in the data table, even for simple existence queries, since the keys themselves aren't stored in the hash table. (This is why hash indexes can't be UNIQUE.) B+trees can often answer such queries without the extra lookup (an "index-only scan"). If your B+tree is so large that its inner nodes spill onto disk (necessitating a 2nd disk seek), chances are the equivalent hash index will as well, which, combined with the consult of the data table, kind of negates the benefit.
A hash index can be a big win for nested loop joins, especially at high concurrency. It is quite common to build a hash table over a subset of the inner table of an equijoin. This is (a) slow to construct and (b) memory-intensive (especially if many of these queries are run concurrently).
With a hash index, a lot of cases that required building a hash table to speed up a query can just use the hash index directly. Furthermore, every concurrent instance of the query can use the same hash index. This is a big win for both performance of a single query (latency) and query scalability.
So hash indexes really only begin to show benefits for individual lookups when your data is terabyte-scale, and below that can even be harmful for that use case (if you could otherwise benefit from an index-only lookup). But see ankrgyl's comment (sibling to this one) for better reasons to consider hash indexes.
I don't think that's really true - even for btree indexes the heap is accessed to check row visibility. That's necessary because postgres doesn't store visibility information in indexes. There's imo no really big problem making hash indexes support uniqueness - it's "just" work.
Probably to avoid returning the wrong result for statistically-inevitable collisions, right?
the link is from the post http://amitkapila16.blogspot.com/2017/03/hash-indexes-are-fa...
Based on the width of the key, the row count and the amount of memory available, there might be instances where a disk-hit gets replaced with an in-memory cache, which is awesome for spinning-disk based systems.
> Based on the observed pattern of searches, MySQL builds a hash index using a prefix of the index key. The prefix of the key can be any length, and it may be that only some of the values in the B-tree appear in the hash index. Hash indexes are built on demand for those pages of the index that are often accessed. > If a table fits almost entirely in main memory, a hash index can speed up queries by enabling direct lookup of any element, turning the index value into a sort of pointer. InnoDB has a mechanism that monitors index searches. If InnoDB notices that queries could benefit from building a hash index, it does so automatically.
I am really excited by some of the PostgresQL developments lately.. in particular with parallel query execution. To the best of my knowledge the only place that currently exists within MySQL (or MariaDB) is MySQL Cluster.
This is what I call trying things "at Indian scale" :D