It does have one surprising limitation: it calculates relevance based on just the current row, rather than being able to take statistics across the whole corpus into account.
Most search engines use TF/IDF or BM25 for relevance calculations, which consider the relative common-ness of terms in comparison to the rest of the corpus. PostgreSQL FTS can't do that as far as I know.
SQLite's built-in FTS CAN do relevance calculations like this! Surprising to see a feature as significant as that show up in SQLite but not in PostgreSQL.
Ahhh. That's what's SQLite's "bm25()" function is for.
Was white listing the SQLite FTS functions a few days ago for DBHub.io (eg so people can use them), but don't really understand (yet) how most of them are actually used. Some day I'll get around to learning about them properly. :)
It’s fantastic having 1 less dependency though!
One caveat- if you migrate any field in the indexed table, you will likely have to drop and recreate all your triggers and stored procs again
I find it dishonest to call it "full text search" whereas it's actually just "English/Indo-European full text search" that uses language-specific features to achieve its goals.
Instead of pretending to have solved the string searching problem by using "language hacks", I'd really like see an open source database that provides easy to use interfaces to suffix trees instead.
The even more infuriating thing is that apparently some databases actually do have suffix tree implementations, but because of assumptions that the data is English/European, other languages work half-assedly on it.
Imagine i18n implications for projects that are based on them. And the users would have no clue how f*cked up things are.
I've never implemented full text search using a dedicated database so maybe 1M is too much to ask for.
[0] https://www.reddit.com/r/programming/comments/12yhhcg/commen...