Build a search engine, not a vector DB (opens in new tab)

(blog.elicit.com)

241 pointsstuhlmueller2y ago82 comments

82 comments

65 comments · 22 top-level

I agree too. My impression is that almost all RAG tutorials _only_ talk about vector DBs, when these are not strictly required for Retrieval Augmented Generation. I'm guessing vector DBs are useful when you have massive amounts of documents on diverse topics.

Some gotchas I experienced (but I might be using the wrong embedding/vector DB: spaCy/FAISS):

- Short user questions might result a low signal query vector, e. g. user : "Who is Keanu Reeves?" -> false positives on Wikipedia articles which only contain "Who is"

- Typos and formatting affects the vectorization, a small difference might lead to a miss, e.g. "Who is Keanu Reeves?" -> match, "Who is keanu Reeves?" -> no match, no match with any other capitalization.

If there's only a single document, a simple keyword search might lead to better results.

In my experience, false positives (retrieving an irrelevant text and generating completely wrong answer) are a bigger problem than negatives (not retrieving text, possibly can't answer question).

Has somebody experience with Apache Lucene / Solr or Elasticsearch?

m-i-l2y ago

> "Has somebody experience with Apache Lucene / Solr or Elasticsearch?"

I've been working on a RAG with Solr, and quickly hit some of the issues you describe when dealing with real-world messy data and user input, e.g. using all-MiniLM-L6-v2 and cosine similarity, "Can you summarize Immanuel Kant's biography?" matched a chunk containing just the word "Biography" rather than one which started "Immanuel Kant, born in 1724...", and "How high is Ben Nevis?" matched a chunk of text about someone called Benjamin rather than a chunk about mountains containing the words "Ben Nevis" and its height[0]. Switching embedding model has helped, but still not convinced that vector search alone is the silver bullet some claim it is. Still lots more to try though, e.g. hybrid search[1], query expansion[2], knowledge graphs etc.

[0] https://www.michael-lewis.com/posts/vector-search-and-retrie...

[1] https://sease.io/2023/12/hybrid-search-with-apache-solr.html

[2] https://news.ycombinator.com/item?id=38706913

bodantogat2y ago

Exactly in the same place as you with Elastic Search (8.11). Went down the vector path to get better matches for adjectives, verbs and negations ( "room with no skylight" vs. "room with skylights" & "room with a large skylight"). Different dataset obviously, but I think I get slightly better results than your examples and it might be worth looking for a different sentence transformer (I tried a few and settled on roberta-base-nli-stsb-mean-tokens).

1 more reply

hobofan2y ago

If you know that your search queries will be actual questions (like in the example you listed), you can possibly use the HyDE[0] to create a hypothetical answer which will usually have an embedding that's closer to the RAG chunks you are looking for.

It has the downside that an LLM (rather than just a embedding model) is used in the query path, but it has helped me multiple times in the past to strongly reduce problems with RAG like the ones you outlined, where it likes to latch onto individual words.

[0]: https://arxiv.org/abs/2212.10496

1 more reply

bryanrasmussen2y ago

Lucene supports decompounding and stemming, https://core.ac.uk/reader/154370300 depending on the language decompounding can be very important or of little import, Germanic languages should probably have decompounding.

toasted-subs2y ago

I wonder what the advantage/disadvantages of dedicated search tools like lucent instead of custom LLC.

mnd9992y ago

Neo4j are mixing vector embeddings with knowledge graphs - https://neo4j.com/generativeai/

WhitneyLand2y ago

Ignoring the disclosure etiquette here, then making an irrelevant rebuttal about relevance when the point was disclosure, then getting snarky with the person who tried to helpfully point it out?

I have no opinion on your products or your post, but some % of people steer away from companies for such things.

hobofan2y ago

It's generally good etiquette around here to disclose your affiliation if you post comments that advertise the products of your employer.

1 more reply

geoduck142y ago

Wow. I learned some stuff about etiquette on HN today.

I'll support you, mnd999. I don't work for a graph dB company. We don't use graph dBs, but I'm considering it. Graph dbs are a legitimate source to feed data I to your RAG system. Our RAG system currently used hybrid search: lexical and semantic. We need to expand our sources, too. I would like to see us use LLMs to rephrase our content (we have a lot of code), and index on that. I think we should build a KG on content quality (we have millions of docs) and software out the things no one likes.

I also think a KG on "learning journeys" would be valuable, but really difficult.

softwaredoug2y ago· 4 in thread

I feel like we're passing the peak of a vector db hype cycle, where its increasingly clear its one retrieval strategy next to full-text search strategies. I constantly talk to people trying to build RAG and they realize they need a full-text search solution, and a number of strategies, VERY dependent on the task you want your chat system to accomplish.

It's important we get through the trough of disillusionment quickly. There's a lot of market education needed to know when they're truly needed.

themanmaran2y ago

I fell into this trap as well. Started pretty hyped about vector dbs as the "magical crtl+f". Realized I needed some keyword matching as well. And also some transforms to get the right format for vector search. And also multiple chunking strategies for more fidelity search.

A month in I realize I'm trying to reinvent a search engine. Kinda wonder if I should have just used something like elasticsearch instead.

Linell2y ago

It's worth saying that ES can work as a vector store itself, so it's very easy to handle a couple different kinds of searches this way.

finikytou2y ago

full text search is also overhyped. at the end you querying a KB just like in the 90s. the major difference is the scale of the model and the fact that he can make assumptions with a tone that would make you believe what is he saying is a fact

softwaredoug2y ago

I disagree it’s “overhyped”. I feel like there’s a fairly correct understanding in the market of its uses and limitations. That hype cycle occurred decades ago

avereveard2y ago· 3 in thread

Agree fully, vector search in embedding space is insufficient if you are working wirh a single document domain (i.e. They are all fish restaurant menu) and then the only thing that can save you is text search. Just make sure the underlying database supports synonyms lists and normalization in the languages you plan using.

About the "bad news" section.

You can do that today by just asking the llm using the ReAct pattern. Give it the database schema, a few shots prompt, and will happily decide to build query, read titles, and do more query if the titles aren't relevant enough, and fetch the content of titles that are relevant and use those to form an opinion.

This may not sem fast, but there are 7b token models that can do it today, at 150+token/second.

VivaLaPanda2y ago

I think a model could do some basic eval but there are too many hidden assumptions for it to do especially well.

barrenko2y ago

please elaborate, thanks.

avereveard2y ago

this is an example: https://platform.openai.com/playground/p/HpFda4ZRXjbbanBwG35...

it's a ReAct loop with search and retrieve action, where I'm simulating the tool by hand. in prod, you'd pick up the output of the Action, run the callback with the LLM input, get the result, and pass the result as 'Observation:' - for the sake of this demo, I'm doing exactly that but manually copy pasting out of wikipedia

works more or less with any backend, and the llm is smart enough to change direction if a search doesn't produce relevant result (and you can see it in the demo). here the loop is cut short because I was running manually, but you can see the important bits.

just implement a retrieve and search function to whatever data source you have, vector or full text, and a couple regex to extract actions and final answer.

pro tip use a expensive llm to run the react loop, and a cheaper llm to summarize articles content after retrieval and before putting it as an observation. ideally you'd want something like "this is a document {document} on this topic: {last_thought}, extract the information relevant to the user question: {question}" trough a cheap llm, so you have the least amount of token into the react loop.

bambax2y ago· 9 in thread

Many, many big companies don't see any value in search. They simply use the defaults, and when those defaults are abysmal (like in the case of Confluence for example), well... they just suffer through it in silence.

I have so far mostly failed in trying to explain 1/ why search matters and 2/ that not all "search" functionality are equal and that building good search is an art form.

marginalia_nu2y ago

> I have so far mostly failed in trying to explain 1/ why search matters and 2/ that not all "search" functionality are equal and that building good search is an art form.

Yeah, it takes an absurd amount of tuning to make search work well. Given how poorly the average search field works in almost anything, it's fair to say this crucial step isn't happening.

I suspect a lot of organizations just don't have workflows that would tolerate someone spending a month tweaking search algorithm parameters. It doesn't look enough like work.

andai2y ago

Doesn't look like work, yet tragically, incremental improvements to "frictionlessness" represent order-of-magnitude improvements to user experience.

1 more reply

PaulHoule2y ago

I went through a phase where I spoke to people who develop numerous enterprise search engines (e.g. OpenText) out of about 20 interviews I think I found one that did actual evaluation work on their search engine. The rest of them figured it was more important to have 300+ 'integrations' to various data sources and didn't think the relevance of the results was much of a selling point.

1 more reply

ankit2192y ago

I can relate. I have had conversations about enterprise search and how it can help them especially when done with the help of embeddings + LLMs, but many do not see it as a problem. It's a classic case of people you would be selling to have hired analysts for the use case, and do not see it as a prominent problem anymore. Employees would like better search, but not as much that they would go to CTOs and vouch for it.

You can use analogies like:

1. Imagine the world before Google. Web search was a pain. <<Search for your company>> would be similarly transformative.

2. Every company has an encyclopedia - the guy who knows about the past efforts and is consulted whenever people are trying something new. Search makes that redundant and reduce times.

3. Same with repetitive work because the employees cannot find where the work was done previously.

search is a feature, and unless you address the central pain point that search solves (in terms of revenue), no one will go for it. When you do, you will end up solving the second problem about how leaders never have the issue but employees do.

Grimblewald2y ago

it may still not work, but try explaining using flashy analogies. For example, the internet without search algorithms is not the economic powerhouse we know it as today, and the quality of search made companies like google the giants they are. All this is because of the enormous economic impact good search has, say a user must make just 5 searches a day, but this turns into 20 because of poor search results, resulting in re-querying in an attempt to turn up the right result, multiply that wasted time by all employees and at face value you're costing yourself an enormous amount of money as a company, not to mention the compounding loss due to workflow interruption. With a graph or two you should be able to convince most of the fact good search = massive productivity gain.

Tomte2y ago

"we have no stemming support in Confluence" goes far beyond unfortunate defaults.

marginalia_nu2y ago

I didn't understand why Confluence's search engine works so poorly before I built my own search engine, and I especially don't understand why it works so poorly after. It's an absolute mystery and goes far beyond misconfiguration. Feels like they're just using a binary index and completely the skipping relevance ranking.

sgift2y ago

Which is the height of bullshit since Confluence uses Lucene internally, which obviously does support stemming (at least it didn't. Luckily, I haven't had to use Confluence for ages). Confluence search is what happens when some dev gets told "hey, add search, we need to mark a checkbox", searches for 30s for "Java search lib" and just adds Lucene without knowing anything about it.

1 more reply

dumbfounder2y ago

Good luck! I exited the search game because I felt it was a race to the bottom. Elastic was super successful, and has basically made search a commodity, but it's a shitty quality commodity. Developers just throw the data in and call it a day. Relevance is the hard part, and always has been, otherwise we would all still be using AltaVista and Inktomi. LLMs are changing the game though, and real innovation is now happening in search. I want back in.

bioxept2y ago· 4 in thread

It seems to me that the buzz-word "vector db" leads to people not fully understanding what it's actually about and how it even relates with LLMs. Vector databases or nearest neighbor algorithms (as they were called before) were already in use for lots of other tasks not related to language processing. If you look at them from that perspective, you will naturally think of vector dbs as just another way of doing plain old search. I hope we get some more advancements in hybrid search. Most of the times, search is the limiting factor when doing RAG.

james-revisoai2y ago

Good points... In many ways, before LLMs, vectors were getting so exciting, Sentence Transformers and BERT embeddings felt so instrumental, so powerful... work by the txtai author (especially things like semantic walking) felt incredible and like the next evolution. It's a shame in a way that all the creative and brilliant uses of text embeddings from similarity embeddings didn't really have any time to shine or go into product before ChatGPT made so much except search use cases obsolete..

Der_Einzige2y ago

Btw - I published a paper at EMNLP with the txtai author (David) about using semantic graphs for automatic creation of debate cases!

https://aclanthology.org/2023.newsum-1.10/

Happy to see that David's excellent work is getting the love that it deserves!

dmezzetti2y ago

Thanks for the nice words on txtai. There have been times this year I've thought about an alternate 2023 where the focus wasn't LLMs and RAG.

ChatGPT certainly set the tone for the year. Though I will say you haven't heard the last of semantic graphs, semantic paths and some of that work that did happen in late 2022 right before ChatGPT. A bit of a detour? Yes. Perhaps the combination is something that will lead to features even more interesting - time will tell.

charcircuit2y ago

>It's a shame in a way that all the creative and brilliant uses of text embeddings from similarity embeddings didn't really have any time to shine or go into product before ChatGPT

Yes, it did. Companies that offer competitive search or recommendation feeds were all using these text models in production.

1 more reply

deckar012y ago· 5 in thread

Instead of embedding the user prompt, I let the LLM invert it into keywords and search the embedding of that. It very much does feel like a magic bullet.

danielbln2y ago

Using the LLM to mutate the user query is the way to go. A common practice for example to take the chat history of a chat, and rephrase a follow up question that might not have a lot of information density (e.g. follow up question is "and then what?" which is useless for search, but the LLM turns it into "after a contract cancellation, what steps have to be taken afterwards" or something similar, which provides a lot more meat to search with.

Using the LLM to mutate the input so it can be used better for search is a path that works very well (ignoring added latency and cost).

kristiandupont2y ago

"Search the embedding"? Could you elaborate on this, it sounds interesting!

CGamesPlay2y ago

I think OP means to filter the user input through an LLM with “convert this question into a keyword list” and then calculating the embedding of the LLM’s output (instead of calculating the embedding of the user input directly). The “search the embedding” is the normal vector DB part.

1 more reply

sroussey2y ago

Ask the LLM to summarize the question, then take an embedding of that.

I think you can do the same with data you store… summarize it to same number of tokens, then get an embedding for that to save with the original text.

Test! Different combinations of summarizing LLM and embedding generation LLM can get different results. But once you decide, you are locked in the summarizer as much as the embedding generator.

Not sure is this is what the parent meant though.

2 more replies

poulpy1232y ago

I'm sure it would be possible to fine tune a LLM like mistral to search a database or a document

politelemon2y ago· 1 in thread

> you could have a language model construct a query that includes a date filter.

But be careful because the output is not guaranteed. Which means you have to take care to provide the schema and what you're trying to do within the context window, and validate the output. There is a non-trivial overhead to this.

VivaLaPanda2y ago

OAI function calling can solve this more or less

lysecret2y ago

Couldn't agree more. To give an example, to go beyond a simple "generic" search.

I have a company finding buyers for commercial real estate. One of the search features are the locations of the buyers (usually family offices etc, always companies they have headquarters, preferences on where to buy etc.). You can then for example calculate the distance to those locations.

LLMs are extremely useful in creating these features from unstructured info on the companies. But just throwing an embedding on this and hoping it works doesn't.

However, embeddings work super well in the parts of the search.

dmezzetti2y ago

I agree that RAG doesn't have to be paired with vector search. Other types of search can work in some cases.

Where vector search excels is that it can encode a complex question as a vector and does a good job bringing back the top n results. Its not impossible to do some of this with keyword search (term expansion, stopwords and so forth). Vector search just makes it easy.

In the end, yes this is a better search system. And thinking about this step is a good point. I would go a step further and say it's also worth thinking about the RAG framework. Lots of examples use a OpenAI/Langchain/Chroma stack. But it's also worth evaluating RAG framework options. There might be frameworks that are easier to integrate and perform better for your use case.

Disclaimer: I am the author of txtai: https://github.com/neuml/txtai

summarity2y ago

I have a related project here: https://findsight.ai and also gave a talk about building it here: https://youtu.be/elNrRU12xRc

ravetcofx2y ago· 3 in thread

I'd love to have a search engine for all of my different conversations I've ever had with people through various messaging apps, that combines email and my scanned documents through paperless-ngx and any other PDFs or documents in my nextcloud in a single search interface

vasco2y ago

Maybe at some point the NSA will let us download them all!

sampriti0262y ago

if someone has to build this locally to fetch discussion where x topic was discussed or find a person who had shown interest in certain x thing, how does one go about it?

One way of doing it is to embed messages with the added context of previous messages until the topic changes, otherwise, a simple similarity search of user prompt embedding would output messages of irrelevant topics since the context was included from the start.

Then embed the user prompt and perform a similarity search of either the user's query or create a hypothetical statement based on the prompt, also called HyDe approach. You ask an LLM to generate a hypothetical response given the query and then use its vector along with the query vector to enhance search quality.

For example, if the user query is - "find me who is interested in playing Minecraft on Tuesday", the llm will generate a response "I play Minecraft on Tuesdays" and we can search the vector of the llm output in the vector db which is all the messages along with their context.

However, I am not sure how this will work in scenarios where the user has sent a message asking "Will you play Minecraft on Tuesday", and person A has responded with "Yes". how can we have the model find person A? Shall we make a summary of each person based on the conversation with the user?

Also, the whole process might be computationally slow. how do we enhance the speed and performance?

(a noob here who wanted to build a similar solution)

worldsayshi2y ago

I guess it could be a reality if GDPR came with a decent API spec do you could request your personal data algorithmically.

gdiamos2y ago

From the article: "The crux is that while vector search is better along some axes than traditional search, it's not magic. Just like regular search, you'll end up with irrelevant or missing documents in your results."

RAG is often helpful and easy to add, but it's fundamentally search - not magic.

I find it helpful to look at the search results before feeding them into the model. Just like the "I'm feeling lucky" button on google doesn't always give the perfect answer. You may have to tweak your search query to improve the result.

jrussbowman2y ago

I just used postgres to build my search engine and it also helps with the last 2 questions. Keeping the content context consistent helps with the first. Unscatter.com for example is content shared only in the last 30 days. Helps with keeping my operating costs under $50 a month too.

I wish I had time to mess with it more. Job and life has taken over. My first goal with AI would be to use it to for key word and phrase extraction and also analyzing all the links I pull in hourly to see if there is a larger story I could make visible.

codingjaguar2y ago· 1 in thread

Partially agree.

Vector DBs are critical components in retrieval systems. What most applications need are retrieval systems, rather than building blocks of retrieval systems. That doesn't mean the building blocks are not important.

As someone working on vector DB, I find many users struggling in building their own retrieval systems with building blocks such as embedding service (openai,cohere), logic orchestration framework (langchain/llamaindex) and vector databases, some even with reranker models. Putting them together is not as easy as it looks. A fairly changeling system work. Letting alone quality tuning and devops.

The struggle is no surprise to me, as tech companies who are experts on this (google,meta) all have dedicated teams working on retrieval system alone, making tons of optimizations and develop a whole feedback loop of evaluating and improving the quality. Most developers don't get access to such resource.

No one size fits all. I think there shall exist a service that democratize AI-powered retrieval, in simple words the know-how of using embedding+vectordb and a bunch of tricks to achieve SOTA retrieval quality.

With this idea I built a Retrieval-as-a-service solution, and here is its demo:

https://github.com/milvus-io/bootcamp/blob/master/bootcamp/R...

Or using it in LlamaIndex:

https://github.com/run-llama/llama_index/blob/main/docs/exam...

Curious to learn your thoughts.

codingjaguar2y ago

Here is an article that systematically discusses how vector retrieval and BM25 affects the search quality, in another word, what kind of systems are the past, now and future:

https://thenewstack.io/the-transformative-fusion-of-probabil...

shouche2y ago

I have been using elastic index for a while now. The best way I have found is to use a hybrid search - match all with embedding + exact+fuzzy match combination as a way to boost results.

Reranking also provide a significant improvement to the response quality.

Another way to improve results for domain specific RAG systems is to use some heuristics to boost results. E.g., penalize results that contain certain negative keywords or boost results with certain patterns.

For RAG, given the limited context size and potential hallucinations, best prompt + best data will provide you with best response.

Prompts can be improved greatly to get the LLM to throw a good response with reduced hallucinations. A lot of techniques are seen on Twitter and can be explored to find a good fit.

I improve my prompts using a GPT assistant that significantly improve the response quality. https://chat.openai.com/g/g-haH111AXX-prompt-optimizer

kristiandupont2y ago· 2 in thread

I'm trying to alleviate the issue with tagging ([link redacted]), but it's not a panacea.

I feel that a big part of the solution will simply be in the form of increased speeds. If you can ask the model for a strategy and then let it search/process a few times in a loop, responses will improve vastly.

pryelluw2y ago

I joke that is akin to applying taxonomy on a live tv interview. You need to tag and categorize but may only do so with precision after a point is made.

My current solution is to have an nlp pipeline that does so as tokens are returned. Not quite as precise yet but shows promise.

Should be open source sooner rather than later.

kristiandupont2y ago

I like that analogy.

d4rkp4ttern2y ago

This resonates with the approach we’ve taken in Langroid (the Multi-Agent framework from ex-CMU/UW-Madison researchers): our DocChatAgent uses a combination of lexical and semantic retrieval, reranking and relevance extraction to improve precision and recall:

https://github.com/langroid/langroid/blob/main/langroid/agen...

mariarmestre2y ago· 1 in thread

I think a fundamental issue with search, and the reason why many companies do not invest in tuning a good search experience, is that the main metric usually is to minimise embarrassing/irrelevant results, rather than get the best possible set of results. How can you even know what is the best answer to your query? Systematic evaluation is very hard.

TimPC2y ago

If you control the browser your results are in you can monitor clicks and time spent on document to generate pretty good signal. If someone opens a document and looks at it for fifteen minutes you should be fairly convinced it was useful.

howmayiannoyyou2y ago

OpenAI's ability to search and evaluate Bing results seems to me the best of both world's if it can be applied to custom data. By way of example, if an AI can query MacOS Spotlight and eval results I think the issue is resolved.

Xenoamorphous2y ago· 1 in thread

How do RAG implementations usually get around the context size limitations in LLMs?

Since it usually deals with PDFs and other docs that can be quite big, do they take only the first N tokens? Are abstractive summarisation techniques used?

svaha17282y ago

They split the document. Here’s an example of Markdown splitting. All this is far more an art than science at this point.

https://python.langchain.com/docs/modules/data_connection/do...

leetrout2y ago

RAG is retrieval-augmented generation. I had never heard of this before.

cgeier2y ago

RAG seems to be Retrieval Augmented Generation

j / k navigate · click thread line to collapse

82 comments

65 comments · 22 top-level

jankovicsandras2y ago· 9 in thread

Some gotchas I experienced (but I might be using the wrong embedding/vector DB: spaCy/FAISS):

- Short user questions might result a low signal query vector, e. g. user : "Who is Keanu Reeves?" -> false positives on Wikipedia articles which only contain "Who is"

If there's only a single document, a simple keyword search might lead to better results.

In my experience, false positives (retrieving an irrelevant text and generating completely wrong answer) are a bigger problem than negatives (not retrieving text, possibly can't answer question).

Has somebody experience with Apache Lucene / Solr or Elasticsearch?

m-i-l2y ago

> "Has somebody experience with Apache Lucene / Solr or Elasticsearch?"

[0] https://www.michael-lewis.com/posts/vector-search-and-retrie...

[1] https://sease.io/2023/12/hybrid-search-with-apache-solr.html

[2] https://news.ycombinator.com/item?id=38706913

bodantogat2y ago

1 more reply

hobofan2y ago

[0]: https://arxiv.org/abs/2212.10496

1 more reply

bryanrasmussen2y ago

toasted-subs2y ago

I wonder what the advantage/disadvantages of dedicated search tools like lucent instead of custom LLC.

mnd9992y ago

Neo4j are mixing vector embeddings with knowledge graphs - https://neo4j.com/generativeai/

WhitneyLand2y ago

Ignoring the disclosure etiquette here, then making an irrelevant rebuttal about relevance when the point was disclosure, then getting snarky with the person who tried to helpfully point it out?

I have no opinion on your products or your post, but some % of people steer away from companies for such things.

hobofan2y ago

It's generally good etiquette around here to disclose your affiliation if you post comments that advertise the products of your employer.

1 more reply

geoduck142y ago

Wow. I learned some stuff about etiquette on HN today.

I also think a KG on "learning journeys" would be valuable, but really difficult.

softwaredoug2y ago· 4 in thread

It's important we get through the trough of disillusionment quickly. There's a lot of market education needed to know when they're truly needed.

themanmaran2y ago

A month in I realize I'm trying to reinvent a search engine. Kinda wonder if I should have just used something like elasticsearch instead.

Linell2y ago

It's worth saying that ES can work as a vector store itself, so it's very easy to handle a couple different kinds of searches this way.

finikytou2y ago

softwaredoug2y ago

I disagree it’s “overhyped”. I feel like there’s a fairly correct understanding in the market of its uses and limitations. That hype cycle occurred decades ago

avereveard2y ago· 3 in thread

About the "bad news" section.

This may not sem fast, but there are 7b token models that can do it today, at 150+token/second.

VivaLaPanda2y ago

I think a model could do some basic eval but there are too many hidden assumptions for it to do especially well.

barrenko2y ago

please elaborate, thanks.

avereveard2y ago

this is an example: https://platform.openai.com/playground/p/HpFda4ZRXjbbanBwG35...

just implement a retrieve and search function to whatever data source you have, vector or full text, and a couple regex to extract actions and final answer.

bambax2y ago· 9 in thread

I have so far mostly failed in trying to explain 1/ why search matters and 2/ that not all "search" functionality are equal and that building good search is an art form.

marginalia_nu2y ago

> I have so far mostly failed in trying to explain 1/ why search matters and 2/ that not all "search" functionality are equal and that building good search is an art form.

Yeah, it takes an absurd amount of tuning to make search work well. Given how poorly the average search field works in almost anything, it's fair to say this crucial step isn't happening.

I suspect a lot of organizations just don't have workflows that would tolerate someone spending a month tweaking search algorithm parameters. It doesn't look enough like work.

andai2y ago

Doesn't look like work, yet tragically, incremental improvements to "frictionlessness" represent order-of-magnitude improvements to user experience.

1 more reply

PaulHoule2y ago

1 more reply

ankit2192y ago

You can use analogies like:

1. Imagine the world before Google. Web search was a pain. <<Search for your company>> would be similarly transformative.

2. Every company has an encyclopedia - the guy who knows about the past efforts and is consulted whenever people are trying something new. Search makes that redundant and reduce times.

3. Same with repetitive work because the employees cannot find where the work was done previously.

Grimblewald2y ago

Tomte2y ago

"we have no stemming support in Confluence" goes far beyond unfortunate defaults.

marginalia_nu2y ago

sgift2y ago

1 more reply

dumbfounder2y ago

bioxept2y ago· 4 in thread

james-revisoai2y ago

Der_Einzige2y ago

Btw - I published a paper at EMNLP with the txtai author (David) about using semantic graphs for automatic creation of debate cases!

https://aclanthology.org/2023.newsum-1.10/

Happy to see that David's excellent work is getting the love that it deserves!

dmezzetti2y ago

Thanks for the nice words on txtai. There have been times this year I've thought about an alternate 2023 where the focus wasn't LLMs and RAG.

charcircuit2y ago

>It's a shame in a way that all the creative and brilliant uses of text embeddings from similarity embeddings didn't really have any time to shine or go into product before ChatGPT

Yes, it did. Companies that offer competitive search or recommendation feeds were all using these text models in production.

1 more reply

deckar012y ago· 5 in thread

Instead of embedding the user prompt, I let the LLM invert it into keywords and search the embedding of that. It very much does feel like a magic bullet.

danielbln2y ago

Using the LLM to mutate the input so it can be used better for search is a path that works very well (ignoring added latency and cost).

kristiandupont2y ago

"Search the embedding"? Could you elaborate on this, it sounds interesting!

CGamesPlay2y ago

1 more reply

sroussey2y ago

Ask the LLM to summarize the question, then take an embedding of that.

I think you can do the same with data you store… summarize it to same number of tokens, then get an embedding for that to save with the original text.

Test! Different combinations of summarizing LLM and embedding generation LLM can get different results. But once you decide, you are locked in the summarizer as much as the embedding generator.

Not sure is this is what the parent meant though.

2 more replies

poulpy1232y ago

I'm sure it would be possible to fine tune a LLM like mistral to search a database or a document

politelemon2y ago· 1 in thread

> you could have a language model construct a query that includes a date filter.

VivaLaPanda2y ago

OAI function calling can solve this more or less

lysecret2y ago

Couldn't agree more. To give an example, to go beyond a simple "generic" search.

LLMs are extremely useful in creating these features from unstructured info on the companies. But just throwing an embedding on this and hoping it works doesn't.

However, embeddings work super well in the parts of the search.

dmezzetti2y ago

I agree that RAG doesn't have to be paired with vector search. Other types of search can work in some cases.

Disclaimer: I am the author of txtai: https://github.com/neuml/txtai

summarity2y ago

I have a related project here: https://findsight.ai and also gave a talk about building it here: https://youtu.be/elNrRU12xRc

ravetcofx2y ago· 3 in thread

vasco2y ago

Maybe at some point the NSA will let us download them all!

sampriti0262y ago

if someone has to build this locally to fetch discussion where x topic was discussed or find a person who had shown interest in certain x thing, how does one go about it?

Also, the whole process might be computationally slow. how do we enhance the speed and performance?

(a noob here who wanted to build a similar solution)

worldsayshi2y ago

I guess it could be a reality if GDPR came with a decent API spec do you could request your personal data algorithmically.

gdiamos2y ago

RAG is often helpful and easy to add, but it's fundamentally search - not magic.

jrussbowman2y ago

codingjaguar2y ago· 1 in thread

Partially agree.

With this idea I built a Retrieval-as-a-service solution, and here is its demo:

https://github.com/milvus-io/bootcamp/blob/master/bootcamp/R...

Or using it in LlamaIndex:

https://github.com/run-llama/llama_index/blob/main/docs/exam...

Curious to learn your thoughts.

codingjaguar2y ago

Here is an article that systematically discusses how vector retrieval and BM25 affects the search quality, in another word, what kind of systems are the past, now and future:

https://thenewstack.io/the-transformative-fusion-of-probabil...

shouche2y ago

I have been using elastic index for a while now. The best way I have found is to use a hybrid search - match all with embedding + exact+fuzzy match combination as a way to boost results.

Reranking also provide a significant improvement to the response quality.

For RAG, given the limited context size and potential hallucinations, best prompt + best data will provide you with best response.

Prompts can be improved greatly to get the LLM to throw a good response with reduced hallucinations. A lot of techniques are seen on Twitter and can be explored to find a good fit.

I improve my prompts using a GPT assistant that significantly improve the response quality. https://chat.openai.com/g/g-haH111AXX-prompt-optimizer

kristiandupont2y ago· 2 in thread

I'm trying to alleviate the issue with tagging ([link redacted]), but it's not a panacea.

pryelluw2y ago

I joke that is akin to applying taxonomy on a live tv interview. You need to tag and categorize but may only do so with precision after a point is made.

My current solution is to have an nlp pipeline that does so as tokens are returned. Not quite as precise yet but shows promise.

Should be open source sooner rather than later.

kristiandupont2y ago

I like that analogy.

d4rkp4ttern2y ago

https://github.com/langroid/langroid/blob/main/langroid/agen...

mariarmestre2y ago· 1 in thread

TimPC2y ago

howmayiannoyyou2y ago

Xenoamorphous2y ago· 1 in thread

How do RAG implementations usually get around the context size limitations in LLMs?

Since it usually deals with PDFs and other docs that can be quite big, do they take only the first N tokens? Are abstractive summarisation techniques used?

svaha17282y ago

They split the document. Here’s an example of Markdown splitting. All this is far more an art than science at this point.

https://python.langchain.com/docs/modules/data_connection/do...

leetrout2y ago

RAG is retrieval-augmented generation. I had never heard of this before.

cgeier2y ago

RAG seems to be Retrieval Augmented Generation

j / k navigate · click thread line to collapse