Show HN: Remembrall – Long-term memory proxy for LLMs (opens in new tab)

(remembrall.dev)

4 pointsraunakchowdhuri2y ago6 comments

Hi HN,

I built Remembrall, a proxy on top of your OpenAI queries that gives your chat system long-term memory.

How it works: Just add an extra user id to your OpenAI call. When a user stops chatting actively, it will trigger an "autosave" and use GPT to save/update important details about the conversation into a vector db. When the user continues the conversation, we'll query the db for relevant info and prepend it into the system prompt.

All happens in < 100 ms latency on the edge, with only two lines of code needed for integration. You also get observability (i.e. a log of all LLM requests) for free in a simple dashboard.

6 comments

pplonski862y ago

Congratulations on launch! I've seen your tweet about unlimited context for LLMs :)

Do you have option to manually provide text that will be used as LLM memory?

raunakchowdhuriOP2y ago

Not right now, but it's a good idea. I'll add that later this week.

QuantumCodester2y ago

Really interesting — can you explain a bit more about how the long-term memory works?

raunakchowdhuriOP2y ago

For sure!

When you send an OpenAI request, after a delay (to ensure the user doesn't keep chatting in the same session), a secondary GPT 3.5 call is made to "autosave" the result. This GPT call gets the information from the current chat session as well as other similar entries in the vector database.

The structured output from this call is used to do either insert a new memory or update an existing memory in the vector database. At query time, we do a search of the vector database and quickly insert relevant context into the system prompt.

I like to consider this approach "dynamic" retrieval augmented generation, as the vector database is constantly changing as conversations occur.

ProofHouse2y ago

Can we easily extract all the customer data at anypoint to put into a new vector db? Also is there any restriction for how many customers *ideally not, and does this essentially double the GPT3.5 costs?

1 more reply

j / k navigate · click thread line to collapse

6 comments

pplonski862y ago

Congratulations on launch! I've seen your tweet about unlimited context for LLMs :)

Do you have option to manually provide text that will be used as LLM memory?

raunakchowdhuriOP2y ago

Not right now, but it's a good idea. I'll add that later this week.

QuantumCodester2y ago

Really interesting — can you explain a bit more about how the long-term memory works?

raunakchowdhuriOP2y ago

For sure!

I like to consider this approach "dynamic" retrieval augmented generation, as the vector database is constantly changing as conversations occur.

ProofHouse2y ago

1 more reply

j / k navigate · click thread line to collapse