Show HN: Reduce ChatGPT costs 10x with distributed cache for LLMs (opens in new tab)

(edgematic.dev)

22 pointszaiste2y ago3 comments

Hello HN!

We're building a caching solution for LLMs (ChatGPT, Claude). By combining cutting-edge approaches, such as edge computing, prompt compression, vectorization, and others - it can reduce your AI bills by up to 10x and significantly lower response times.

Key Features: - cost efficiency: our system stores frequent queries, reducing the number of upstream (paid) API calls - fast responses: with various nodes globally, we reduce latency by serving data from the nearest location - scalability: designed to handle increasing loads and data sizes without degrading performance.

The cache operates transparently, requiring minimal changes to your existing setup. It's like your local content delivery network (CDN), but for LLMs, ensuring that users get the fastest possible access to information.

While still in its early stages, we are excited about the potential and are looking to gather feedback from the community. We will share the demo once you subscribe to our mailing list (to control our spending ;))

This could be a game-changer for those using LLMs in prod environments where cost and response time are critical.

We’re eager to hear what you think!

(launching from the Bunny Coworking House in SF - thanks, Henry, Liyen, and Grace, for having us)

3 comments

throwaway888abc2y ago

Looks great, do you have any concrete data how much money it will save ?

Also, how does it compare to for example GptCache[0] ? or any other semantic cache solution[1] ?

[0] https://gptcache.readthedocs.io/en/latest/

[1] https://portkey.ai/blog/reducing-llm-costs-and-latency-seman...

zaisteOP2y ago

We are still exploring. We don’t have any concrete data yet, but in some instances, we've observed reductions up to ten times. This seems especially relevant to specific areas, e.g. chatbots, where similar questions happen more often.

throwaway888abc2y ago

>We are still exploring. Fair point. Worth of looking into, is to create/train/tune small model (2b/7b) based on previous cached answers in case your knowledge index/domain is without changes in time.

Exciting times

j / k navigate · click thread line to collapse

3 comments

throwaway888abc2y ago

Looks great, do you have any concrete data how much money it will save ?

Also, how does it compare to for example GptCache[0] ? or any other semantic cache solution[1] ?

[0] https://gptcache.readthedocs.io/en/latest/

[1] https://portkey.ai/blog/reducing-llm-costs-and-latency-seman...

zaisteOP2y ago

throwaway888abc2y ago

Exciting times

j / k navigate · click thread line to collapse