Earlier this year, we started developing a RAG-powered app to enable companies to talk to their free-text data safely.
During our experimentation, however, we realized that using such a new method meant that there weren’t industry-standards for evaluation metrics to measure the accuracy of RAG performance. We built Tonic Validate Metrics (tvalmetrics, for short) to easily calculate the benchmarks we needed to meet in building our RAG system.
We’re sharing this python package with the hope that it will be as useful for you as it has been for us and become a key part of the toolset you use to build LLM-powered applications. We also made Tonic Validate Metrics open-source so that it can thrive and evolve with your contributions!
Please take it for a spin and let us know what you think in the comments.
Docs: https://docs.tonic.ai/validate
Repo: https://github.com/TonicAI/tvalmetrics
Tonic Validate: https://validate.tonic.ai