How does it work? Bettershot aims to detect 3 things:
Was the question relevant to the data (i.e. filter out questions like "how's the weather?" if the chatbot's purpose was to answer questions on the history of jeans)
If relevant then, Did the model response invent new information when answering the question (i.e. information that was not in the prompt passed in)
Did the model refuse to answer the question (e.g. "Sorry as an AI language model...")
We do this by using chatgpt (currently gpt-3.5-turbo-16k) to evaluate each prompt-response pair 5 times, sampling the most frequent result (e.g. if it evaluated it to 'True' 4 times out of 5, then it's probably a good response).Check out the repo to know more https://github.com/ClerkieAI/bettershot
No comments yet.