It tries to solve the LLM hallucination issue by guiding it only to answer questions from the given context instead of making things up. If you ask something not covered in an episode, it should say that it doesn't know rather than providing a plausible, but potentially incorrect response.
It uses Whisper to transcribe, text-embedding-ada-002 to embed, Pinecone.io to search, and text-davinci-003 to generate the answer.
More examples and explanations here: https://twitter.com/rileytomasek/status/1603854647575384067
Tip - you don’t actually need GPT-3 level embedding for a decent semantic search. Sentence transformers paired with one of their models is good enough.
I like this: https://huggingface.co/sentence-transformers/multi-qa-MiniLM... - since it’s very light.
Also, perhaps I am an idiot but I just used Postgres array field to store my embeddings array to keep things simple and free.
The UX is gorgeous, simple and snappy. I remember a few of Huberman's podcasts so I typed a few questions and the answers were spot on.
I'll be following you and your work, Riley.
The UI is my own design system that I will open source at some point. The app is Remix with a Redis cache to keep things snappy.
On a side note, Huberman Labs bothers me. I was an avid listener to the early episodes. As I have ADHD, some of his explanations of the brain chemistry involved in attention and motivation were fascinating. But in one of the early-ish episodes he said some completely ridiculous about acupuncture (that it worked) that makes me think he has no real critical thinking skills.
I hope anyone out there listening to him and thinking about applying any of the approaches he talks about just takes the time to see whether any other sources say they have real-world effects.
To the credit of the author, this tool highlights the exact thing I'm talking about. Try searching for...
"How does acupuncture work?"
"Acupuncture involves taking needles and sometimes electricity and or heat as well and stimulating particular locations on the body. Through these maps of stimulation that have been developed over thousands of years, mostly in Asia, acupuncture can reduce inflammation in the body by stimulating the body in particular ways at particular sites on the body, liberating certain cells and molecules that enhance the function of the immune system and potentially can be used to combat different types of infection."
Another example is his episode with Matthew Walker[1] (author of Why We Sleep), and his other episodes where he gives sleep advice citing Walker as an authority. The problem is that Walker's work is not good[2][3] (riddled with errors at best, fraudulent at worst).
To be clear: I'm sure a lot (if not the majority) of what Huberman says is correct, or at least matches our current scientific understanding. The problem is that some things are incorrect, and the layman has no way of knowing which is which (especially since all the content is presented with the same high-certainty "science-based tools" tone)
[0] https://old.reddit.com/r/andrewhuberman/comments/smnnb0/crit...
[1] https://www.youtube.com/watch?v=gbQFSMayJxk
Prof Huberman is an expert and he reads up on relevant studies before talking about something. Even though he may occasionally make mistakes (who doesn't?), your evidence to the contrary, if any, should be at least as strong as his.
Moreover, accusing that an accomplished scientist lacks critical thinking skills should require substantial evidence.
[1] https://www.nccih.nih.gov/health/acupuncture-what-you-need-t...
[2] https://www.hopkinsmedicine.org/health/wellness-and-preventi...
and there are many other links one can easily find.
I've only done a single search with the tool so far, but it immediately returned the details that I was hoping for, along with context and other relevant mentions of the search terms.
Thank you kindly for making and sharing this.
>> The default mode network and the task networks become de-synchronized in the ADHD brain.
> What are the three parts of the brain that become de-synchronized in the ADHD brain?
>> The default mode network, the task networks, and the dopamine circuits.
From https://youtu.be/hFL6qRIJZ_Y?t=1714:
> An area called the dorsolateral prefrontal cortex ... the posterior cingulate cortex, and ... the lateral parietal lobe ... these are three brain areas that normally are synchronized in their activities ... that's how it is in a typical person. In a person with ADHD ... these brain areas are not playing well with each other.
I wonder if part of the problem might be the usage of text to speech. Did you consider scraping the transcriptions instead? e.g. with https://github.com/jdepoix/youtube-transcript-api
My guess is that there are more relevant results from the semantic search than I'm including in the context (to reduce costs) and that exact snippet isn't being given to the answering model as context.
would you be open sourcing soon? totally understand if you want to keep it private but if you are open sourcing there’s a few other podcasts i’m interested in running this on for myself, like some parenting ones.
I have some ideas of my own that I would love to implement similarly to this and it would help to know how to get started.
Preprocessing
1. Transcribe the dataset
2. Chunk the transcription into paragraphs.
3. Store the embedding of each paragraph into a vector database.
Querrying
1. Convert the user's query into an embedding
2. Query the vector database for the top N closest embeddings and fetch the paragraphs that correspond to them. To be robust against queries which you don't have results for you should limit how far away results can be from the user's query.
3. Using those paragraphs craft a propmt that you will give to a LLM.
4. Do any final filtering on the what you got back from the LLM.
The way I built it is documented here: https://www.pinecone.io/learn/openai-whisper/
Afaik it's the same approach as Riley, that is:
- Scrape audio of YouTube videos
- Transcribe to text with OpenAI's Whisper
- Use sentence transformer to create embeddings of text
- Index embeddings (with transcribed text, timestamps, and video URL attached) in Pinecone's vector database
- Wrap up the querying functionality in a nice UI
(this is for the search functionality)
If wanting to replicate the Q&A part, I also built something similar and wrote about it (https://youtu.be/coaaSxys5so) - it's essentially the same process but we return text snippets to GPT-3 along with the original question and it generates an answer
Typically you'd split the text in paragraph sized chunks to handle this requirement of sentence transformers, with GPT-3 embeddings you naturally have more flexibility there