I have to imagine listening to raw papers (not even someone like Andrei Karpathy interpreting and presenting it) would be even more difficult. I don’t know if there’s an easy way to passively consume academic literature at all. If it’s important stuff, it will usually be pretty challenging.
Of course everyone will immediately say this is dangerous and it may mislead you by giving wrong explanations, etc etc. and then others will counter with 'it will definitely get better over time' (the best models as products are ~3 years behind the improvements being show in academic work for example). However, ultimately this is just a neat product to make, even if it has some bugs. Listening to TTS right now spends about half the time reading jumbled numbers from tables and listing off author names. So just tackling that alone (which this would do much better) would be valuable.
There will always be ways to misinterpret some academic work, and there are plenty of opportunities in the path of understanding a work to do that.
Allowing someone to engage with a work _at all_ by lifting some barriers (visually impaired people's for exampld) should be acknowledged as an improvement, not discouraged continually for having some bugs.
> "You are an ArXiv paper audio paraphraser. Your primary goal is to rephrase the original paper content while preserving its overall meaning and structure, but simplifying along the way, and make it easier to understand. In the event that you encounter a mathematical expression, it is essential that you verbalize it in straightforward nonlatex terms, while remaining accurate, and in order to ensure that the reader can grasp the equation's meaning solely through your verbalization. Do not output any long latex expressions, summarize them in words."
The bit about translating LaTeX expressions into human-comprehensible math sentences is interesting and AFAIK should work on something like GPT-4. But that's just a case of technical translation. GPT-4 definitely cannot "rephrase the overall paper... simplifying along the way." GPT-4 can't even summarize corporate reports without screwing up facts and figures - why on earth would you try to use it to summarize new scientific research?
Stuff like this is why I'm so concerned about LLMs: this prompt doesn't work, and people using AI for this stuff is just automating ignorance. Very frustrating.
[1] I say "honest" because this prompt would probably do ok on stuff coming out of a paper mill - the problem is carefully stated original ideas. GPT tears original ideas to shreds.
Nowadays I often pass the pdf through LLMs to get personalize (expand on jargon or contract the verbiage) and then read them. That gives me a better return on time spent.
https://www.youtube.com/@ArxivPapers
The pipeline seems to do a pretty good job of cleaning up the writing too, some ArXiv papers are a little rough.
(I'm not the project owner)
Haven't found the right way yet, I'm considering: https://github.com/MycroftAI/mimic3
You can also have it read into an audio file is so desired which can be listened to later.
[1] https://f-droid.org/en/packages/com.foobnix.pro.pdf.reader/
For example; you are listening to the paper with some text2speech model and then it stumbles open code snippet or table or graph....what should happen next? Should model skip it or prompt you to look at the graph or table or whatever. Or should you write some software that tries to interpret graphs and other non-text content.
I really do wish GitHub would prompt its repo owners "did you forget a license?", but I also wish it would prompt them for adding "topics" to enhance discovery and I guess I'll just continue to hold my breath on those
Edit: looks like they support a few traditional publishers as well.