Like with robovoiced videos on YT reading some scraped content.
"A ship that can navigate space without a computer on board can be constructed in one-fifth the time and at one-tenth the expense of a computer-laden ship. We could build fleets five time, ten times, as great as Deneb could if we could but eliminate the computer."
But this of course is nonsensical with current technology, same as it would be nonsensical to go back to manual agriculture or manual manufacturing - we can achieve so much more with our tools than without them. And the way I see it, as long as we have an incentive to advance the state of the art, people will have an incentive (and curiosity) to learn how we got where we are, so that they could push the envelope.
[0] https://ia803006.us.archive.org/6/items/TheFeelingOfPower/Th...
I wonder if that was inspiration for Wondercraft.
Here's an example segment, demonstrating an extra feature where they can call an expert to weigh in on whatever they are talking about: https://soundcloud.com/bemmu/19animals
Of course if you listen to podcasts because you like the parasocial aspect or the celebrity interviews, then yeah... Not really a point.
There are a few podcasts for which I'd have greater interest if the narration were by someone other than the current host....
There are also services such as the National Library for the Blind (UK) and BARD (US) which provide books, including a large number of audiobooks, for the blind. Automated text-to-speech would make a vastly larger library available, particularly of very recent publications, niche publications, and long-since-out-of-print books. Such services do take requests, but tend to focus on works published within the past five years.
The problem is that filtering/searching on that massive catalog and weeding the useless stuff out.
I actually quite often wish I could access a condensed version of a few podcasts in text form. Sometimes there's little nuggets of information dropped by hosts or guests that don't make it onto any other medium.
When I do intentionally listen to podcasts (i.e. as opposed to having to, because that's the only available form of some content), I do so because I enjoy the style of the conversation itself.
I believe (but then again I also want to believe, so make of this what you will) that I'd be holding the AI to only the same standards I hold humans to. It's not like I'm trying to build a relationship to the speaker in either case.
I listen to a ton of podcasts in different niches: Theo Von, all in pod, masters of scale, the daily, some true crime stuff, etc
I found the AI briefing room which is a quick summary done by and read by ai. It’s not as good as a human but I’m completely used to it now.
I am thinking of summarizing the business related podcasts I listen to for myself so I can consume more content in less time.
I wish all podcasts had a shorter ai version
If I get to control it and I can have it draw in enough interesting angles into something, I think it could be fun. I wouldn't replace one of my favorites, but I'd gladly use something that could generate creative new content.
Why not follow bots on YouTube and Spotify?
- LLM-driven back and forth with the paper as context
- Text-to-speech
Pricing for high quality text to speech with Google's studio voices run at USD 160.00/1M count. And given the average 10 minute recording at the average 130 WPM is 1,300 words and at 5 characters per word is 6500, we can estimate an audio cost of $1. LLM cost is probably about the same given the research paper processing and conversation.
So only costs about $2-3 per 10 minute recording. Wild.
[Attention is All You Need - 1:07]
> Voice A: How did the "Attention is All You Need" paper address this sequential processing bottleneck of RNNs?
> Voice B: So, instead of going step-by-step like RNNs, they introduced a model called the Transformer - hence the title.
What title? The paper is entitled "Attention is All You Need".
People are fooling themselves. These are stochastic parrots cosplaying as academics.
"[Transformers .. replaced...] ...the suspects from the time.. recurrent networks, convolution, GRUs".
GRU has no place being mentioned here. It's hallucinated in effect, though, not wrong. Just a misdirecting piece of information not in the original source.
GRU gives a Ben Kenobi vibe: it died out about when this paper was published.
But it's also kind of misinforming the listener to state this. GRUs are a subtype of recurrent networks. It's a small thing, but no actual professor would mention GRUs here I think. It's not relevant (GRUs are not mentioned in the paper itself) and mentioning RNNs and GRUs is a bit like saying "Yes, uses both Ice and Frozen Water"
So while the conversational style gives me podcast-keep-my-attention vibes.. I feel a uncanny valley fear. Yes each small weird decision is not going to rock my world. But it's slightly distorting the importance. Yes a human could list GRUs just the same, and probably, most professors would mistake or others.
But it just feels like this is professing to be the next, all-there thing. I don't see how you can do that and launch this while knowing it produces content like that. At least with humans, you can learn from 5 humans and take the overall picture - if only one mentions GRU, you move on. If there's one AI source, or AI sources that all tend to make the same mistake (e.g. continuing to list an inappropriate item to ensure conversational style), that's very different.
I don't like it.
[1] https://www.thisamericanlife.org/803/greetings-people-of-ear...
> These are stochastic parrots cosplaying as academics.
LOL
"The transformer processes the entire sequence all at once by using something called self attention"
There are hacks everywhere but humans lying sometimes have implications (libel/slander) that we can control. Computers are thought of in general society as devoid of bias and "smart" so if they lie people are more likely to listen.
It would be good to lead off with a disclaimer.
In this regard, LLMs are imperfect like ourselves, just to a different extent.
In other words: it's not summarising the paper in a clever way, it is summarising all the discussions that have been made about it.
What I'm thinking of is that I'd input a pdf, and the AI will do a bit of preprocessing leading to the creation of learning outcomes, talking points, visual aids and comprehension questions for me; and then once it's ready, will begin to lecture to me about the topic, allowing me to interrupt it at any point with my questions, after which it'll resume the lecture while adapting to any new context from my interruptions.
Are we there yet?
Sign up and I'll let you in very soon.
And before you know it, there is a story of David Cameron diddling a pig's head in his youth and now our deceased are being brought back to life.
Charlie Brooker was ahead of us all.
I wish Google would make these experiments more well-known!
The reading is very natural overall, though sometimes the emphasis is a bit off. What catches my ear is when Word A in a sentence receives stronger stress than Word B, but the longer context suggests that actually it should be Word B with the greater emphasis. An inexperienced human reader might miss that as well, but a professional narrator who is thinking about the overall meaning would get it right.
I prefer professional human narration when it is available, but the Reader app’s ability to handle nearly any text is wonderful. AI-read narration can have another advantage: clarity of enunciation. Even the most skillful human narrator sometimes slurs a consonant or two; the ElevenLabs voices render speech sounds distinctly while still sounding natural.
1. Take a science book. I used one Einstein loved as a kid, in German. But I can also use Asimov in English. Or anything else. We’ll handle language and outdated information on the LLM level.
2. Extract the core ideas and narrative with an LLM and rewrite it into a conversation, say, between a curious 7 year old girl and her dad. We can take into account what my kids are interested in, what they already know, facts from their own life, comparisons with their surroundings etc. to make it more engaging.
3. Turn it into audio using Text-to-Speech (multiple voices).
Physical ones, I scan. Cutting the spine is easiest. But today you can also just take pics with your phone.
Many retailers also sell EPUB. Which is just HTML.
Obviously, that’s all for private consumption only. (Unless you’re OpenAI I guess. :-P)
I have a project idea already to use arxiv RSS API to fetch interesting papers based on keywords (or some LLM summary) and then pass it to something like illuminate and then you have a listening queue to follow latest in the field. Though there will be some problems with formatting but then you could just open the pdf to see the plots and equations.
Please do not replace humanity with a faint imitation of what makes use human, actual spontaneity.
If you produce AI content, don't emulate small talk and quirky side jabs. It's pathetic.
This is just more hot garbage on top of a pile of junk.
I imagine a brighter future where we can choose to turn that off and remove it from search, like the low quality content it is. I would rather read imperfect content from human beings, coming from the source, than perfectly redigested AI clown vomit.
Note: I use AI tools every day. I have nothing against AI generated content, I have everything against AI advancements in human replacement, the "pretend" part. Classifying and returning knowledge is great. But I really dislike the trend of making AI more "human like", to the point of deceiving, such as pretending small talk and perfect human voice synthesis.
OTOH, i think the AI generated stuff should be clearly marked as such so there is no pretending.
But yeah - like electronic instruments, AI will take away the blue collar creative jobs, leaving behind a lot more noise and an even greater economic imbalance.
>all I can feel is sadness and how cringe it is.
Hm, really? I came to the opposite conclusion. I explained this to a friend who can see very little, and usually relies on audio to experience a lot of the world and written content - it is especially hard because a lot of written content isn't available in audio form or isn't talked about it.
He was pretty excited about it, and so am I. Maybe it's not the use case for you, and that's fine, but going "this is pathetic, no one is using it, le cringe" is a bit far.
"Illuminate is an experimental technology that uses AI to adapt content to your learning preferences. Illuminate generates audio with two AI-generated voices in conversation, discussing the key points of select papers. Illuminate is currently optimized for published computer science academic papers.
As an experimental product, the generated audio with two AI-generated voices in conversation may not always perfectly capture the nuances of the original research papers. Please be aware that there may be occasional errors or inconsistencies and that we are continually iterating to improve the user experience."
https://cloud.google.com/text-to-speech/docs/voice-types#cha...
Looks like you can generate from Website URLs if you add them as sources to your notebook, as well as Slides, Docs, PDFs etc. Anything NotebookLM supports.
Does anyone know how the summary was generated? (text summarization, I suppose?) Is there a bias towards "podcast-style discussion"? Not that I'm complaining about it - just that I found it helpful.
This only seems like it would be useful for spammers trying to game platforms, which is silly because spam is probably the number one thing bringing down the quality of Google's own products and services.
It also tells us something about humans, because it really does feel more engaging having two voices discussing a subject than simple text-to-speech, even though the information density is smaller.
LLMs have "hacked" this channel, and can participate in a 1:1 conversation with a human (via text chat).
With good text <--> speech, machines can participate in a 1:1 oral conversation with a human.
I'm with you: this is hella scary and creepy.
[0] Walter J Ong: "Orality and Literacy".
Limiting choice to frivolous voices is really testing the waters for how people will respond to fully acted voice gen from them, they want that trust from the creative guild first. But for users who run into this rigid stuff it's going to be like fake generated grandma pics in your google recipe modals.
> Illuminate generates audio with two AI-generated voices in conversation, discussing the key points of select papers.
This is a very useful tool, I will Star it and wait until Piper supports MacOS in the future.
Is this supposed to be a good thing that we want to accelerate (e/acc) towards?
English is particularly bad to read aloud because it is like programming language Fortran based on immutable tokens. If you want tonal variety, you have to understand the content.
Some other languages modify the tokens themselves, so just one word can be pompous, comical, uneducated etc.
I would like to send a text and then get back a podcast dialog between two people.
[1] https://illuminate.google.com/home?pli=1&play=SKUdNc_PPLL8
More of a tech demo than anything else.
What's wild about this is that the voices seem way better than GCP's TTS that I've seen. Any way to get those voices as an API?
Also it's weird that they focus only on AI papers in the demo, and not more interesting social stuff, like environment protection, climate change, etc
If it's just used for generating low quality robo content like we see on TikTok and YouTube then it's not so interesting.
Why would one prefer this AI conversation to the actual source?
Can these be agents and allow the listener to ask questions / interact?
1) it prepares me for the real studying. by being exposed to the gist of the material before actual studying, im very confident that the subsequent real study session would be more effective
2) i can brush up easily on key concepts, if im unable to sit properly, eg while commuting. but even if i were, a math textbook can be too dense for this purpose, and i often just want to refresh my memory on key concepts. and often im tired of _reading_ symbols or words, that’s when id prefer to actually _listen_, in a way, using a muscle that’s not tired
3) if im struggling with something, i can play this 5min chapter explanation multiple times a day throughout the week, while doing stuff, and engaging with it in a casual way. i think this would “soften” the struggle tremendously, and increase the chances of grasping the thing next time i tackle it
also id like a “temperature” knob, that i could tweak for how much in detail i want it to go
In other words: I suspect that the output is heavily derivative from online discussions, and not based on the papers.
Of course, the real proof would be to see the output for entirely new papers.
It shouldn't be surprising that a LLM is able to understand a paper, just upload one to Claude 3.5 Sonnet.
For all the other papers, assuming they were impactful, they must have been referred by others, highlighting what their contribution is, what is controversial, etc.
In other words: the LLM doesn't have to "understand" the paper; it can simply parrot what others have been saying/writing about it.
(For example: a podcast about Google Illuminate could use our brief exchange to discuss the possible merits of this technology.)
Source?
Definitely one of the coolest things I have seen an LLM do.
I've seen YouTubers provide tutorials on auto-creating YouTube videos and podcast episodes on niche scientific subjects, on how to build seemingly-reputable brands with zero ongoing effort. That is all totally novel. Being able to lie or be wrong before is orthogonal to the real issue: scale.
This tech can allow "content creators" to spin hundreds of podcasts with garbage simultaneously, saturating the search space with nonsense. Similar to what is already being done with text everywhere.
What makes one skeptic regarding conspiracionist ideas is access and visibility to more enlightened content. If that access gets disrupted (it already has been), many people will not be able to tell the difference, specially future generations.
Building trust with your users is important, Google.
to me
> "Bad enough it has to talk, does it need fake vocal tics...?" - Gilfoyle
Found it: https://youtu.be/APlmfdbjmUY?si=b4-rgkxeXigU_un_&t=179
I saw they launched NotebookLM Audio Overview today: https://blog.google/technology/ai/notebooklm-audio-overviews...
So what the heck is illuminate and why would they simultaneously launch a competing product?
No matter how great the idea, it's hard to stay excited for more than a few microseconds at the sight of the word "Google". I can already hear the gravediggers shovels preparing a plot in the Google graveyard, and hear the sobs of the people who built their lives, workflows, even jobs and businesses around something that will be tossed aside as soon as it stops being someone's pet play-thing at Google.
A strange ambivalent feeling of hope already tarnished with tragedy.