https://phys.org/news/2010-12-air-playstation-3s-supercomput...
I have yet to see any benefit to society from GPT's improvements, but I do see the internet quickly becoming more and more unusable due to the inundation of machine-generated spam on nearly every communications platform.
A) silently dispose of it and hope nobody else ever makes the mistake of creating it.
or
B) keep it in the freezer and hold a press release about how you made it but it's too dangerous to share any details.
The model in question is Microsoft Vall-E2 without the click bait headline.
> Our experiments on the LibriSpeech and VCTK datasets show that VALL-E 2 surpasses previous systems in speech robustness, naturalness, and speaker similarity. It is the first of its kind to reach human parity on these benchmarks. Moreover, VALL-E 2 consistently synthesizes high-quality speech, even for sentences that are traditionally challenging due to their complexity or repetitive phrases
https://www.microsoft.com/en-us/research/project/vall-e-x/va...
> This page is for research demonstration purposes only. Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public.
If you go back and look at older cities they almost all have the same pattern: walls and gates.
I figure now that the Internet is a badlands roamed by robots pretending to be people as they attempt to rob you for their masters, we'll see the formation of cryptologically-secured enclaves. Maybe? Who knows?
At this point I'm pretty much going to restrict online communication to encrypted authenticated channels. (Heck, I should sign this comment, eh? If only as a performance?) Hopefully it remains difficult to build an AI that can guess large numbers. ;P
> so 2024 will be the last human election and what we mean by that is not that it's just going to be an AI running as president in 2028 but that will really be although maybe um it will be you know humans as figureheads but it'll be Whoever greater compute power will win
We saw already AI voices influencing elections in India https://restofworld.org/2023/ai-voice-modi-singing-politics/
> AI-generated songs, like the ones featuring Prime Minister Narendra Modi, are gaining traction ahead of India’s upcoming elections. [...] Earlier this month, an Instagram video of Modi “singing” a Telugu love song had over 2 million views, while a similar Tamil-language song had more than 2.7 million. A Punjabi song racked up more than 17 million views.
1. “… but if you and your big rich company were to acquihire us you'd get access…” — though as this is MS it probably isn't that!
2. That is only works as well as claimed in specific circumstances, or has significant flaws, so they don't want people looking at it too closely just yet. The wordage “in benchmarks used by Microsoft” might point to this.
3. That a competitor is getting close to releasing something similar, or has just done so, and they don't want to look like they were in second place or too far behind.
social solutions take too long to use against the tech, but tech solutions are fallible. to be defeatist about it, there's going to be a golden window of time here where some really nasty scams have no impedance.
It's almost as if a consumer protection agency should be created and funded to protect consumers.
'hi, sry to call you I'm Cindy and I'm from your insurance. I'm calling regarding your car crash ...'.
Do you think you could give a recording of a minute of someone talking to a talented impressionist and they could impersonate that person to some degree? It doesn’t seem that far fetched to me.
Getting high-quality audio for an arbitrary private citizen via public means isn't that easy, especially for folks like me that don't post video on public social media and use automated call screening and never say a word until the caller has been vetted.
> "This page is for research demonstration purposes only. Currently, we have no plans to incorporate VALL-E 2 into a product or expand access to the public."
A speech generator can help rob 1000 banks.
Truly irresponsible
I use TTS to listen to articles and stories that don't have access to an audiobook narrator. I've used some of the voices based on MBROLA tech, but those can grate after a while.
The more recent voice models are a lot higher quality and emotive (without the jarring pitch transitions of things like Cepstral) so are better to listen to. However, the recent models can clip/skip text, have prolonged silence, have long/warped/garbled words, etc. that make them harder to use longer term.
It's like we're stuck in some movie that came out in 1994[0], or something. Except, in this version, everything is gonna up sooner or later, anyway. Might as well profit from it along the way, right?
Le sigh.
---
I can't even think of non malicious uses that are anything more than novelty or small conveniences. Meanwhile the malicious use cases are innumerable.
In a just world building this would be a severe felony, punished with prison and destruction of all of the direct and indirect source material.
On the one hand, I would love this kind of tech to be available for entertainment purposes. An RPG with convincing NPCs that are able to provide a novel experience for every player? Sounds great.
On the other: this is fraught with ethical problems, not to mention an ideal tool for fraud. At worst, it could be used as a weapon for total asymmetrical warfare on concepts like media integrity and an ideal tool for character assassination; disinformation, propaganda, etc.
I would happy welcome a world where this stuff is nerfed across the board, where videogames and porn are just chock full of AI voice-acting artifacts. We'll adjust and accept that as just a part of the experience, as we have with low fidelity media of the past. But my more cynical side tells me that's not what people in power are concerned about.
Perfecting the tech for wide-spread use has trade offs; need for caller id, ease of slandering until trust in voice uniqueness recalibrates, all of which is going to change soon anyway but giving only rich/bad actors the tech at first has its own set of trade offs. Head in the sand is the irresponsible way.