undefined | Better HN

0 pointsLegend24402y ago0 comments

They don't need that - they already have enough data to generate plausibly human voices that don't sound like anyone in particular.

Voice cloning is a special case, these models are equally good at making new voices.

0 comments

techdragon2y ago

I’ve found it’s not actually as easy to get this stuff to sound different to the specific someone it’s trained on.

gwern2y ago

Don't expect that to last more than a year or two, assuming it's even still a problem for the best voice-generation AIs. Generating high-quality is the hard problem; generating specific high-quality samples is, by comparison, a lot easier.

Remember when Stable Diffusion was released a year ago and one of the big artist copes was "sure, it can generate random images, but it'll never be able to generate the same character repeatedly!" They were already wrong because Textual Inversion and DreamBooth were already published, and soon enough, ported to SD and now people could dump out thousands of images of the same character in the same consistent style etc (and did).

techdragon2y ago

The issue is more that I can’t get the equivalent of a slider control to adjust one or more properties of the voice from the AI in real time… like a vocal fry slider to use an example of something most people are capable of deliberately doing when they want to… but the currently available models are pre-trained to sound like the average/median of one specific person (or character) and while I imagine tools will improve to control and customise the training of the models to customise this vocal output I don’t see a clear path from the current model architectural design to one where I can freely control the stylistic expression aspects of the vocal output without loading in a completely different set of model data trained for that new desired output.

1 more reply

j / k navigate · click thread line to collapse

0 comments

techdragon2y ago

I’ve found it’s not actually as easy to get this stuff to sound different to the specific someone it’s trained on.

gwern2y ago

techdragon2y ago

1 more reply

j / k navigate · click thread line to collapse