Ask HN: My wife might lose the ability to speak in 3 weeks – how to prepare?

855 pointstech4all5y ago217 comments

My wife will be undergoing significant oral surgery in a few weeks and there is a SMALL chance she may lose the ability to speak. I'd like to prepare, just in case, to have technology to reproduce her voice from keyboard or other input.

My ideal would be an open source "deepfake toolkit" that allows me to provide pre-recorded samples of her speech and then TTS in her voice. Unfortunately most articles and tools I'm finding are anti-deepfake. Any recommendations?

Fallback would be recording her speaking "phonetic pangrams" and then using her pre-recorded phonemes to recreate speech that sounds like her. I feel like the deepfake toolkit is the way to go. Appreciate any recommendations... There must be open source tools for this??

217 comments

audiohermit5y ago

Hey, speech ML researcher here. Make sure you have different recordings of different contexts. fifteen.ai's best TTS voices use ~90 min of utterances, some separated by emotion. If you're having her read a text, make sure it's engaging--we do a lot of unconscious voicing when reading aloud. Tbh, if she has a non-Anglophone accent, you're going to need more because the training data is biased towards UK/US speakers.

If you want to read up on the basics, check out the SV2TTS paper: https://arxiv.org/pdf/1806.04558.pdf Basically you use a speaker encoding to condition the TTS output. This paper/idea is used all over, even for speech-to-speech translation, with small changes.

There's a few open-source version implementations but mostly outdated--the better ones are either private for business or privacy reasons.

There's a lot of work on non-parallel transfer learning (aka subjects are saying different things) so TTS has progressed rapidly and most public implementations lag a bit behind the research. If you're willing to grok speech processing, I'd start with NeMo for overall simplicity--don't get distracted by Kaldi.

Edit: Important note! Utterances are usually clipped of silence before/after so take that into account when analyzing corpus lengths. The quality of each utterance is much much more important than the length--fifteen.ai's TTS is so good primarily because they got fans of each character to collect the data.

grogenaut5y ago

I came here to say this. My brother has a PhD in chemistry and no coding experience. He was able to create a voice model of himself using basic nvidia example generators in a week. My dad lost his voice and it would have been very nice to have a TTS that was much more close to him. I personally would think it would be worth it to have that database.

But obviously also attend to the human matters as well, eg spend time.

audiohermit5y ago

I work in pathological speech processing/synthesis so I'm unfortunately familiar with your father's position. It really sucks that these people didn't know that archiving their voice would've been useful. I hear snippets that people manage to glean from family videos right after listening to their current voices and it makes me really sad.

On the upside, your father can choose any celebrity he wants to voice him! Tons of celeb data is publicly available (VoxCeleb 1 & 2).

clan5y ago

Are there any simple howtos anywhere which describes the process in as simple terms as possible? Without knowing the cool toolkits du jour.

Something like: - Download these texts - Record in WAV at least 48 kHz - Record each line in a separate file. - Do 3 takes of each line: flat, happy, despair

Maybe even a minimal set and a full set depending on how much effort you are willing to put in.

A plain description on how to capture a raw base which within reason and technology could be used as a baseline for the most common toolkits.

I have myself looked into this (for fun) but I felt I needed a very good understanding of the toolkits before even starting to feed in data. And for my admittedly unimportant use it seemed a huge investment to create a corpus I was not even confident would work. I ended up taking the low road and used an existing voice.

2 more replies

vervez5y ago

Is Morgan Freeman the most used celebrity?

3 more replies

grogenaut5y ago

Unfortunately dad passed 21 years ago. But the options now are much better. Just projecting my past experiences on the obvious Delta.

dheera5y ago

Which generator works the best, qualitatively? I come from a vision/ML background but haven't played with speech at all, so it's completely new to me, and wondering what the state of the art is.

I've been wanting to create a TTS of myself so I can take phone calls using headphones and type back what I want to say so that I don't have to yell private information out loud in public locations. Would be nice if during non-COVID times I could sit in a train seat and take phone calls completely silently.

audiohermit5y ago

Much of the work in speech synthesis has been about closing the gap in vocoders, which take a generated spectrogram and output a waveform. There's a clear gap between practical online implementations and computational behemoths like WaveNet. As you implied it's hard to quantitatively judge which result is better, papers usually use surveys to judge.

Here's a recent work that has a good comparison of some vocoders: https://wavenode-example.github.io/

Edit: WaveRNN struck a good balance for me in the past but is not shown in the link. Tons of new work coming out though!

1 more reply

tombert5y ago

This sounds pretty cool (your brother making the voice model, not your dad losing the voice)...do you have a link to this example? I would love to play with this.

lunixbochs5y ago

I have an open source web service for rapidly recording lots of text prompts to flac: https://speech.talonvoice.com (right now the live site prompts for single words because I’m trying to build single word training data, but the prompts can be any length)

You can set it up yourself with a bit of Python knowledge from this branch: https://github.com/talonvoice/noise/tree/speech-dataset

There are keyboard shortcuts - up/down/space to move through the list and record quickly.

If you want to use it on arbitrary text prompts, you can modify this function to return each line from a text file: https://github.com/talonvoice/noise/blob/speech-dataset/serv...

If you use this, before recording too much, do some test recordings and make sure they sound ok. Web audio can be unreliable in some browsers.

The uploaded files are named after the short name, so make sure you can correspond the short name with the original text prompts, eg with string_to_shortname().

If you aren’t easily able to do this yourself, I’d be happy to spin up an instance of it for you with text prompts of your choosing.

exikyut5y ago

Somewhat OT question: after taking a quick look at this I stumbled on the eye-tracking video you made using pops to click... and I'm curious, can eye trackers not detect and report blinking?

Also, I noted the VLC demo says it doesn't use DNS! That's awesome...

lunixbochs5y ago

I can detect blinking, yes. However your eyelids have very small muscles that are not meant to be consciously controlled all day and I don’t recommend straining them. Your eyelids twitching from muscle strain is rather uncomfortable (from experience)

The VLC demo was using macos speech recognition. In the beta now I’m shipping my own engine+trained models based on Facebook’s wav2letter, which is going pretty well.

Veedrac5y ago

Also buy a (half) decent mic! They're much cheaper than you might expect.

core-questions5y ago

Seconding this, it's worth a hundred or so for at least a Yeti or something, considering it's not like you'll get another chance to do this.

gknoy5y ago

Also, in a similar vein as testing backups, make sure to test + listen to the recorded audio, if you can.

ChrisGammell5y ago

I have about 500 hours of high quality, channel isolated (separate from the person I was speaking to) audio. It comes from my podcast that I have done for many years. It's probably closer to 75-100 hours audio of me actually speaking, since I am more the interviewer.

Is that something that would be useful to a researcher in any context? I am intrigued by the idea of having my voice preserved (you know, ego), but also am happy to donate the sound files if they would help researchers in any way for datasets.

If so: chris@theamphour.com

lunixbochs5y ago

Do you have transcripts, even just for some of the episodes? Unsupervised learning is possible but more difficult.

In general, yes, this is probably useful data in some way for speech recognition or TTS.

oh_sigh5y ago

Make sure to get recording of true honest laughter of hers too

ta15481772315y ago

Depending on your desired level of hedging...

I would say also consider recording a variety of honest utterances of all kinds, situations, and emotions. Anger outbursts, apathetic grunts, sexual even if you so desire (hence throwaway account)... Please dont be offended by this, just thinking of all scenarios for you to decide for yourself...

kevmoo15y ago

Came here to post this. Glad someone else thought of this first!

tech4allOP5y ago

Thanks so much for the resources and the well thought out reply!

daanzu5y ago

I wrote a simple little Python GUI app to record training audio. Given a text file containing prompts, it will choose a random selection and ordering of them, display them to be dictated by the user, and record the dictation audio and metadata to a .wav file and recorder.tsv file respectively. You can select a previous recording to play it back, delete it, and/or re-record it. It comes with a few selections of sentences designed to cover a broad diverse range of English (Arctic, TIMIT). Pretty simple and no-nonsense.

https://github.com/daanzu/speech-training-recorder

Originally intended for recording data for training speech recognition models [0], it should work just as well for recording to be used for speech synthesis.

[0] https://github.com/daanzu/kaldi-active-grammar

lunixbochs5y ago

Did you figure out why half of Shervin’s audio was empty? I would hesitate to recommend this if there’s still a chance half of the data isn’t usable after recording.

kemiller20025y ago

My mom lost her ability to speak, and what you are going to find is that your life and how you interact with everyone will have to change. Human verbal communication is very fast. She will find it difficult to be part of normal conversations. Without lots of help, she will start to fade into the background of conversations, because she can't keep up. You will have to help her be a part of things. It will be a depressing experience for her, and you will have to help her. People will look at her differently like she is mentally handicapped. (I know she won't be, but people will assume that she is even unconsciously). I recommend finding her a therapist if she has to go through this transition.

pmw5y ago

This reminded me of some wonderful writings from Roger Ebert, a brilliant man who also lost his ability to speak.

I cannot find it now, but I believe he wrote about this exact phenomenon: even with the best technology, you cannot communicate as fluently as a conversation demands, so you're relegated to the background.

Here's one of his writings I was able to find: https://www.rogerebert.com/roger-ebert/i-think-im-musing-my-...

metrokoi5y ago

Sometimes we have to be reminded that technology can not solve all of our problems. I find one of the issues with my relationships is that I try too hard to help solve their problems. Of course I have good intentions, but focusing on trying to solve the problem can lead to forgetting to do small things similar to what you said about including her in conversations. Think about what she is experiencing. This is the most important comment I have seen.

bergerjac5y ago

Seems like a great application for Elon's Neuralink.

mhh__5y ago

Who needs therapy when you have technology that doesn't exist yet!

fxtentacle5y ago

Record her reading the texts of a standardized text training corpus.

That way, you can retrain an existing AI to do text to speech with her own voice.

Edit: here's a link to the corpus that I believe Mozilla uses http://www.openslr.org/12/

asveikau5y ago

Is she on board with this? I can imagine a lot of people being severely put off by being asked to record "a corpus of approximately 1000 hours" in advance of what sounds like a stressful surgery.

tech4allOP5y ago

Good concern. We won't be doing any "hundreds of hours" solution. We've been married over 30 years so we're a pretty good team - naturally I wouldn't do it without both of us thinking it was a great idea.

joshribakoff5y ago

Seconding this, also, reproducing her voice with an AI may not be something she is on board with, it could make her feel like you don't accept her with or without a voice. It may also be unhealthy for you, similar to how spending too long on social media can become a dangerous source of dopamine.

It might make sense to consider making a recording that is more meaningful, and focus on giving her emotional support rather than building an AI that could be perceived as a replacement.

netsharc5y ago

It's not like OP is replacing her entirity with Alexa, if I were the wife I'd think "sure, let's 'backup' my voice, having it available in case I lose mine would be useful, so that people can still hear my thoughts in my voice instead of a robot's."...

2 more replies

fxtentacle5y ago

It's 1000 hours because multiple speakers record the same articles.

I believe some speakers only recorded 1-2 hours, which seems doable.

jfkebwjsbx5y ago

They have 500 hours left, so that would be impossible.

audiohermit5y ago

I'll push back on this. The quality of the read speech should be a higher concern than having parallel data. Unless OP's wife is a teacher or actor/voice actor, if LibriSpeech transcripts are boring, it will come out in the speech.

I think OP would ideally want the model to pick up on more natural intonation, instead of monotone dictation. Record everything from now on, as best you can with similar recording context, and hopefully that data will be enough to cover more natural nuances.

aerovistae5y ago

And get a high quality mic to do it with!

cabite5y ago

or better, rent a recording room for the time it takes.

trynewideas5y ago

Mozilla's is licensed CC-BY, which is pretty liberal. In case the Attribution license is a blocker, here's CMU_ARCTIC's, which is built from copyright-free sources and has no licensing restraints: http://festvox.org/cmu_arctic/

josinalvo5y ago

I think this is backwards... This is a corpus to train speech to text, not text to speech, right?

joshribakoff5y ago

It's a corpus designed to capture the full breadth of combinatorial nuances of human speech in a general sense.

reubenmorais5y ago

No, it is not. For one, it's a corpus of read speech, which means it does not capture well the characteristics of conversational human speech – hesitation, disfluencies, different tones and registers, etc. LibriSpeech has a paper explaining the design of the corpus, all you need to read is the first sentence of the abstract to know what it is supposed to capture:

This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems.

http://www.danielpovey.com/files/2015_icassp_librispeech.pdf

1 more reply

tech4allOP5y ago

Thanks!

Mo35y ago

This right here

olivermarks5y ago

https://youtu.be/0sR1rU3gLzQ

'This AI Clones Your Voice After Listening for 5 Seconds'

joeyspn5y ago

Woah, this is definitely the solution that the OP needs. I did read about WaveNet's text-to-speech a couple of years ago but didn't know it has progressed this far. It's crazy good, mindblowing.

Rotten1945y ago

I would also suggest looking into learning American Sign Language (of course alongside this project). While communicating via keyboard is workable and good for communicating with the wider world, ASL would be much more convenient for communicating between you two -- and a very interesting language to boot. It is a foreign language thats not related to English besides a few loan words, but there's tons of online resources and most universities have classes as well. Plus, you also can experience beautiful Deaf culture, with a rich storytelling and poetic tradition that blends language, gesture, acting, and pantomime in a way thats just impossible to translate to a spoken language.

The downvoted commenter was being a jerk, but I do think learning ASL is an option worth looking into.

krisoft5y ago

I think your answer misses the point of the question. Learning ASL can be done after the surgery if she lost her voice. The question was what can be done now before the surgery. The kind of things which, if it comes to the worst and she loses her voice, cannot be done after.

saltcured5y ago

I wouldn't discount the value of having some rudimentary signs to communicate immediately after surgery. It seems odd to me to focus on some dream of a perfect TTS synthesis if these more basic needs are not addressed first.

If you've ever had a mouth injury that inhibits talking, or been in a foreign environment where your speech is totally useless, it can be very stressful to be unable to communicate. I think the couple should consider learning some of the basics ahead of time, so that communication is possible without typing or any other apparatus.

Considering post-surgery recovery window, I'd want to be able to express very basic things like:

I am comfortable

I am in pain

I am hungry

I am nauseated

I need to urinate/defecate

I want to rest

I love you

When will you return

etc. I might suggest trying to boil down one or two inside-joke kinds of phrases as well, to be able to lift each others spirits in private or intimate way.

whatusername5y ago

a pen and paper would suffice for immediate communication needs.

1 more reply

elil175y ago

I strongly agree with this. Trying to type never as your main form of communication is exhausting. With a sign language, even if you’re not very good at it, you’re having a face to face conversation and you feel a sense of connection.

Also, if you’re not in America, you can learn your local sign language (e.g. British Sign Language, AusLan)

imglorp5y ago

Agree. Learning for my spouse. It's fun and easy and there's a ton of resources online and maybe at your local university. You can get good enough to have essential conversation in a few hundred signs. Deep and rapid skill takes study and practice, as you would expect.

zapzupnz5y ago

I agree, there is great value in sign languages for people who are unable to speak. (Disclaimer: I am hearing and learnt New Zealand Sign Language)

Obviously, it comes with great effort on both the part of the wife and OP, plus a rethinking of some social interactions and even social groups.

However, no problem is insurmountable with sufficient assistance and support from friends, family, and expert groups. Learning sign language is fun and a great way to meet new friends, hearing and Deaf alike.

It may be a last resort, but it's an option not to be ignored.

quiet_hacker5y ago

I have a progressive neurodegenerative disease and lost most my ability to speak about 3 years ago. What you are proposing is super cool, but you might be overthinking this. These things (text to speech, etc) are more awkward than practical in real life. Also, make sure your wife is completely on board. Seeing old clips and hearing my voice is actually kind of depressing to me. Here is my actual advice:

Outside of social situations, it honestly hasn't been that big of deal for me. As a remote developer, my job has remained the same. My managers and co workers have been super supportive. I send messages during meetings to one person who will read it aloud for me.

With text and social media, I still keep up with friends and family. Most medical appointments, etc, can be made online. SprintIP relay is free for deaf/speech impaired, and it allows the caller to type what they want to say and a representative will relay this to the other party. It works via the web or a mobile app. https://www.sprintrelay.com/sprintiprelay

Banks, brokers, or anything involving personal info (like SS#) usually requires a voice phone call. I have my wife call and explain the situation. I can whisper yes, as they occasionally require me to give permission. Some call center representatives have no idea how to handle this situation, and will just stick to the script saying they have to speak to me the entire time. My wife just thanks them, calls back, and hopes for someone more understanding.

There are awkward encounters where people don't know you can't speak, and will respond by speaking louder and slower. These people will also assume you are not intelligent and be dismissive. This is just one of the things you have to deal with.

I sincerely hope the procedure goes well and you wife doesn't have to deal with this. Just know that even if the worse happens, she can have a normal and productive life!

aspaceman5y ago

> There are awkward encounters where people don't know you can't speak, and will respond by speaking louder and slower. These people will also assume you are not intelligent and be dismissive. This is just one of the things you have to deal with.

It sucks you have to just deal with it.

sesuximo5y ago

That’s terrible about the call centers who need verbal confirmation. Crazy that they didn’t set up an alternative.

civilian5y ago

How do you communicate with your wife?

Did you ever consider learning sign language?

happycry5y ago

We get quite a few requests for this at Resemble (https://resemble.ai). We can get her to record right on our website or you can upload an existing file (along with a video of her consent) on the platform. Feel free to shoot me a message and I'd be happy to help build a voice for her.

cdolan5y ago

I dont know how to send messages but I researched this space a few years ago. Unfortunately a family member of mine had a surgery result in loss of his speech.

We have a lot of tapes around of his voice, from voice mails to family videos to some things from his work. If you are open to reaching out that would be awesome, I’ll check out the site as well.

Edit: I’ve wanted to make some sort of soundboard + “text to talk” setup for this family member. He often can’t participate in conversations because he writes on a whiteboard, and the speed of chatter moves faster than his writing

happycry5y ago

Feel free to shoot me an email: zohaib[at]resemble.ai

We also have an API that you might find useful for the soundboard project: https://app.resemble.ai/docs

louwhopley5y ago

Wow, this looks like a great service!

Out of interest what are the average response times to generate a clip of one or two sentences from a configured voice?

Imagining the easy text-to-speech solution the OP could build on this resemble API.

happycry5y ago

Thanks! We do have a synchronous real-time API and latency is one of the bigger issues that we're trying to improve on now. At the moment, you can expect speeds that are 10x faster than realtime.

archon8105y ago

Just FYI, your page keeps jumping on mobile as it renders and erases words. Not a good experience if I'm trying to read.

mattlondon5y ago

I don't know if you have kids/grandkids/nieces or nephews (or plan to have those) but it might be nice to record your wife reading some books out loud.

Not only will you have your own personal "audio books" of Harry Potter/The Hobbit/Chronicles of Narnia/Oi Frog/Alice in Wonderland/Roald Dahls etc etc for any kids/grandkids/relatives etc that will hopefully be something treasured in its own right, but you'll also have a large corpus of training data from well-known texts that you can retrain over and over as the tech improves in the future. Might be worth chucking in some other well-known texts to avoid over-fitting on a "kids' story voice" - maybe something plain like inauguration speeches/declaration of independence/magna carta/etc.

Obviously I'd focus on gathering raw material now, and focus on the reconstruction later when you've all recovered mentally and physically to whatever happens. The more data the better when it comes to this sort of thing. There might not be something "simple" right now (e.g. you could probably implement the WaveNet or similar paper yourself today, and training it up on some GPUs in your spare room etc, but in a few years there might be a nice WYSIWYG/SaaS thing for it), but with the recordings safely stored you'll obviously be able to use it in the future.

Best of luck to you both.

Zenbit_UX5y ago

I like this idea but the specific examples you give would almost certainly be a terrible idea. A voice trained on Tolkien or old American legalese like the Magna Carta would train a model with a lot of thee, thus, therefore and though art and undertrain it with modern English. His wife would sound like the second coming of Jesus or Shakespeare and less like a normal human being.

mattlondon5y ago

From what I understand, it is not the words themselves (thee etc) but the sounds that make the words - so the "th" and the "ee" are still legit sounds in modern English words. The network would just be synthesising the words you tell it to - it won't be picking the words for you.

I might be wrong though.

kerkeslager5y ago

I don't have any answers to give you, but I want to say that this is a really loving and beautiful thing you're trying to do.

Someone5y ago

Is it? My first thought was “is your ideal also her ideal?”.

We cannot rule out she wants to spend quality time with her partner instead of spending time in a recording studio, so that, if the worst outcome comes, her husband can remind her of what she lost.

kerkeslager5y ago

Presumably the guy is better at guessing what his wife wants than you are, and his wife is an adult who can tell him if he guesses wrong.

fragmede5y ago

Presumably the wife is the best at knowing what she wants.

I'd make no presumption, good or bad, about the their relationship dynamic, however.

thaumasiotes5y ago

> his wife is an adult who can tell him if he guesses wrong

She can, but she might not. A lot of that depends on how he presents the idea to her -- it might seem like something that's important to him.

1 more reply

daveFNbuck5y ago

I think the idea here is that she could use her own voice instead of a generic voice with text to speech devices. I doubt he intends to taunt her with it.

DaniloDias5y ago

By all means, let’s have hacker news expand this technical question into an evaluation of this guys marriage.

pezo19195y ago

Life is complex bro.

1 more reply

glonq5y ago

That was also my first thought, having seen far too many tech geeks inflict unwanted products and projects onto their poor partners and families.

The sentiment is admirable, but it's a lot of work considering that the probability of a negative outcome is very low.

y-c-o-m-b5y ago

Was just about to post the same. Not only is this an amazing thing to do for her, but this is one of the coolest threads I've seen on HN in a long time

rimliu5y ago

Depends. I'd be horrified to listen to my own voice when I am not speaking. Keep in mind, what we sound very different to ourselves.

LeifCarrotson5y ago

From a different perspective, I'd be brokenhearted if I could not speak to my son or my wife in something they could recognize as their father or husband's voice. I know modern TTS systems are a fair sight (voice?) better than Microsoft Sam, but it would be emotionally valuable to me to have a self-trained TTS library.

I'm not sure there's a correlation to other senses, I can't see for my future self or move on his behalf. I suppose there are things I would want to taste or smell if I was going to lose those senses, but those are experiences for me, not things I'd use to communicate with loved ones.

After losing my voice in an accident, I'd be willing to spend many, many hours transcribing my own speech in the handful of scratchy family videos, voicemails, and phone logs of ordinary conversations. If I could spend a couple days prior to the event reading some books, a TTS training corpus, or anniversary/birthday/wedding/etc greetings and congratulations into a microphone and have a personal text-to-speech voice I'd be all over that.

It would be a little weird if someone else used it as their narrator, but that's not OP's goal.

Speaking of recording books and training corpuses, my grandparents (who have their voices) got a special kind of joy from reading children's books that they once read to me and that I once heard as a child to their new grandson. OP, if you and your wife have or might have kids (and she can handle it emotionally), it might be nice to record video/audio of reading children's books to future grandchildren. Even if your future grandchild knows that grandma can't read books out loud, I'd bet Grandma would be happy to silently turn the pages for a toddler on her lap until those digital recordings got worn and scratchy like an old VHS.

Alex39175y ago

> Keep in mind, what we sound very different to ourselves.

This is less of a problem with modern high-quality mics than it was, say, with answering machines 30 years ago. Your voice might still sound not exactly the same, but it hopefully shouldn't be unbearably grating either.

corobo5y ago

Turn the bass up a bit

It's because reproduced audio doesn't have the bass the same as you hearing it conducted through your jawbone (though of course this will sound too bassy to everyone else!)

1 more reply

inanutshellus5y ago

I'd love to be able to change my mind, though.

Recording audio and then choosing not to use it later is fine.

Not recording it because I don't want it right now... maybe fine? maybe sad.

kerkeslager5y ago

Presumably the guy is better at guessing what his wife wants than you are, and his wife is an adult who can tell him if he guesses wrong.

covercash5y ago

Other resources you may want to explore are r/mute and r/deaf subreddits. Both also have Discord servers listed in the sidebars.

Having spent a good deal of time in hospitals, a few things I recommend... 10’ phone cable since outlets can sometimes be far from the bed, cheap slippers she can wear to walk around (stepping in a hospital hallway mystery puddle wearing just socks is very unpleasant), comfy clothes that you don’t mind having ruined (T-shirts, underwear, shirts, pajama pants - they can temporarily unhook the IV so she can put a T-shirt on), earplugs, eye mask. If she’s going to be on liquid-only diet, bring your own since hospital food is not great, not terrible. Soylent/Orgain/Ensure if she’s permitted that, otherwise good quality Italian ices are such a nice treat and most hospitals have a patient fridge/freezer you can store them in. Broth, but go to a restaurant or grocery store/farmers market with hot soup bar and fill a container with just the broth from the chicken noodle soup. It’s INFINITELY better than boxed broth.

Hopefully all of your research and preparation will be for nothing, I wish you and your wife a successful surgery!

dawg-5y ago

Speech-language Pathology student here. I would recommend going to see a speech therapist. It will likely be covered by your health insurance. Find an SLP who specializes in AAC (Augmented and Alternative Communication) who can help your wife communicate if she loses her speech. Your DIY approach could work, but having support from an SLP to help her learn the system, and come up with other options if it doesn't cover all of her communication needs, will go a long way.

stevenbedrick5y ago

Upvoted and agreed 100%, from an AAC researcher. Your best bet is definitely going to be to reach out to an SLP with AAC expertise.

coronadisaster5y ago

Just have her carry a good microphone at all times to record everything she says until that point, to have a maximum amount of samples. If you can't "deepfake" it today, maybe you will be able to do it tomorrow, but at least you will have the data.

woah5y ago

This will probably sound a lot more natural than sitting there reading a training corpus. Given that advances in AI are tending to eliminate manual feature engineering, predetermined training corpuses may soon be a thing of the past anyway.

lostlogin5y ago

“This conversation is being recorded for training and quality assurance purposes.” Should be stated before each new interaction. The legal requirement will vary by jurisdiction but a lawyer can advise on that. And yes, I’m joking.

coronadisaster5y ago

While this can be true, it depends in which state that you live in: https://recordinglaw.com/party-two-party-consent-states/ . In Illinois, it is apparently legal for the police to record you without consent but it is illegal for you to record the police...

bluGill5y ago

Agree, voice recorders are cheap. Every sound she makes for these last few days can be edited and used later if needed. Or if now just throw it away.

markk5y ago

Maybe using a lav mic and the recorder in the pocket would be the most natural

korethr5y ago

Others here are addressing technical solutions, but I don't see anyone here covering non-verbal communication. IMO, that's going to be just as important.

I am going to assume that your wife and you have a healthy relationship with strong communication, in part because you've developed an intuition for her body language and other non-verbal communication methods. In the scenario where she loses her ability to speak, even if she happily and completely takes to whatever technical solution(s) you offer to replace that, I think it's likely she will reflexively lean more heavily on those non-verbal channels, and you're going to need to get better at reading them than you are now.

uberman5y ago

This might get you started:

https://speech.microsoft.com/customvoice

I imagine if MS offers custom voices then the other text to speech providers do as well.

Good luck

tech4allOP5y ago

Thank you - great lead.

thaumasiotes5y ago

Some (decades old) research on this involved a research team creating a video of JFK saying "I never met Forrest Gump". I found a writeup in Google Books: https://books.google.com/books?id=mQtGVQeQplcC&pg=PA208&lpg=...

> We evaluated our Kennedy results qualitatively along the following dimensions: ... naturalness of the composited articulation; ...

Obviously the state of the art will have advanced, but maybe this can point the way toward more current research.

While I tend to agree with everyone else that this can be a great idea, my instinct is to float the idea to your wife first and see how she responds. I can imagine someone taking this negatively.

foepys5y ago

There is a YouTube channel called "Speaking of AI" that makes short fake speeches of some US public figures. The quality is quite good and a bit frightening.

https://www.youtube.com/channel/UCID5qusrF32kSj-oSGq3rJg/vid...

watertom5y ago

If she loses her ability to speak there are many ways to help her out, but nothing can replace the sound of her voice, especially for those important moments.

Just in case. Record specific messages for various people in her life, that can be used repeatedly, Children, Mom, Dad, siblings, in-laws, friends, messages like: "X, I love you", "X, I miss you.", "Mommy loves you!" "Give me a hug". "Holiday Greeting", "Happy Birthday","I'm so proud of you!" favorite happy saying, frustration saying,

You get the idea.

arethuza5y ago

What about recording messages to other people for future events (e.g. graduation of a child, birth of grandchild etc.)?

Recording a message to a yet unborn grandchild is maybe something we could all do!

jasonhn99995y ago

When my dad lost his speech, we had Boogie Board Jot devices all over the house. It made writing short notes and simple dialogs much less tedious.

We also used the Verbally premium iPad app to help give him a voice and make transactions on easier.

Wishing you all the best.

fxtentacle5y ago

The paper "Generalization Of Audio Deepfake Detection" gives an overview.

The paper https://arxiv.org/abs/1904.05441 has a list of spoofing methods.

Here's one method as paper https://arxiv.org/pdf/1806.04558.pdf

And here on GitHub https://github.com/CorentinJ/Real-Time-Voice-Cloning

probably_wrong5y ago

For an open-source approach, the MaryTTS project has a guide on how to add new voices to their tool: https://github.com/marytts/marytts/wiki/VoiceImportToolsTuto...

mbreese5y ago

You may want to look up what was done for Roger Ebert. He has lost his voice due to surgery, but because of the vast corpus of audio recordings of him, a viable text to speech engine was able to be created.

It’s a bit dated at this point, but I imagine the research has vastly improved since then.

It’s a very good question though. A decade ago this was able to be done for one man. Is it now possible to be done for anyone? Like others, I’d guess the first step is to record everything while you can.

echelon5y ago

I wrote https://trumped.com

You ideally want five hours of clean speech (good microphone, no background noise, high sample rate). It should be spoken clearly, in a single tone or mood. My model sounds awful because the data isn't consistent, and the room tone and microphones are terrible.

If you want different prosody or moods, don't mix them in the same data set.

You can experiment with transfer learning LJSpeech with Nvidia Tacotron2 right now. Glow-tts is also promising.

You'll start to get results with fifteen minutes of sample data, but for high quality you want a lot of audio.

Have your wife read a book and record it. The training chunks will be ~10 seconds apiece, so keep that in mind for how to segment the audio.

Focus on getting lots of good sounding data. Hours. The models will improve, but this may be your only shot of acquiring the data.

Download the LJSpeech dataset and listen to it. See how it sounds, how it's separated. That is a fantastic dataset that has yielded tremendous results, and you can use it for inspiration.

asdfman1235y ago

Here's a simple and practical solution:

Get a decent audio headset, have it record the audio to her phone, and spend hours talking to her about whatever. Preferably in a reasonably quiet environment.

Just spend a lot of time talking. You don't have to talk to her through a headset. Just make sure hers is recording her voice.

It would be easy, painless, and probably good for the relationship too.

nutanc5y ago

At a minimum get the following list of sentences recorded in her voice, http://www.festvox.org/cmu_arctic/cmuarctic.data

Make sure the recordings are of a good quality. This will ensure that you will have a baseline TTS of her voice at the minimum.

arslnjmn5y ago

(off topic) Record a few things for her future self. E.g. favourite quotes, frequently used phrases.

zxter5y ago

Good advice! Maybe a few shoutouts to your future children.

bcatanzaro5y ago

Make sure to record with the best microphone you can find and in the quietest room you can find. Makes a huge difference in the resulting TTS.

adrianmonk5y ago

You might look at resources for ALS patients.

Since ALS (aka Lou Gehrig's disease) is a degenerative motor neuron disease, people with ALS can pretty much count on eventually losing the ability to speak. So "voice banking" is apparently pretty common.

anaisbetts5y ago

Not exactly what you're asking for, but I wrote an app for this scenario:

https://play.google.com/store/apps/details?id=org.anaisbetts...

This is a text-to-speech app with a very keen emphasis on Day To Day usage - the UX will put the focus at the right places, help you reply faster, etc. I used it for a full month when I was unable to speak after voice surgery and it made a big difference, other folx have reported the same

da39a3ee5y ago

This is probably a really stupid suggestion but just in case.

Do you and your wife drink alcohol a bit? If so might it be worth having a couple of drinks in a quiet setting with her one evening with microphones running? I'm not suggesting getting wasted! I'm just wondering whether it might help to catch her getting more animated or "natural" in conversation. I was thinking this might help make the resulting synthesized speech capture even more of her personality than reading children's books or subsets of AI corpora etc.

shockron225y ago

I have had good results with this. https://www.resemble.ai/ It is based on this open source work. If you want to run it yourself. https://github.com/CorentinJ/Real-Time-Voice-Cloning

The voice cloning can be done in a matter of minutes. (< an hour) Its also very easy to use the website.

Best of luck!

kw95y ago

Strongly suggest reaching out to Dr. Rupal Patel (https://www.linkedin.com/in/rupalvocalid) of Northeastern University (https://coe.northeastern.edu/people/patel-rupal/) and VocaliD (https://vocalid.ai/about-us/). She's a licensed Speech-Language Pathologist (https://web.northeastern.edu/cadlab/publications/RupalPatel_...) and she and her husband, Dr. Deb Roy, did the Human Speechome project (https://en.wikipedia.org/wiki/Human_Speechome_Project). She was also my doctoral advisor and I feel confident saying she would be very interested in talking with you.

benjohnson5y ago

Do you have children? Perhaps - record her reading a few favorite children's books.

jitendrac5y ago

ML will require a lot of samples for getting it as desired. I will say, let your wife carry an attached microphone and meet all the people she wishes to talk at least once. collect all the audio data, and you can use it later. <ake all the available moments memorable for her like If you have child record a message from your wife for next 10 birthday of child.

underdeserver5y ago

Consider investing in a good microphone for recording. A Blue Yeti is ~$200.

DoreenMichele5y ago

Not to discourage you from making voice recordings and all that, but as someone who is handicapped and sometimes has trouble speaking because of it:

1. I spend a lot of time online. It doesn't matter so much there. I do a lot of typing.

2. My oldest son, who had serious output difficulties as a child, is talented at inferring what I need from a gesture and a grunt. This has proven enormously helpful.

3. Consider using her phone as a communication device. It's small and people tend to take their phone everywhere and she can type out what she wants to say.

4. Writing tweets can help a person learn to say things more succinctly. I do freelance writing and figuring out how to say things succinctly is a talent you can develop. (It's something I have to work at -- I'm a "would have written you a shorter letter if I had more time" type of person.) This can help enormously when you face communication barriers.

5. Take some time to deal with the emotional stuff. It matters.

I'm sorry you are facing this. Best of luck.

seesawtron5y ago

Here's a recent work [0] where you can train the model with 10s audio and convert any "text to speech" (all doable in the browser). I tried with Google Colab demo [1] and its performance fluctuates with the training audio sample that you give it so might need some trial and error to get the sweet spot.

Also the model is not saved in the browser with Colab so you might also want to do it locally to save it eventualy (if it comes to that).

All the best mate!

[0] Main repo: https://github.com/CorentinJ/Real-Time-Voice-Cloning [1] Google colab repo to try it out: https://github.com/CorentinJ/Real-Time-Voice-Cloning/blob/ma...

ardenwood5y ago

Hi, I like your idea for your wife. Hope the surgery will succeed without damage to her speaking. I'm from Nvidia and know well the team behind NeMo toolkit. Happy to connect you to the team if that helps. You may send me an email to ardenwood.bruin_at_gmail.com. -- Michael

maps75y ago

That's really good of you. It's amazing to see this community be so helpful.

jameswestgate5y ago

This may also be useful. Free and open source.

https://www.tobiidynavox.com/en-gb/software/web-applications...

totetsu5y ago

The mycroft voice assistant has some tooling they used to create voices.

https://mycroft.ai/blog/mimic-2-is-live/ https://github.com/MycroftAI/mimic2

Search Results Web results

Festival Speech Synthesis has a tool for recording speech databases, and some tutorials for training festival voices. http://www.cstr.ed.ac.uk/research/projects/speechrecorder/

disabled5y ago

You need to do voice banking. It is imperative that you do so, so that your wife keeps her identity no matter what.

What you need to do is spend the entire next 3 weeks doing voice banking. This will give your wife a text-to-speech voice (SAPI 5 voice, or others, for example). You record phrases that the voice banking service wants you to speak, with a high quality headset (best if wired) in a quiet setting.

The more sentences (samples) you have, the better the voice will be, obviously. But, there are services out there that will update the recordings, as the technology gets better, and that is the way to go, in terms of choosing the "best service".

The voice banking services that people typically use are here: https://www.mndassociation.org/professionals/management-of-m...

I would say that Acapela my-own-voice is currently the best technology. Obviously there are open source technologies, but you do not have the luxury of time to figure all of that out. However, you should do your own voice banking for later post-processing on your own with open source stuff.

There is also a free version of voice banking available, but I would only recommend it as a secondary tool: https://www.modeltalker.org/

This app (iOS and Android) for example, allows you to use your personal voice banked text-to-speech voice, to talk: https://therapy-box.co.uk/predictable

This is another great app that allows you to use your personal voice banked text-to-speech voice: https://www.assistiveware.com/products/proloquo4text

Source: Disabled engineering student, who is extremely interested in assistive technology. I would love to be a rehabilitation engineer.

stevewillows5y ago

It might also be worth recording normal conversations you have around the house as a fallback. You can always cut it up later and feed it into these systems.

Best of luck to the two of you. I really hope you don't ever need this technology.

KhoomeiK5y ago

You might want to try DIY'ing something like this [1] depending on the extensiveness of her surgery. It basically records electrical signals (EMG) emitted by the vocal chords (subvocalizations) and can convert it to text with ML/other signal processing algorithms. Basically a rudimentary version of the transhumanist Brain-Computer Interfaces that would enable telepathy.

[1] https://dam-prod.media.mit.edu/x/2018/03/23/p43-kapur_BRjFwE...

nighthawk4545y ago

This can be trained using only 5 Seconds of reference audio: https://google.github.io/tacotron/publications/speaker_adapt... https://arxiv.org/pdf/1806.04558.pdf

It's been mentioned a bit already, but thought it was worth calling out. This may be one of the lowest-overhead ways to start experimenting, at least in terms of data collection.

abjecton5y ago

Your approach towards the situation might determine the life quality of you and your wife. I can't imagine how it's like to think in a logic way while you're in the middle of such of an emotional event.

The_rationalist5y ago

https://dathudeptrai.github.io/TensorflowTTS/ is the state of the art and feels natural enough

ooopsnevermind5y ago

First off I'm sorry you're going through that, it sounds really tough. We sometimes have families use us for this (https://trysaga.com) as a way to collect voice recordings of loved ones, to record and share a large number of memories and stories in their voice and have them saved forever. You can download all the recordings to keep. It's free right now and I'd be happy to help out and make sure it got you what you needed, let me know.

cl0rkster5y ago

Probably not what you were seeking, but I have to imagine it would be similar to long periods I have spent in a non-verbal state. Being allowed to exist and just smile or laugh as a "part" of the conversation around me was like sunlight on a dark day. The range of human emotion and expression often overlaps enormously between people. Sometimes pretending you're voice is really the good you hear around you and not the throat mumblings that cause so much conflict is the most beautiful dream.

cl0rkster5y ago

Also... Learn sign language. Some of the most beautiful and overlooked people are non-verbal. I've met several truly speechless people who had families that never learned to sign. It's sad for them.

redsh5y ago

Sorry about this. Record as much voice as you can now (stereo too?), then you’ll have time to find the right solution and improve it as the technology gets better in time

m4635y ago

I went through something similar with a parent years and years ago. I wanted to be able to do things to help with what would eventually be lost.

I have to say I didn't help as much as I thought I could and afterwards I was always wondering if I could have used this technology or that and done more.

So - I think you should recognize that you can only do so much, we're doing the best we can, and in the end we are all winging it.

YAFZ5y ago

You might contact the following company: https://www.acapela-group.com/solutions/acapela-voice-factor...

There's also open source TTS from Mozilla: https://github.com/mozilla/TTS

erogol5y ago

Hope I am not repeating any comments here. My suggestion is that you start recording as soon as possible and as much as possible without worrying about technicalities. You can also use if you have any old voice records or videos with a relatively good voice quality. For now maybe she can read a book aloud in a silent room. After you have the data I can also help if you like to create a TTS model.

hvaoc5y ago

This is not open source but this was very good from their demo in terms of your own voice reproduction.

https://www.descript.com/lyrebird-ai

I hope good folks in there will help you, try reaching them.

https://m.youtube.com/watch?v=VnFC-s2nOtI

unstatusthequo5y ago

Love Descript and think it’s a great way to both record and get transcripts.

TriNetra5y ago

I've recently seen these two software on HN that maybe of some help:

deepfake for voice: https://github.com/CorentinJ/Real-Time-Voice-Cloning

Reproducing emotional voices: https://www.sonantic.io/

abinaya_rl5y ago

You are trying to do a beautiful thing. I don't have a knowledge of this subject, but I really wish you good luck on this project.

rajacombinator5y ago

Is this a time sensitive procedure? I think I’m stating the obvious - (maybe not) - but this is not something you should just wing a few weeks before, nor is it something you should try to figure out on your own without thoroughly discussing with your wife. “Surprise honey, I deepfaked your voice!” is not something most people would appreciate.

inspectorG4dget5y ago

Nobody has mentioned VocalID and voice surrogacy [1] yet. This organization might be able to recreate her voice from historic samples for speech-to-text

[1] https://www.ted.com/talks/rupal_patel_synthetic_voices_as_un...

meristem5y ago

All sorts of feels here. I had a positive outcome from exploratory throat surgery that had a chance of obliterating my voice. Prepping the way you are doing is amazing. Please balance it with time well-spent with your wife, being present in the moment. Sounds trite and yet takes focus to not just concentrate in the possible negative future outcome.

peterwwillis5y ago

Here's a story from the San Francisco Chronicle on saving Stephen Hawking's voice: https://www.sfchronicle.com/bayarea/article/The-Silicon-Vall...

loph5y ago

You might look at what Jamie Dupree has done.

https://www.cnn.com/2018/06/15/health/dystonia-jamie-dupree-...

He uses a text-to-speech system that sounds more-or-less like him.

jimlikeslimes5y ago

This is very much a short term solution if they are unable to talk immediately after surgery, for up to a few days. My wife used a small portable whiteboard and magic marker to write messages on in the same situation. It worked really well. Even with our 2 year old, it helped her to understand something unusual was going on.

offsky5y ago

I’m sorry that the both of you have to deal with this. I’ve read many of the replies here and I’m surprised there isn’t already a self-service website that does this. Pay some money, record some text, and boom here’s your voice. Something like this should exist. Someone should build this.

moooo995y ago

I don't really have anything to add to all the helpful comments under your thread. Do the preparation as much as you can, as long as your wife also wants this.

You said there is a small chance, so I really wish you and your wife the best of luck that she and her voice will be fine after the surgery.

eschaton20235y ago

If she has time get here to read the most common english words. Then parse the text and play the audio for the known words and use traditional speech synthesis for the outliers. It will not be perfect but you can then possibly train an AI to pronounce the outliers.

egwor5y ago

I would also think of various phrases that need a lot emotion applied. e.g. for sensitive situations like someone's death, or for positive feedback like a wedding or a birthday or a thank you

Maybe also if she has a favourite book or a favourite quote, get those recorded too.

Back it all up!

mathnode5y ago

If you don’t have any children (yet) you should get her to record herself reading some of her favourite children’s books. At the very least she will be able to read along with them. Children’s books are quite sparse, so a page per-track is easy to do.

jll295y ago

Just let her read a couple of pieces of texts and record in high-quality (44 KHz).

Beyond the techical answer, you may want her to record some nice personal words addressed to your family that you can listen to later.

You don't need to do anything until the worst case materialises.

bb1235y ago

There is https://www.descript.com/lyrebird-ai which is in private beta right now, but looks to serve your needs exactly. Maybe reach out to them?

voicevoice505y ago

For recording training audio:

https://github.com/daanzu/speech-training-recorder

The recorder works with Python 3.6.10. Need to pip install webrtcvad also.

mproud5y ago

Roger Ebert has some articles about his troubles he encountered that may be worth a read.

techbio5y ago

Confident as I may be that OPs intentions are good and pure, a quick CTRL-F on the comment threads finds no references to “abuse” or “ethics”, and I propose that synthesis of voice raises issues for which society has few natural defenses.

diggum5y ago

https://www.modeltalker.org/vrec/ is a project for "voice banking" that might be able to help. It's not perfect yet.

bigmasterofnone5y ago

Good luck with what you are doing and more importantly, I wish your wife good health.

PopeDotNinja5y ago

My first thought was to spend some time together not speaking. See how it goes, so there’s less fear going into it. Maybe take a couples mime class or something! Just making it real and not living in fear is the point.

josinalvo5y ago

IDK about the tech, but I would not worry about it right now. You dont need to play with the tech unless the bad unlikely outcome comes to pass.

The only tip I have is from a bit of amateur sound editing I did: collect many samples, and beware of big phrases: Like, ask her to say the same thing many times. And ... sometimes ... to ... stop ... at ... each ... word. And ... so ... me ... ti ... mes at each syllable.

Otherwise, if you ever need to create a sample that contains a single word/syllable, you cant. It is weird how much sound that contains clearly distinguishable syllables for the human ears still is not separable when you go to edit it.

Also, you might want to check wordlists by frequency to get a menu of common words, and ipa notation, to ensure you cover a good range of sounds

JDEW5y ago

> Otherwise, if you ever need to create a sample that contains a single word/syllable, you cant. It is weird how much sound that contains clearly distinguishable syllables for the human ears still is not separable when you go to edit it.

Don’t know why you’re being downvoted. Thought it was insightful.

techwraith5y ago

I recently learned about a startup that is working on this kind of tech: https://phonetic.ai/

vinniejames5y ago

Take a look at Lyrebird

https://www.descript.com/lyrebird-ai?source=lyrebird

suchoudh5y ago

Please do keep us posted on the final outcome. We all pray for the surgery to go successful. ( Really appreciate your efforts for preparing for the worst case scenario)

csisnett5y ago

Vocalid.ai has an vocal bank where you can record yourself, and use other people's voices as well. It could be a good choice for her to use her own voice

fenesiistvan5y ago

These are the things i am coming always back to ycombinator.com. There are always valuable, intelligent replies here for all kind of issues you might have.

ponker5y ago

Make sure to not have her read too much. The vocal cords can get inflamed and increase the chance of complications/damage.

lowercased5y ago

what dangers are there of someone 'stealing' your voice to impersonate you later? it seems mostly theoretical right now, but perhaps the more high-profile you are, the bigger the dangers might be, even today? if you had a large body of your voice already recorded (prepped for voice processing systems), is that data high-risk?

diegoperini5y ago

Please let us know the good news if they arrive, preferable with Tell HN or something similar.

Good luck and best wishes! <3

pkinnaird5y ago

get a great microphone and have her read her favorite books. Go for books with lots of dialog and emotional content.

Later, you can extract all the phonemes you want from it and you will retain the emotional expressiveness of her voice.

She should probably sing some songs -- lullabies, rock, etc. Go for emotional diversity.

smolPotat5y ago

There's an app for that! It's called Vocable, it's open source and iOS and Android!!!

glonq5y ago

> I'd like to prepare, just in case, to have technology to reproduce her voice from keyboard or other input.

Is this something that she wants? She's got a lot on her plate (emotionally and logistically) to prepare for this surgery, and maybe doesn't need a big geek project inflicted upon her just because there's a small chance of a bad outcome.

werdnapk5y ago

How small of a chance of her losing her ability to speak are you talking about here?

dragoon75y ago

Learn sign language.

klyrs5y ago

These suggestions are getting downvoted, but my girlfriend needed surgery wherein she wouldn't be able to speak for about a month. I know sign language, and tutored her for about a month leading up to the surgery. It was empowering, and she was able to teach friends, family and coworkers a few basic signs which made a lot of interactions go smoother. This low-tech solution doesn't need batteries or internet connectivity, and can provide a much smoother flow of conversation than typing things out.

chubs5y ago

Acapela.com has a voice banking service

ghoshbishakh5y ago

Please. There is a small chance you said. Everything will be fine. But still carry on your research on the problem since it might help others.

swyx5y ago

even if there is a small chance, the preparation may help lesson the blow of what would still be a tremendous loss.

also it might just help pass the time since OP has 3 weeks.

kangaroozach5y ago

Descript.com has the tech.

Reach out to Andrew Mason.

dazuaz5y ago

Not bad for as a niche product Idea

evmolesworth5y ago

Does your wife want you to do this?

kangaroozach5y ago

Descript.com Andrew Mason

pezo19195y ago

Did you ask her about that? Make sure she is not freaking out of that.

j / k navigate · click thread line to collapse

217 comments

audiohermit5y ago

There's a few open-source version implementations but mostly outdated--the better ones are either private for business or privacy reasons.

grogenaut5y ago

But obviously also attend to the human matters as well, eg spend time.

audiohermit5y ago

On the upside, your father can choose any celebrity he wants to voice him! Tons of celeb data is publicly available (VoxCeleb 1 & 2).

clan5y ago

Are there any simple howtos anywhere which describes the process in as simple terms as possible? Without knowing the cool toolkits du jour.

Something like: - Download these texts - Record in WAV at least 48 kHz - Record each line in a separate file. - Do 3 takes of each line: flat, happy, despair

Maybe even a minimal set and a full set depending on how much effort you are willing to put in.

A plain description on how to capture a raw base which within reason and technology could be used as a baseline for the most common toolkits.

2 more replies

vervez5y ago

Is Morgan Freeman the most used celebrity?

3 more replies

grogenaut5y ago

Unfortunately dad passed 21 years ago. But the options now are much better. Just projecting my past experiences on the obvious Delta.

dheera5y ago

Which generator works the best, qualitatively? I come from a vision/ML background but haven't played with speech at all, so it's completely new to me, and wondering what the state of the art is.

audiohermit5y ago

Here's a recent work that has a good comparison of some vocoders: https://wavenode-example.github.io/

Edit: WaveRNN struck a good balance for me in the past but is not shown in the link. Tons of new work coming out though!

1 more reply

tombert5y ago

This sounds pretty cool (your brother making the voice model, not your dad losing the voice)...do you have a link to this example? I would love to play with this.

lunixbochs5y ago

You can set it up yourself with a bit of Python knowledge from this branch: https://github.com/talonvoice/noise/tree/speech-dataset

There are keyboard shortcuts - up/down/space to move through the list and record quickly.

If you want to use it on arbitrary text prompts, you can modify this function to return each line from a text file: https://github.com/talonvoice/noise/blob/speech-dataset/serv...

If you use this, before recording too much, do some test recordings and make sure they sound ok. Web audio can be unreliable in some browsers.

The uploaded files are named after the short name, so make sure you can correspond the short name with the original text prompts, eg with string_to_shortname().

If you aren’t easily able to do this yourself, I’d be happy to spin up an instance of it for you with text prompts of your choosing.

exikyut5y ago

Somewhat OT question: after taking a quick look at this I stumbled on the eye-tracking video you made using pops to click... and I'm curious, can eye trackers not detect and report blinking?

Also, I noted the VLC demo says it doesn't use DNS! That's awesome...

lunixbochs5y ago

The VLC demo was using macos speech recognition. In the beta now I’m shipping my own engine+trained models based on Facebook’s wav2letter, which is going pretty well.

Veedrac5y ago

Also buy a (half) decent mic! They're much cheaper than you might expect.

core-questions5y ago

Seconding this, it's worth a hundred or so for at least a Yeti or something, considering it's not like you'll get another chance to do this.

gknoy5y ago

Also, in a similar vein as testing backups, make sure to test + listen to the recorded audio, if you can.

ChrisGammell5y ago

If so: chris@theamphour.com

lunixbochs5y ago

Do you have transcripts, even just for some of the episodes? Unsupervised learning is possible but more difficult.

In general, yes, this is probably useful data in some way for speech recognition or TTS.

oh_sigh5y ago

Make sure to get recording of true honest laughter of hers too

ta15481772315y ago

Depending on your desired level of hedging...

kevmoo15y ago

Came here to post this. Glad someone else thought of this first!

tech4allOP5y ago

Thanks so much for the resources and the well thought out reply!

daanzu5y ago

https://github.com/daanzu/speech-training-recorder

Originally intended for recording data for training speech recognition models [0], it should work just as well for recording to be used for speech synthesis.

[0] https://github.com/daanzu/kaldi-active-grammar

lunixbochs5y ago

Did you figure out why half of Shervin’s audio was empty? I would hesitate to recommend this if there’s still a chance half of the data isn’t usable after recording.

kemiller20025y ago

pmw5y ago

This reminded me of some wonderful writings from Roger Ebert, a brilliant man who also lost his ability to speak.

Here's one of his writings I was able to find: https://www.rogerebert.com/roger-ebert/i-think-im-musing-my-...

metrokoi5y ago

bergerjac5y ago

Seems like a great application for Elon's Neuralink.

mhh__5y ago

Who needs therapy when you have technology that doesn't exist yet!

fxtentacle5y ago

Record her reading the texts of a standardized text training corpus.

That way, you can retrain an existing AI to do text to speech with her own voice.

Edit: here's a link to the corpus that I believe Mozilla uses http://www.openslr.org/12/

asveikau5y ago

Is she on board with this? I can imagine a lot of people being severely put off by being asked to record "a corpus of approximately 1000 hours" in advance of what sounds like a stressful surgery.

tech4allOP5y ago

joshribakoff5y ago

It might make sense to consider making a recording that is more meaningful, and focus on giving her emotional support rather than building an AI that could be perceived as a replacement.

netsharc5y ago

2 more replies

fxtentacle5y ago

It's 1000 hours because multiple speakers record the same articles.

I believe some speakers only recorded 1-2 hours, which seems doable.

jfkebwjsbx5y ago

They have 500 hours left, so that would be impossible.

audiohermit5y ago

aerovistae5y ago

And get a high quality mic to do it with!

cabite5y ago

or better, rent a recording room for the time it takes.

trynewideas5y ago

josinalvo5y ago

I think this is backwards... This is a corpus to train speech to text, not text to speech, right?

joshribakoff5y ago

It's a corpus designed to capture the full breadth of combinatorial nuances of human speech in a general sense.

reubenmorais5y ago

This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems.

http://www.danielpovey.com/files/2015_icassp_librispeech.pdf

1 more reply

tech4allOP5y ago

Thanks!

Mo35y ago

This right here

olivermarks5y ago

https://youtu.be/0sR1rU3gLzQ

'This AI Clones Your Voice After Listening for 5 Seconds'

joeyspn5y ago

Woah, this is definitely the solution that the OP needs. I did read about WaveNet's text-to-speech a couple of years ago but didn't know it has progressed this far. It's crazy good, mindblowing.

Rotten1945y ago

The downvoted commenter was being a jerk, but I do think learning ASL is an option worth looking into.

krisoft5y ago

saltcured5y ago

Considering post-surgery recovery window, I'd want to be able to express very basic things like:

I am comfortable

I am in pain

I am hungry

I am nauseated

I need to urinate/defecate

I want to rest

I love you

When will you return

etc. I might suggest trying to boil down one or two inside-joke kinds of phrases as well, to be able to lift each others spirits in private or intimate way.

whatusername5y ago

a pen and paper would suffice for immediate communication needs.

1 more reply

elil175y ago

Also, if you’re not in America, you can learn your local sign language (e.g. British Sign Language, AusLan)

imglorp5y ago

zapzupnz5y ago

I agree, there is great value in sign languages for people who are unable to speak. (Disclaimer: I am hearing and learnt New Zealand Sign Language)

Obviously, it comes with great effort on both the part of the wife and OP, plus a rethinking of some social interactions and even social groups.

It may be a last resort, but it's an option not to be ignored.

quiet_hacker5y ago

I sincerely hope the procedure goes well and you wife doesn't have to deal with this. Just know that even if the worse happens, she can have a normal and productive life!

aspaceman5y ago

It sucks you have to just deal with it.

sesuximo5y ago

That’s terrible about the call centers who need verbal confirmation. Crazy that they didn’t set up an alternative.

civilian5y ago

How do you communicate with your wife?

Did you ever consider learning sign language?

happycry5y ago

cdolan5y ago

I dont know how to send messages but I researched this space a few years ago. Unfortunately a family member of mine had a surgery result in loss of his speech.

We have a lot of tapes around of his voice, from voice mails to family videos to some things from his work. If you are open to reaching out that would be awesome, I’ll check out the site as well.

happycry5y ago

Feel free to shoot me an email: zohaib[at]resemble.ai

We also have an API that you might find useful for the soundboard project: https://app.resemble.ai/docs

louwhopley5y ago

Wow, this looks like a great service!

Out of interest what are the average response times to generate a clip of one or two sentences from a configured voice?

Imagining the easy text-to-speech solution the OP could build on this resemble API.

happycry5y ago

Thanks! We do have a synchronous real-time API and latency is one of the bigger issues that we're trying to improve on now. At the moment, you can expect speeds that are 10x faster than realtime.

archon8105y ago

Just FYI, your page keeps jumping on mobile as it renders and erases words. Not a good experience if I'm trying to read.

mattlondon5y ago

I don't know if you have kids/grandkids/nieces or nephews (or plan to have those) but it might be nice to record your wife reading some books out loud.

Best of luck to you both.

Zenbit_UX5y ago

mattlondon5y ago

I might be wrong though.

kerkeslager5y ago

I don't have any answers to give you, but I want to say that this is a really loving and beautiful thing you're trying to do.

Someone5y ago

Is it? My first thought was “is your ideal also her ideal?”.

We cannot rule out she wants to spend quality time with her partner instead of spending time in a recording studio, so that, if the worst outcome comes, her husband can remind her of what she lost.

kerkeslager5y ago

Presumably the guy is better at guessing what his wife wants than you are, and his wife is an adult who can tell him if he guesses wrong.

fragmede5y ago

Presumably the wife is the best at knowing what she wants.

I'd make no presumption, good or bad, about the their relationship dynamic, however.

thaumasiotes5y ago

> his wife is an adult who can tell him if he guesses wrong

She can, but she might not. A lot of that depends on how he presents the idea to her -- it might seem like something that's important to him.

1 more reply

daveFNbuck5y ago

I think the idea here is that she could use her own voice instead of a generic voice with text to speech devices. I doubt he intends to taunt her with it.

DaniloDias5y ago

By all means, let’s have hacker news expand this technical question into an evaluation of this guys marriage.

pezo19195y ago

Life is complex bro.

1 more reply

glonq5y ago

That was also my first thought, having seen far too many tech geeks inflict unwanted products and projects onto their poor partners and families.

The sentiment is admirable, but it's a lot of work considering that the probability of a negative outcome is very low.

y-c-o-m-b5y ago

Was just about to post the same. Not only is this an amazing thing to do for her, but this is one of the coolest threads I've seen on HN in a long time

rimliu5y ago

Depends. I'd be horrified to listen to my own voice when I am not speaking. Keep in mind, what we sound very different to ourselves.

LeifCarrotson5y ago

It would be a little weird if someone else used it as their narrator, but that's not OP's goal.

Alex39175y ago

> Keep in mind, what we sound very different to ourselves.

corobo5y ago

Turn the bass up a bit

It's because reproduced audio doesn't have the bass the same as you hearing it conducted through your jawbone (though of course this will sound too bassy to everyone else!)

1 more reply

inanutshellus5y ago

I'd love to be able to change my mind, though.

Recording audio and then choosing not to use it later is fine.

Not recording it because I don't want it right now... maybe fine? maybe sad.

kerkeslager5y ago

Presumably the guy is better at guessing what his wife wants than you are, and his wife is an adult who can tell him if he guesses wrong.

covercash5y ago

Other resources you may want to explore are r/mute and r/deaf subreddits. Both also have Discord servers listed in the sidebars.

Hopefully all of your research and preparation will be for nothing, I wish you and your wife a successful surgery!

dawg-5y ago

stevenbedrick5y ago

Upvoted and agreed 100%, from an AAC researcher. Your best bet is definitely going to be to reach out to an SLP with AAC expertise.

coronadisaster5y ago

woah5y ago

lostlogin5y ago

coronadisaster5y ago

bluGill5y ago

Agree, voice recorders are cheap. Every sound she makes for these last few days can be edited and used later if needed. Or if now just throw it away.

markk5y ago

Maybe using a lav mic and the recorder in the pocket would be the most natural

korethr5y ago

Others here are addressing technical solutions, but I don't see anyone here covering non-verbal communication. IMO, that's going to be just as important.

uberman5y ago

This might get you started:

https://speech.microsoft.com/customvoice

I imagine if MS offers custom voices then the other text to speech providers do as well.

Good luck

tech4allOP5y ago

Thank you - great lead.

thaumasiotes5y ago

> We evaluated our Kennedy results qualitatively along the following dimensions: ... naturalness of the composited articulation; ...

Obviously the state of the art will have advanced, but maybe this can point the way toward more current research.

While I tend to agree with everyone else that this can be a great idea, my instinct is to float the idea to your wife first and see how she responds. I can imagine someone taking this negatively.

foepys5y ago

There is a YouTube channel called "Speaking of AI" that makes short fake speeches of some US public figures. The quality is quite good and a bit frightening.

https://www.youtube.com/channel/UCID5qusrF32kSj-oSGq3rJg/vid...

watertom5y ago

If she loses her ability to speak there are many ways to help her out, but nothing can replace the sound of her voice, especially for those important moments.

You get the idea.

arethuza5y ago

What about recording messages to other people for future events (e.g. graduation of a child, birth of grandchild etc.)?

Recording a message to a yet unborn grandchild is maybe something we could all do!

jasonhn99995y ago

When my dad lost his speech, we had Boogie Board Jot devices all over the house. It made writing short notes and simple dialogs much less tedious.

We also used the Verbally premium iPad app to help give him a voice and make transactions on easier.

Wishing you all the best.

fxtentacle5y ago

The paper "Generalization Of Audio Deepfake Detection" gives an overview.

The paper https://arxiv.org/abs/1904.05441 has a list of spoofing methods.

Here's one method as paper https://arxiv.org/pdf/1806.04558.pdf

And here on GitHub https://github.com/CorentinJ/Real-Time-Voice-Cloning

probably_wrong5y ago

For an open-source approach, the MaryTTS project has a guide on how to add new voices to their tool: https://github.com/marytts/marytts/wiki/VoiceImportToolsTuto...

mbreese5y ago

It’s a bit dated at this point, but I imagine the research has vastly improved since then.

echelon5y ago

I wrote https://trumped.com

If you want different prosody or moods, don't mix them in the same data set.

You can experiment with transfer learning LJSpeech with Nvidia Tacotron2 right now. Glow-tts is also promising.

You'll start to get results with fifteen minutes of sample data, but for high quality you want a lot of audio.

Have your wife read a book and record it. The training chunks will be ~10 seconds apiece, so keep that in mind for how to segment the audio.

Focus on getting lots of good sounding data. Hours. The models will improve, but this may be your only shot of acquiring the data.

Download the LJSpeech dataset and listen to it. See how it sounds, how it's separated. That is a fantastic dataset that has yielded tremendous results, and you can use it for inspiration.

asdfman1235y ago

Here's a simple and practical solution:

Get a decent audio headset, have it record the audio to her phone, and spend hours talking to her about whatever. Preferably in a reasonably quiet environment.

Just spend a lot of time talking. You don't have to talk to her through a headset. Just make sure hers is recording her voice.

It would be easy, painless, and probably good for the relationship too.

nutanc5y ago

At a minimum get the following list of sentences recorded in her voice, http://www.festvox.org/cmu_arctic/cmuarctic.data

Make sure the recordings are of a good quality. This will ensure that you will have a baseline TTS of her voice at the minimum.

arslnjmn5y ago

(off topic) Record a few things for her future self. E.g. favourite quotes, frequently used phrases.

zxter5y ago

Good advice! Maybe a few shoutouts to your future children.

bcatanzaro5y ago

Make sure to record with the best microphone you can find and in the quietest room you can find. Makes a huge difference in the resulting TTS.

adrianmonk5y ago

You might look at resources for ALS patients.

anaisbetts5y ago

Not exactly what you're asking for, but I wrote an app for this scenario:

https://play.google.com/store/apps/details?id=org.anaisbetts...

da39a3ee5y ago

This is probably a really stupid suggestion but just in case.

shockron225y ago

I have had good results with this. https://www.resemble.ai/ It is based on this open source work. If you want to run it yourself. https://github.com/CorentinJ/Real-Time-Voice-Cloning

The voice cloning can be done in a matter of minutes. (< an hour) Its also very easy to use the website.

Best of luck!

kw95y ago

benjohnson5y ago

Do you have children? Perhaps - record her reading a few favorite children's books.

jitendrac5y ago

underdeserver5y ago

Consider investing in a good microphone for recording. A Blue Yeti is ~$200.

DoreenMichele5y ago

Not to discourage you from making voice recordings and all that, but as someone who is handicapped and sometimes has trouble speaking because of it:

1. I spend a lot of time online. It doesn't matter so much there. I do a lot of typing.

2. My oldest son, who had serious output difficulties as a child, is talented at inferring what I need from a gesture and a grunt. This has proven enormously helpful.

3. Consider using her phone as a communication device. It's small and people tend to take their phone everywhere and she can type out what she wants to say.

5. Take some time to deal with the emotional stuff. It matters.

I'm sorry you are facing this. Best of luck.

seesawtron5y ago

Also the model is not saved in the browser with Colab so you might also want to do it locally to save it eventualy (if it comes to that).

All the best mate!

[0] Main repo: https://github.com/CorentinJ/Real-Time-Voice-Cloning [1] Google colab repo to try it out: https://github.com/CorentinJ/Real-Time-Voice-Cloning/blob/ma...

ardenwood5y ago

maps75y ago

That's really good of you. It's amazing to see this community be so helpful.

jameswestgate5y ago

This may also be useful. Free and open source.

https://www.tobiidynavox.com/en-gb/software/web-applications...

totetsu5y ago

The mycroft voice assistant has some tooling they used to create voices.

https://mycroft.ai/blog/mimic-2-is-live/ https://github.com/MycroftAI/mimic2

Search Results Web results

Festival Speech Synthesis has a tool for recording speech databases, and some tutorials for training festival voices. http://www.cstr.ed.ac.uk/research/projects/speechrecorder/

disabled5y ago

You need to do voice banking. It is imperative that you do so, so that your wife keeps her identity no matter what.

The voice banking services that people typically use are here: https://www.mndassociation.org/professionals/management-of-m...

There is also a free version of voice banking available, but I would only recommend it as a secondary tool: https://www.modeltalker.org/

This app (iOS and Android) for example, allows you to use your personal voice banked text-to-speech voice, to talk: https://therapy-box.co.uk/predictable

This is another great app that allows you to use your personal voice banked text-to-speech voice: https://www.assistiveware.com/products/proloquo4text

Source: Disabled engineering student, who is extremely interested in assistive technology. I would love to be a rehabilitation engineer.

stevewillows5y ago

It might also be worth recording normal conversations you have around the house as a fallback. You can always cut it up later and feed it into these systems.

Best of luck to the two of you. I really hope you don't ever need this technology.

KhoomeiK5y ago

[1] https://dam-prod.media.mit.edu/x/2018/03/23/p43-kapur_BRjFwE...

nighthawk4545y ago

This can be trained using only 5 Seconds of reference audio: https://google.github.io/tacotron/publications/speaker_adapt... https://arxiv.org/pdf/1806.04558.pdf

It's been mentioned a bit already, but thought it was worth calling out. This may be one of the lowest-overhead ways to start experimenting, at least in terms of data collection.

abjecton5y ago

The_rationalist5y ago

https://dathudeptrai.github.io/TensorflowTTS/ is the state of the art and feels natural enough

ooopsnevermind5y ago

cl0rkster5y ago

Also... Learn sign language. Some of the most beautiful and overlooked people are non-verbal. I've met several truly speechless people who had families that never learned to sign. It's sad for them.

redsh5y ago

Sorry about this. Record as much voice as you can now (stereo too?), then you’ll have time to find the right solution and improve it as the technology gets better in time

m4635y ago

I went through something similar with a parent years and years ago. I wanted to be able to do things to help with what would eventually be lost.

I have to say I didn't help as much as I thought I could and afterwards I was always wondering if I could have used this technology or that and done more.

So - I think you should recognize that you can only do so much, we're doing the best we can, and in the end we are all winging it.

YAFZ5y ago

You might contact the following company: https://www.acapela-group.com/solutions/acapela-voice-factor...

There's also open source TTS from Mozilla: https://github.com/mozilla/TTS

erogol5y ago

hvaoc5y ago

This is not open source but this was very good from their demo in terms of your own voice reproduction.

https://www.descript.com/lyrebird-ai

I hope good folks in there will help you, try reaching them.

https://m.youtube.com/watch?v=VnFC-s2nOtI

unstatusthequo5y ago

Love Descript and think it’s a great way to both record and get transcripts.

TriNetra5y ago

I've recently seen these two software on HN that maybe of some help:

deepfake for voice: https://github.com/CorentinJ/Real-Time-Voice-Cloning

Reproducing emotional voices: https://www.sonantic.io/

abinaya_rl5y ago

You are trying to do a beautiful thing. I don't have a knowledge of this subject, but I really wish you good luck on this project.

rajacombinator5y ago

inspectorG4dget5y ago

Nobody has mentioned VocalID and voice surrogacy [1] yet. This organization might be able to recreate her voice from historic samples for speech-to-text

[1] https://www.ted.com/talks/rupal_patel_synthetic_voices_as_un...

meristem5y ago

peterwwillis5y ago

Here's a story from the San Francisco Chronicle on saving Stephen Hawking's voice: https://www.sfchronicle.com/bayarea/article/The-Silicon-Vall...

loph5y ago

You might look at what Jamie Dupree has done.

https://www.cnn.com/2018/06/15/health/dystonia-jamie-dupree-...

He uses a text-to-speech system that sounds more-or-less like him.

jimlikeslimes5y ago

offsky5y ago

moooo995y ago

I don't really have anything to add to all the helpful comments under your thread. Do the preparation as much as you can, as long as your wife also wants this.

You said there is a small chance, so I really wish you and your wife the best of luck that she and her voice will be fine after the surgery.

eschaton20235y ago

egwor5y ago

I would also think of various phrases that need a lot emotion applied. e.g. for sensitive situations like someone's death, or for positive feedback like a wedding or a birthday or a thank you

Maybe also if she has a favourite book or a favourite quote, get those recorded too.

Back it all up!

mathnode5y ago

jll295y ago

Just let her read a couple of pieces of texts and record in high-quality (44 KHz).

Beyond the techical answer, you may want her to record some nice personal words addressed to your family that you can listen to later.

You don't need to do anything until the worst case materialises.

bb1235y ago

There is https://www.descript.com/lyrebird-ai which is in private beta right now, but looks to serve your needs exactly. Maybe reach out to them?

voicevoice505y ago

For recording training audio:

https://github.com/daanzu/speech-training-recorder

The recorder works with Python 3.6.10. Need to pip install webrtcvad also.

mproud5y ago

Roger Ebert has some articles about his troubles he encountered that may be worth a read.

techbio5y ago

diggum5y ago

https://www.modeltalker.org/vrec/ is a project for "voice banking" that might be able to help. It's not perfect yet.

bigmasterofnone5y ago

Good luck with what you are doing and more importantly, I wish your wife good health.

PopeDotNinja5y ago

josinalvo5y ago

IDK about the tech, but I would not worry about it right now. You dont need to play with the tech unless the bad unlikely outcome comes to pass.

Also, you might want to check wordlists by frequency to get a menu of common words, and ipa notation, to ensure you cover a good range of sounds

JDEW5y ago

Don’t know why you’re being downvoted. Thought it was insightful.

techwraith5y ago

I recently learned about a startup that is working on this kind of tech: https://phonetic.ai/

vinniejames5y ago

Take a look at Lyrebird

https://www.descript.com/lyrebird-ai?source=lyrebird

suchoudh5y ago

Please do keep us posted on the final outcome. We all pray for the surgery to go successful. ( Really appreciate your efforts for preparing for the worst case scenario)

csisnett5y ago

Vocalid.ai has an vocal bank where you can record yourself, and use other people's voices as well. It could be a good choice for her to use her own voice

fenesiistvan5y ago

These are the things i am coming always back to ycombinator.com. There are always valuable, intelligent replies here for all kind of issues you might have.

ponker5y ago

Make sure to not have her read too much. The vocal cords can get inflamed and increase the chance of complications/damage.

lowercased5y ago

diegoperini5y ago

Please let us know the good news if they arrive, preferable with Tell HN or something similar.

Good luck and best wishes! <3

pkinnaird5y ago

get a great microphone and have her read her favorite books. Go for books with lots of dialog and emotional content.

Later, you can extract all the phonemes you want from it and you will retain the emotional expressiveness of her voice.

She should probably sing some songs -- lullabies, rock, etc. Go for emotional diversity.

smolPotat5y ago

There's an app for that! It's called Vocable, it's open source and iOS and Android!!!

glonq5y ago

> I'd like to prepare, just in case, to have technology to reproduce her voice from keyboard or other input.

werdnapk5y ago

How small of a chance of her losing her ability to speak are you talking about here?

dragoon75y ago

Learn sign language.

klyrs5y ago

chubs5y ago

Acapela.com has a voice banking service

ghoshbishakh5y ago

Please. There is a small chance you said. Everything will be fine. But still carry on your research on the problem since it might help others.

swyx5y ago

even if there is a small chance, the preparation may help lesson the blow of what would still be a tremendous loss.

also it might just help pass the time since OP has 3 weeks.

kangaroozach5y ago

Descript.com has the tech.

Reach out to Andrew Mason.

dazuaz5y ago

Not bad for as a niche product Idea

evmolesworth5y ago

Does your wife want you to do this?

kangaroozach5y ago

Descript.com Andrew Mason

pezo19195y ago

Did you ask her about that? Make sure she is not freaking out of that.

j / k navigate · click thread line to collapse