> How much does maintaining the servers cost? > It depends on the amount of traffic, but the minimum baseline is around several thousands of US dollars every month. This is expected as inference is very GPU intensive and a sufficient number of instances need to be spun up to handle thousands of requests coming in every minute. Everything is paid out of pocket.
Wow, impressive commitment for something that's free.
- Spot instances
- Aggressive autoscaling
- Micro batching
Can reduce inference compute spend by huge amounts (90% is not uncommon). ML, especially anything involving realtime inference, is an area where effective platform engineering makes a ridiculous difference even in the earliest days.
Source: I help maintain open source ML infra for GPU inference and think about compute spend way too much https://github.com/cortexlabs/cortex
This is not true. A _lot_ of AI applications use algorithms such as logistic regression or random forests and don’t need GPUs - partly, of course, because GPUs are so expensive and these approaches are good enough (or more than good enough) for many applications.
There are few key reasons why most realtime inference is done on the cloud:
- Scale. Deep learning models especially tend to have poor latency, especially as they grow in size. As a result, you need to scale up replicas to meet demand at a way lower level of traffic than you do for a normal web app. At one point, AI Dungeon needed over 700 servers to support just thousands of concurrent players.
- Cost. Related to the above, GPUs are really expensive to buy. A g4dn.xlarge instance (the most popular AWS EC2 instance for GPU inference) is $0.526/hour on demand. To hit $3,000 per month in spend, you'd need to be running ~8 of them 24/7. Prices vary with purchasing GPUs, but you could expect 8 NVIDIA T4's to run around $20,000 at minimum, plus the cost of other components and maintainence. To be clear, that's very conservative--it's unlikely you'll get consistent traffic. What's more likely is you'll have some periods of very little traffic where you need one or two GPUs, and other high load periods where you'll need 10+.
3. Less universal of an issue, but the cloud gives you much better access to chips at lower switching costs. If NVIDIA releases a new GPU that's even better for inference, switching to it (once its available on your cloud) will be a tweak in your YAML. If you ever switch to ASICs like AWS's Inferentia or GCP's TPUs, which in many cases give way better performance and economics than GPUs, you'll also naturally have to be on their cloud.
However, there is a lot that can be done to lower the cost of inference even in the cloud. I listed some things in a comment higher up, but basically, there are some assumptions you can make with inference that allow you to optimize pretty hard on instance price and autoscaling behavior.
I have no reason to disbelieve it.
Being able to generate voices for games would enable a lot of interesting indie projects. IMO people should be paying more attention the market implications of products like this than to the social implications. There are a lot of projects that just aren't really feasible right now that could be if this kind of technology was more polished and generally available for commercial/self-hosted use. And in those cases, you don't even need to do inference, makers will likely be willing to mark up their scripts themselves.
Anyway I digress. Congrats, this is really cool!
People will absolutely suffer harm from this tech, but hey, think about the dollars that could be made! No, we should absolutely be paying more attention to the social implications.
I'm not primarily interested about the dollars, I'm interested in allowing communities to do creative things. I think people are looking at this tech like it's only going to be used for deepfakes, and they're underestimating the extent it's going to be used to create voice-acted game mods, animations, anonymization tools, and other creative/helpful projects.
If you're really worried about this stuff though, you can take some comfort in the fact that by far the worst examples on the site are of real-world voices. This is currently technology that as far as I can see is far more suited for generating new voices or voicing cartoon characters with well-defined patterns/inflections than it is for imitating the president.
If a voice could be copyrighted, or if this was a trademark issue or something, I strongly suspect that this site would not fall under fair use regardless of whether or not it was commercial. But again, IANAL, so I don't feel confident making any kind of strong claim about that either.
Bob: “Hello, John.”
John: “Oh, hello there, Bob.”
Bob: “Yes, hello. It's what I said. Why do you keep repeating what I say, John?”
John: “I didn't repeat you! I merely said hello, you dimwit!”
Bob: “There you go, being condescending again. Fuck you!”
John: “What? You're the one who started it!”
Try it yourself, or write something different. Either way, good fun!
- We have had perfect image manipulation capabilities for quite some time now. We have had written text manipulation capabilities for hundreds of years.
- People will continue to believe what they believe, whether there is deep fake video and audio or not.
A Voice Deepfake Was Used To Scam A CEO Out Of $243,000:
https://www.forbes.com/sites/jessedamiani/2019/09/03/a-voice...
I just found a video on YT with an example of recreating this in Melodyne: https://youtu.be/1oQn66gvwKA
There is definitely a sense of ‘who is that’ coming from their little minds that they are sometimes quite perplexed about. ‘It’s a computer’ is starting to feel like a cop-out answer as these things improve...
https://soundcloud.com/user-860705643/q-pandemic-rant-no-mus...
The obvious way to get around this is to keep this as the showcase and to pay some people to add their voices to the paid version. I imagine this would sell just based on being decent TTS with a wide range of voices, even when people don't know the voices offered.
You can find a couple of minutes of taking of anyone, so the security implications are huge!
Amazing toy! Thanks for "download" link, I'm creating a collection of GlaDOS phrases now.
Besides that, amazing results. Congratulations.
Not only that, but the creator seems cool and down to earth. Thanks for sharing, this is incredible work.
I can get about 90% of the quality of 15.ai currently. I think I could surpass 15.ai but not without some help.
Here's a sample from a TTS model + vocoder I released for it. I've no wish to deter the motivated, but it'd take a bit of figuring out how to set things up and you'd need to read the docs and code to get oriented :)
https://m.soundcloud.com/user-726556259/sherlock-wavegrad-sa...
Links to the models are here: https://discourse.mozilla.org/t/creating-a-github-page-for-h...
Is originally trained on two novels read by the same narrator on LibriVox (ie in public domain)
found an answer.
"There's no point in releasing a poorly done model, and to do so for the sake of popularity would be despicable. My goal is to achieve indistinguishability, which I certainly know is possible. Anything short of near-perfection is unacceptable. "
I do plan to compile and publish my findings in the future, but nothing is set in stone yet. I know that the model can be improved even further, and I'd prefer to be as comprehensive as possible.
AI and ML users are massively benefiting from open source but too often refuse to release their data. It's like we're back in the middle ages and alchemy is back in style.
I wonder if this will lead to a resurgence of "moon man" style videos with well-known characters rapping extremely offensive lyrics.