Wit.ai (YC W14) Wants To Be The Twilio For Natural Language (opens in new tab)

(techcrunch.com)

108 pointsblandinw12y ago17 comments

17 comments

ddod12y ago

Since the Wit.ai guys gave a shout out on my thread earlier today, I thought I'd return the favor and show off my own implementation of a voice>>text>>comprehension system that I used to make my personal site voice interactive: https://benwasser.com

I'm really glad there's work and advancement being done in this arena and I'm hoping to see more people playing around with it.

jw201312y ago

Wow, ddod your voice>>text is awesome. Did you implement it all on your own?

Some improvement text>>comprehension will be great. Right now it does not understand many of my queries. Keep it up the good work!

ddod12y ago

Voice to text is a multi-million if not billion dollar endeavor. I simply implemented the Web Speech API, which ostensibly uses Google (and possibly Apple's) voice recognition system. I came up with the text comprehension bit, which is limited to the input quality (what I get from the API) and the training set. I've been adjusting and adding to the training set since I first released this, but the matching has worked as well as I'd hope for.

My training set is specifically designed to be conversational interview and personal questions, but I think a lot of the people who reach the site don't grasp that.

Here's some examples of the input it gets and has no clue what to do with:

- " um changes nice so when you change the things december " matched to: -1: no match

- " nexus 10 " matched to: 13: Huh?

- " pictures " matched to: 14: Huh?

- " call " matched to: 17: Cool

- "videos " matched to: 16: Huh?

- " change " matched to: 15: Huh?

And this is a set of input where people did get it:

- " hi what's your name " matched to: 23: My name

- "what's your name " matched to: 27: My name

- "what do you do " matched to: 31: What I do

- "why should we hire you " matched to: 38: Why you should hire me

-"what's your favorite food " matched to: 35: Food

-"wendy's see yourself in 5 years " matched to: 28: Goals

2 more replies

supremum12y ago

For some reason my mind first just picked up Twilio & Natural Language and got quite excited at the prospect of an additional layer on top of Twilio to run NLP on SMS/phone call streams.

Like if you could just create smart NLP around SMS menus, you'd solve the third world's sms-as-a-helpdesk frustrations.

Or think of the premium subscription services you could charge for when people can interact on the level of natural language instead of just replying with simple commands.

"for the first time, the developers themselves do not have to be experts in the field, or face the prospect of huge expense to bring in that technical knowledge from elsewhere." - I love that the building blocks of building cool experiences become more well-polished and easier to fit together.

It's a good time to be alive, that's for sure!

MortenK12y ago

Can anybody with practical experience developing with Wit.ai comment on how accurate and consistent it works? Is there any new and better working software behind this, compared to the current breed of frankly abysmal voice recognition software (Siri, Nuance etc)?

iandanforth12y ago

It's pretty good! The standard caveats around having a quiet area with a decent mic apply, but I get good results just chatting at my laptop.

However the cool thing about Wit is that they are constantly updating their suite of NL recognizers. The more you use the service, the better it gets, and it does so without having to buy a new release of Dragon. :)

ragebol12y ago

Its pretty good IMHO. For instance, when I say "Move to the next song" to control my music or "Go to the next room" for a robot I'm working on, it both works as it should even tough its easy to mix up the intentions of these sentences.

dangrossman12y ago

Their homepage (https://wit.ai/) says "stream audio to the API, get structured information in return", but the API docs say "send natural language sentences (text) and get structured information (JSON) in return".

That's disappointing since the only problem I ran into with doing home automation via a web application was the speech-to-text, not processing commands once they were in text. A list of regular expressions works quite well for that.

The HTML5 Speech Recognition API in Chrome kinda sucks. It does speech to text well, but reliably keeping the API listening for speech at all has been challenging. Even a bunch of code basically checking "has the webkitSpeechRecognition object borked itself yet? recreate it and restart listening" every two seconds doesn't work reliably.

I'd love a JavaScript API that can listen to the microphone, determine if anything has been spoken (versus silence or background noise), and when something that may be speech is detected, send it to another API endpoint that converts it to text.

Edit: They do take audio input, woo :) Thanks for the correction. https://wit.ai/docs/api#toc_9

mrmch12y ago

Their documentation has a pretty clear endpoint for sending raw audio: https://wit.ai/docs/api#toc_9

dangrossman12y ago

Thanks, don't know how I missed the links at the top of the page. I only checked each category on the left nav of the docs page.

blandinwOP12y ago

The docs were outdated indeed, we fixed this, thanks!

Breefield12y ago

They have JS, IOS and Android sdks, they all do speech to text, then the text NPL processing.

https://wit.ai/docs

endlessvoid9412y ago

I'm blown away by how much I can accomplish with wit.ai. I have my own personal jarvis / siri system thanks to this service.

Can't recommend this service enough.

parm28912y ago

Is your system open source? Would love to take a look at how it works.

feralmoan12y ago

A lot of love here for wit.ai, I was thinking about transactional contexts today (NLP conversations) and if I could support that kind of workflow with wit and 'oh hey there ya go its already in there as states!'. Great work!

j / k navigate · click thread line to collapse

17 comments

ddod12y ago

I'm really glad there's work and advancement being done in this arena and I'm hoping to see more people playing around with it.

jw201312y ago

Wow, ddod your voice>>text is awesome. Did you implement it all on your own?

Some improvement text>>comprehension will be great. Right now it does not understand many of my queries. Keep it up the good work!

ddod12y ago

My training set is specifically designed to be conversational interview and personal questions, but I think a lot of the people who reach the site don't grasp that.

Here's some examples of the input it gets and has no clue what to do with:

- " um changes nice so when you change the things december " matched to: -1: no match

- " nexus 10 " matched to: 13: Huh?

- " pictures " matched to: 14: Huh?

- " call " matched to: 17: Cool

- "videos " matched to: 16: Huh?

- " change " matched to: 15: Huh?

And this is a set of input where people did get it:

- " hi what's your name " matched to: 23: My name

- "what's your name " matched to: 27: My name

- "what do you do " matched to: 31: What I do

- "why should we hire you " matched to: 38: Why you should hire me

-"what's your favorite food " matched to: 35: Food

-"wendy's see yourself in 5 years " matched to: 28: Goals

2 more replies

supremum12y ago

For some reason my mind first just picked up Twilio & Natural Language and got quite excited at the prospect of an additional layer on top of Twilio to run NLP on SMS/phone call streams.

Like if you could just create smart NLP around SMS menus, you'd solve the third world's sms-as-a-helpdesk frustrations.

Or think of the premium subscription services you could charge for when people can interact on the level of natural language instead of just replying with simple commands.

It's a good time to be alive, that's for sure!

MortenK12y ago

iandanforth12y ago

It's pretty good! The standard caveats around having a quiet area with a decent mic apply, but I get good results just chatting at my laptop.

ragebol12y ago

dangrossman12y ago

Edit: They do take audio input, woo :) Thanks for the correction. https://wit.ai/docs/api#toc_9

mrmch12y ago

Their documentation has a pretty clear endpoint for sending raw audio: https://wit.ai/docs/api#toc_9

dangrossman12y ago

Thanks, don't know how I missed the links at the top of the page. I only checked each category on the left nav of the docs page.

blandinwOP12y ago

The docs were outdated indeed, we fixed this, thanks!

Breefield12y ago

They have JS, IOS and Android sdks, they all do speech to text, then the text NPL processing.

https://wit.ai/docs

endlessvoid9412y ago

I'm blown away by how much I can accomplish with wit.ai. I have my own personal jarvis / siri system thanks to this service.

Can't recommend this service enough.

parm28912y ago

Is your system open source? Would love to take a look at how it works.

feralmoan12y ago

j / k navigate · click thread line to collapse