Years ago it got sent to the cloud, but as long as you have an iPhone from the past few years it's on-device.
You cannot use an iPhone as a dictation device without reviewing the transcribed text, which IMO defeats the purpose of dictation.
Meanwhile, i've gotten excellent results on the iPhone from a Whipser->LLM pipeline.
[1] https://github.com/futo-org/android-keyboard/blob/master/LIC...
https://gitlab.futo.org/alex/voiceinput/-/blob/master/LICENS...
> FUTO Source First License 1.0
> You may use or modify the software only for non-commercial purposes
In each and every case I'm familiar with, streaming means "send the whole audio thus far to the inference engine, inference it, and send back the transcription"
I have a Flutter library that does the same flow as this (though via ONNX, so I can cover all platforms), and Whisper + Silero is ~identical to the interfaces I used at Google.
If the idea is streaming is when each audio byte is only sent once to the server, there's still an audio buffer accumulated -- its just on the server.
- Overlap compute with the user speaking: Not having to wait until all the speech has been acquired can massively reduce latency at the end of speech and allow a larger model to be used. This doesn't have to be the whole system, for instance an encoder can run in this fashion along audio as it comes in even if the final step of the system then runs in a non-streaming fashion.
- Produce partial results while the user is speaking: This can be just a UI nice to have, but it can also be much deeper, eg, a system can be activating on words or phrases in the input before the user is finished speaking which can dramatically change latency.
- Better segmentation: Whisper + Silero is just using VAD to make segments for Whisper, this is not at all the best you can do if you are actually decoding while you go. Looking at the results as you go allow you to make much better and faster segmentation decisions.
- streaming == I talk and the text appears as I talk
- batched == I talk, and after I'm done talking some processing happens and the text gets populated
Is your Flutter library available? And does it run locally? I'm looking for a good Flutter streaming (in the sense above) speech recognition library. vosk looks good, but it's lacking some configurability such as selecting audio source.
https://github.com/Helium314/HeliBoard
https://github.com/openboard-team/openboard
https://github.com/rkkr/simple-keyboard (guessing, since AOSP Keyboard works and this is a fork)
Not open source: https://www.microsoft.com/en-us/swiftkey
Does not have glide/swipe (reserved for symbols), but I just installed and giving it a shot: https://github.com/Julow/Unexpected-Keyboard
It does have glide typing, even.though I don't use it.
It rather uses long-tap to access multiple symbols, and can be split or pushed to a corner on devices with a big screen.
I'm very interested in using this, but I can't even find a way to try to troubleshoot it. I'm not finding usage instructions, never mind any kind of error messages. It just doesn't do anything.
This is especially interesting to me because the screenshot on the repo is from Vanadium, which strongly suggests to me that it's from a GrapheneOS device itself.
The thing I'm tripping over now is just that I keep pressing the button more than once when I'm done speaking because it's not clear that it registered the first time. If it could even just stay "pressed" or something while it processes the text, I think that would make it clearer. Any third state for the button would do I think.
Looking forward to using this! Thanks!
Ah, it currently uses the Jetpack Compose toggle button but I do suppose it does actually have three states instead of two. I initially wanted to add a loading circle inside the button but wasn't able to without messing up the padding and such.
Hope you enjoy using Transcribro!
I would pay for an app that did this.
This is an unaffiliated version looks like https://apps.apple.com/us/app/live-transcribe/id1471473738
I understand that the author trusts itself more than F-Droid, but as a user the opposite seems more relevant.
I'm not really sold on the argument... Also constant push/hype of GrapheneOS (and the "attitude" of it's devs) is mildly annoying...
I see the features listed[0] which seems like a reasonable feature set, but nothing unusual afaict.
If there has been a lot of hype can you tell me what people find compelling about it?