Transcribro: On-device Accurate Speech-to-text (opens in new tab)

(github.com)

165 pointsthebiblelover71y ago60 comments

60 comments

Looks similar to the new FUTO keyboard: https://voiceinput.futo.org/

I've been using this for a while (the voice input, not their keyboard) and it's so refreshing to be able to just speak and have the output come out as fully formed, well punctuated sentences with proper capitalization.

james2doyle1y ago

I agree. No more "speaking punctuation". Just talk as normal and it comes out fully formed

1 more reply

leobg1y ago

Anything like that available for iOS?

crazygringo1y ago

iOS already has on-device dictation built into the standard keyboard.

Years ago it got sent to the cloud, but as long as you have an iPhone from the past few years it's on-device.

1 more reply

brylie1y ago

Aiko, mentioned elsewhere, includes a local copy of the OpenAI Whisper model: https://apps.apple.com/app/aiko/id1672085276

b33f1y ago

Aiko is a free app for iOS and macOS that also uses whisper for local TTS

gala8y1y ago

There is also Sayboard (open-source, multiple languages): https://github.com/ElishaAz/Sayboard

kolme1y ago

This looks great! I've been wanting to drop the Swipe keyboard ever since I saw sneaky ads on it (like me typing "Google Maps" and getting "Bing Maps" as a "suggestion").

yjftsjthsd-h1y ago

But open source, which is a pretty big difference

grandma_tea1y ago

FUTO and Transcribro are open source.

2 more replies

flax1y ago

Documentation severely lacking. I wanted to know whether this does streaming or only batch, as well as examples for integrating with Android apps.

soupslurpr1y ago

It uses VAD and processes after it detects no speech for 3 seconds, so only batch. Examples for integrating with Android apps? Like apps that can use it? Pretty much any app that uses Android's SpeechRecognizer class if you set Transcribro as the user-selected speech recognizer or if the app uses Transcribro explicitly. For example, Google Maps uses the user-selected speech recognizer when it doesn't detect Google's speech services on the system.

pants21y ago

Considering it uses Whisper, it's probably not streaming

refulgentis1y ago

I did some core work on TTS at Google, at several layers, and I've never quite understood what people mean by streaming vs. not.

In each and every case I'm familiar with, streaming means "send the whole audio thus far to the inference engine, inference it, and send back the transcription"

I have a Flutter library that does the same flow as this (though via ONNX, so I can cover all platforms), and Whisper + Silero is ~identical to the interfaces I used at Google.

If the idea is streaming is when each audio byte is only sent once to the server, there's still an audio buffer accumulated -- its just on the server.

3 more replies

yewenjie1y ago

Seems like Gboard is incompatible with it. Is there a good enough open source alternative to Gboard in 2024 that has smooth glide-typing and a similar layout?

SparkyMcUnicorn1y ago

Any of these should work.

https://github.com/Helium314/HeliBoard

https://github.com/openboard-team/openboard

https://github.com/rkkr/simple-keyboard (guessing, since AOSP Keyboard works and this is a fork)

Not open source: https://www.microsoft.com/en-us/swiftkey

Does not have glide/swipe (reserved for symbols), but I just installed and giving it a shot: https://github.com/Julow/Unexpected-Keyboard

Grimblewald1y ago

Unexpected keyboard is unexpectedly awesome. Looks a bit dated, but boy does it have some functionality packed into it.

nine_k1y ago

My choice is https://github.com/AnySoftKeyboard/AnySoftKeyboard/

It does have glide typing, even.though I don't use it.

It rather uses long-tap to access multiple symbols, and can be split or pushed to a corner on devices with a big screen.

smeej1y ago

Not sure what I'm doing wrong, but I tried installing it on a GrapheneOS device with Play Services installed and nothing happened. When I pushed the mic button, it changed to look pressed for a second, and went back to normal. Nothing happened when I spoke. Tried holding it down while speaking. Still nothing.

I'm very interested in using this, but I can't even find a way to try to troubleshoot it. I'm not finding usage instructions, never mind any kind of error messages. It just doesn't do anything.

This is especially interesting to me because the screenshot on the repo is from Vanadium, which strongly suggests to me that it's from a GrapheneOS device itself.

soupslurpr1y ago

You're correct I do use GrapheneOS. Hm do you have the global microphone toggle off? There's an upstream issue that causes SpeechRecognizer implementations to silently fail when the microphone toggle is off. You may have to force-stop Transcribro after turning it on.

https://github.com/soupslurpr/Transcribro/issues/3

smeej1y ago

I didn't think I did, but cycling it a couple times and restarting did fix! Great guess!

The thing I'm tripping over now is just that I keep pressing the button more than once when I'm done speaking because it's not clear that it registered the first time. If it could even just stay "pressed" or something while it processes the text, I think that would make it clearer. Any third state for the button would do I think.

Looking forward to using this! Thanks!

1 more reply

lawgimenez1y ago

This is cool, I get to read another Jetpack Compose codebase since I am halfway through migrating our app to Jetpack. So this helps a lot.

tmaly1y ago

I wish there was something where I could transcribe iPhone voice memos to text.

I would pay for an app that did this.

cee_el1231y ago

Google has an app called live transcribe on Android but there's no iPhone version

This is an unaffiliated version looks like https://apps.apple.com/us/app/live-transcribe/id1471473738

hidelooktropic1y ago

The microphone icon on the keyboard does this.

swyx1y ago

is there an iPhone version of this? custom keyboard?

crancher1y ago

Accrescent hype is comically overdone.

free_bip1y ago

I looked in the GitHub issues and there's a closed issue for F-droid inclusion. The author states that F-droid "Doesn't meet their requirements" but doesn't elaborate. I wonder what F-droid is missing that they need so much?

okso1y ago

F-Droid only packages open-source software and rebuilds it from source, while installing from Accrescent would move all trust to the developer, even if the license changes to proprietary.

I understand that the author trusts itself more than F-Droid, but as a user the opposite seems more relevant.

ementally1y ago

Reason https://www.privacyguides.org/en/android/#f-droid

1 more reply

okso1y ago

Link: https://github.com/soupslurpr/Transcribro/issues/9

mijoharas1y ago

I only just saw it from this project.

I see the features listed[0] which seems like a reasonable feature set, but nothing unusual afaict.

If there has been a lot of hype can you tell me what people find compelling about it?

[0] https://accrescent.app/

j / k navigate · click thread line to collapse

60 comments

james2doyle1y ago

Looks similar to the new FUTO keyboard: https://voiceinput.futo.org/

iamjackg1y ago

james2doyle1y ago

I agree. No more "speaking punctuation". Just talk as normal and it comes out fully formed

1 more reply

leobg1y ago

Anything like that available for iOS?

crazygringo1y ago

iOS already has on-device dictation built into the standard keyboard.

Years ago it got sent to the cloud, but as long as you have an iPhone from the past few years it's on-device.

1 more reply

brylie1y ago

Aiko, mentioned elsewhere, includes a local copy of the OpenAI Whisper model: https://apps.apple.com/app/aiko/id1672085276

b33f1y ago

Aiko is a free app for iOS and macOS that also uses whisper for local TTS

gala8y1y ago

There is also Sayboard (open-source, multiple languages): https://github.com/ElishaAz/Sayboard

kolme1y ago

This looks great! I've been wanting to drop the Swipe keyboard ever since I saw sneaky ads on it (like me typing "Google Maps" and getting "Bing Maps" as a "suggestion").

yjftsjthsd-h1y ago

But open source, which is a pretty big difference

grandma_tea1y ago

FUTO and Transcribro are open source.

2 more replies

flax1y ago

Documentation severely lacking. I wanted to know whether this does streaming or only batch, as well as examples for integrating with Android apps.

soupslurpr1y ago

pants21y ago

Considering it uses Whisper, it's probably not streaming

refulgentis1y ago

I did some core work on TTS at Google, at several layers, and I've never quite understood what people mean by streaming vs. not.

In each and every case I'm familiar with, streaming means "send the whole audio thus far to the inference engine, inference it, and send back the transcription"

I have a Flutter library that does the same flow as this (though via ONNX, so I can cover all platforms), and Whisper + Silero is ~identical to the interfaces I used at Google.

If the idea is streaming is when each audio byte is only sent once to the server, there's still an audio buffer accumulated -- its just on the server.

3 more replies

yewenjie1y ago

Seems like Gboard is incompatible with it. Is there a good enough open source alternative to Gboard in 2024 that has smooth glide-typing and a similar layout?

SparkyMcUnicorn1y ago

Any of these should work.

https://github.com/Helium314/HeliBoard

https://github.com/openboard-team/openboard

https://github.com/rkkr/simple-keyboard (guessing, since AOSP Keyboard works and this is a fork)

Not open source: https://www.microsoft.com/en-us/swiftkey

Does not have glide/swipe (reserved for symbols), but I just installed and giving it a shot: https://github.com/Julow/Unexpected-Keyboard

Grimblewald1y ago

Unexpected keyboard is unexpectedly awesome. Looks a bit dated, but boy does it have some functionality packed into it.

nine_k1y ago

My choice is https://github.com/AnySoftKeyboard/AnySoftKeyboard/

It does have glide typing, even.though I don't use it.

It rather uses long-tap to access multiple symbols, and can be split or pushed to a corner on devices with a big screen.

smeej1y ago

I'm very interested in using this, but I can't even find a way to try to troubleshoot it. I'm not finding usage instructions, never mind any kind of error messages. It just doesn't do anything.

This is especially interesting to me because the screenshot on the repo is from Vanadium, which strongly suggests to me that it's from a GrapheneOS device itself.

soupslurpr1y ago

https://github.com/soupslurpr/Transcribro/issues/3

smeej1y ago

I didn't think I did, but cycling it a couple times and restarting did fix! Great guess!

Looking forward to using this! Thanks!

1 more reply

lawgimenez1y ago

This is cool, I get to read another Jetpack Compose codebase since I am halfway through migrating our app to Jetpack. So this helps a lot.

tmaly1y ago

I wish there was something where I could transcribe iPhone voice memos to text.

I would pay for an app that did this.

cee_el1231y ago

Google has an app called live transcribe on Android but there's no iPhone version

This is an unaffiliated version looks like https://apps.apple.com/us/app/live-transcribe/id1471473738

hidelooktropic1y ago

The microphone icon on the keyboard does this.

swyx1y ago

is there an iPhone version of this? custom keyboard?

crancher1y ago

Accrescent hype is comically overdone.

free_bip1y ago

okso1y ago

F-Droid only packages open-source software and rebuilds it from source, while installing from Accrescent would move all trust to the developer, even if the license changes to proprietary.

I understand that the author trusts itself more than F-Droid, but as a user the opposite seems more relevant.

ementally1y ago

Reason https://www.privacyguides.org/en/android/#f-droid

1 more reply

okso1y ago

Link: https://github.com/soupslurpr/Transcribro/issues/9

mijoharas1y ago

I only just saw it from this project.

I see the features listed[0] which seems like a reasonable feature set, but nothing unusual afaict.

If there has been a lot of hype can you tell me what people find compelling about it?

[0] https://accrescent.app/

j / k navigate · click thread line to collapse