Show HN: Slave – local dictation and TTS for macOS (3k words free) (opens in new tab)

(slave.bot)

2 pointsmesadb6mo ago1 comments

Slave is a macOS app for voice-in, voice-out.

Dictate in most languages. Types into any app.

Listen back with local Piper TTS.

3,000 words free. Then $6.99/month.

Next: joins meetings, transcribes, writes short notes. Later: lightweight Obsidian-style notes built from your text.

Built on Whisper + Piper. Runs on your machine.

Feedback on UX, speed, and pricing is welcome.

1 comments

mesadbOP6mo ago

Some implementation details, since getting this to work well was not trivial.

My goal was “press hotkey, start talking, see text within ~1–2 seconds” on an M2 MacBook Pro, and support multiple languages.

First attempts (cloud) – I tried Hugging Face real-time transcription. It worked but latency was all over the place and costs would not scale. – I tried OpenAI real-time transcription. Latency was better, but when there was background noise, it'd transcribe wrong things. Saw 200ms responses. I can bring that back if I can make it stable. – I briefly experimented with Gemini for transcribing and formatting multi-language text. Quality was not consistent enough compared to Whisper for Multi language.

Local experiments – I used FFmpeg + Whisper CLI in a bunch of ways: batching, buffering, trying to “stream” partial results out of Whisper to make it feel live. – I also tried a local Llama model to format the raw transcript into an email. On an M2 Pro this took ~2 seconds for short emails and got much slower for long text. It looked nice but the latency was not acceptable for everyday use.

Where I ended up (for now) – Current version sticks to FFmpeg + Whisper CLI locally, optimized for short chunks so you usually see text within about 1–2 seconds. – I dropped the heavy on-device LLM formatting and keep the formatting logic much simpler so it stays predictable and fast.

Next step is to re-introduce “smart” formatting and meeting notes, but only when I can do it without blowing up latency. Happy to dig deeper into any of these if people are curious.