🎙️

golden-voice

Clone your voice, speak anything, pay nothing

v0.1.0 GitHub · MIT License · macOS

What it does

Record 30 seconds of yourself talking. golden-voice clones your voice locally using XTTS v2 and gives you a one-command TTS that sounds like you. No cloud, no API keys, no monthly bill.

golden-voice setup my-voice-sample.wav
golden-voice speak "The deployment looks clean. Ship it."
cat summary.md | golden-voice speak --stdin

Why it exists

We were paying ElevenLabs $100/month for voice narration in our dev tools. It burned through quota fast with multiple sessions running, the quality was good but the cost didn't scale, and we wanted something we owned.

So we built it. One afternoon, zero budget. Now every developer on the team has their own voice clone running on their laptop.

How it works

You record 15-30 seconds of yourself talking naturally
XTTS v2 (open source, from Coqui) analyzes your voice — pitch, cadence, timbre
Any text you pipe in gets synthesized in your voice, locally, on CPU
Optional sox effects — reverb, chorus, echo for flavor

Generation takes ~15-20 seconds on Apple Silicon. For instant feedback, use --quick which falls back to macOS say with warm voices.

Install

git clone https://github.com/goldenfocus/golden-cloud.git
cd golden-cloud/blocks/golden-voice
bash install.sh

Quality vs sample length

6 seconds

~60% — recognizable but off

15 seconds

~75% — clearly you, slight artifacts

30 seconds

~85% — friends do a double-take

Requirements

macOS (Apple Silicon or Intel)
Python 3.11 (installer handles via brew)
~2GB disk for the XTTS model (downloaded once)
A 15-30 second recording of your voice

A GoldenFocus block

Part of the GoldenFocus ecosystem — tools that fix the world piece by piece. Free, open, built to share.