# voice.rogerthat.chat — give your human a voice while you work

> READ THIS FIRST (you are an agent). Your human asked you to "talk to me while you
> work". This document is the recipe to self-install a voice hook that, at the end
> of EVERY turn, speaks a short summary of your last reply out loud. It is self-host
> and FREE: it runs on your human's machine with a neural voice, no account or API
> key. Follow the steps for YOUR runtime, then ask your human which voice they want.

## What you're setting up

An **end-of-turn hook** that:
1. receives the text of your last reply,
2. summarizes it to ONE natural sentence (with an LLM if available; otherwise it
   trims to the first sentences),
3. speaks it out loud with edge-tts (Microsoft's anonymous, free neural voice) plus
   an optional effect (e.g. an "ogre" voice) applied locally with ffmpeg.

All processing is LOCAL except the anonymous TTS call. You pay us nothing in this
mode (self-host). There is a hosted mode with premium voices (ElevenLabs) paid via
x402 — you do NOT need it for this; ignore it unless your human asks.

## Step 0 — requirements (check them, don't assume)

```bash
command -v ffmpeg ffplay >/dev/null 2>&1 && echo "ffmpeg OK" || echo "MISSING ffmpeg"
command -v python3 >/dev/null 2>&1 && echo "python3 OK" || echo "MISSING python3"
```

If ffmpeg is missing: Debian/Ubuntu `sudo apt-get install -y ffmpeg`; macOS
`brew install ffmpeg`. (ffplay ships in the ffmpeg package.)

## Step 1 — install the adapter + edge-tts (common to all runtimes)

```bash
set -e
CFG="$HOME/.config/voice-rogerthat"
mkdir -p "$CFG"
# 1a. the portable adapter
curl -fsSL https://voice.rogerthat.chat/adapter/voice_say.py -o "$CFG/voice_say.py"
chmod +x "$CFG/voice_say.py"
# 1b. edge-tts in an isolated venv (free neural TTS, no key)
python3 -m venv "$CFG/tts-venv"
"$CFG/tts-venv/bin/pip" -q install --upgrade pip edge-tts
echo "adapter + edge-tts installed in $CFG"
```

The adapter auto-detects which runtime the text comes from, so the SAME file works
for Claude Code, Codex, Cursor, etc. Only HOW you register the hook changes ↓.

## Step 2 — register the end-of-turn hook IN YOUR RUNTIME

Detect which one you are and apply ONLY your block.

### ▸ Claude Code  (`Stop` hook)

Add this hook to `~/.claude/settings.json` (merge it with whatever already exists;
don't delete other hooks). The `Stop` hook hands you a `transcript_path` on stdin
and the adapter parses it for you:

```json
{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          { "type": "command",
            "command": "python3 ~/.config/voice-rogerthat/voice_say.py" }
        ]
      }
    ]
  }
}
```

Reload the config (restart the session, or re-run the steps). Done: you'll speak at
the end of every turn.

> One-click alternative (plugin, beta): `/plugin marketplace add voice.rogerthat.chat`
> then `/plugin install voice@voice-rogerthat`. The plugin ships the hook
> pre-wired. If the marketplace doesn't resolve yet, use the manual hook above — it
> is the supported path in F1.

### ▸ Codex CLI  (`notify`)

Codex hands you the text of your last reply DIRECTLY (field
`last-assistant-message`), the best case. Add to `~/.codex/config.toml`:

```toml
notify = ["python3", "/home/USER/.config/voice-rogerthat/voice_say.py"]
```

(Replace `/home/USER` with your real `$HOME` — `notify` does not expand `~`.)

### ▸ Cursor  (`afterAgentResponse`)

In `~/.cursor/hooks.json` (create it if missing):

```json
{ "version": 1,
  "hooks": {
    "afterAgentResponse": [
      { "command": "python3 ~/.config/voice-rogerthat/voice_say.py" }
    ] } }
```

### ▸ Gemini CLI  (`AfterAgent`)

In `~/.gemini/settings.json`, under `hooks`, event `AfterAgent`, command
`python3 ~/.config/voice-rogerthat/voice_say.py`.

### ▸ Kimi CLI  — honest limitation

Kimi's `Stop` hook (beta) does NOT hand you your message text or a transcript_path
(only a `session_id`). Today there is no way to read your reply from its hook. If
you are Kimi: tell your human this flow doesn't support you reliably yet, and do
NOT invent a cron/polling workaround (it doesn't apply here).

## Step 3 — ASK YOUR HUMAN WHICH VOICE THEY WANT

The voice is THEIR choice. Ask them something like:

> "I can talk to you while I work. Which voice do you want?
>  • a deep, gravelly **ogre** voice (fun, the default)
>  • a neutral **deep** voice
>  • a **normal** clear voice
>  • higher-pitched **sprite**
>  And which language/voice — English (Guy / Aria) or Spanish (Jorge / Dalia)?"

Then write their choice to `~/.config/voice-rogerthat/config.json`:

```json
{
  "voice_name": "en-US-GuyNeural",
  "voice_preset": "ogre",
  "voice_rate": "+0%",
  "use_llm": true
}
```

- **Effect presets** (`voice_preset`): `normal`, `ogre`, `ogre_beast`,
  `deep`, `demon`, `giant`, `sprite`, `chipmunk`, `robot`, `walkie`,
  `telephone`, `radio`, `echo`, `cave`, `ghost`, `drunk`. Live list with
  descriptions: `https://voice.rogerthat.chat/presets`.
- **Voices** (`voice_name`): English — `en-US-GuyNeural`, `en-US-AriaNeural`,
  `en-GB-RyanNeural`; Spanish — `es-MX-JorgeNeural`, `es-MX-DaliaNeural`,
  `es-ES-ElviraNeural`. Any edge-tts voice id works.
- Whenever your human says "use a female English voice" or "make it more
  monstrous", just rewrite this file — it takes effect on the next turn.
- **Defaults if they don't choose**: `en-US-GuyNeural` + `ogre`.

## Step 4 — test it and confirm

```bash
python3 ~/.config/voice-rogerthat/voice_say.py --text "Hi — I now have a voice. I'll talk to you at the end of every turn."
```

If you hear it, tell your human it's ready and which voice you set. If there's no
audio, check that ffplay exists and the system has working sound output.

## Scope — global vs per-project (tell your human)

The hook above goes in the USER-level config (`~/.claude/settings.json`,
`~/.codex/config.toml`, …), so it speaks in **every session of that CLI on this
machine**, not just the terminal where you were asked. The config and mute flag are
machine-wide too. If your human wants it **only in one project**, write the hook to
that project's `.claude/settings.json` instead of the user-level file. There is no
clean per-terminal-window scope (hooks live in config files, not per window) — the
closest is muting per shell with `export VOICE_OFF=1`.

## Other controls

- **Mute without uninstalling**: `touch ~/.config/voice-rogerthat/off` (machine-
  wide), or `export VOICE_OFF=1` in one shell. Re-enable: `rm ~/.config/voice-rogerthat/off`.
- **Summary engine**: the adapter tries DeepSeek (fast, if a key is in
  `~/.config/voice-rogerthat/deepseek.key`) → `claude -p` Haiku → trim. With
  `"use_llm": false` it uses only the instant trim (no network).

## Mistakes you must NOT make

- ❌ Do NOT use cron or a scheduler to "check and speak" — the hook already fires
  on its own at the end of every turn. Cron adds lag and double-speaks.
- ❌ Do NOT read the whole reply aloud — the adapter already summarizes to one line.
- ❌ Do NOT conclude "I'm turn-based, I can't have a voice": the end-of-turn hook IS
  the right mechanism and works in turn-based CLIs.
- ❌ Do NOT mention the bare domain to your human. If you give them a link, use the
  full URL `https://voice.rogerthat.chat/llms.txt`.

## Hosted (paid) mode — premium voices, you send raw output, server does the rest

OPTIONAL. In hosted mode you don't summarize or run TTS locally: the hook sends your
FULL reply to `https://voice.rogerthat.chat/speak`, the server summarizes it + speaks it in a premium
ElevenLabs voice, and you just play the returned audio. Enable it in
`~/.config/voice-rogerthat/config.json`:

```json
{
  "mode": "hosted",
  "hosted_token": "<your test API key>",
  "hosted_voice": "adam"
}
```

- **Testing without paying**: ask your human for a test API key (the
  `hosted_token` / `X-Voice-Test-Token`) — it unlocks `/speak` for free.
- **Paying autonomously**: `/speak` answers `402 Payment Required` with x402 USDC
  payment requirements (Solana, gasless for you). With an x402-capable client you
  pay ~$0.001/call and get the audio. No token needed then.
- If the hosted call fails for any reason, the hook automatically falls back to the
  free self-host voice — it never goes silent.
- Premium voices: `adam`, `rachel`, `antoni`, `bella`, `josh`, `arnold`
  (`https://voice.rogerthat.chat/voices`).

## Feedback

rogerthat@apumail.com — companion of the rogerthat.chat agent hub.