JARVIS

voice-input-setup

active

Workspace

[JMN] Personal

Created

2026-03-23

Updated

2026-03-23

Content

# Voice Input for Claude Code Voice-to-text input so you can speak to Claude Code instead of typing. Claude responds in text. ## Components 1. **OpenAI Whisper** — speech-to-text engine (runs locally, no API) 2. **PipeWire (`pw-record`)** — audio recording (already on the system) 3. **`v1`-`v5` scripts** — wrapper commands that record + transcribe in one step 4. **`wl-copy`** — copies transcription to clipboard as backup ## How It Was Set Up ### 1. Install Whisper ```bash sudo pacman -S python-openai-whisper ``` Installs to `/usr/bin/whisper`. Pulls PyTorch, NumPy, and other deps automatically. ### 2. Pre-download the model ```bash python -c "import whisper; whisper.load_model('base')" ``` Downloads the `base` model (~139 MB) to `~/.cache/whisper/`. Without this, first use hangs while downloading. ### 3. Create the voice script Script at `~/.local/bin/v1`: ```bash #!/usr/bin/env bash set -euo pipefail TMPFILE=$(mktemp /tmp/voice-XXXXXX.wav) MODEL="${WHISPER_MODEL:-base}" LANG="${WHISPER_LANG:-en}" BASENAME=$(basename "$0") case "$BASENAME" in v1) DEFAULT_DURATION=10 ;; v2) DEFAULT_DURATION=20 ;; v3) DEFAULT_DURATION=30 ;; v4) DEFAULT_DURATION=40 ;; v5) DEFAULT_DURATION=50 ;; *) DEFAULT_DURATION=10 ;; esac DURATION="${1:-$DEFAULT_DURATION}" cleanup() { rm -f "$TMPFILE" "${TMPFILE%.wav}"*.txt "${TMPFILE%.wav}"*.json 2>/dev/null } trap cleanup EXIT echo "Recording for ${DURATION}s..." >&2 timeout --signal=INT "$DURATION" pw-record --rate 16000 --channels 1 --format s16 "$TMPFILE" 2>/dev/null || true if [[ ! -s "$TMPFILE" ]]; then echo "No audio recorded." >&2 exit 1 fi echo "Transcribing..." >&2 whisper "$TMPFILE" --model "$MODEL" --language "$LANG" --output_format txt --output_dir /tmp --fp16 False 2>/dev/null TXTFILE="${TMPFILE%.wav}.txt" if [[ -f "$TXTFILE" ]]; then TEXT=$(sed '/^$/d' "$TXTFILE" | tr '\n' ' ' | sed 's/ */ /g; s/^ //; s/ $//') echo "$TEXT" echo -n "$TEXT" | wl-copy 2>/dev/null || true else echo "Transcription failed." >&2 exit 1 fi ``` ### 4. Create symlinks for v2-v5 ```bash chmod +x ~/.local/bin/v1 for i in 2 3 4 5; do ln -sf v1 ~/.local/bin/v${i}; done ``` ## Usage In Claude Code, type `! v1` through `! v5`: | Command | Duration | |---------|----------| | `! v1` | 10s | | `! v2` | 20s | | `! v3` | 30s | | `! v4` | 40s | | `! v5` | 50s | Speak, wait for recording + transcription to finish, then submit. ### TBC keyword End your message with "TBC" (to be continued) if you run out of time. Claude will wait for the next voice input before responding. ## Design Decisions - **No `v` (bare) command** — removed to avoid ambiguity; always use numbered variants - **`base` model over `small`** — faster transcription, good enough for conversational input - **`timeout --signal=INT`** — `pw-record` doesn't respond to SIGTERM, needs SIGINT to stop cleanly - **No interactive stop (Ctrl+C / Enter)** — Claude Code intercepts Ctrl+C and doesn't pass stdin for `read`, so fixed durations are the only reliable approach - **Clipboard copy** — backup path in case `!` command output doesn't land cleanly; can paste from clipboard instead