voice-input-setup
activeWorkspace
[JMN] Personal
Created
2026-03-23
Updated
2026-03-23
Content
# Voice Input for Claude Code
Voice-to-text input so you can speak to Claude Code instead of typing. Claude responds in text.
## Components
1. **OpenAI Whisper** — speech-to-text engine (runs locally, no API)
2. **PipeWire (`pw-record`)** — audio recording (already on the system)
3. **`v1`-`v5` scripts** — wrapper commands that record + transcribe in one step
4. **`wl-copy`** — copies transcription to clipboard as backup
## How It Was Set Up
### 1. Install Whisper
```bash
sudo pacman -S python-openai-whisper
```
Installs to `/usr/bin/whisper`. Pulls PyTorch, NumPy, and other deps automatically.
### 2. Pre-download the model
```bash
python -c "import whisper; whisper.load_model('base')"
```
Downloads the `base` model (~139 MB) to `~/.cache/whisper/`. Without this, first use hangs while downloading.
### 3. Create the voice script
Script at `~/.local/bin/v1`:
```bash
#!/usr/bin/env bash
set -euo pipefail
TMPFILE=$(mktemp /tmp/voice-XXXXXX.wav)
MODEL="${WHISPER_MODEL:-base}"
LANG="${WHISPER_LANG:-en}"
BASENAME=$(basename "$0")
case "$BASENAME" in
v1) DEFAULT_DURATION=10 ;;
v2) DEFAULT_DURATION=20 ;;
v3) DEFAULT_DURATION=30 ;;
v4) DEFAULT_DURATION=40 ;;
v5) DEFAULT_DURATION=50 ;;
*) DEFAULT_DURATION=10 ;;
esac
DURATION="${1:-$DEFAULT_DURATION}"
cleanup() {
rm -f "$TMPFILE" "${TMPFILE%.wav}"*.txt "${TMPFILE%.wav}"*.json 2>/dev/null
}
trap cleanup EXIT
echo "Recording for ${DURATION}s..." >&2
timeout --signal=INT "$DURATION" pw-record --rate 16000 --channels 1 --format s16 "$TMPFILE" 2>/dev/null || true
if [[ ! -s "$TMPFILE" ]]; then
echo "No audio recorded." >&2
exit 1
fi
echo "Transcribing..." >&2
whisper "$TMPFILE" --model "$MODEL" --language "$LANG" --output_format txt --output_dir /tmp --fp16 False 2>/dev/null
TXTFILE="${TMPFILE%.wav}.txt"
if [[ -f "$TXTFILE" ]]; then
TEXT=$(sed '/^$/d' "$TXTFILE" | tr '\n' ' ' | sed 's/ */ /g; s/^ //; s/ $//')
echo "$TEXT"
echo -n "$TEXT" | wl-copy 2>/dev/null || true
else
echo "Transcription failed." >&2
exit 1
fi
```
### 4. Create symlinks for v2-v5
```bash
chmod +x ~/.local/bin/v1
for i in 2 3 4 5; do ln -sf v1 ~/.local/bin/v${i}; done
```
## Usage
In Claude Code, type `! v1` through `! v5`:
| Command | Duration |
|---------|----------|
| `! v1` | 10s |
| `! v2` | 20s |
| `! v3` | 30s |
| `! v4` | 40s |
| `! v5` | 50s |
Speak, wait for recording + transcription to finish, then submit.
### TBC keyword
End your message with "TBC" (to be continued) if you run out of time. Claude will wait for the next voice input before responding.
## Design Decisions
- **No `v` (bare) command** — removed to avoid ambiguity; always use numbered variants
- **`base` model over `small`** — faster transcription, good enough for conversational input
- **`timeout --signal=INT`** — `pw-record` doesn't respond to SIGTERM, needs SIGINT to stop cleanly
- **No interactive stop (Ctrl+C / Enter)** — Claude Code intercepts Ctrl+C and doesn't pass stdin for `read`, so fixed durations are the only reliable approach
- **Clipboard copy** — backup path in case `!` command output doesn't land cleanly; can paste from clipboard instead