- 引入 WebVoiceProcessor 处理 getUserMedia、AudioContext 生命周期和 WASM 重采样 - 删除自定义 AudioWorklet (audio-processor.ts) 和线性插值重采样器 (resample.ts) - 改善音频采集稳定性:自动检测 AudioContext suspended/closed 状态并重建 - 更精确的错误提示:区分权限拒绝、设备未找到、设备异常
7.5 KiB
AGENTS.md — VoicePaste
Project Overview
VoicePaste: phone as microphone via browser → LAN WebSocket → Go server → Doubao ASR → real-time preview on phone → auto-paste to computer's focused app. Single Go binary with embedded frontend.
Tech Stack
- Backend: Go 1.25+, Fiber v3, fasthttp/websocket, CGO required (robotgo + clipboard)
- Frontend: React 19, TypeScript, Zustand, Vite 7, Tailwind CSS v4, Biome 2, bun (package manager + runtime)
- Tooling: Taskfile (not Make), mise (Go + bun + task)
- ASR: Doubao Seed-ASR-2.0 via custom binary WebSocket protocol
Build & Run Commands
# Install mise tools (go, bun, task)
mise install
# Build everything (frontend + Go binary → dist/)
task
# Build frontend only
task build:frontend
# Run (build + execute)
task run
# Dev mode (go run, skips frontend build)
task dev
# Clean all artifacts
task clean
# Tidy Go modules
task tidy
Frontend (run from web/)
bun install # Install deps
bun run build # Vite production build
bun run dev # Vite dev server
bun run lint # Biome check (lint + format)
bun run lint:fix # Biome check --write (auto-fix)
bun run typecheck # tsc --noEmit
Go
go vet ./... # Lint
go build -o dist/voicepaste . # Build (add .exe on Windows)
No test suite exists yet. No go test targets.
Project Structure
main.go # Entry point, embed.FS, TLS init, server startup
internal/
config/config.go # YAML + env var config, fsnotify hot-reload, atomic global
server/server.go # Fiber v3 HTTPS server, static files from embed.FS
server/net.go # LAN IP detection
tls/tls.go # AnyIP cert download/cache + self-signed fallback
tls/generate.go # Self-signed cert generation
ws/protocol.go # JSON message types (start/stop/paste/partial/final/pasted/error)
ws/handler.go # WS upgrade, token auth, session lifecycle, text accumulation, paste
asr/protocol.go # Doubao binary protocol codec (4-byte header, gzip)
asr/client.go # WSS client to Doubao, audio streaming, result forwarding
paste/paste.go # clipboard.Write + robotgo key simulation (Ctrl+V / Cmd+V)
web/
index.html # HTML shell with React root
vite.config.ts # Vite config (React + Tailwind plugins)
biome.json # Biome config (lint, format, Tailwind class sorting)
tsconfig.json # TypeScript strict config (React JSX)
src/
main.tsx # React entry point
App.tsx # Root component: composes hooks + layout
app.css # Tailwind imports, design tokens (@theme), keyframes
stores/
app-store.ts # Zustand store: connection, recording, preview, history, toast
hooks/
useWebSocket.ts # WS client hook: connect, reconnect, message dispatch
useRecorder.ts # Audio pipeline hook: WebVoiceProcessor (16kHz Int16 PCM capture)
components/
StatusBadge.tsx # Connection status indicator
PreviewBox.tsx # Real-time transcription preview
MicButton.tsx # Push-to-talk button with animations
HistoryList.tsx # Transcription history with re-send
Code Style — Go
Imports
Group in stdlib → external → internal order, separated by blank lines:
import (
"fmt"
"log/slog"
"github.com/gofiber/fiber/v3"
"github.com/imbytecat/voicepaste/internal/config"
)
Use aliases only to avoid collisions: crypto_tls "crypto/tls", vpTLS "...internal/tls", wsMsg "...internal/ws".
Logging
Use log/slog exclusively. Structured key-value pairs:
slog.Info("message", "key", value)
slog.Error("failed to X", "err", err)
Per-connection loggers via slog.With("remote", addr).
Error Handling
- Always wrap with context:
fmt.Errorf("dial doubao: %w", err) - Return errors up; log at the boundary (main, handler entry)
- Never suppress errors silently.
slog.Warnfor non-fatal,slog.Error+ exit/return for fatal - Never use
as any,@ts-ignore, or empty catch blocks
Naming
- Package names: short, lowercase, single word (
asr,ws,paste,config) - Exported types:
PascalCasewith doc comments - Unexported:
camelCase - Constants:
PascalCasefor exported,camelCasefor unexported - Acronyms stay uppercase:
ASR,TLS,WS,URL,IP
Patterns
sync.Mutexfor shared state,chanfor goroutine communicationatomic.Valuefor hot-reloadable config- Goroutine cleanup:
defer,sync.WaitGroup,closeCh chan struct{} - Fiber v3 middleware pattern for auth checks before WS upgrade
Code Style — TypeScript (Frontend)
Formatting (Biome)
- Indent: tabs
- Quotes: double quotes
- Semicolons: default (enabled)
- Organize imports: enabled via Biome assist
TypeScript Config
-
strict: true,noUnusedLocals,noUnusedParameters -
Target: ES2022, module: ESNext, bundler resolution
-
DOM + DOM.Iterable libs
-
React 19 with functional components and hooks
-
Zustand for global state management (connection, recording, preview, history, toast)
-
Custom hooks for imperative APIs:
useWebSocket,useRecorder -
Zustand
getState()in hooks/callbacks to avoid stale closures -
Pointer Events for touch/mouse (not touch + mouse separately)
-
@picovoice/web-voice-processor for audio capture (16kHz Int16 PCM, WASM resampling)
-
WebVoiceProcessor handles getUserMedia, AudioContext lifecycle, cross-browser compat
-
WebSocket: binary for audio frames, JSON text for control messages
-
Tailwind CSS v4 with
@themedesign tokens; minimal custom CSS (keyframes only)
Language & Locale
- UI text: Chinese (中文) — this app is for family members
- Git commits: Chinese, conventional format:
feat:,fix:,chore:,refactor: - Code comments: English
- Communication with user: Chinese (中文)
Key Constraints
- CGO is required (robotgo, clipboard) — no cross-compilation
- Token auth: read from
config.yaml; empty = no auth. Never auto-generate tokens - Frontend is embedded via
//go:embed all:web/distinmain.go embeddirective cannot use../paths — must be in the package referencing it- Build output goes to
dist/(gitignored) - Frontend ignores (
node_modules,dist) inweb/.gitignore, not root - Config file (
config.yaml) is gitignored;config.example.yamlis committed os.UserCacheDir()for platform-correct cert cache paths- robotgo paste:
KeyDown(modifier)→ delay →KeyTap("v")→ delay →KeyUp(modifier)
Hotwords (热词) Feature
Local hotword management for improved ASR accuracy on specific terms (names, technical vocabulary).
Configuration
doubao:
hotwords:
- 张三
- 李四
- VoicePaste
- 人工智能
Implementation
- Hotwords stored locally in
config.yaml(not tied to cloud provider) BuildHotwordsContext()converts string array to Doubao API format:{"hotwords":[{"word":"张三"},{"word":"李四"}]}- Sent via
corpus.contextparameter inFullClientRequest - Hot-reloadable: config changes apply to new connections
- Platform-agnostic design: easy to migrate to other ASR providers
Doubao API Details
- Parameter:
request.corpus.context(JSON string) - Limits: 100 tokens (双向流式), 5000 tokens (流式输入)
- Priority:
contexthotwords >boosting_table_id(if both present) - No weight support in
contextmode (unlikeboosting_table_id)