# AGENTS.md — VoicePaste ## Project Overview VoicePaste: phone as microphone via browser → LAN WebSocket → Go server → Doubao ASR → real-time preview on phone → auto-paste to computer's focused app. Single Go binary with embedded frontend. ## Tech Stack - **Backend**: Go 1.25+, Fiber v3, fasthttp/websocket, CGO required (robotgo + clipboard) - **Frontend**: React 19, TypeScript, Zustand, Vite 7, Tailwind CSS v4, Biome 2, bun (package manager + runtime) - **Tooling**: Taskfile (not Make), mise (Go + bun + task) - **ASR**: Doubao Seed-ASR-2.0 via custom binary WebSocket protocol ## Build & Run Commands ```bash # Install mise tools (go, bun, task) mise install # Build everything (frontend + Go binary → dist/) task # Build frontend only task build:frontend # Run (build + execute) task run # Dev mode (go run, skips frontend build) task dev # Clean all artifacts task clean # Tidy Go modules task tidy ``` ### Frontend (run from `web/`) ```bash bun install # Install deps bun run build # Vite production build bun run dev # Vite dev server bun run lint # Biome check (lint + format) bun run lint:fix # Biome check --write (auto-fix) bun run typecheck # tsc --noEmit ``` ### Go ```bash go vet ./... # Lint go build -o dist/voicepaste . # Build (add .exe on Windows) ``` No test suite exists yet. No `go test` targets. ## Project Structure ``` main.go # Entry point, embed.FS, TLS init, server startup internal/ config/config.go # YAML + env var config, fsnotify hot-reload, atomic global server/server.go # Fiber v3 HTTPS server, static files from embed.FS server/net.go # LAN IP detection tls/tls.go # AnyIP cert download/cache + self-signed fallback tls/generate.go # Self-signed cert generation ws/protocol.go # JSON message types (start/stop/paste/partial/final/pasted/error) ws/handler.go # WS upgrade, token auth, session lifecycle, text accumulation, paste asr/protocol.go # Doubao binary protocol codec (4-byte header, gzip) asr/client.go # WSS client to Doubao, audio streaming, result forwarding paste/paste.go # clipboard.Write + robotgo key simulation (Ctrl+V / Cmd+V) web/ index.html # HTML shell with React root vite.config.ts # Vite config (React + Tailwind plugins) biome.json # Biome config (lint, format, Tailwind class sorting) tsconfig.json # TypeScript strict config (React JSX) src/ main.tsx # React entry point App.tsx # Root component: composes hooks + layout app.css # Tailwind imports, design tokens (@theme), keyframes stores/ app-store.ts # Zustand store: connection, recording, preview, history, toast hooks/ useWebSocket.ts # WS client hook: connect, reconnect, message dispatch useRecorder.ts # Audio pipeline hook: WebVoiceProcessor (16kHz Int16 PCM capture) components/ StatusBadge.tsx # Connection status indicator PreviewBox.tsx # Real-time transcription preview MicButton.tsx # Push-to-talk button with animations HistoryList.tsx # Transcription history with re-send ``` ## Code Style — Go ### Imports Group in stdlib → external → internal order, separated by blank lines: ```go import ( "fmt" "log/slog" "github.com/gofiber/fiber/v3" "github.com/imbytecat/voicepaste/internal/config" ) ``` Use aliases only to avoid collisions: `crypto_tls "crypto/tls"`, `vpTLS "...internal/tls"`, `wsMsg "...internal/ws"`. ### Logging Use `log/slog` exclusively. Structured key-value pairs: ```go slog.Info("message", "key", value) slog.Error("failed to X", "err", err) ``` Per-connection loggers via `slog.With("remote", addr)`. ### Error Handling - Always wrap with context: `fmt.Errorf("dial doubao: %w", err)` - Return errors up; log at the boundary (main, handler entry) - Never suppress errors silently. `slog.Warn` for non-fatal, `slog.Error` + exit/return for fatal - Never use `as any`, `@ts-ignore`, or empty catch blocks ### Naming - Package names: short, lowercase, single word (`asr`, `ws`, `paste`, `config`) - Exported types: `PascalCase` with doc comments - Unexported: `camelCase` - Constants: `PascalCase` for exported, `camelCase` for unexported - Acronyms stay uppercase: `ASR`, `TLS`, `WS`, `URL`, `IP` ### Patterns - `sync.Mutex` for shared state, `chan` for goroutine communication - `atomic.Value` for hot-reloadable config - Goroutine cleanup: `defer`, `sync.WaitGroup`, `closeCh chan struct{}` - Fiber v3 middleware pattern for auth checks before WS upgrade ## Code Style — TypeScript (Frontend) ### Formatting (Biome) - Indent: tabs - Quotes: double quotes - Semicolons: default (enabled) - Organize imports: enabled via Biome assist ### TypeScript Config - `strict: true`, `noUnusedLocals`, `noUnusedParameters` - Target: ES2022, module: ESNext, bundler resolution - DOM + DOM.Iterable libs - React 19 with functional components and hooks - Zustand for global state management (connection, recording, preview, history, toast) - Custom hooks for imperative APIs: `useWebSocket`, `useRecorder` - Zustand `getState()` in hooks/callbacks to avoid stale closures - Pointer Events for touch/mouse (not touch + mouse separately) - @picovoice/web-voice-processor for audio capture (16kHz Int16 PCM, WASM resampling) - WebVoiceProcessor handles getUserMedia, AudioContext lifecycle, cross-browser compat - WebSocket: binary for audio frames, JSON text for control messages - Tailwind CSS v4 with `@theme` design tokens; minimal custom CSS (keyframes only) ## Language & Locale - **UI text**: Chinese (中文) — this app is for family members - **Git commits**: Chinese, conventional format: `feat:`, `fix:`, `chore:`, `refactor:` - **Code comments**: English - **Communication with user**: Chinese (中文) ## Key Constraints - CGO is required (robotgo, clipboard) — no cross-compilation - Token auth: read from `config.yaml`; empty = no auth. Never auto-generate tokens - Frontend is embedded via `//go:embed all:web/dist` in `main.go` - `embed` directive cannot use `../` paths — must be in the package referencing it - Build output goes to `dist/` (gitignored) - Frontend ignores (`node_modules`, `dist`) in `web/.gitignore`, not root - Config file (`config.yaml`) is gitignored; `config.example.yaml` is committed - `os.UserCacheDir()` for platform-correct cert cache paths - robotgo paste: `KeyDown(modifier)` → delay → `KeyTap("v")` → delay → `KeyUp(modifier)` ## Hotwords (热词) Feature Local hotword management for improved ASR accuracy on specific terms (names, technical vocabulary). ### Configuration ```yaml doubao: hotwords: - 张三 - 李四 - VoicePaste - 人工智能 ``` ### Implementation - Hotwords stored locally in `config.yaml` (not tied to cloud provider) - `BuildHotwordsContext()` converts string array to Doubao API format: ```json {"hotwords":[{"word":"张三"},{"word":"李四"}]} ``` - Sent via `corpus.context` parameter in `FullClientRequest` - Hot-reloadable: config changes apply to new connections - Platform-agnostic design: easy to migrate to other ASR providers ### Doubao API Details - Parameter: `request.corpus.context` (JSON string) - Limits: 100 tokens (双向流式), 5000 tokens (流式输入) - Priority: `context` hotwords > `boosting_table_id` (if both present) - No weight support in `context` mode (unlike `boosting_table_id`)