- 引入 WebVoiceProcessor 处理 getUserMedia、AudioContext 生命周期和 WASM 重采样 - 删除自定义 AudioWorklet (audio-processor.ts) 和线性插值重采样器 (resample.ts) - 改善音频采集稳定性:自动检测 AudioContext suspended/closed 状态并重建 - 更精确的错误提示:区分权限拒绝、设备未找到、设备异常
212 lines
7.5 KiB
Markdown
212 lines
7.5 KiB
Markdown
# AGENTS.md — VoicePaste
|
|
|
|
## Project Overview
|
|
|
|
VoicePaste: phone as microphone via browser → LAN WebSocket → Go server → Doubao ASR → real-time preview on phone → auto-paste to computer's focused app. Single Go binary with embedded frontend.
|
|
|
|
## Tech Stack
|
|
|
|
- **Backend**: Go 1.25+, Fiber v3, fasthttp/websocket, CGO required (robotgo + clipboard)
|
|
- **Frontend**: React 19, TypeScript, Zustand, Vite 7, Tailwind CSS v4, Biome 2, bun (package manager + runtime)
|
|
- **Tooling**: Taskfile (not Make), mise (Go + bun + task)
|
|
- **ASR**: Doubao Seed-ASR-2.0 via custom binary WebSocket protocol
|
|
|
|
## Build & Run Commands
|
|
|
|
```bash
|
|
# Install mise tools (go, bun, task)
|
|
mise install
|
|
|
|
# Build everything (frontend + Go binary → dist/)
|
|
task
|
|
|
|
# Build frontend only
|
|
task build:frontend
|
|
|
|
# Run (build + execute)
|
|
task run
|
|
|
|
# Dev mode (go run, skips frontend build)
|
|
task dev
|
|
|
|
# Clean all artifacts
|
|
task clean
|
|
|
|
# Tidy Go modules
|
|
task tidy
|
|
```
|
|
|
|
### Frontend (run from `web/`)
|
|
|
|
```bash
|
|
bun install # Install deps
|
|
bun run build # Vite production build
|
|
bun run dev # Vite dev server
|
|
bun run lint # Biome check (lint + format)
|
|
bun run lint:fix # Biome check --write (auto-fix)
|
|
bun run typecheck # tsc --noEmit
|
|
```
|
|
|
|
### Go
|
|
|
|
```bash
|
|
go vet ./... # Lint
|
|
go build -o dist/voicepaste . # Build (add .exe on Windows)
|
|
```
|
|
|
|
No test suite exists yet. No `go test` targets.
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
main.go # Entry point, embed.FS, TLS init, server startup
|
|
internal/
|
|
config/config.go # YAML + env var config, fsnotify hot-reload, atomic global
|
|
server/server.go # Fiber v3 HTTPS server, static files from embed.FS
|
|
server/net.go # LAN IP detection
|
|
tls/tls.go # AnyIP cert download/cache + self-signed fallback
|
|
tls/generate.go # Self-signed cert generation
|
|
ws/protocol.go # JSON message types (start/stop/paste/partial/final/pasted/error)
|
|
ws/handler.go # WS upgrade, token auth, session lifecycle, text accumulation, paste
|
|
asr/protocol.go # Doubao binary protocol codec (4-byte header, gzip)
|
|
asr/client.go # WSS client to Doubao, audio streaming, result forwarding
|
|
paste/paste.go # clipboard.Write + robotgo key simulation (Ctrl+V / Cmd+V)
|
|
web/
|
|
index.html # HTML shell with React root
|
|
vite.config.ts # Vite config (React + Tailwind plugins)
|
|
biome.json # Biome config (lint, format, Tailwind class sorting)
|
|
tsconfig.json # TypeScript strict config (React JSX)
|
|
src/
|
|
main.tsx # React entry point
|
|
App.tsx # Root component: composes hooks + layout
|
|
app.css # Tailwind imports, design tokens (@theme), keyframes
|
|
stores/
|
|
app-store.ts # Zustand store: connection, recording, preview, history, toast
|
|
hooks/
|
|
useWebSocket.ts # WS client hook: connect, reconnect, message dispatch
|
|
useRecorder.ts # Audio pipeline hook: WebVoiceProcessor (16kHz Int16 PCM capture)
|
|
components/
|
|
StatusBadge.tsx # Connection status indicator
|
|
PreviewBox.tsx # Real-time transcription preview
|
|
MicButton.tsx # Push-to-talk button with animations
|
|
HistoryList.tsx # Transcription history with re-send
|
|
```
|
|
|
|
## Code Style — Go
|
|
|
|
### Imports
|
|
Group in stdlib → external → internal order, separated by blank lines:
|
|
```go
|
|
import (
|
|
"fmt"
|
|
"log/slog"
|
|
|
|
"github.com/gofiber/fiber/v3"
|
|
|
|
"github.com/imbytecat/voicepaste/internal/config"
|
|
)
|
|
```
|
|
Use aliases only to avoid collisions: `crypto_tls "crypto/tls"`, `vpTLS "...internal/tls"`, `wsMsg "...internal/ws"`.
|
|
|
|
### Logging
|
|
Use `log/slog` exclusively. Structured key-value pairs:
|
|
```go
|
|
slog.Info("message", "key", value)
|
|
slog.Error("failed to X", "err", err)
|
|
```
|
|
Per-connection loggers via `slog.With("remote", addr)`.
|
|
|
|
### Error Handling
|
|
- Always wrap with context: `fmt.Errorf("dial doubao: %w", err)`
|
|
- Return errors up; log at the boundary (main, handler entry)
|
|
- Never suppress errors silently. `slog.Warn` for non-fatal, `slog.Error` + exit/return for fatal
|
|
- Never use `as any`, `@ts-ignore`, or empty catch blocks
|
|
|
|
### Naming
|
|
- Package names: short, lowercase, single word (`asr`, `ws`, `paste`, `config`)
|
|
- Exported types: `PascalCase` with doc comments
|
|
- Unexported: `camelCase`
|
|
- Constants: `PascalCase` for exported, `camelCase` for unexported
|
|
- Acronyms stay uppercase: `ASR`, `TLS`, `WS`, `URL`, `IP`
|
|
|
|
### Patterns
|
|
- `sync.Mutex` for shared state, `chan` for goroutine communication
|
|
- `atomic.Value` for hot-reloadable config
|
|
- Goroutine cleanup: `defer`, `sync.WaitGroup`, `closeCh chan struct{}`
|
|
- Fiber v3 middleware pattern for auth checks before WS upgrade
|
|
|
|
## Code Style — TypeScript (Frontend)
|
|
|
|
### Formatting (Biome)
|
|
- Indent: tabs
|
|
- Quotes: double quotes
|
|
- Semicolons: default (enabled)
|
|
- Organize imports: enabled via Biome assist
|
|
|
|
### TypeScript Config
|
|
- `strict: true`, `noUnusedLocals`, `noUnusedParameters`
|
|
- Target: ES2022, module: ESNext, bundler resolution
|
|
- DOM + DOM.Iterable libs
|
|
|
|
- React 19 with functional components and hooks
|
|
- Zustand for global state management (connection, recording, preview, history, toast)
|
|
- Custom hooks for imperative APIs: `useWebSocket`, `useRecorder`
|
|
- Zustand `getState()` in hooks/callbacks to avoid stale closures
|
|
- Pointer Events for touch/mouse (not touch + mouse separately)
|
|
- @picovoice/web-voice-processor for audio capture (16kHz Int16 PCM, WASM resampling)
|
|
- WebVoiceProcessor handles getUserMedia, AudioContext lifecycle, cross-browser compat
|
|
- WebSocket: binary for audio frames, JSON text for control messages
|
|
- Tailwind CSS v4 with `@theme` design tokens; minimal custom CSS (keyframes only)
|
|
|
|
## Language & Locale
|
|
|
|
- **UI text**: Chinese (中文) — this app is for family members
|
|
- **Git commits**: Chinese, conventional format: `feat:`, `fix:`, `chore:`, `refactor:`
|
|
- **Code comments**: English
|
|
- **Communication with user**: Chinese (中文)
|
|
|
|
## Key Constraints
|
|
|
|
- CGO is required (robotgo, clipboard) — no cross-compilation
|
|
- Token auth: read from `config.yaml`; empty = no auth. Never auto-generate tokens
|
|
- Frontend is embedded via `//go:embed all:web/dist` in `main.go`
|
|
- `embed` directive cannot use `../` paths — must be in the package referencing it
|
|
- Build output goes to `dist/` (gitignored)
|
|
- Frontend ignores (`node_modules`, `dist`) in `web/.gitignore`, not root
|
|
- Config file (`config.yaml`) is gitignored; `config.example.yaml` is committed
|
|
- `os.UserCacheDir()` for platform-correct cert cache paths
|
|
- robotgo paste: `KeyDown(modifier)` → delay → `KeyTap("v")` → delay → `KeyUp(modifier)`
|
|
|
|
## Hotwords (热词) Feature
|
|
|
|
Local hotword management for improved ASR accuracy on specific terms (names, technical vocabulary).
|
|
|
|
### Configuration
|
|
|
|
```yaml
|
|
doubao:
|
|
hotwords:
|
|
- 张三
|
|
- 李四
|
|
- VoicePaste
|
|
- 人工智能
|
|
```
|
|
|
|
### Implementation
|
|
|
|
- Hotwords stored locally in `config.yaml` (not tied to cloud provider)
|
|
- `BuildHotwordsContext()` converts string array to Doubao API format:
|
|
```json
|
|
{"hotwords":[{"word":"张三"},{"word":"李四"}]}
|
|
```
|
|
- Sent via `corpus.context` parameter in `FullClientRequest`
|
|
- Hot-reloadable: config changes apply to new connections
|
|
- Platform-agnostic design: easy to migrate to other ASR providers
|
|
|
|
### Doubao API Details
|
|
|
|
- Parameter: `request.corpus.context` (JSON string)
|
|
- Limits: 100 tokens (双向流式), 5000 tokens (流式输入)
|
|
- Priority: `context` hotwords > `boosting_table_id` (if both present)
|
|
- No weight support in `context` mode (unlike `boosting_table_id`)
|