Files
voicepaste/AGENTS.md
imbytecat 70344bcd98 refactor: 迁移前端到 React 19 + Zustand + Tailwind CSS v4
- 将 vanilla TS 单文件 (app.ts 395行) 拆分为 React 组件化架构
- 引入 Zustand 管理全局状态 (连接/录音/预览/历史/toast)
- 自定义 hooks 封装 WebSocket 连接和音频录制管线
- CSS 全面 Tailwind 化,style.css 从 234 行精简到 114 行 (仅保留 tokens + keyframes)
- 新增依赖: react, react-dom, zustand, @vitejs/plugin-react
- Go 后端 embed 路径 web/dist 不变,无需改动
2026-03-02 06:36:02 +08:00

217 lines
7.7 KiB
Markdown

# AGENTS.md — VoicePaste
## Project Overview
VoicePaste: phone as microphone via browser → LAN WebSocket → Go server → Doubao ASR → real-time preview on phone → auto-paste to computer's focused app. Single Go binary with embedded frontend.
## Tech Stack
- **Backend**: Go 1.25+, Fiber v3, fasthttp/websocket, CGO required (robotgo + clipboard)
- **Frontend**: React 19, TypeScript, Zustand, Vite 7, Tailwind CSS v4, Biome 2, bun (package manager + runtime)
- **Tooling**: Taskfile (not Make), mise (Go + bun + task)
- **ASR**: Doubao Seed-ASR-2.0 via custom binary WebSocket protocol
## Build & Run Commands
```bash
# Install mise tools (go, bun, task)
mise install
# Build everything (frontend + Go binary → dist/)
task
# Build frontend only
task build:frontend
# Run (build + execute)
task run
# Dev mode (go run, skips frontend build)
task dev
# Clean all artifacts
task clean
# Tidy Go modules
task tidy
```
### Frontend (run from `web/`)
```bash
bun install # Install deps
bun run build # Vite production build
bun run dev # Vite dev server
bun run lint # Biome check (lint + format)
bun run lint:fix # Biome check --write (auto-fix)
bun run typecheck # tsc --noEmit
```
### Go
```bash
go vet ./... # Lint
go build -o dist/voicepaste . # Build (add .exe on Windows)
```
No test suite exists yet. No `go test` targets.
## Project Structure
```
main.go # Entry point, embed.FS, TLS init, server startup
internal/
config/config.go # YAML + env var config, fsnotify hot-reload, atomic global
server/server.go # Fiber v3 HTTPS server, static files from embed.FS
server/net.go # LAN IP detection
tls/tls.go # AnyIP cert download/cache + self-signed fallback
tls/generate.go # Self-signed cert generation
ws/protocol.go # JSON message types (start/stop/paste/partial/final/pasted/error)
ws/handler.go # WS upgrade, token auth, session lifecycle, text accumulation, paste
asr/protocol.go # Doubao binary protocol codec (4-byte header, gzip)
asr/client.go # WSS client to Doubao, audio streaming, result forwarding
paste/paste.go # clipboard.Write + robotgo key simulation (Ctrl+V / Cmd+V)
web/
index.html # HTML shell with React root
vite.config.ts # Vite config (React + Tailwind plugins)
biome.json # Biome config (lint, format, Tailwind class sorting)
tsconfig.json # TypeScript strict config (React JSX)
src/
main.tsx # React entry point
App.tsx # Root component: composes hooks + layout
app.css # Tailwind imports, design tokens (@theme), keyframes
stores/
app-store.ts # Zustand store: connection, recording, preview, history, toast
hooks/
useWebSocket.ts # WS client hook: connect, reconnect, message dispatch
useRecorder.ts # Audio pipeline hook: getUserMedia, AudioWorklet, resample
components/
StatusBadge.tsx # Connection status indicator
PreviewBox.tsx # Real-time transcription preview
MicButton.tsx # Push-to-talk button with animations
HistoryList.tsx # Transcription history with re-send
Toast.tsx # Auto-dismiss toast notifications
lib/
resample.ts # Linear interpolation resampler (native rate → 16kHz Int16)
workers/
audio-processor.ts # AudioWorklet: PCM capture, 200ms frame accumulation
```
## Code Style — Go
### Imports
Group in stdlib → external → internal order, separated by blank lines:
```go
import (
"fmt"
"log/slog"
"github.com/gofiber/fiber/v3"
"github.com/imbytecat/voicepaste/internal/config"
)
```
Use aliases only to avoid collisions: `crypto_tls "crypto/tls"`, `vpTLS "...internal/tls"`, `wsMsg "...internal/ws"`.
### Logging
Use `log/slog` exclusively. Structured key-value pairs:
```go
slog.Info("message", "key", value)
slog.Error("failed to X", "err", err)
```
Per-connection loggers via `slog.With("remote", addr)`.
### Error Handling
- Always wrap with context: `fmt.Errorf("dial doubao: %w", err)`
- Return errors up; log at the boundary (main, handler entry)
- Never suppress errors silently. `slog.Warn` for non-fatal, `slog.Error` + exit/return for fatal
- Never use `as any`, `@ts-ignore`, or empty catch blocks
### Naming
- Package names: short, lowercase, single word (`asr`, `ws`, `paste`, `config`)
- Exported types: `PascalCase` with doc comments
- Unexported: `camelCase`
- Constants: `PascalCase` for exported, `camelCase` for unexported
- Acronyms stay uppercase: `ASR`, `TLS`, `WS`, `URL`, `IP`
### Patterns
- `sync.Mutex` for shared state, `chan` for goroutine communication
- `atomic.Value` for hot-reloadable config
- Goroutine cleanup: `defer`, `sync.WaitGroup`, `closeCh chan struct{}`
- Fiber v3 middleware pattern for auth checks before WS upgrade
## Code Style — TypeScript (Frontend)
### Formatting (Biome)
- Indent: tabs
- Quotes: double quotes
- Semicolons: default (enabled)
- Organize imports: enabled via Biome assist
### TypeScript Config
- `strict: true`, `noUnusedLocals`, `noUnusedParameters`
- Target: ES2022, module: ESNext, bundler resolution
- DOM + DOM.Iterable libs
- React 19 with functional components and hooks
- Zustand for global state management (connection, recording, preview, history, toast)
- Custom hooks for imperative APIs: `useWebSocket`, `useRecorder`
- Zustand `getState()` in hooks/callbacks to avoid stale closures
- Pointer Events for touch/mouse (not touch + mouse separately)
- AudioWorklet for audio capture (not MediaRecorder)
- `?worker&url` Vite import for AudioWorklet files
- WebSocket: binary for audio frames, JSON text for control messages
- Tailwind CSS v4 with `@theme` design tokens; minimal custom CSS (keyframes only)
## Language & Locale
- **UI text**: Chinese (中文) — this app is for family members
- **Git commits**: Chinese, conventional format: `feat:`, `fix:`, `chore:`, `refactor:`
- **Code comments**: English
- **Communication with user**: Chinese (中文)
## Key Constraints
- CGO is required (robotgo, clipboard) — no cross-compilation
- Token auth: read from `config.yaml`; empty = no auth. Never auto-generate tokens
- Frontend is embedded via `//go:embed all:web/dist` in `main.go`
- `embed` directive cannot use `../` paths — must be in the package referencing it
- Build output goes to `dist/` (gitignored)
- Frontend ignores (`node_modules`, `dist`) in `web/.gitignore`, not root
- Config file (`config.yaml`) is gitignored; `config.example.yaml` is committed
- `os.UserCacheDir()` for platform-correct cert cache paths
- robotgo paste: `KeyDown(modifier)` → delay → `KeyTap("v")` → delay → `KeyUp(modifier)`
## Hotwords (热词) Feature
Local hotword management for improved ASR accuracy on specific terms (names, technical vocabulary).
### Configuration
```yaml
doubao:
hotwords:
- 张三
- 李四
- VoicePaste
- 人工智能
```
### Implementation
- Hotwords stored locally in `config.yaml` (not tied to cloud provider)
- `BuildHotwordsContext()` converts string array to Doubao API format:
```json
{"hotwords":[{"word":"张三"},{"word":"李四"}]}
```
- Sent via `corpus.context` parameter in `FullClientRequest`
- Hot-reloadable: config changes apply to new connections
- Platform-agnostic design: easy to migrate to other ASR providers
### Doubao API Details
- Parameter: `request.corpus.context` (JSON string)
- Limits: 100 tokens (双向流式), 5000 tokens (流式输入)
- Priority: `context` hotwords > `boosting_table_id` (if both present)
- No weight support in `context` mode (unlike `boosting_table_id`)