Files
voicepaste/AGENTS.md
imbytecat 669bfac722 refactor: 使用 @picovoice/web-voice-processor 替换手写音频采集管线
- 引入 WebVoiceProcessor 处理 getUserMedia、AudioContext 生命周期和 WASM 重采样
- 删除自定义 AudioWorklet (audio-processor.ts) 和线性插值重采样器 (resample.ts)
- 改善音频采集稳定性:自动检测 AudioContext suspended/closed 状态并重建
- 更精确的错误提示:区分权限拒绝、设备未找到、设备异常
2026-03-02 07:42:45 +08:00

7.5 KiB

AGENTS.md — VoicePaste

Project Overview

VoicePaste: phone as microphone via browser → LAN WebSocket → Go server → Doubao ASR → real-time preview on phone → auto-paste to computer's focused app. Single Go binary with embedded frontend.

Tech Stack

  • Backend: Go 1.25+, Fiber v3, fasthttp/websocket, CGO required (robotgo + clipboard)
  • Frontend: React 19, TypeScript, Zustand, Vite 7, Tailwind CSS v4, Biome 2, bun (package manager + runtime)
  • Tooling: Taskfile (not Make), mise (Go + bun + task)
  • ASR: Doubao Seed-ASR-2.0 via custom binary WebSocket protocol

Build & Run Commands

# Install mise tools (go, bun, task)
mise install

# Build everything (frontend + Go binary → dist/)
task

# Build frontend only
task build:frontend

# Run (build + execute)
task run

# Dev mode (go run, skips frontend build)
task dev

# Clean all artifacts
task clean

# Tidy Go modules
task tidy

Frontend (run from web/)

bun install              # Install deps
bun run build            # Vite production build
bun run dev              # Vite dev server
bun run lint             # Biome check (lint + format)
bun run lint:fix         # Biome check --write (auto-fix)
bun run typecheck        # tsc --noEmit

Go

go vet ./...             # Lint
go build -o dist/voicepaste .  # Build (add .exe on Windows)

No test suite exists yet. No go test targets.

Project Structure

main.go                     # Entry point, embed.FS, TLS init, server startup
internal/
  config/config.go          # YAML + env var config, fsnotify hot-reload, atomic global
  server/server.go          # Fiber v3 HTTPS server, static files from embed.FS
  server/net.go             # LAN IP detection
  tls/tls.go                # AnyIP cert download/cache + self-signed fallback
  tls/generate.go           # Self-signed cert generation
  ws/protocol.go            # JSON message types (start/stop/paste/partial/final/pasted/error)
  ws/handler.go             # WS upgrade, token auth, session lifecycle, text accumulation, paste
  asr/protocol.go           # Doubao binary protocol codec (4-byte header, gzip)
  asr/client.go             # WSS client to Doubao, audio streaming, result forwarding
  paste/paste.go            # clipboard.Write + robotgo key simulation (Ctrl+V / Cmd+V)
web/
  index.html                # HTML shell with React root
  vite.config.ts            # Vite config (React + Tailwind plugins)
  biome.json                # Biome config (lint, format, Tailwind class sorting)
  tsconfig.json             # TypeScript strict config (React JSX)
  src/
    main.tsx                # React entry point
    App.tsx                 # Root component: composes hooks + layout
    app.css                 # Tailwind imports, design tokens (@theme), keyframes
    stores/
      app-store.ts           # Zustand store: connection, recording, preview, history, toast
    hooks/
      useWebSocket.ts        # WS client hook: connect, reconnect, message dispatch
      useRecorder.ts         # Audio pipeline hook: WebVoiceProcessor (16kHz Int16 PCM capture)
    components/
      StatusBadge.tsx         # Connection status indicator
      PreviewBox.tsx          # Real-time transcription preview
      MicButton.tsx           # Push-to-talk button with animations
      HistoryList.tsx         # Transcription history with re-send

Code Style — Go

Imports

Group in stdlib → external → internal order, separated by blank lines:

import (
    "fmt"
    "log/slog"

    "github.com/gofiber/fiber/v3"

    "github.com/imbytecat/voicepaste/internal/config"
)

Use aliases only to avoid collisions: crypto_tls "crypto/tls", vpTLS "...internal/tls", wsMsg "...internal/ws".

Logging

Use log/slog exclusively. Structured key-value pairs:

slog.Info("message", "key", value)
slog.Error("failed to X", "err", err)

Per-connection loggers via slog.With("remote", addr).

Error Handling

  • Always wrap with context: fmt.Errorf("dial doubao: %w", err)
  • Return errors up; log at the boundary (main, handler entry)
  • Never suppress errors silently. slog.Warn for non-fatal, slog.Error + exit/return for fatal
  • Never use as any, @ts-ignore, or empty catch blocks

Naming

  • Package names: short, lowercase, single word (asr, ws, paste, config)
  • Exported types: PascalCase with doc comments
  • Unexported: camelCase
  • Constants: PascalCase for exported, camelCase for unexported
  • Acronyms stay uppercase: ASR, TLS, WS, URL, IP

Patterns

  • sync.Mutex for shared state, chan for goroutine communication
  • atomic.Value for hot-reloadable config
  • Goroutine cleanup: defer, sync.WaitGroup, closeCh chan struct{}
  • Fiber v3 middleware pattern for auth checks before WS upgrade

Code Style — TypeScript (Frontend)

Formatting (Biome)

  • Indent: tabs
  • Quotes: double quotes
  • Semicolons: default (enabled)
  • Organize imports: enabled via Biome assist

TypeScript Config

  • strict: true, noUnusedLocals, noUnusedParameters

  • Target: ES2022, module: ESNext, bundler resolution

  • DOM + DOM.Iterable libs

  • React 19 with functional components and hooks

  • Zustand for global state management (connection, recording, preview, history, toast)

  • Custom hooks for imperative APIs: useWebSocket, useRecorder

  • Zustand getState() in hooks/callbacks to avoid stale closures

  • Pointer Events for touch/mouse (not touch + mouse separately)

  • @picovoice/web-voice-processor for audio capture (16kHz Int16 PCM, WASM resampling)

  • WebVoiceProcessor handles getUserMedia, AudioContext lifecycle, cross-browser compat

  • WebSocket: binary for audio frames, JSON text for control messages

  • Tailwind CSS v4 with @theme design tokens; minimal custom CSS (keyframes only)

Language & Locale

  • UI text: Chinese (中文) — this app is for family members
  • Git commits: Chinese, conventional format: feat:, fix:, chore:, refactor:
  • Code comments: English
  • Communication with user: Chinese (中文)

Key Constraints

  • CGO is required (robotgo, clipboard) — no cross-compilation
  • Token auth: read from config.yaml; empty = no auth. Never auto-generate tokens
  • Frontend is embedded via //go:embed all:web/dist in main.go
  • embed directive cannot use ../ paths — must be in the package referencing it
  • Build output goes to dist/ (gitignored)
  • Frontend ignores (node_modules, dist) in web/.gitignore, not root
  • Config file (config.yaml) is gitignored; config.example.yaml is committed
  • os.UserCacheDir() for platform-correct cert cache paths
  • robotgo paste: KeyDown(modifier) → delay → KeyTap("v") → delay → KeyUp(modifier)

Hotwords (热词) Feature

Local hotword management for improved ASR accuracy on specific terms (names, technical vocabulary).

Configuration

doubao:
  hotwords:
    - 张三
    - 李四
    - VoicePaste
    - 人工智能

Implementation

  • Hotwords stored locally in config.yaml (not tied to cloud provider)
  • BuildHotwordsContext() converts string array to Doubao API format:
    {"hotwords":[{"word":"张三"},{"word":"李四"}]}
    
  • Sent via corpus.context parameter in FullClientRequest
  • Hot-reloadable: config changes apply to new connections
  • Platform-agnostic design: easy to migrate to other ASR providers

Doubao API Details

  • Parameter: request.corpus.context (JSON string)
  • Limits: 100 tokens (双向流式), 5000 tokens (流式输入)
  • Priority: context hotwords > boosting_table_id (if both present)
  • No weight support in context mode (unlike boosting_table_id)