Compare commits

..

2 Commits

3 changed files with 207 additions and 11 deletions

165
AGENTS.md Normal file
View File

@@ -0,0 +1,165 @@
# AGENTS.md — VoicePaste
## Project Overview
VoicePaste: phone as microphone via browser → LAN WebSocket → Go server → Doubao ASR → real-time preview on phone → auto-paste to computer's focused app. Single Go binary with embedded frontend.
## Tech Stack
- **Backend**: Go 1.25+, Fiber v3, fasthttp/websocket, CGO required (robotgo + clipboard)
- **Frontend**: TypeScript, Vite 7, Biome 2, bun (package manager + runtime)
- **Tooling**: Taskfile (not Make), mise (Go + bun + task)
- **ASR**: Doubao Seed-ASR-2.0 via custom binary WebSocket protocol
## Build & Run Commands
```bash
# Install mise tools (go, bun, task)
mise install
# Build everything (frontend + Go binary → dist/)
task
# Build frontend only
task build:frontend
# Run (build + execute)
task run
# Dev mode (go run, skips frontend build)
task dev
# Clean all artifacts
task clean
# Tidy Go modules
task tidy
```
### Frontend (run from `web/`)
```bash
bun install # Install deps
bun run build # Vite production build
bun run dev # Vite dev server
bun run lint # Biome check (lint + format)
bun run lint:fix # Biome check --write (auto-fix)
bun run typecheck # tsc --noEmit
```
### Go
```bash
go vet ./... # Lint
go build -o dist/voicepaste . # Build (add .exe on Windows)
```
No test suite exists yet. No `go test` targets.
## Project Structure
```
main.go # Entry point, embed.FS, TLS init, server startup
internal/
config/config.go # YAML + env var config, fsnotify hot-reload, atomic global
server/server.go # Fiber v3 HTTPS server, static files from embed.FS
server/net.go # LAN IP detection
tls/tls.go # AnyIP cert download/cache + self-signed fallback
tls/generate.go # Self-signed cert generation
ws/protocol.go # JSON message types (start/stop/paste/partial/final/pasted/error)
ws/handler.go # WS upgrade, token auth, session lifecycle, text accumulation, paste
asr/protocol.go # Doubao binary protocol codec (4-byte header, gzip)
asr/client.go # WSS client to Doubao, audio streaming, result forwarding
paste/paste.go # clipboard.Write + robotgo key simulation (Ctrl+V / Cmd+V)
web/
app.ts # Main app: WS client, audio pipeline, recording, history, UI
audio-processor.ts # AudioWorklet: PCM capture, 200ms frame accumulation
index.html # Mobile-first UI (all Chinese)
style.css # Dark theme
vite.config.ts # Vite config
biome.json # Biome config
tsconfig.json # TypeScript strict config
```
## Code Style — Go
### Imports
Group in stdlib → external → internal order, separated by blank lines:
```go
import (
"fmt"
"log/slog"
"github.com/gofiber/fiber/v3"
"github.com/imbytecat/voicepaste/internal/config"
)
```
Use aliases only to avoid collisions: `crypto_tls "crypto/tls"`, `vpTLS "...internal/tls"`, `wsMsg "...internal/ws"`.
### Logging
Use `log/slog` exclusively. Structured key-value pairs:
```go
slog.Info("message", "key", value)
slog.Error("failed to X", "err", err)
```
Per-connection loggers via `slog.With("remote", addr)`.
### Error Handling
- Always wrap with context: `fmt.Errorf("dial doubao: %w", err)`
- Return errors up; log at the boundary (main, handler entry)
- Never suppress errors silently. `slog.Warn` for non-fatal, `slog.Error` + exit/return for fatal
- Never use `as any`, `@ts-ignore`, or empty catch blocks
### Naming
- Package names: short, lowercase, single word (`asr`, `ws`, `paste`, `config`)
- Exported types: `PascalCase` with doc comments
- Unexported: `camelCase`
- Constants: `PascalCase` for exported, `camelCase` for unexported
- Acronyms stay uppercase: `ASR`, `TLS`, `WS`, `URL`, `IP`
### Patterns
- `sync.Mutex` for shared state, `chan` for goroutine communication
- `atomic.Value` for hot-reloadable config
- Goroutine cleanup: `defer`, `sync.WaitGroup`, `closeCh chan struct{}`
- Fiber v3 middleware pattern for auth checks before WS upgrade
## Code Style — TypeScript (Frontend)
### Formatting (Biome)
- Indent: tabs
- Quotes: double quotes
- Semicolons: default (enabled)
- Organize imports: enabled via Biome assist
### TypeScript Config
- `strict: true`, `noUnusedLocals`, `noUnusedParameters`
- Target: ES2022, module: ESNext, bundler resolution
- DOM + DOM.Iterable libs
### Patterns
- No framework — vanilla TypeScript with direct DOM manipulation
- State object pattern: single `AppState` interface with mutable fields
- Pointer Events for touch/mouse (not touch + mouse separately)
- AudioWorklet for audio capture (not MediaRecorder)
- `?worker&url` Vite import for AudioWorklet files
- WebSocket: binary for audio frames, JSON text for control messages
## Language & Locale
- **UI text**: Chinese (中文) — this app is for family members
- **Git commits**: Chinese, conventional format: `feat:`, `fix:`, `chore:`, `refactor:`
- **Code comments**: English
- **Communication with user**: Chinese (中文)
## Key Constraints
- CGO is required (robotgo, clipboard) — no cross-compilation
- Token auth: read from `config.yaml`; empty = no auth. Never auto-generate tokens
- Frontend is embedded via `//go:embed all:web/dist` in `main.go`
- `embed` directive cannot use `../` paths — must be in the package referencing it
- Build output goes to `dist/` (gitignored)
- Frontend ignores (`node_modules`, `dist`) in `web/.gitignore`, not root
- Config file (`config.yaml`) is gitignored; `config.example.yaml` is committed
- `os.UserCacheDir()` for platform-correct cert cache paths
- robotgo paste: `KeyDown(modifier)` → delay → `KeyTap("v")` → delay → `KeyUp(modifier)`

View File

@@ -74,7 +74,7 @@ func Dial(cfg Config, resultCh chan<- wsMsg.ServerMsg) (*Client, error) {
EnableDDC: true, EnableDDC: true,
ShowUtterances: false, ShowUtterances: false,
ResultType: "single", ResultType: "single",
EndWindowSize: 400, EndWindowSize: 2000,
}, },
} }
data, err := EncodeFullClientRequest(req) data, err := EncodeFullClientRequest(req)
@@ -132,10 +132,15 @@ func (c *Client) readLoop(resultCh chan<- wsMsg.ServerMsg) {
resultCh <- wsMsg.ServerMsg{Type: wsMsg.MsgError, Message: resp.ErrMsg} resultCh <- wsMsg.ServerMsg{Type: wsMsg.MsgError, Message: resp.ErrMsg}
return return
} }
// nostream mode: result comes after last audio packet or >15s // nostream mode: may return intermediate results every ~15s
text := resp.Text text := resp.Text
if text != "" { if text != "" {
resultCh <- wsMsg.ServerMsg{Type: wsMsg.MsgFinal, Text: text} if resp.IsLast {
resultCh <- wsMsg.ServerMsg{Type: wsMsg.MsgFinal, Text: text}
} else {
// Intermediate result (>15s audio) — preview only, don't paste
resultCh <- wsMsg.ServerMsg{Type: wsMsg.MsgPartial, Text: text}
}
} }
if resp.IsLast { if resp.IsLast {
return return

View File

@@ -62,23 +62,32 @@ func (h *Handler) handleConn(c *websocket.Conn) {
defer close(resultCh) defer close(resultCh)
// Writer goroutine: single writer to avoid concurrent writes // Writer goroutine: single writer to avoid concurrent writes
// Accumulates all result texts; paste is triggered by stop, not by ASR final.
var wg sync.WaitGroup var wg sync.WaitGroup
var accMu sync.Mutex
var accText string
wg.Add(1) wg.Add(1)
go func() { go func() {
defer wg.Done() defer wg.Done()
for msg := range resultCh { for msg := range resultCh {
// Accumulate text from both partial and final results
if msg.Type == MsgPartial || msg.Type == MsgFinal {
accMu.Lock()
accText += msg.Text
// Send accumulated preview to phone
preview := ServerMsg{Type: MsgPartial, Text: accText}
accMu.Unlock()
if err := c.WriteMessage(websocket.TextMessage, preview.Bytes()); err != nil {
log.Warn("ws write error", "err", err)
return
}
continue
}
// Forward other messages (error, pasted) as-is
if err := c.WriteMessage(websocket.TextMessage, msg.Bytes()); err != nil { if err := c.WriteMessage(websocket.TextMessage, msg.Bytes()); err != nil {
log.Warn("ws write error", "err", err) log.Warn("ws write error", "err", err)
return return
} }
// Auto-paste on final result
if msg.Type == MsgFinal && msg.Text != "" && h.pasteFunc != nil {
if err := h.pasteFunc(msg.Text); err != nil {
log.Error("auto-paste failed", "err", err)
} else {
_ = c.WriteMessage(websocket.TextMessage, ServerMsg{Type: MsgPasted}.Bytes())
}
}
} }
}() }()
@@ -119,6 +128,10 @@ func (h *Handler) handleConn(c *websocket.Conn) {
if active { if active {
continue continue
} }
// Reset accumulated text for new session
accMu.Lock()
accText = ""
accMu.Unlock()
sa, cl, err := h.asrFactory(resultCh) sa, cl, err := h.asrFactory(resultCh)
if err != nil { if err != nil {
log.Error("asr start failed", "err", err) log.Error("asr start failed", "err", err)
@@ -134,12 +147,25 @@ func (h *Handler) handleConn(c *websocket.Conn) {
if !active { if !active {
continue continue
} }
// Finish ASR session — waits for final result from readLoop
if cleanup != nil { if cleanup != nil {
cleanup() cleanup()
cleanup = nil cleanup = nil
} }
sendAudio = nil sendAudio = nil
active = false active = false
// Now paste the accumulated text
accMu.Lock()
finalText := accText
accText = ""
accMu.Unlock()
if finalText != "" && h.pasteFunc != nil {
if err := h.pasteFunc(finalText); err != nil {
log.Error("auto-paste failed", "err", err)
} else {
resultCh <- ServerMsg{Type: MsgPasted}
}
}
log.Info("recording stopped") log.Info("recording stopped")
case MsgPaste: case MsgPaste: