Compare commits
2 Commits
350e405fac
...
cead3e42b8
| Author | SHA1 | Date | |
|---|---|---|---|
| cead3e42b8 | |||
| bfaa792760 |
165
AGENTS.md
Normal file
165
AGENTS.md
Normal file
@@ -0,0 +1,165 @@
|
|||||||
|
# AGENTS.md — VoicePaste
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
|
VoicePaste: phone as microphone via browser → LAN WebSocket → Go server → Doubao ASR → real-time preview on phone → auto-paste to computer's focused app. Single Go binary with embedded frontend.
|
||||||
|
|
||||||
|
## Tech Stack
|
||||||
|
|
||||||
|
- **Backend**: Go 1.25+, Fiber v3, fasthttp/websocket, CGO required (robotgo + clipboard)
|
||||||
|
- **Frontend**: TypeScript, Vite 7, Biome 2, bun (package manager + runtime)
|
||||||
|
- **Tooling**: Taskfile (not Make), mise (Go + bun + task)
|
||||||
|
- **ASR**: Doubao Seed-ASR-2.0 via custom binary WebSocket protocol
|
||||||
|
|
||||||
|
## Build & Run Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install mise tools (go, bun, task)
|
||||||
|
mise install
|
||||||
|
|
||||||
|
# Build everything (frontend + Go binary → dist/)
|
||||||
|
task
|
||||||
|
|
||||||
|
# Build frontend only
|
||||||
|
task build:frontend
|
||||||
|
|
||||||
|
# Run (build + execute)
|
||||||
|
task run
|
||||||
|
|
||||||
|
# Dev mode (go run, skips frontend build)
|
||||||
|
task dev
|
||||||
|
|
||||||
|
# Clean all artifacts
|
||||||
|
task clean
|
||||||
|
|
||||||
|
# Tidy Go modules
|
||||||
|
task tidy
|
||||||
|
```
|
||||||
|
|
||||||
|
### Frontend (run from `web/`)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bun install # Install deps
|
||||||
|
bun run build # Vite production build
|
||||||
|
bun run dev # Vite dev server
|
||||||
|
bun run lint # Biome check (lint + format)
|
||||||
|
bun run lint:fix # Biome check --write (auto-fix)
|
||||||
|
bun run typecheck # tsc --noEmit
|
||||||
|
```
|
||||||
|
|
||||||
|
### Go
|
||||||
|
|
||||||
|
```bash
|
||||||
|
go vet ./... # Lint
|
||||||
|
go build -o dist/voicepaste . # Build (add .exe on Windows)
|
||||||
|
```
|
||||||
|
|
||||||
|
No test suite exists yet. No `go test` targets.
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
main.go # Entry point, embed.FS, TLS init, server startup
|
||||||
|
internal/
|
||||||
|
config/config.go # YAML + env var config, fsnotify hot-reload, atomic global
|
||||||
|
server/server.go # Fiber v3 HTTPS server, static files from embed.FS
|
||||||
|
server/net.go # LAN IP detection
|
||||||
|
tls/tls.go # AnyIP cert download/cache + self-signed fallback
|
||||||
|
tls/generate.go # Self-signed cert generation
|
||||||
|
ws/protocol.go # JSON message types (start/stop/paste/partial/final/pasted/error)
|
||||||
|
ws/handler.go # WS upgrade, token auth, session lifecycle, text accumulation, paste
|
||||||
|
asr/protocol.go # Doubao binary protocol codec (4-byte header, gzip)
|
||||||
|
asr/client.go # WSS client to Doubao, audio streaming, result forwarding
|
||||||
|
paste/paste.go # clipboard.Write + robotgo key simulation (Ctrl+V / Cmd+V)
|
||||||
|
web/
|
||||||
|
app.ts # Main app: WS client, audio pipeline, recording, history, UI
|
||||||
|
audio-processor.ts # AudioWorklet: PCM capture, 200ms frame accumulation
|
||||||
|
index.html # Mobile-first UI (all Chinese)
|
||||||
|
style.css # Dark theme
|
||||||
|
vite.config.ts # Vite config
|
||||||
|
biome.json # Biome config
|
||||||
|
tsconfig.json # TypeScript strict config
|
||||||
|
```
|
||||||
|
|
||||||
|
## Code Style — Go
|
||||||
|
|
||||||
|
### Imports
|
||||||
|
Group in stdlib → external → internal order, separated by blank lines:
|
||||||
|
```go
|
||||||
|
import (
|
||||||
|
"fmt"
|
||||||
|
"log/slog"
|
||||||
|
|
||||||
|
"github.com/gofiber/fiber/v3"
|
||||||
|
|
||||||
|
"github.com/imbytecat/voicepaste/internal/config"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
Use aliases only to avoid collisions: `crypto_tls "crypto/tls"`, `vpTLS "...internal/tls"`, `wsMsg "...internal/ws"`.
|
||||||
|
|
||||||
|
### Logging
|
||||||
|
Use `log/slog` exclusively. Structured key-value pairs:
|
||||||
|
```go
|
||||||
|
slog.Info("message", "key", value)
|
||||||
|
slog.Error("failed to X", "err", err)
|
||||||
|
```
|
||||||
|
Per-connection loggers via `slog.With("remote", addr)`.
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
- Always wrap with context: `fmt.Errorf("dial doubao: %w", err)`
|
||||||
|
- Return errors up; log at the boundary (main, handler entry)
|
||||||
|
- Never suppress errors silently. `slog.Warn` for non-fatal, `slog.Error` + exit/return for fatal
|
||||||
|
- Never use `as any`, `@ts-ignore`, or empty catch blocks
|
||||||
|
|
||||||
|
### Naming
|
||||||
|
- Package names: short, lowercase, single word (`asr`, `ws`, `paste`, `config`)
|
||||||
|
- Exported types: `PascalCase` with doc comments
|
||||||
|
- Unexported: `camelCase`
|
||||||
|
- Constants: `PascalCase` for exported, `camelCase` for unexported
|
||||||
|
- Acronyms stay uppercase: `ASR`, `TLS`, `WS`, `URL`, `IP`
|
||||||
|
|
||||||
|
### Patterns
|
||||||
|
- `sync.Mutex` for shared state, `chan` for goroutine communication
|
||||||
|
- `atomic.Value` for hot-reloadable config
|
||||||
|
- Goroutine cleanup: `defer`, `sync.WaitGroup`, `closeCh chan struct{}`
|
||||||
|
- Fiber v3 middleware pattern for auth checks before WS upgrade
|
||||||
|
|
||||||
|
## Code Style — TypeScript (Frontend)
|
||||||
|
|
||||||
|
### Formatting (Biome)
|
||||||
|
- Indent: tabs
|
||||||
|
- Quotes: double quotes
|
||||||
|
- Semicolons: default (enabled)
|
||||||
|
- Organize imports: enabled via Biome assist
|
||||||
|
|
||||||
|
### TypeScript Config
|
||||||
|
- `strict: true`, `noUnusedLocals`, `noUnusedParameters`
|
||||||
|
- Target: ES2022, module: ESNext, bundler resolution
|
||||||
|
- DOM + DOM.Iterable libs
|
||||||
|
|
||||||
|
### Patterns
|
||||||
|
- No framework — vanilla TypeScript with direct DOM manipulation
|
||||||
|
- State object pattern: single `AppState` interface with mutable fields
|
||||||
|
- Pointer Events for touch/mouse (not touch + mouse separately)
|
||||||
|
- AudioWorklet for audio capture (not MediaRecorder)
|
||||||
|
- `?worker&url` Vite import for AudioWorklet files
|
||||||
|
- WebSocket: binary for audio frames, JSON text for control messages
|
||||||
|
|
||||||
|
## Language & Locale
|
||||||
|
|
||||||
|
- **UI text**: Chinese (中文) — this app is for family members
|
||||||
|
- **Git commits**: Chinese, conventional format: `feat:`, `fix:`, `chore:`, `refactor:`
|
||||||
|
- **Code comments**: English
|
||||||
|
- **Communication with user**: Chinese (中文)
|
||||||
|
|
||||||
|
## Key Constraints
|
||||||
|
|
||||||
|
- CGO is required (robotgo, clipboard) — no cross-compilation
|
||||||
|
- Token auth: read from `config.yaml`; empty = no auth. Never auto-generate tokens
|
||||||
|
- Frontend is embedded via `//go:embed all:web/dist` in `main.go`
|
||||||
|
- `embed` directive cannot use `../` paths — must be in the package referencing it
|
||||||
|
- Build output goes to `dist/` (gitignored)
|
||||||
|
- Frontend ignores (`node_modules`, `dist`) in `web/.gitignore`, not root
|
||||||
|
- Config file (`config.yaml`) is gitignored; `config.example.yaml` is committed
|
||||||
|
- `os.UserCacheDir()` for platform-correct cert cache paths
|
||||||
|
- robotgo paste: `KeyDown(modifier)` → delay → `KeyTap("v")` → delay → `KeyUp(modifier)`
|
||||||
@@ -74,7 +74,7 @@ func Dial(cfg Config, resultCh chan<- wsMsg.ServerMsg) (*Client, error) {
|
|||||||
EnableDDC: true,
|
EnableDDC: true,
|
||||||
ShowUtterances: false,
|
ShowUtterances: false,
|
||||||
ResultType: "single",
|
ResultType: "single",
|
||||||
EndWindowSize: 400,
|
EndWindowSize: 2000,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
data, err := EncodeFullClientRequest(req)
|
data, err := EncodeFullClientRequest(req)
|
||||||
@@ -132,10 +132,15 @@ func (c *Client) readLoop(resultCh chan<- wsMsg.ServerMsg) {
|
|||||||
resultCh <- wsMsg.ServerMsg{Type: wsMsg.MsgError, Message: resp.ErrMsg}
|
resultCh <- wsMsg.ServerMsg{Type: wsMsg.MsgError, Message: resp.ErrMsg}
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
// nostream mode: result comes after last audio packet or >15s
|
// nostream mode: may return intermediate results every ~15s
|
||||||
text := resp.Text
|
text := resp.Text
|
||||||
if text != "" {
|
if text != "" {
|
||||||
|
if resp.IsLast {
|
||||||
resultCh <- wsMsg.ServerMsg{Type: wsMsg.MsgFinal, Text: text}
|
resultCh <- wsMsg.ServerMsg{Type: wsMsg.MsgFinal, Text: text}
|
||||||
|
} else {
|
||||||
|
// Intermediate result (>15s audio) — preview only, don't paste
|
||||||
|
resultCh <- wsMsg.ServerMsg{Type: wsMsg.MsgPartial, Text: text}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
if resp.IsLast {
|
if resp.IsLast {
|
||||||
return
|
return
|
||||||
|
|||||||
@@ -62,22 +62,31 @@ func (h *Handler) handleConn(c *websocket.Conn) {
|
|||||||
defer close(resultCh)
|
defer close(resultCh)
|
||||||
|
|
||||||
// Writer goroutine: single writer to avoid concurrent writes
|
// Writer goroutine: single writer to avoid concurrent writes
|
||||||
|
// Accumulates all result texts; paste is triggered by stop, not by ASR final.
|
||||||
var wg sync.WaitGroup
|
var wg sync.WaitGroup
|
||||||
|
var accMu sync.Mutex
|
||||||
|
var accText string
|
||||||
wg.Add(1)
|
wg.Add(1)
|
||||||
go func() {
|
go func() {
|
||||||
defer wg.Done()
|
defer wg.Done()
|
||||||
for msg := range resultCh {
|
for msg := range resultCh {
|
||||||
if err := c.WriteMessage(websocket.TextMessage, msg.Bytes()); err != nil {
|
// Accumulate text from both partial and final results
|
||||||
|
if msg.Type == MsgPartial || msg.Type == MsgFinal {
|
||||||
|
accMu.Lock()
|
||||||
|
accText += msg.Text
|
||||||
|
// Send accumulated preview to phone
|
||||||
|
preview := ServerMsg{Type: MsgPartial, Text: accText}
|
||||||
|
accMu.Unlock()
|
||||||
|
if err := c.WriteMessage(websocket.TextMessage, preview.Bytes()); err != nil {
|
||||||
log.Warn("ws write error", "err", err)
|
log.Warn("ws write error", "err", err)
|
||||||
return
|
return
|
||||||
}
|
}
|
||||||
// Auto-paste on final result
|
continue
|
||||||
if msg.Type == MsgFinal && msg.Text != "" && h.pasteFunc != nil {
|
|
||||||
if err := h.pasteFunc(msg.Text); err != nil {
|
|
||||||
log.Error("auto-paste failed", "err", err)
|
|
||||||
} else {
|
|
||||||
_ = c.WriteMessage(websocket.TextMessage, ServerMsg{Type: MsgPasted}.Bytes())
|
|
||||||
}
|
}
|
||||||
|
// Forward other messages (error, pasted) as-is
|
||||||
|
if err := c.WriteMessage(websocket.TextMessage, msg.Bytes()); err != nil {
|
||||||
|
log.Warn("ws write error", "err", err)
|
||||||
|
return
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}()
|
}()
|
||||||
@@ -119,6 +128,10 @@ func (h *Handler) handleConn(c *websocket.Conn) {
|
|||||||
if active {
|
if active {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
// Reset accumulated text for new session
|
||||||
|
accMu.Lock()
|
||||||
|
accText = ""
|
||||||
|
accMu.Unlock()
|
||||||
sa, cl, err := h.asrFactory(resultCh)
|
sa, cl, err := h.asrFactory(resultCh)
|
||||||
if err != nil {
|
if err != nil {
|
||||||
log.Error("asr start failed", "err", err)
|
log.Error("asr start failed", "err", err)
|
||||||
@@ -134,12 +147,25 @@ func (h *Handler) handleConn(c *websocket.Conn) {
|
|||||||
if !active {
|
if !active {
|
||||||
continue
|
continue
|
||||||
}
|
}
|
||||||
|
// Finish ASR session — waits for final result from readLoop
|
||||||
if cleanup != nil {
|
if cleanup != nil {
|
||||||
cleanup()
|
cleanup()
|
||||||
cleanup = nil
|
cleanup = nil
|
||||||
}
|
}
|
||||||
sendAudio = nil
|
sendAudio = nil
|
||||||
active = false
|
active = false
|
||||||
|
// Now paste the accumulated text
|
||||||
|
accMu.Lock()
|
||||||
|
finalText := accText
|
||||||
|
accText = ""
|
||||||
|
accMu.Unlock()
|
||||||
|
if finalText != "" && h.pasteFunc != nil {
|
||||||
|
if err := h.pasteFunc(finalText); err != nil {
|
||||||
|
log.Error("auto-paste failed", "err", err)
|
||||||
|
} else {
|
||||||
|
resultCh <- ServerMsg{Type: MsgPasted}
|
||||||
|
}
|
||||||
|
}
|
||||||
log.Info("recording stopped")
|
log.Info("recording stopped")
|
||||||
|
|
||||||
case MsgPaste:
|
case MsgPaste:
|
||||||
|
|||||||
Reference in New Issue
Block a user