Project: Cross-Platform Local Audio Transcription Service

Goal

Develop a lightweight, local-only system service that transcribes audio (mic/system) in real-time and injects the text via OS-level keyboard emulation.

Architecture

Backend: Rust (Core logic, audio capture, Whisper inference).
Frontend: Tauri (HTML/CSS/JS for UI, communicates with Rust backend via Tauri Commands).
Communication: Secure, high-speed IPC bridge via Tauri's built-in command system (no manual pipes/sockets needed).
Configuration: config.toml for runtime settings (model paths, device selection).

Technology Stack

Language: Rust
Transcription: Whisper (whisper.cpp via whisper-rs or similar bindings).
Audio Capture: cpal (cross-platform).
Input Emulation: enigo (cross-platform).
Frontend Framework: Tauri (v2).
Config Handling: serde + toml.

Key Design Decisions

Offline-First: All transcription is performed locally; no network/API calls.
Efficiency: Lean, background-service focus. Tauri + Rust provides a smaller memory/resource footprint than Electron.
Dynamic Input Selection: Service supports dynamic audio source switching (hot-swapping) via Tauri Command calls from the UI.
OS Specifics: Uses Rust cfg attributes for platform-specific input simulation (X11/Wayland/Win32) without diverging the codebase.

Project Status & Evolution

Dependencies: Switched from whisper-rs to whisper-cpp-plus due to compilation incompatibilities with current whisper.cpp APIs. This change prioritizes long-term maintainability over minor library versioning.
Audio Infrastructure: System requires libasound2-dev on Linux (PipeWire/ALSA).
Design Philosophy:
- Lean & Modular: Keep the backend (Rust) strictly for heavy lifting (audio, VAD, transcription) and the frontend (Tauri) for control/config.
- Dynamic Configuration: Real-time updates via notify (config watcher) allow for sensitivity tuning without restarting the service.
- VAD-First: Implemented energy-based VAD for efficient, human-centric transcription triggering, minimizing CPU load and irrelevant noise.

Next Steps

Re-build: Implement the transition to whisper-cpp-plus in audio.rs and main.rs.
Transcription Logic: Integrate the actual inference engine within the VAD-triggered buffer worker.
Keyboard Emulation: Implement enigo integration to inject the transcribed text.
Tauri Integration: Build the minimal HTML/JS GUI for sensitivity sliders and model selection.

2.6 KiB Raw Permalink Blame History

Project: Cross-Platform Local Audio Transcription Service

Goal

Architecture

Technology Stack

Key Design Decisions

Project Status & Evolution

Next Steps

2.6 KiB

Raw Permalink Blame History