# Project Context: local-stt ## Overview - **Project Name:** local-stt - **Primary Goal:** Local, offline Speech-to-Text (STT) transcription of `.wav` audio files with automated punctuation and sentence-case formatting. - **Working Directory:** `/home/openclaw/.openclaw/workspace/projects/local-stt` - **Current Status:** In Progress (Core transcription and formatting pipeline functional). ## Environment & Dependencies - **Interpreter:** Python 3 (Virtual environment located at `scripts/bin/python3`) - **Key Libraries:** `sherpa-onnx`, `numpy`, `wave`, `argparse`, `re` - **Audio Requirements:** 16kHz, mono `.wav` files. ## Models (Local ONNX) - **STT Transducer (Offline):** `sherpa-onnx-zipformer-en-2023-06-26` - *Note:* Shifted from the streaming/mobile model to the standard offline fp32 Zipformer for higher accuracy and compatibility with the `OfflineRecognizer` API. - **Punctuation Model:** `sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12-int8` - *Note:* Configured using the `ct_transformer` parameter. Does not require a separate `vocab.txt` file for execution. ## Core Script: `scripts/transcribe.py` - **Pipeline:** 1. Loads the offline transducer and punctuation models. 2. Reads input `.wav` via `wave` and normalizes to `float32`. 3. Decodes the audio stream using `OfflineRecognizer`. 4. Applies punctuation via `OfflinePunctuation` (instantiated directly, not via `from_config`). 5. Runs a custom Python regex function (`format_sentence_case`) to convert text to lowercase, capitalize sentence starters, and capitalize the standalone pronoun "I" and its contractions. 6. Prints the final output to the console and saves a `.txt` file to the `output/` directory matching the input filename. ## Recent Milestones & Fixes - Resolved `RuntimeError: No graph was found in the protobuf` by matching the offline API (`OfflineRecognizer`) with the correct non-streaming Zipformer model. - Fixed `AttributeError` for result retrieval by accessing `stream.result.text` instead of `recognizer.get_result()`. - Corrected punctuation configuration by passing the model path to `ct_transformer` and removing the unnecessary `vocab` argument. - Implemented a post-processing regex function to handle sentence capitalization and formatting of the raw uppercase output.