Initial commit
This commit is contained in:
@@ -0,0 +1,11 @@
|
||||
# Conversation Checkpoint: Local STT Implementation
|
||||
## Project: local-stt
|
||||
- **Status:** In-Progress (Development/Debugging)
|
||||
- **Environment:** Dedicated venv in `/home/openclaw/.openclaw/workspace/projects/local-stt/scripts/`.
|
||||
- **Dependencies:** `sherpa-onnx` (v1.12.34), `numpy`, `onnx`, `wave` installed.
|
||||
- **Models:**
|
||||
- STT: `sherpa-onnx-streaming-zipformer-en-2023-06-26-mobile`
|
||||
- Punctuation: `sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12-int8`
|
||||
- **Current Issue:** Encountering `RuntimeError: No graph was found in the protobuf` and `RuntimeError: Got invalid dimensions for input` when attempting to run `transcribe.py`.
|
||||
- **Verified:** `.onnx` model files are valid via the `onnx` library.
|
||||
- **Next Steps:** Investigate version-specific loading methods for Zipformer in `sherpa-onnx` (the API changed significantly). Confirm exact model source version to ensure compatibility with current Python API.
|
||||
@@ -0,0 +1,33 @@
|
||||
# Project Context: local-stt
|
||||
|
||||
## Overview
|
||||
- **Project Name:** local-stt
|
||||
- **Primary Goal:** Local, offline Speech-to-Text (STT) transcription of `.wav` audio files with automated punctuation and sentence-case formatting.
|
||||
- **Working Directory:** `/home/openclaw/.openclaw/workspace/projects/local-stt`
|
||||
- **Current Status:** In Progress (Core transcription and formatting pipeline functional).
|
||||
|
||||
## Environment & Dependencies
|
||||
- **Interpreter:** Python 3 (Virtual environment located at `scripts/bin/python3`)
|
||||
- **Key Libraries:** `sherpa-onnx`, `numpy`, `wave`, `argparse`, `re`
|
||||
- **Audio Requirements:** 16kHz, mono `.wav` files.
|
||||
|
||||
## Models (Local ONNX)
|
||||
- **STT Transducer (Offline):** `sherpa-onnx-zipformer-en-2023-06-26`
|
||||
- *Note:* Shifted from the streaming/mobile model to the standard offline fp32 Zipformer for higher accuracy and compatibility with the `OfflineRecognizer` API.
|
||||
- **Punctuation Model:** `sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12-int8`
|
||||
- *Note:* Configured using the `ct_transformer` parameter. Does not require a separate `vocab.txt` file for execution.
|
||||
|
||||
## Core Script: `scripts/transcribe.py`
|
||||
- **Pipeline:**
|
||||
1. Loads the offline transducer and punctuation models.
|
||||
2. Reads input `.wav` via `wave` and normalizes to `float32`.
|
||||
3. Decodes the audio stream using `OfflineRecognizer`.
|
||||
4. Applies punctuation via `OfflinePunctuation` (instantiated directly, not via `from_config`).
|
||||
5. Runs a custom Python regex function (`format_sentence_case`) to convert text to lowercase, capitalize sentence starters, and capitalize the standalone pronoun "I" and its contractions.
|
||||
6. Prints the final output to the console and saves a `.txt` file to the `output/` directory matching the input filename.
|
||||
|
||||
## Recent Milestones & Fixes
|
||||
- Resolved `RuntimeError: No graph was found in the protobuf` by matching the offline API (`OfflineRecognizer`) with the correct non-streaming Zipformer model.
|
||||
- Fixed `AttributeError` for result retrieval by accessing `stream.result.text` instead of `recognizer.get_result()`.
|
||||
- Corrected punctuation configuration by passing the model path to `ct_transformer` and removing the unnecessary `vocab` argument.
|
||||
- Implemented a post-processing regex function to handle sentence capitalization and formatting of the raw uppercase output.
|
||||
Reference in New Issue
Block a user