2.3 KiB
2.3 KiB
Project Context: local-stt
Overview
- Project Name: local-stt
- Primary Goal: Local, offline Speech-to-Text (STT) transcription of
.wavaudio files with automated punctuation and sentence-case formatting. - Working Directory:
/home/openclaw/.openclaw/workspace/projects/local-stt - Current Status: In Progress (Core transcription and formatting pipeline functional).
Environment & Dependencies
- Interpreter: Python 3 (Virtual environment located at
scripts/bin/python3) - Key Libraries:
sherpa-onnx,numpy,wave,argparse,re - Audio Requirements: 16kHz, mono
.wavfiles.
Models (Local ONNX)
- STT Transducer (Offline):
sherpa-onnx-zipformer-en-2023-06-26- Note: Shifted from the streaming/mobile model to the standard offline fp32 Zipformer for higher accuracy and compatibility with the
OfflineRecognizerAPI.
- Note: Shifted from the streaming/mobile model to the standard offline fp32 Zipformer for higher accuracy and compatibility with the
- Punctuation Model:
sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12-int8- Note: Configured using the
ct_transformerparameter. Does not require a separatevocab.txtfile for execution.
- Note: Configured using the
Core Script: scripts/transcribe.py
- Pipeline:
- Loads the offline transducer and punctuation models.
- Reads input
.wavviawaveand normalizes tofloat32. - Decodes the audio stream using
OfflineRecognizer. - Applies punctuation via
OfflinePunctuation(instantiated directly, not viafrom_config). - Runs a custom Python regex function (
format_sentence_case) to convert text to lowercase, capitalize sentence starters, and capitalize the standalone pronoun "I" and its contractions. - Prints the final output to the console and saves a
.txtfile to theoutput/directory matching the input filename.
Recent Milestones & Fixes
- Resolved
RuntimeError: No graph was found in the protobufby matching the offline API (OfflineRecognizer) with the correct non-streaming Zipformer model. - Fixed
AttributeErrorfor result retrieval by accessingstream.result.textinstead ofrecognizer.get_result(). - Corrected punctuation configuration by passing the model path to
ct_transformerand removing the unnecessaryvocabargument. - Implemented a post-processing regex function to handle sentence capitalization and formatting of the raw uppercase output.