Files
Knowledge/projects/local-stt/local-stt-context-meta.md
2026-05-05 09:40:28 +10:00

2.3 KiB

Project Context: local-stt

Overview

  • Project Name: local-stt
  • Primary Goal: Local, offline Speech-to-Text (STT) transcription of .wav audio files with automated punctuation and sentence-case formatting.
  • Working Directory: /home/openclaw/.openclaw/workspace/projects/local-stt
  • Current Status: In Progress (Core transcription and formatting pipeline functional).

Environment & Dependencies

  • Interpreter: Python 3 (Virtual environment located at scripts/bin/python3)
  • Key Libraries: sherpa-onnx, numpy, wave, argparse, re
  • Audio Requirements: 16kHz, mono .wav files.

Models (Local ONNX)

  • STT Transducer (Offline): sherpa-onnx-zipformer-en-2023-06-26
    • Note: Shifted from the streaming/mobile model to the standard offline fp32 Zipformer for higher accuracy and compatibility with the OfflineRecognizer API.
  • Punctuation Model: sherpa-onnx-punct-ct-transformer-zh-en-vocab272727-2024-04-12-int8
    • Note: Configured using the ct_transformer parameter. Does not require a separate vocab.txt file for execution.

Core Script: scripts/transcribe.py

  • Pipeline:
    1. Loads the offline transducer and punctuation models.
    2. Reads input .wav via wave and normalizes to float32.
    3. Decodes the audio stream using OfflineRecognizer.
    4. Applies punctuation via OfflinePunctuation (instantiated directly, not via from_config).
    5. Runs a custom Python regex function (format_sentence_case) to convert text to lowercase, capitalize sentence starters, and capitalize the standalone pronoun "I" and its contractions.
    6. Prints the final output to the console and saves a .txt file to the output/ directory matching the input filename.

Recent Milestones & Fixes

  • Resolved RuntimeError: No graph was found in the protobuf by matching the offline API (OfflineRecognizer) with the correct non-streaming Zipformer model.
  • Fixed AttributeError for result retrieval by accessing stream.result.text instead of recognizer.get_result().
  • Corrected punctuation configuration by passing the model path to ct_transformer and removing the unnecessary vocab argument.
  • Implemented a post-processing regex function to handle sentence capitalization and formatting of the raw uppercase output.