1 line
1.5 KiB
Markdown
1 line
1.5 KiB
Markdown
# iOS Push-to-Talk Web App Checkpoint (Apr 8, 2026)\n\n**User Goal:** Web-based PTT in iPhone Safari browser. Hold button to record audio, **live transcription preview before send button**, then POST audio/text to remote server (Node.js or Python Flask OK).\n\n**Core Flow:**\n1. Hold (mousedown/touchstart): MediaRecorder API streams audio.\n2. Live STT: Web Speech API (SpeechRecognition) on client—`continuous=true, interimResults=true` for real-time text (iPhone Neural Engine, en-AU lang).\n3. Release: Stop, finalize text, enable Send → POST blob/text to server.\n\n**Options:**\n- **Client-Only STT (Fastest):** Web Speech—sub-1s, offline, private. Backend just stores.\n- **Server Streaming:** Chunks via WebSocket → Whisper (OpenAI/local) partials back.\n- **Hybrid:** Client preview + server re-do for accuracy.\n\n**iOS Safari Fit:** Full MediaRecorder/SpeechRec support (iOS 14.5+). Mic perm once.\n\n**Backend Recs:**\n- **Node/Socket.io:** Real-time easy.\n- **Flask/FastAPI + SocketIO:** Python ML (Whisper).\n\n**Frontend Snippet:**\n```js\n// PTT Button + Live Text\nconst rec = new (window.SpeechRecognition || window.webkitSpeechRecognition)();\nrec.interimResults = true; rec.lang = 'en-AU';\npttBtn.onmousedown = () => rec.start();\npttBtn.onmouseup = () => rec.stop();\nrec.onresult = (e) => { /* update div with final + interim */ };\n```\n\n**Backend (Node Ex):** Socket.io streams chunks → Whisper → emit partials.\n\n**Next Steps:** Prototype PWA? Custom Whisper server? Accents/privacy tweaks.\n\n**Status:** Explored—ready to code/deploy. |