Add OpenVoiceUI Voice Pipeline — full STT → LLM → TTS tutorial by MCERQUA · Pull Request #606 · Shubhamsaboo/awesome-llm-apps

MCERQUA · 2026-03-18T15:35:48Z

What this adds

A self-contained voice AI agent tutorial in voice_ai_agents/openvoiceui_voice_pipeline/ that demonstrates the complete voice conversation loop:

Speech-to-Text → Language Model → Text-to-Speech

Inspired by the architecture behind OpenVoiceUI, an open-source voice AI platform.

Pipeline

Step	What Happens	API
🎤 STT	Browser mic recording transcribed	OpenAI Whisper
🧠 LLM	Transcript sent for AI response	GPT-4o
🔊 TTS	Response synthesized and played back	OpenAI TTS

Files

voice_pipeline.py — Streamlit app (~180 lines), fully self-contained
requirements.txt — 3 dependencies: openai, streamlit, python-dotenv
README.md — setup instructions + what you'll learn

What learners take away

How to capture audio from a browser with st.audio_input()
How to call Whisper for real-time transcription
How to maintain multi-turn conversation state in Streamlit
How to synthesize TTS and autoplay audio responses

All credentials entered interactively in the Streamlit sidebar. No .env file required.

Self-contained Streamlit app demonstrating the complete voice AI loop: - STT: browser mic recording transcribed via OpenAI Whisper - LLM: multi-turn conversation with GPT-4o - TTS: response synthesized and played back via OpenAI TTS Includes configurable voice, model, and system prompt.

…xpressive TTS - Add pipeline_agents.py: VoiceAssistant (GPT-4o + WebSearchTool, Pydantic output) and TTSDirector (GPT-4o-mini, writes delivery instructions for TTS) - Refactor voice_pipeline.py: two-agent async pipeline via Runner.run(), multi-turn context window (last 6 messages), gpt-4o-mini-tts with instructions - Update requirements.txt to include openai-agents and pydantic - Update README with agent architecture diagram and expanded learning outcomes

awesomekoder · 2026-03-22T03:11:24Z

Thanks for the clean submission! The code quality is solid and the two-agent pattern is well documented.

However, the STT > text LLM > TTS pipeline is now outdated. OpenAI's Realtime API and Gemini 3.1 Live API both support native voice-to-voice with lower latency, no transcription step, and the model can actually hear tone and emotion.

For a stronger submission, consider building a tutorial using one of these native audio approaches:

OpenAI Realtime API: https://platform.openai.com/docs/guides/realtime
Gemini Live API: https://ai.google.dev/gemini-api/docs/live

Would love to see a resubmission using the modern voice architecture.

Mike added 2 commits March 18, 2026 15:35

Shubhamsaboo closed this Mar 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenVoiceUI Voice Pipeline — full STT → LLM → TTS tutorial#606

Add OpenVoiceUI Voice Pipeline — full STT → LLM → TTS tutorial#606
MCERQUA wants to merge 2 commits intoShubhamsaboo:mainfrom
MCERQUA:feat/openvoiceui-voice-pipeline

MCERQUA commented Mar 18, 2026

Uh oh!

awesomekoder commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MCERQUA commented Mar 18, 2026

What this adds

Pipeline

Files

What learners take away

Uh oh!

awesomekoder commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants