Add OpenVoiceUI Voice Pipeline — full STT → LLM → TTS tutorial#606
Closed
MCERQUA wants to merge 2 commits intoShubhamsaboo:mainfrom
Closed
Add OpenVoiceUI Voice Pipeline — full STT → LLM → TTS tutorial#606MCERQUA wants to merge 2 commits intoShubhamsaboo:mainfrom
MCERQUA wants to merge 2 commits intoShubhamsaboo:mainfrom
Conversation
added 2 commits
March 18, 2026 15:35
Self-contained Streamlit app demonstrating the complete voice AI loop: - STT: browser mic recording transcribed via OpenAI Whisper - LLM: multi-turn conversation with GPT-4o - TTS: response synthesized and played back via OpenAI TTS Includes configurable voice, model, and system prompt.
…xpressive TTS - Add pipeline_agents.py: VoiceAssistant (GPT-4o + WebSearchTool, Pydantic output) and TTSDirector (GPT-4o-mini, writes delivery instructions for TTS) - Refactor voice_pipeline.py: two-agent async pipeline via Runner.run(), multi-turn context window (last 6 messages), gpt-4o-mini-tts with instructions - Update requirements.txt to include openai-agents and pydantic - Update README with agent architecture diagram and expanded learning outcomes
Contributor
|
Thanks for the clean submission! The code quality is solid and the two-agent pattern is well documented. However, the STT > text LLM > TTS pipeline is now outdated. OpenAI's Realtime API and Gemini 3.1 Live API both support native voice-to-voice with lower latency, no transcription step, and the model can actually hear tone and emotion. For a stronger submission, consider building a tutorial using one of these native audio approaches:
Would love to see a resubmission using the modern voice architecture. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this adds
A self-contained voice AI agent tutorial in
voice_ai_agents/openvoiceui_voice_pipeline/that demonstrates the complete voice conversation loop:Speech-to-Text → Language Model → Text-to-Speech
Inspired by the architecture behind OpenVoiceUI, an open-source voice AI platform.
Pipeline
Files
voice_pipeline.py— Streamlit app (~180 lines), fully self-containedrequirements.txt— 3 dependencies: openai, streamlit, python-dotenvREADME.md— setup instructions + what you'll learnWhat learners take away
st.audio_input()All credentials entered interactively in the Streamlit sidebar. No
.envfile required.