Voice Lifecycle
Understanding the real-time states of a voice session: Listen, Think, Speak, and Interrupt.
Overview
A voice conversation isn't just audio back-and-forth. It's a state machine. Your UI needs to reflect these states to feel "alive" and responsive.
Client States (Daily.co)
The daily-js library provides the fundamental connection states.
Connecting
Initial join request. Show a spinner.
Connected
Joined the room. Local mic is live.
Left / Error
Disconnected. Reset UI.
import { useDaily, useCallState } from '@daily-co/daily-react';
const MyComponent = () => {
const callState = useCallState(); // 'joining', 'joined', 'left', 'error'
if (callState === 'joining') return ;
if (callState === 'joined') return ;
return ;
};Server Events (App Messages)
HUMA sends custom events via Daily's app-message channel to tell you what the AGENT is doing.
transcript-user-finalSTTThe user finished a sentence. "What cards do you have?"
transcript-agent-textLLMThe agent generated text. "I have two 7s." (Before audio plays)
audio-startTTSAgent started speaking audio. Trigger "Speaking" animation.
audio-endTTSAgent finished speaking. Return to "Listening" state.
const [agentState, setAgentState] = useState('idle');
useAppMessage({
onAppMessage: (e) => {
switch (e.data.type) {
case 'transcript-user-final':
setAgentState('thinking'); // User stopped, agent is thinking
break;
case 'audio-start':
setAgentState('speaking'); // Agent started talking
break;
case 'audio-end':
setAgentState('idle'); // Done talking
break;
}
},
});Handling Interruption
Real conversations involve interruptions. If the user speaks while the agent is talking, the agent should stop immediately.
How it works automatically:
- 1
VAD Trigger: Deepgram detects user speech while agent is outputting audio.
- 2
Clear Queue: HUMA server clears the remaining audio buffer.
- 3
Stop Event: Client receives
audio-stopevent.
Visual Feedback
When you receive audio-stop, immediately cut the "Speaking" animation. This makes the interruption feel snappy and responsive.
Example Flow
Here is a full lifecycle trace for a single turn in Go Fish.
"Do you have any Kings?"
STT finalized. State: Thinking
"Nope, Go Fish!" (Text generated)
Audio starts playing. State: Speaking
Audio finished. State: Idle/Listening
Common Pitfalls
Missing "Thinking" State
Users will think the app is broken during the ~1s silence between speaking and response. Always show a visual indicator when transcript-user-final fires.
Ignoring Interruption
If the UI keeps showing "Speaking" after an interruption, it feels laggy. Listen for audio-stop or new transcript-user-final to reset state.
Summary
Key Events
- transcript-user-finalUser done speaking
- transcript-agent-textAgent text ready
- audio-startAudio playing
- audio-endAudio finished
UI States
- Listening (Default)
- Thinking (Wait)
- Speaking (Active)
Congratulations!
Integration Guide Complete
You've mastered the full HUMA integration process, from designing personalities to handling real-time voice interrupts. You are now ready to build production-grade Multi-User Agents.