# HUMA Integration Guide HUMA is an async, stateful API for creating human-like AI agents that can participate in real-time interactions like games, chats, and voice calls. ## Key Concepts - **State**: Your app's state (chat history, game state, etc.) is always available to HUMA - **Events**: State changes are passed as Context Updates with description + new state - **Tools**: HUMA communicates back via async tool calls that your app executes - **Asynchronous**: All tools are async - processing tools doesn't block HUMA from handling new events - **WebSocket**: Real-time bidirectional communication between your app and HUMA ## Architecture Overview 1. Create agent via REST API with personality, instructions, and tool definitions 2. Connect to agent via WebSocket with agent ID 3. Send events with context (chat history, user data, game state, etc.) 4. Receive tool-call events → Execute tool in your client → Send result back 5. Agent continues processing with tool results until complete ## Phase 1: Design Personality Define WHO the agent is - their character, background, and speaking style. ### What is Personality? Personality defines WHO your agent is - their character, background, voice, and quirks. It's separate from what they DO (that's instructions in Phase 2). Think of it like an actor's character brief: "Who is this person? What's their background? How do they talk?" ### Personality Schema ```typescript interface PersonalityDefinition { // Core identity name: string; // Agent's name background: string; // History, expertise, role // Character traits traits: string[]; // Key personality characteristics // Communication style speechPatterns: { tone: string; // Formal, casual, playful, etc. vocabulary: string; // Word choices, phrases they use quirks: string[]; // Unique speech habits }; // Current state mood: string; // Current emotional state motivation: string; // What drives them right now } ``` ### Example Personalities **Example 1: Finn (Friendly Go Fish Player)** ``` ## Core Traits Friendly, enthusiastic, genuinely curious about people, slightly competitive but always a good sport ## Background 28-year-old graphic designer from Portland. Loves board game nights with friends. Has been playing card games since childhood with his grandma. ## Speech Patterns - Uses phrases like "Oh nice!" and "Good one!" - Speaks in short, energetic sentences - Often asks follow-up questions about others - Occasionally references his grandma's card game wisdom ## Current Mood Excited to play - it's been a while since his last game night ## Motivation Wants to have fun and maybe win, but mostly enjoys the social aspect ``` **Example 2: Victoria (Strategic Analyst)** ``` ## Core Traits Analytical, methodical, quietly competitive, observant, occasionally dry humor ## Background 35-year-old data scientist. Approaches games like puzzles to solve. Keeps mental notes of patterns. ## Speech Patterns - Precise, measured language - "Interesting..." when analyzing - Rarely uses filler words - Occasionally drops statistical observations ## Current Mood Focused - treating this as a strategic exercise ## Motivation Proving optimal strategy beats luck ``` ### Best Practices 1. **Be Specific, Not Generic** - Bad: "Friendly and helpful" - Good: "Enthusiastic fitness coach who celebrates every small win with genuine excitement" 2. **Add Grounding Details** - Include recent events, specific preferences, or small habits - These make the character feel real and consistent 3. **Consider the Context** - How does this personality fit your use case? - A game agent needs different traits than a support agent 4. **Avoid Personality/Instruction Mixing** - Personality = WHO they are - Instructions = WHAT they do - Keep them separate for clarity ### Common Pitfalls - **Too vague**: "Nice person" tells the AI nothing useful - **Too long**: Overwhelming detail can dilute key traits - **Mixed with instructions**: "Friendly person who should always greet users" mixes personality with behavior rules - **Inconsistent traits**: "Shy but loves being center of attention" creates confusion ## Phase 2: Design Rules (Instructions) Define WHAT the agent does - their tasks, constraints, and behavioral guidelines. ### What are Instructions? Instructions define WHAT your agent does - their role, rules, and behavioral constraints. This is separate from WHO they are (that's personality in Phase 1). Think of it like a job description: "What's their role? What are the rules? What can/can't they do?" ### Instructions Schema ```typescript interface InstructionsDefinition { // Role definition role: string; // What is the agent's job? // Behavioral rules rules: { must: string[]; // Things the agent MUST do mustNot: string[]; // Things the agent must NEVER do should: string[]; // Preferred behaviors }; // Tool usage toolGuidelines: { [toolName: string]: { when: string; // When to use this tool how: string; // How to use it properly constraints: string[]; // Limitations }; }; // Information access visibility: { canSee: string[]; // What info the agent has access to cannotSee: string[]; // What info is hidden from agent }; } ``` ### Example Instructions **Go Fish Agent Instructions:** ``` ## Your Role You are playing Go Fish with other players. Your goal is to collect complete sets (4 of a kind) while creating a fun, social experience. ## Game Rules - On your turn, ask another player for a specific rank you hold - If they have cards of that rank, they give them all to you - If not, they say "Go Fish!" and you draw from the deck - When you collect 4 of a kind, you score a set - Game ends when all sets are collected ## Tool Usage Guidelines ### ask_for_cards - Use ONLY when it's your turn - You MUST have at least one card of the rank you're asking for - Target a player you think has the cards based on previous asks ### send_message - Use for social interaction and reactions - Keep messages brief and in-character - Don't spam - wait for natural conversation moments ## Information Visibility You CAN see: - Your own hand and cards - Number of cards each player has - Completed sets by all players - Recent game history and chat You CANNOT see: - Other players' actual cards - The deck order - Cards that were discarded ``` ### Structuring Instructions **Layer 1: Role Definition** Start with the big picture - what is this agent's job? **Layer 2: Core Rules** Define the must-do and must-not-do rules: - MUST: Required behaviors - MUST NOT: Prohibited actions - SHOULD: Preferred behaviors **Layer 3: Tool Guidelines** For each tool, specify: - When to use it - How to use it correctly - Any constraints or limitations **Layer 4: Information Scope** Clearly define what the agent can and cannot see. ### Best Practices 1. **Be Explicit About Constraints** - "You can only ask for ranks you have" is clearer than "follow the rules" 2. **Provide Decision Frameworks** - Help the agent know WHEN to take actions, not just WHAT actions exist 3. **Handle Edge Cases** - What should happen if it's not their turn? - What if they try to do something invalid? 4. **Keep Personality Separate** - Instructions define behavior, not character - "Be friendly" belongs in personality, not instructions ### Common Pitfalls - **Too vague**: "Play the game well" doesn't help - **Contradictory rules**: "Always respond" + "Don't spam" creates conflict - **Missing edge cases**: What happens when the deck is empty? - **Personality leakage**: "Be a friendly helper" mixes personality with instructions ## Phase 3: Design State (Context) Define the context structure that HUMA will receive with each event. ### The "Human Screen" Principle Design state as if you were building a UI for a human player. Include everything a human would see on their screen - no more, no less. If a human player would see it, include it. If they wouldn't, don't. ### State Schema ```typescript interface AgentContext { // Agent's own data - use "you" for clarity you: { name: string; // Agent's private data (hand, inventory, etc.) [key: string]: unknown; }; // Other participants' PUBLIC data only otherPlayers?: Array<{ name: string; // Only data visible to everyone [key: string]: unknown; }>; // Current situation currentState: { phase: string; turn?: string; // Other situational data [key: string]: unknown; }; // Recent history recentHistory: Array<{ type: string; description: string; timestamp?: string; }>; } ``` ### Example State Structure **Go Fish Game State:** ```json { "game": { "currentTurn": "finn", "deckSize": 24, "phase": "playing" }, "you": { "name": "Finn", "hand": ["7♠", "7♥", "K♦", "3♣", "3♥", "A♠"], "handSize": 6, "availableRanks": ["7", "K", "3", "A"], "completedSets": ["Q"], "setCount": 1 }, "otherPlayers": [ { "name": "Alice", "cardCount": 5, "completedSets": ["J", "9"], "setCount": 2 }, { "name": "Bob", "cardCount": 4, "completedSets": [], "setCount": 0 } ], "lastAction": { "type": "go_fish", "player": "Alice", "target": "Finn", "rank": "5", "description": "Alice asked Finn for 5s - Go Fish!" }, "recentHistory": [ { "type": "cards_received", "description": "Bob gave 2 sevens to Finn" }, { "type": "go_fish", "description": "Alice asked Finn for 5s - Go Fish!" } ], "chatHistory": [ { "author": "Alice", "message": "Nice hand, Finn!", "timestamp": "..." }, { "author": "Finn", "message": "Thanks! Getting lucky today", "timestamp": "..." } ] } ``` ### Key Design Principles **1. Use "you" for Agent's Own Data** The agent is playing AS this character. Use "you" to make it clear: - `you.hand` not `finnHand` - `you.score` not `agentScore` **2. Only Include Visible Information** - Other players' card COUNT: Yes (visible on table) - Other players' actual CARDS: No (hidden information) **3. Pre-compute Useful Data** Help the agent make decisions by including derived data: - `availableRanks` saves the agent from parsing the hand - `isYourTurn` is clearer than checking `currentTurn === you.name` **4. Include Relevant History** Recent actions provide context for decision-making: - Last 3-5 actions is usually sufficient - Include who did what and the outcome **5. Full Replacement on Each Event** Context is FULLY REPLACED with each event. Always send the complete state, not deltas. ### State Design Checklist - [ ] Agent's own data uses "you" field - [ ] Other players only show public information - [ ] Current situation is clear (whose turn, phase, etc.) - [ ] Recent history provides context - [ ] Pre-computed helpers for common decisions - [ ] No hidden information leaked - [ ] Complete state sent each time (not deltas) ### Common Pitfalls - **Leaking hidden info**: Including other players' hands - **Missing context**: Not including whose turn it is - **Inconsistent naming**: Sometimes "you", sometimes the agent's name - **Too much history**: Sending entire game log instead of recent actions - **Deltas instead of full state**: Sending only changes breaks context ## Phase 4: State Changes & Events Define how your app communicates state changes to HUMA. ### Event Types HUMA uses two types of events for communication: **1. Context Update Events** - Notify HUMA something happened **2. Tool Result Events** - Respond to agent's tool calls ### Context Update Event Schema ```typescript interface ContextUpdateEvent { type: 'huma-0.1-event'; content: { name: string; // Event name (kebab-case) context: Record; // Full state (replaces previous) description: string; // Human-readable description }; } ``` **Example:** ```typescript socket.emit('message', { type: 'huma-0.1-event', content: { name: 'player-asked-for-cards', context: { game: { currentTurn: 'finn', deckSize: 23 }, you: { name: 'Finn', hand: ['7♠', '7♥', '7♦', 'K♦'], ... }, otherPlayers: [...], lastAction: { type: 'cards_received', ... } }, description: 'Alice asked Bob for 7s and got 2 cards! It is now your turn.' } }); ``` ### Tool Result Event Schema ```typescript interface ToolResultEvent { type: 'huma-0.1-event'; content: { type: 'tool-result'; toolCallId: string; // ID from the tool-call event status: 'completed' | 'canceled'; success: boolean; result?: unknown; // Success result error?: string; // Error message if failed }; } ``` **Success Example:** ```typescript socket.emit('message', { type: 'huma-0.1-event', content: { type: 'tool-result', toolCallId: 'tc_abc123', status: 'completed', success: true, result: 'Victoria gave you 2 seven(s)! Your turn continues.' } }); ``` **Error Example:** ```typescript socket.emit('message', { type: 'huma-0.1-event', content: { type: 'tool-result', toolCallId: 'tc_abc123', status: 'completed', success: false, error: "It's not your turn. It's Alice's turn." } }); ``` ### Event Naming Conventions Use kebab-case with verb-noun pattern: - `turn-started` - `cards-received` - `player-joined` - `message-sent` - `game-ended` ### Description Best Practices Descriptions help the agent understand what happened and what to do next: **Bad:** "Cards transferred" **Good:** "Alice gave 2 sevens to Bob. Bob now has 4 sevens and completes a set! It's now Finn's turn." Include: 1. What happened 2. Who was involved 3. The outcome 4. What should happen next (if relevant) ### Event Sequencing When a tool changes game state, send context update events BEFORE the tool result: ```typescript // 1. Execute the game action const result = game.askForCards(agentId, targetId, rank); // 2. Send context update FIRST (with new state) socket.emit('message', { type: 'huma-0.1-event', content: { name: 'cards-received', context: buildAgentContext(game, agentId), // Fresh state description: 'Victoria gave 2 seven(s) to Finn!' } }); // 3. THEN send tool result socket.emit('message', { type: 'huma-0.1-event', content: { type: 'tool-result', toolCallId: 'tc_abc123', status: 'completed', success: true, result: 'Victoria gave you 2 seven(s)! Your turn continues.' } }); ``` **Why Events Before Results?** The agent processes messages in order. If you send the result first, the agent might decide its next action based on stale state. ### Server → Client Events | Event Type | Description | Fields | |------------|-------------|--------| | status | Agent processing state | status: 'idle' \| 'thinking' | | tool-call | Request to execute a tool | toolCallId, toolName, arguments | | cancel-tool-call | Cancel a pending tool call | toolCallId, reason? | | error | Error occurred | message, code? | ### Common Pitfalls - **Forgetting context**: Every event should include full state - **Wrong event order**: Send context updates before tool results - **Vague descriptions**: "Something happened" doesn't help - **Missing toolCallId**: Tool results must include the ID ## Phase 5: Design Tools Define what actions your agent can take and handle tool execution. ### What Are Tools? Tools are actions that HUMA agents can perform in your application. You **define** available tools when creating the agent, and **execute** them when HUMA sends tool-call events. ### Tool Flow 1. **Define Tools** - Create agent with tool definitions 2. **Receive Tool Calls** - HUMA sends tool-call event via WebSocket 3. **Execute & Respond** - Run the action, send tool-result back ### Tool Definition Schema ```typescript interface ToolDefinition { name: string; // Unique identifier (e.g., "ask_for_cards") description: string; // What the tool does (helps AI decide when to use) parameters: ToolParameter[]; } interface ToolParameter { name: string; // Parameter name (e.g., "targetPlayer") type: 'string' | 'number' | 'boolean' | 'object' | 'array'; description: string; // What this parameter is for required?: boolean; // Is this parameter required? (default: false) } ``` ### Example Tool Definitions **ask_for_cards (Main Game Action):** ```javascript const ASK_FOR_CARDS_TOOL = { name: 'ask_for_cards', description: 'Ask another player for all their cards of a specific rank. ' + 'You must already have at least one card of that rank in your hand. ' + 'Only use this when it is your turn.', parameters: [ { name: 'targetPlayer', type: 'string', description: 'The name of the player to ask', required: true, }, { name: 'rank', type: 'string', description: 'The card rank to ask for (e.g., "7", "K", "A")', required: true, }, ], }; ``` **send_message (Chat Tool):** ```javascript const SEND_MESSAGE_TOOL = { name: 'send_message', description: 'Send a chat message to all players. Use for reactions, comments, ' + 'or friendly conversation during the game.', parameters: [ { name: 'message', type: 'string', description: 'The message to send', required: true, }, ], }; ``` ### Receiving Tool Calls When HUMA decides to use a tool, it sends a tool-call event: ```typescript // HUMA → Your App { type: 'tool-call', toolCallId: 'tc_abc123', // Unique ID - save this! toolName: 'ask_for_cards', // Which tool to execute arguments: { // Arguments passed by the agent targetPlayer: 'Victoria', rank: '7' } } ``` **Important:** Save the toolCallId - you must include it in your result. ### Executing Tools When you receive a tool-call, validate inputs, execute the action, and return a result: ```javascript function handleToolCall(socket, game, agentPlayerId, toolCall) { const { toolCallId, toolName, arguments: args } = toolCall; switch (toolName) { case 'ask_for_cards': return executeAskForCards(socket, game, agentPlayerId, toolCallId, args); case 'send_message': return executeSendMessage(socket, game, agentPlayerId, toolCallId, args); default: // Unknown tool - return error socket.emit('message', { type: 'huma-0.1-event', content: { type: 'tool-result', toolCallId, status: 'completed', success: false, error: `Unknown tool: ${toolName}` } }); } } ``` ### Validation Example Always validate before executing: ```javascript function executeAskForCards(socket, game, agentPlayerId, toolCallId, args) { const { targetPlayer, rank } = args; // 1. Validate it's the agent's turn const currentPlayer = game.getCurrentPlayer(); if (currentPlayer.id !== agentPlayerId) { return sendError(socket, toolCallId, `It's not your turn. It's ${currentPlayer.name}'s turn.`); } // 2. Validate target player exists const target = game.getPlayerByName(targetPlayer); if (!target) { return sendError(socket, toolCallId, `Player "${targetPlayer}" not found.`); } // 3. Validate agent has the rank they're asking for const agentPlayer = game.getPlayer(agentPlayerId); const hasRank = agentPlayer.hand.some(card => card.rank === rank); if (!hasRank) { return sendError(socket, toolCallId, `You don't have any ${rank}s. You can only ask for ranks you have.`); } // All valid - execute the game action const result = game.askForCards(agentPlayerId, target.id, rank); // ... send events and result ... } ``` ### Sending Tool Results After executing a tool, send back a result: **Success Result:** ```javascript socket.emit('message', { type: 'huma-0.1-event', content: { type: 'tool-result', toolCallId: 'tc_abc123', status: 'completed', success: true, result: 'Victoria gave you 2 seven(s)!' } }); ``` **Error Result:** ```javascript socket.emit('message', { type: 'huma-0.1-event', content: { type: 'tool-result', toolCallId: 'tc_abc123', status: 'completed', success: false, error: "It's not your turn." } }); ``` ### Result Helper Functions ```javascript function createToolResultSuccess(toolCallId, result) { return { type: 'huma-0.1-event', content: { type: 'tool-result', toolCallId, status: 'completed', success: true, result, }, }; } function createToolResultError(toolCallId, error) { return { type: 'huma-0.1-event', content: { type: 'tool-result', toolCallId, status: 'completed', success: false, error, }, }; } ``` ### Tool Cancellation HUMA may send cancel-tool-call to abort a pending tool: ```javascript // HUMA → Your App { type: 'cancel-tool-call', toolCallId: 'tc_abc123', reason: 'User interrupted' // Optional } ``` Handle cancellation: ```javascript function handleCancelToolCall(socket, toolCallId, reason) { socket.emit('message', { type: 'huma-0.1-event', content: { type: 'tool-result', toolCallId, status: 'canceled', success: false, error: reason || 'Canceled by agent' } }); } ``` ### Tool Design Best Practices 1. **Use Clear, Action-Oriented Names** - Good: `ask_for_cards`, `send_message`, `place_bet` - Avoid: `action`, `do_thing`, `execute` 2. **Write Helpful Descriptions** - Bad: "Ask for cards" - Good: "Ask another player for all their cards of a specific rank. You must already have at least one card of that rank. Only use when it's your turn." 3. **Validate All Inputs** - Required parameters exist - Values are within valid ranges - Action is allowed in current state - Player has permission 4. **Return Useful Error Messages** - Bad: "Invalid input" - Good: "You don't have any 7s. You can only ask for ranks you have: K, 3, A" 5. **Match Tool to Instructions** - Tool def: "Only use when it's your turn" - Instructions: "When it's your turn, use ask_for_cards..." ### Common Pitfalls - **Forgetting the toolCallId**: Always include it in results - **No input validation**: Never trust inputs blindly - **Vague tool descriptions**: Be specific about what, when, and constraints - **Wrong event order**: Send context updates before tool results - **Not handling unknown tools**: Always have a default error case ## Phase 6: Pick a Router Choose how HUMA selects behavioral strategies for your agent. ### What is a Router? Before the agent generates a response, HUMA's **router** decides HOW the agent should behave. Should it be curious? Playful? Reserved? The router scores different behavioral strategies and the agent acts accordingly. **Flow:** Event Received → Router (Scores strategies) → Action Agent (Generates response) ### Available Router Types #### llm-judge (Default) **Best for:** General purpose applications Uses an LLM to score 13 behavioral strategies based on personality, context, and the current event: - CURIOUS, ASSERTIVE, SUPPORTIVE, PLAYFUL, CAUTIOUS - CHALLENGING, EMPATHETIC, STRATEGIC, DIRECT, REFLECTIVE - ENTHUSIASTIC, RESERVED, QUIET **Characteristics:** - Context-aware - Personality-driven - ~100-200ms latency #### random **Best for:** Testing, low latency requirements Randomly selects strategies without LLM call. Fast and unpredictable. **Characteristics:** - Near-instant - No LLM call - Unpredictable behavior #### conversational **Best for:** Chat applications, natural dialogue Designed for conversational flows. Includes "timeliness" scoring - how urgent is it for the agent to respond? **Characteristics:** - Timeliness-aware - Natural pacing - Good for back-and-forth dialogue #### turn-taking **Best for:** Multi-agent voice conversations Optimized for scenarios where multiple agents might speak. Uses turn-taking strategies to prevent overlapping speech. **Characteristics:** - Multi-agent friendly - Voice-optimized - Prevents overlapping #### voice-game **Best for:** Voice gaming scenarios LLM-based router specialized for voice gaming. Scores 4 game-specific strategies: - GENERAL_TALK: Share stories, start topics - REACT: Respond to game events - TALK_TO_OTHERS: Social, conversational - FOCUS_ON_GAME: Stay quiet, concentrate **Characteristics:** - Game-aware - Voice-optimized - No timeliness calculation ### Router Selection Guide | Use Case | Router | Why | |----------|--------|-----| | Turn-based game (Go Fish) | llm-judge | Personality-driven decisions, strategic thinking | | Chat application | conversational | Natural dialogue flow, timeliness awareness | | Voice co-op gaming | voice-game | Game-specific strategies, voice-optimized | | Multi-agent voice room | turn-taking | Prevents overlapping, natural turn flow | | Testing / Development | random | Fast iteration, no LLM cost | | Latency-sensitive app | random | Skip router LLM call entirely | ### Setting the Router Specify `routerType` in your agent metadata: ```typescript type RouterType = 'random' | 'llm-judge' | 'conversational' | 'turn-taking' | 'voice-game'; const metadata = { className: 'Finn', personality: FINN_PERSONALITY, instructions: FINN_INSTRUCTIONS, tools: GO_FISH_TOOLS, routerType: 'llm-judge', // Default if omitted }; ``` ### How Strategies Affect Behavior The router's strategy scores directly shape how the agent responds: **Finn (Friendly) - High PLAYFUL + ENTHUSIASTIC:** - "Ooh, I'll ask Victoria for some 7s!" - "Yes! Got 'em! Thanks Victoria!" - "Aw man, go fish again..." **Victoria (Strategic) - High STRATEGIC + RESERVED:** - "Based on the previous asks, I'll try..." - "Interesting. As expected." - "Well played." ### When to Change Routers **Same Agent, Different Contexts:** Use the same personality but different routers. Finn might use `llm-judge` in a turn-based card game but `voice-game` in a real-time co-op shooter. **Performance Optimization:** If latency matters more than nuanced behavior, switch to `random`. Upgrade to `llm-judge` later. **Multi-Agent Scenarios:** When multiple agents might respond simultaneously, consider `turn-taking` or `voice-game` to manage coordination. ### Common Pitfalls - **Using random for production**: Great for testing but produces inconsistent personality - **Wrong router for voice**: `llm-judge` in real-time voice can feel sluggish - **Over-engineering router choice**: Start with `llm-judge`. Only switch if you have specific issues ## Phase 7: Voice Mode Add voice capabilities to your HUMA agents using Daily.co and ElevenLabs. ### What is Voice Mode? Voice mode enables HUMA agents to join Daily.co audio rooms and communicate via speech: - **Listen (STT)**: Hear user speech via Deepgram transcription - **Speak (TTS)**: Respond vocally using ElevenLabs text-to-speech ### Voice Architecture ``` Your App ←→ Client WebSocket ←→ HUMA API ↓ Voice Service ↓ Daily.co | Deepgram | ElevenLabs ``` ### VoiceConfig Schema ```typescript interface VoiceConfig { /** Enable voice mode for this agent */ enabled: boolean; /** ElevenLabs voice ID for TTS (optional) */ voiceId?: string; } // In Huma01Metadata: interface Huma01Metadata { className: string; personality: string; instructions: string; tools: ToolDefinition[]; routerType?: RouterType; voice?: VoiceConfig; // Add voice config here } ``` ### Enabling Voice Mode **Step 1: Add voice config to metadata** ```javascript const metadata = { className: 'Finn', personality: FINN_PERSONALITY, instructions: FINN_INSTRUCTIONS, tools: GO_FISH_TOOLS, routerType: 'voice-game', // Use voice-game router for gaming scenarios // Enable voice mode voice: { enabled: true, voiceId: 'EXAVITQu4vr4xnSDxMaL', // ElevenLabs voice ID (optional) }, }; ``` **Step 2: Create agent with voice config** ```javascript const response = await fetch('https://api.humalike.tech/api/agents', { method: 'POST', headers: { 'Content-Type': 'application/json', 'X-API-Key': 'ak_your_key', }, body: JSON.stringify({ name: 'Finn Voice Agent', agentType: 'HUMA-0.1', metadata: metadata, // Includes voice config }), }); const { id: agentId } = await response.json(); ``` **Step 3: Agent automatically gets the speak tool** When voice is enabled and the agent joins a room, HUMA automatically adds a `speak` tool: ```javascript { name: 'speak', description: 'Speak text aloud using text-to-speech. Only available when in a voice room.', parameters: [ { name: 'text', type: 'string', description: 'The text to speak', required: true, } ], } ``` ### Choosing a Voice The `voiceId` is an ElevenLabs voice identifier. You can: - Use ElevenLabs pre-made voices (browse at elevenlabs.io) - Clone custom voices in your ElevenLabs account - Omit voiceId to use the default voice **Example Voice IDs:** | Voice ID | Name | Character | |----------|------|-----------| | EXAVITQu4vr4xnSDxMaL | Sarah | Warm, friendly female | | 21m00Tcm4TlvDq8ikWAM | Rachel | Professional female | | pNInz6obpgDQGcFmaJgB | Adam | Friendly male | | VR6AewLTigWG4xSOukaG | Arnold | Deep, confident male | **Tip:** Match voice to personality. Friendly Finn should have a warm, casual voice. Strategic Victoria might suit a more measured, professional voice. ### Router Selection for Voice | Use Case | Router | Why | |----------|--------|-----| | Voice gaming (Go Fish) | voice-game | Game-specific strategies | | Multi-agent voice room | turn-taking | Prevents overlapping speech | | Voice chat / assistant | conversational | Natural dialogue with timeliness | ### Voice Mode vs Text Mode | Aspect | Text Mode | Voice Mode | |--------|-----------|------------| | Input | Events from your app | Events + voice transcripts | | Output | Tool calls | Tool calls + speak tool | | Recommended router | llm-judge | voice-game, turn-taking | | Latency sensitivity | Low | High (real-time) | ### Voice-Specific Instructions Voice mode works best with shorter, more conversational responses: ```javascript const instructions = FINN_INSTRUCTIONS + ` ## Voice Mode Instructions When in voice mode: - Use the speak tool to talk to other players - Keep responses natural and conversational - React vocally to game events ("Nice!" "Go fish!") - Don't speak too much - short phrases work best `; ``` ### Common Pitfalls - **Not enabling voice at creation time**: Voice must be enabled in metadata when creating the agent - **Using wrong router**: `llm-judge` adds latency that can feel sluggish in voice - **Long responses**: Text instructions might produce long responses that sound unnatural when spoken - **Invalid voiceId**: If you provide an invalid ElevenLabs voiceId, TTS will fail silently ## Phase 8: Voice Lifecycle Manage Daily.co rooms, handle voice events, and control the agent's presence in voice calls. ### Lifecycle Overview A voice-enabled agent goes through several states: ``` Agent Created (voice.enabled: true) ↓ Connected (Client WebSocket active) ↓ [join-daily-room] → ↓ In Voice Room (Can speak & hear) ↓ [leave-daily-room] → ↓ Left Room (Can rejoin) ``` ### Joining a Daily.co Room Send a `join-daily-room` control event: ```javascript socket.emit('message', { type: 'join-daily-room', roomUrl: 'https://your-domain.daily.co/room-name', }); ``` **Join Flow:** 1. Your app sends join-daily-room with Daily.co room URL 2. HUMA validates voice is enabled (returns error if not) 3. HUMA spawns Voice Service agent 4. Voice Service connects to Daily.co (agent appears in room) 5. HUMA emits voice-status: joined **Handling Join Response:** ```javascript socket.on('event', (event) => { if (event.type === 'voice-status') { switch (event.status) { case 'joined': console.log('Agent joined room:', event.roomUrl); break; case 'left': console.log('Agent left room'); break; case 'error': console.error('Voice error:', event.error); break; } } }); ``` ### Leaving a Room Send a `leave-daily-room` control event: ```javascript socket.emit('message', { type: 'leave-daily-room', }); ``` **Note:** If your client WebSocket disconnects, HUMA automatically removes the agent from any voice room. ### Voice Events While in a voice room, HUMA processes several event types: #### Transcript Events (Voice → Agent) When someone speaks, HUMA receives transcripts: ```javascript // Transcript event forwarded to your app { type: 'transcript', text: 'Hey Finn, do you have any sevens?', isFinal: true, speaker: 'Alice' } // Internally converted to HUMA-0.1 event: { name: 'voice-transcript', description: 'Alice said: "Hey Finn, do you have any sevens?"', context: { source: 'voice', participant: 'Alice', text: 'Hey Finn, do you have any sevens?' } } ``` #### Speak Events (Agent → Voice) When the agent decides to speak: ```javascript { type: 'speak-status', status: 'started', commandId: 'sp_abc123' } // TTS audio is now playing { type: 'speak-status', status: 'finished', commandId: 'sp_abc123' } // Speech completed normally // OR { type: 'speak-status', status: 'interrupted', commandId: 'sp_abc123' } // User interrupted the agent { type: 'speak-status', status: 'failed', commandId: 'sp_abc123', error: '...' } // TTS failed ``` #### Participant Events (Room Events) When participants join or leave: ```javascript // Participant joined { name: 'voice-participant-joined', description: 'Alice joined the voice room', context: { source: 'voice', participant: 'Alice' } } // Participant left { name: 'voice-participant-left', description: 'Alice left the voice room', context: { source: 'voice', participant: 'Alice', reason: 'left call' } } ``` #### Interruption Events (Barge-in) When a user interrupts the agent while speaking: ```javascript { name: 'voice-interrupted', description: 'Alice interrupted you while you were saying: "Let me think about..."', context: { source: 'voice', interruptedBy: 'Alice', spokenText: 'Let me think about...' } } ``` ### Complete Voice Flow Example ```javascript import { io } from 'socket.io-client'; // 1. Connect to HUMA (agent already has voice.enabled: true) const socket = io('wss://api.humalike.tech', { query: { agentId: 'agent_123', apiKey: 'ak_your_key' }, transports: ['websocket'], }); // 2. Listen for all events socket.on('event', (event) => { switch (event.type) { case 'voice-status': handleVoiceStatus(event); break; case 'transcript': handleTranscript(event); break; case 'speak-status': handleSpeakStatus(event); break; case 'tool-call': handleToolCall(event); break; } }); // 3. Join voice room function joinVoiceRoom(roomUrl) { socket.emit('message', { type: 'join-daily-room', roomUrl: roomUrl, }); } // 4. Handle voice status function handleVoiceStatus(event) { if (event.status === 'joined') { console.log('Agent is now in voice room'); showVoiceUI(); } else if (event.status === 'left') { hideVoiceUI(); } else if (event.status === 'error') { showError(event.error); } } // 5. Leave voice room function leaveVoiceRoom() { socket.emit('message', { type: 'leave-daily-room', }); } ``` ### Agent Lifecycle in Voice Calls **When Agent Joins:** - Agent appears in Daily.co room participant list - `speak` tool becomes available - Agent starts receiving transcript events - Participant join/leave events trigger agent reactions **During Voice Call:** - Agent processes transcripts through router → action agent - Agent can call `speak` to respond vocally - Agent can also call other tools (game actions, etc.) - Interruptions cancel current speech and notify agent **When Agent Leaves:** - Agent disappears from Daily.co room - `speak` tool no longer available - Pending speak calls are canceled - Client WebSocket remains connected (can rejoin) **When Client Disconnects:** - Agent automatically leaves voice room - Voice Service agent is terminated - All resources are cleaned up ### Error Handling | Error | Cause | Response | |-------|-------|----------| | Voice not enabled | Trying to join without voice config | Enable voice in agent metadata | | Already in room | Trying to join while in a room | Leave first, then join new room | | Voice Service unavailable | Service cannot be reached | Retry or notify user | | Unexpected disconnect | Voice Service disconnected | Attempt to rejoin | ### Common Pitfalls - **Not waiting for voice-status: joined**: The join-daily-room event returns immediately. Wait for the status event. - **Invalid Daily.co room URL**: Ensure the room exists and is accessible - **Not handling speak errors**: Handle failed and interrupted speak events - **Ignoring interim transcripts**: HUMA only processes final transcripts (`isFinal: true`) ## Implementation Summary Once all phases are complete, you'll have: 1. **Personality** - WHO the agent is (character, speech patterns) 2. **Instructions** - WHAT the agent does (rules, constraints) 3. **State Design** - Context the agent receives 4. **Event System** - How state changes are communicated 5. **Tool Definitions** - Actions the agent can take 6. **Router Selection** - How agent decides when to act 7. **Voice Config** - Enabling voice and selecting voice ID 8. **Voice Lifecycle** - Daily.co room management ## Complete Code Template (With Voice) ```typescript import { io } from 'socket.io-client'; const API = 'https://api.humalike.tech'; const API_KEY = 'ak_your_api_key'; // 1. Define Agent Metadata WITH VOICE ENABLED const AGENT_METADATA = { className: 'Your Agent', personality: `[Your personality from Phase 1]`, instructions: ` [Your instructions from Phase 2] ## Voice Mode Instructions When in voice mode: - Use the speak tool to talk - Keep responses short and natural - React vocally to events `, tools: [ { name: 'your_tool', description: 'Description', parameters: [ { name: 'param1', type: 'string', description: 'Description', required: true } ] } ], routerType: 'voice-game', // CRITICAL: Enable voice voice: { enabled: true, voiceId: 'EXAVITQu4vr4xnSDxMaL' // ElevenLabs voice ID } }; // 2. Create Agent Instance async function createAgent() { const response = await fetch(`${API}/api/agents`, { method: 'POST', headers: { 'Content-Type': 'application/json', 'X-API-Key': API_KEY }, body: JSON.stringify({ name: 'Voice Agent', agentType: 'HUMA-0.1', metadata: AGENT_METADATA }) }); return response.json(); } // 3. Connect WebSocket with Voice Event Handling function connectToAgent(agentId) { const socket = io(API, { query: { agentId, apiKey: API_KEY }, transports: ['websocket'] }); socket.on('event', (data) => { switch (data.type) { case 'status': console.log('Agent status:', data.status); break; case 'tool-call': handleToolCall(socket, data); break; // Voice events case 'voice-status': if (data.status === 'joined') console.log('Agent joined voice room'); if (data.status === 'left') console.log('Agent left voice room'); if (data.status === 'error') console.error('Voice error:', data.error); break; case 'transcript': if (data.isFinal) { console.log(`${data.speaker} said: ${data.text}`); } break; case 'speak-status': console.log(`Agent ${data.status} speaking`); break; } }); return socket; } // 4. Join Voice Room function joinVoiceRoom(socket, roomUrl) { socket.emit('message', { type: 'join-daily-room', roomUrl }); } // 5. Leave Voice Room function leaveVoiceRoom(socket) { socket.emit('message', { type: 'leave-daily-room' }); } ``` --- # APPENDIX: Full API Documentation ## A. HUMA-0.1 Agent API Overview HUMA-0.1 is an event-driven agent architecture that enables real-time AI interactions through WebSocket connections. **Key Concepts:** - **Client-Defined Agents**: Define personality, instructions, and tools in your client code - **Custom Tools**: Define any tools you need - executed by your client, results sent back - **Flexible Context**: Send any JSON context - chat history, game state, user data, etc. - **Event-Driven**: Real-time bidirectional communication via WebSocket ## B. Authentication All API endpoints require API key authentication. **Creating API Keys:** ``` POST /api/users/{userId}/api-keys { "name": "Production Key" } Response: { "id": "clx456...", "key": "ak_Xk9mP2qR4tV6wY8zA1bC3dE5fG7hJ9kL", // Save this! "keyPreview": "...9kL", "name": "Production Key" } ``` **Using API Keys:** ```typescript // HTTP Requests fetch('/api/agents', { headers: { 'Content-Type': 'application/json', 'X-API-Key': 'ak_your_api_key' } }); // WebSocket Connection const socket = io('wss://api.humalike.tech', { query: { agentId: 'your-agent-id', apiKey: 'ak_your_api_key' } }); ``` ## C. Agent Lifecycle Every agent goes through distinct phases: **CREATED → CONNECTED → ACTIVE → ENDED** - **Created**: Agent exists in database, not active - **Connected**: WebSocket connection established - **Active**: Processing events (IDLE → THINKING → ACTING → IDLE) - **Ended**: Manual deletion, TTL expiration, or system cleanup ## D. REST API Reference | Method | Endpoint | Description | |--------|----------|-------------| | GET | /api/agents | List all agents | | POST | /api/agents | Create new agent | | GET | /api/agents/{id} | Get agent details | | PUT | /api/agents/{id}/state | Update agent state | | DELETE | /api/agents/{id} | Delete agent | ## E. WebSocket Events ### Client → Server All events sent via `message` channel with type `huma-0.1-event`: ```typescript // Context Update socket.emit('message', { type: 'huma-0.1-event', content: { name: 'event-name', context: { /* full state */ }, description: 'What happened' } }); // Tool Result socket.emit('message', { type: 'huma-0.1-event', content: { type: 'tool-result', toolCallId: 'tc_xxx', status: 'completed', success: true, result: 'Result data' } }); ``` ### Server → Client | Event | Description | Fields | |-------|-------------|--------| | status | Agent processing state | status: 'idle' \| 'thinking' | | tool-call | Execute a tool | toolCallId, toolName, arguments | | cancel-tool-call | Cancel pending tool | toolCallId, reason? | | error | Error occurred | message, code? | --- ## F. Voice API Reference ### VoiceConfig Schema ```typescript interface VoiceConfig { enabled: boolean; // Must be true for voice voiceId?: string; // ElevenLabs voice ID (optional) } ``` ### Voice Control Events (Client → HUMA) **Join Room:** ```typescript socket.emit('message', { type: 'join-daily-room', roomUrl: 'https://your-domain.daily.co/room-name' }); ``` **Leave Room:** ```typescript socket.emit('message', { type: 'leave-daily-room' }); ``` ### Voice Status Events (HUMA → Client) | Event | Description | Fields | |-------|-------------|--------| | voice-status | Agent joined/left/error | status, roomUrl, error | | transcript | Speech-to-text | speaker, text, isFinal | | speak-status | Agent speaking state | status: started/finished/interrupted/failed | | participant-joined | Someone joined | participantId | | participant-left | Someone left | participantId | ### Best Practices - Always wait for `voice-status: joined` before assuming agent is in call - Use `isFinal: true` on transcripts for complete speech segments - Handle `speak-status: interrupted` for natural conversation flow - Send `leave-daily-room` before disconnecting (auto-cleanup on disconnect)