# HUMA Integration Guide

HUMA is an async, stateful API for creating human-like AI agents that can participate in real-time interactions like games, chats, and voice calls.

## Key Concepts

- **State**: Your app's state (chat history, game state, etc.) is always available to HUMA
- **Events**: State changes are passed as Context Updates with description + new state
- **Tools**: HUMA communicates back via async tool calls that your app executes
- **Asynchronous**: All tools are async - processing tools doesn't block HUMA from handling new events
- **WebSocket**: Real-time bidirectional communication between your app and HUMA

## Architecture Overview

1. Create agent via REST API with personality, instructions, and tool definitions
2. Connect to agent via WebSocket with agent ID
3. Send events with context (chat history, user data, game state, etc.)
4. Receive tool-call events → Execute tool in your client → Send result back
5. Agent continues processing with tool results until complete

## Phase 1: Design Personality

Define WHO the agent is - their character, background, and speaking style.

### What is Personality?

Personality defines WHO your agent is - their character, background, voice, and quirks. It's separate from what they DO (that's instructions in Phase 2).

Think of it like an actor's character brief: "Who is this person? What's their background? How do they talk?"

### Personality Schema

```typescript
interface PersonalityDefinition {
  // Core identity
  name: string;           // Agent's name
  background: string;     // History, expertise, role

  // Character traits
  traits: string[];       // Key personality characteristics

  // Communication style
  speechPatterns: {
    tone: string;         // Formal, casual, playful, etc.
    vocabulary: string;   // Word choices, phrases they use
    quirks: string[];     // Unique speech habits
  };

  // Current state
  mood: string;           // Current emotional state
  motivation: string;     // What drives them right now
}
```

### Example Personalities

**Example 1: Finn (Friendly Go Fish Player)**
```
## Core Traits
Friendly, enthusiastic, genuinely curious about people, slightly competitive but always a good sport

## Background
28-year-old graphic designer from Portland. Loves board game nights with friends. Has been playing card games since childhood with his grandma.

## Speech Patterns
- Uses phrases like "Oh nice!" and "Good one!"
- Speaks in short, energetic sentences
- Often asks follow-up questions about others
- Occasionally references his grandma's card game wisdom

## Current Mood
Excited to play - it's been a while since his last game night

## Motivation
Wants to have fun and maybe win, but mostly enjoys the social aspect
```

**Example 2: Victoria (Strategic Analyst)**
```
## Core Traits
Analytical, methodical, quietly competitive, observant, occasionally dry humor

## Background
35-year-old data scientist. Approaches games like puzzles to solve. Keeps mental notes of patterns.

## Speech Patterns
- Precise, measured language
- "Interesting..." when analyzing
- Rarely uses filler words
- Occasionally drops statistical observations

## Current Mood
Focused - treating this as a strategic exercise

## Motivation
Proving optimal strategy beats luck
```

### Best Practices

1. **Be Specific, Not Generic**
   - Bad: "Friendly and helpful"
   - Good: "Enthusiastic fitness coach who celebrates every small win with genuine excitement"

2. **Add Grounding Details**
   - Include recent events, specific preferences, or small habits
   - These make the character feel real and consistent

3. **Consider the Context**
   - How does this personality fit your use case?
   - A game agent needs different traits than a support agent

4. **Avoid Personality/Instruction Mixing**
   - Personality = WHO they are
   - Instructions = WHAT they do
   - Keep them separate for clarity

### Common Pitfalls

- **Too vague**: "Nice person" tells the AI nothing useful
- **Too long**: Overwhelming detail can dilute key traits
- **Mixed with instructions**: "Friendly person who should always greet users" mixes personality with behavior rules
- **Inconsistent traits**: "Shy but loves being center of attention" creates confusion

## Phase 2: Design Rules (Instructions)

Define WHAT the agent does - their tasks, constraints, and behavioral guidelines.

### What are Instructions?

Instructions define WHAT your agent does - their role, rules, and behavioral constraints. This is separate from WHO they are (that's personality in Phase 1).

Think of it like a job description: "What's their role? What are the rules? What can/can't they do?"

### Instructions Schema

```typescript
interface InstructionsDefinition {
  // Role definition
  role: string;               // What is the agent's job?

  // Behavioral rules
  rules: {
    must: string[];           // Things the agent MUST do
    mustNot: string[];        // Things the agent must NEVER do
    should: string[];         // Preferred behaviors
  };

  // Tool usage
  toolGuidelines: {
    [toolName: string]: {
      when: string;           // When to use this tool
      how: string;            // How to use it properly
      constraints: string[];  // Limitations
    };
  };

  // Information access
  visibility: {
    canSee: string[];         // What info the agent has access to
    cannotSee: string[];      // What info is hidden from agent
  };
}
```

### Example Instructions

**Go Fish Agent Instructions:**
```
## Your Role
You are playing Go Fish with other players. Your goal is to collect complete sets (4 of a kind) while creating a fun, social experience.

## Game Rules
- On your turn, ask another player for a specific rank you hold
- If they have cards of that rank, they give them all to you
- If not, they say "Go Fish!" and you draw from the deck
- When you collect 4 of a kind, you score a set
- Game ends when all sets are collected

## Tool Usage Guidelines

### ask_for_cards
- Use ONLY when it's your turn
- You MUST have at least one card of the rank you're asking for
- Target a player you think has the cards based on previous asks

### send_message
- Use for social interaction and reactions
- Keep messages brief and in-character
- Don't spam - wait for natural conversation moments

## Information Visibility
You CAN see:
- Your own hand and cards
- Number of cards each player has
- Completed sets by all players
- Recent game history and chat

You CANNOT see:
- Other players' actual cards
- The deck order
- Cards that were discarded
```

### Structuring Instructions

**Layer 1: Role Definition**
Start with the big picture - what is this agent's job?

**Layer 2: Core Rules**
Define the must-do and must-not-do rules:
- MUST: Required behaviors
- MUST NOT: Prohibited actions
- SHOULD: Preferred behaviors

**Layer 3: Tool Guidelines**
For each tool, specify:
- When to use it
- How to use it correctly
- Any constraints or limitations

**Layer 4: Information Scope**
Clearly define what the agent can and cannot see.

### Best Practices

1. **Be Explicit About Constraints**
   - "You can only ask for ranks you have" is clearer than "follow the rules"

2. **Provide Decision Frameworks**
   - Help the agent know WHEN to take actions, not just WHAT actions exist

3. **Handle Edge Cases**
   - What should happen if it's not their turn?
   - What if they try to do something invalid?

4. **Keep Personality Separate**
   - Instructions define behavior, not character
   - "Be friendly" belongs in personality, not instructions

### Common Pitfalls

- **Too vague**: "Play the game well" doesn't help
- **Contradictory rules**: "Always respond" + "Don't spam" creates conflict
- **Missing edge cases**: What happens when the deck is empty?
- **Personality leakage**: "Be a friendly helper" mixes personality with instructions

## Phase 3: Design State (Context)

Define the context structure that HUMA will receive with each event.

### The "Human Screen" Principle

Design state as if you were building a UI for a human player. Include everything a human would see on their screen - no more, no less.

If a human player would see it, include it. If they wouldn't, don't.

### State Schema

```typescript
interface AgentContext {
  // Agent's own data - use "you" for clarity
  you: {
    name: string;
    // Agent's private data (hand, inventory, etc.)
    [key: string]: unknown;
  };

  // Other participants' PUBLIC data only
  otherPlayers?: Array<{
    name: string;
    // Only data visible to everyone
    [key: string]: unknown;
  }>;

  // Current situation
  currentState: {
    phase: string;
    turn?: string;
    // Other situational data
    [key: string]: unknown;
  };

  // Recent history
  recentHistory: Array<{
    type: string;
    description: string;
    timestamp?: string;
  }>;
}
```

### Example State Structure

**Go Fish Game State:**
```json
{
  "game": {
    "currentTurn": "finn",
    "deckSize": 24,
    "phase": "playing"
  },
  "you": {
    "name": "Finn",
    "hand": ["7♠", "7♥", "K♦", "3♣", "3♥", "A♠"],
    "handSize": 6,
    "availableRanks": ["7", "K", "3", "A"],
    "completedSets": ["Q"],
    "setCount": 1
  },
  "otherPlayers": [
    {
      "name": "Alice",
      "cardCount": 5,
      "completedSets": ["J", "9"],
      "setCount": 2
    },
    {
      "name": "Bob",
      "cardCount": 4,
      "completedSets": [],
      "setCount": 0
    }
  ],
  "lastAction": {
    "type": "go_fish",
    "player": "Alice",
    "target": "Finn",
    "rank": "5",
    "description": "Alice asked Finn for 5s - Go Fish!"
  },
  "recentHistory": [
    {
      "type": "cards_received",
      "description": "Bob gave 2 sevens to Finn"
    },
    {
      "type": "go_fish",
      "description": "Alice asked Finn for 5s - Go Fish!"
    }
  ],
  "chatHistory": [
    { "author": "Alice", "message": "Nice hand, Finn!", "timestamp": "..." },
    { "author": "Finn", "message": "Thanks! Getting lucky today", "timestamp": "..." }
  ]
}
```

### Key Design Principles

**1. Use "you" for Agent's Own Data**
The agent is playing AS this character. Use "you" to make it clear:
- `you.hand` not `finnHand`
- `you.score` not `agentScore`

**2. Only Include Visible Information**
- Other players' card COUNT: Yes (visible on table)
- Other players' actual CARDS: No (hidden information)

**3. Pre-compute Useful Data**
Help the agent make decisions by including derived data:
- `availableRanks` saves the agent from parsing the hand
- `isYourTurn` is clearer than checking `currentTurn === you.name`

**4. Include Relevant History**
Recent actions provide context for decision-making:
- Last 3-5 actions is usually sufficient
- Include who did what and the outcome

**5. Full Replacement on Each Event**
Context is FULLY REPLACED with each event. Always send the complete state, not deltas.

### State Design Checklist

- [ ] Agent's own data uses "you" field
- [ ] Other players only show public information
- [ ] Current situation is clear (whose turn, phase, etc.)
- [ ] Recent history provides context
- [ ] Pre-computed helpers for common decisions
- [ ] No hidden information leaked
- [ ] Complete state sent each time (not deltas)

### Common Pitfalls

- **Leaking hidden info**: Including other players' hands
- **Missing context**: Not including whose turn it is
- **Inconsistent naming**: Sometimes "you", sometimes the agent's name
- **Too much history**: Sending entire game log instead of recent actions
- **Deltas instead of full state**: Sending only changes breaks context

## Phase 4: State Changes & Events

Define how your app communicates state changes to HUMA.

### Event Types

HUMA uses two types of events for communication:

**1. Context Update Events** - Notify HUMA something happened
**2. Tool Result Events** - Respond to agent's tool calls

### Context Update Event Schema

```typescript
interface ContextUpdateEvent {
  type: 'huma-0.1-event';
  content: {
    name: string;                      // Event name (kebab-case)
    context: Record<string, unknown>;  // Full state (replaces previous)
    description: string;               // Human-readable description
  };
}
```

**Example:**
```typescript
socket.emit('message', {
  type: 'huma-0.1-event',
  content: {
    name: 'player-asked-for-cards',
    context: {
      game: { currentTurn: 'finn', deckSize: 23 },
      you: { name: 'Finn', hand: ['7♠', '7♥', '7♦', 'K♦'], ... },
      otherPlayers: [...],
      lastAction: { type: 'cards_received', ... }
    },
    description: 'Alice asked Bob for 7s and got 2 cards! It is now your turn.'
  }
});
```

### Tool Result Event Schema

```typescript
interface ToolResultEvent {
  type: 'huma-0.1-event';
  content: {
    type: 'tool-result';
    toolCallId: string;                // ID from the tool-call event
    status: 'completed' | 'canceled';
    success: boolean;
    result?: unknown;                  // Success result
    error?: string;                    // Error message if failed
  };
}
```

**Success Example:**
```typescript
socket.emit('message', {
  type: 'huma-0.1-event',
  content: {
    type: 'tool-result',
    toolCallId: 'tc_abc123',
    status: 'completed',
    success: true,
    result: 'Victoria gave you 2 seven(s)! Your turn continues.'
  }
});
```

**Error Example:**
```typescript
socket.emit('message', {
  type: 'huma-0.1-event',
  content: {
    type: 'tool-result',
    toolCallId: 'tc_abc123',
    status: 'completed',
    success: false,
    error: "It's not your turn. It's Alice's turn."
  }
});
```

### Event Naming Conventions

Use kebab-case with verb-noun pattern:
- `turn-started`
- `cards-received`
- `player-joined`
- `message-sent`
- `game-ended`

### Description Best Practices

Descriptions help the agent understand what happened and what to do next:

**Bad:** "Cards transferred"
**Good:** "Alice gave 2 sevens to Bob. Bob now has 4 sevens and completes a set! It's now Finn's turn."

Include:
1. What happened
2. Who was involved
3. The outcome
4. What should happen next (if relevant)

### Event Sequencing

When a tool changes game state, send context update events BEFORE the tool result:

```typescript
// 1. Execute the game action
const result = game.askForCards(agentId, targetId, rank);

// 2. Send context update FIRST (with new state)
socket.emit('message', {
  type: 'huma-0.1-event',
  content: {
    name: 'cards-received',
    context: buildAgentContext(game, agentId),  // Fresh state
    description: 'Victoria gave 2 seven(s) to Finn!'
  }
});

// 3. THEN send tool result
socket.emit('message', {
  type: 'huma-0.1-event',
  content: {
    type: 'tool-result',
    toolCallId: 'tc_abc123',
    status: 'completed',
    success: true,
    result: 'Victoria gave you 2 seven(s)! Your turn continues.'
  }
});
```

**Why Events Before Results?**
The agent processes messages in order. If you send the result first, the agent might decide its next action based on stale state.

### Server → Client Events

| Event Type | Description | Fields |
|------------|-------------|--------|
| status | Agent processing state | status: 'idle' \| 'thinking' |
| tool-call | Request to execute a tool | toolCallId, toolName, arguments |
| cancel-tool-call | Cancel a pending tool call | toolCallId, reason? |
| error | Error occurred | message, code? |

### Common Pitfalls

- **Forgetting context**: Every event should include full state
- **Wrong event order**: Send context updates before tool results
- **Vague descriptions**: "Something happened" doesn't help
- **Missing toolCallId**: Tool results must include the ID

## Phase 5: Design Tools

Define what actions your agent can take and handle tool execution.

### What Are Tools?

Tools are actions that HUMA agents can perform in your application. You **define** available tools when creating the agent, and **execute** them when HUMA sends tool-call events.

### Tool Flow

1. **Define Tools** - Create agent with tool definitions
2. **Receive Tool Calls** - HUMA sends tool-call event via WebSocket
3. **Execute & Respond** - Run the action, send tool-result back

### Tool Definition Schema

```typescript
interface ToolDefinition {
  name: string;           // Unique identifier (e.g., "ask_for_cards")
  description: string;    // What the tool does (helps AI decide when to use)
  parameters: ToolParameter[];
}

interface ToolParameter {
  name: string;           // Parameter name (e.g., "targetPlayer")
  type: 'string' | 'number' | 'boolean' | 'object' | 'array';
  description: string;    // What this parameter is for
  required?: boolean;     // Is this parameter required? (default: false)
}
```

### Example Tool Definitions

**ask_for_cards (Main Game Action):**
```javascript
const ASK_FOR_CARDS_TOOL = {
  name: 'ask_for_cards',
  description:
    'Ask another player for all their cards of a specific rank. ' +
    'You must already have at least one card of that rank in your hand. ' +
    'Only use this when it is your turn.',
  parameters: [
    {
      name: 'targetPlayer',
      type: 'string',
      description: 'The name of the player to ask',
      required: true,
    },
    {
      name: 'rank',
      type: 'string',
      description: 'The card rank to ask for (e.g., "7", "K", "A")',
      required: true,
    },
  ],
};
```

**send_message (Chat Tool):**
```javascript
const SEND_MESSAGE_TOOL = {
  name: 'send_message',
  description:
    'Send a chat message to all players. Use for reactions, comments, ' +
    'or friendly conversation during the game.',
  parameters: [
    {
      name: 'message',
      type: 'string',
      description: 'The message to send',
      required: true,
    },
  ],
};
```

### Receiving Tool Calls

When HUMA decides to use a tool, it sends a tool-call event:

```typescript
// HUMA → Your App
{
  type: 'tool-call',
  toolCallId: 'tc_abc123',      // Unique ID - save this!
  toolName: 'ask_for_cards',    // Which tool to execute
  arguments: {                   // Arguments passed by the agent
    targetPlayer: 'Victoria',
    rank: '7'
  }
}
```

**Important:** Save the toolCallId - you must include it in your result.

### Executing Tools

When you receive a tool-call, validate inputs, execute the action, and return a result:

```javascript
function handleToolCall(socket, game, agentPlayerId, toolCall) {
  const { toolCallId, toolName, arguments: args } = toolCall;

  switch (toolName) {
    case 'ask_for_cards':
      return executeAskForCards(socket, game, agentPlayerId, toolCallId, args);

    case 'send_message':
      return executeSendMessage(socket, game, agentPlayerId, toolCallId, args);

    default:
      // Unknown tool - return error
      socket.emit('message', {
        type: 'huma-0.1-event',
        content: {
          type: 'tool-result',
          toolCallId,
          status: 'completed',
          success: false,
          error: `Unknown tool: ${toolName}`
        }
      });
  }
}
```

### Validation Example

Always validate before executing:

```javascript
function executeAskForCards(socket, game, agentPlayerId, toolCallId, args) {
  const { targetPlayer, rank } = args;

  // 1. Validate it's the agent's turn
  const currentPlayer = game.getCurrentPlayer();
  if (currentPlayer.id !== agentPlayerId) {
    return sendError(socket, toolCallId,
      `It's not your turn. It's ${currentPlayer.name}'s turn.`);
  }

  // 2. Validate target player exists
  const target = game.getPlayerByName(targetPlayer);
  if (!target) {
    return sendError(socket, toolCallId,
      `Player "${targetPlayer}" not found.`);
  }

  // 3. Validate agent has the rank they're asking for
  const agentPlayer = game.getPlayer(agentPlayerId);
  const hasRank = agentPlayer.hand.some(card => card.rank === rank);
  if (!hasRank) {
    return sendError(socket, toolCallId,
      `You don't have any ${rank}s. You can only ask for ranks you have.`);
  }

  // All valid - execute the game action
  const result = game.askForCards(agentPlayerId, target.id, rank);
  // ... send events and result ...
}
```

### Sending Tool Results

After executing a tool, send back a result:

**Success Result:**
```javascript
socket.emit('message', {
  type: 'huma-0.1-event',
  content: {
    type: 'tool-result',
    toolCallId: 'tc_abc123',
    status: 'completed',
    success: true,
    result: 'Victoria gave you 2 seven(s)!'
  }
});
```

**Error Result:**
```javascript
socket.emit('message', {
  type: 'huma-0.1-event',
  content: {
    type: 'tool-result',
    toolCallId: 'tc_abc123',
    status: 'completed',
    success: false,
    error: "It's not your turn."
  }
});
```

### Result Helper Functions

```javascript
function createToolResultSuccess(toolCallId, result) {
  return {
    type: 'huma-0.1-event',
    content: {
      type: 'tool-result',
      toolCallId,
      status: 'completed',
      success: true,
      result,
    },
  };
}

function createToolResultError(toolCallId, error) {
  return {
    type: 'huma-0.1-event',
    content: {
      type: 'tool-result',
      toolCallId,
      status: 'completed',
      success: false,
      error,
    },
  };
}
```

### Tool Cancellation

HUMA may send cancel-tool-call to abort a pending tool:

```javascript
// HUMA → Your App
{
  type: 'cancel-tool-call',
  toolCallId: 'tc_abc123',
  reason: 'User interrupted'  // Optional
}
```

Handle cancellation:
```javascript
function handleCancelToolCall(socket, toolCallId, reason) {
  socket.emit('message', {
    type: 'huma-0.1-event',
    content: {
      type: 'tool-result',
      toolCallId,
      status: 'canceled',
      success: false,
      error: reason || 'Canceled by agent'
    }
  });
}
```

### Tool Design Best Practices

1. **Use Clear, Action-Oriented Names**
   - Good: `ask_for_cards`, `send_message`, `place_bet`
   - Avoid: `action`, `do_thing`, `execute`

2. **Write Helpful Descriptions**
   - Bad: "Ask for cards"
   - Good: "Ask another player for all their cards of a specific rank. You must already have at least one card of that rank. Only use when it's your turn."

3. **Validate All Inputs**
   - Required parameters exist
   - Values are within valid ranges
   - Action is allowed in current state
   - Player has permission

4. **Return Useful Error Messages**
   - Bad: "Invalid input"
   - Good: "You don't have any 7s. You can only ask for ranks you have: K, 3, A"

5. **Match Tool to Instructions**
   - Tool def: "Only use when it's your turn"
   - Instructions: "When it's your turn, use ask_for_cards..."

### Common Pitfalls

- **Forgetting the toolCallId**: Always include it in results
- **No input validation**: Never trust inputs blindly
- **Vague tool descriptions**: Be specific about what, when, and constraints
- **Wrong event order**: Send context updates before tool results
- **Not handling unknown tools**: Always have a default error case

## Phase 6: Pick a Router

Choose how HUMA selects behavioral strategies for your agent.

### What is a Router?

Before the agent generates a response, HUMA's **router** decides HOW the agent should behave. Should it be curious? Playful? Reserved? The router scores different behavioral strategies and the agent acts accordingly.

**Flow:** Event Received → Router (Scores strategies) → Action Agent (Generates response)

### Available Router Types

#### llm-judge (Default)
**Best for:** General purpose applications

Uses an LLM to score 13 behavioral strategies based on personality, context, and the current event:
- CURIOUS, ASSERTIVE, SUPPORTIVE, PLAYFUL, CAUTIOUS
- CHALLENGING, EMPATHETIC, STRATEGIC, DIRECT, REFLECTIVE
- ENTHUSIASTIC, RESERVED, QUIET

**Characteristics:**
- Context-aware
- Personality-driven
- ~100-200ms latency

#### random
**Best for:** Testing, low latency requirements

Randomly selects strategies without LLM call. Fast and unpredictable.

**Characteristics:**
- Near-instant
- No LLM call
- Unpredictable behavior

#### conversational
**Best for:** Chat applications, natural dialogue

Designed for conversational flows. Includes "timeliness" scoring - how urgent is it for the agent to respond?

**Characteristics:**
- Timeliness-aware
- Natural pacing
- Good for back-and-forth dialogue

#### turn-taking
**Best for:** Multi-agent voice conversations

Optimized for scenarios where multiple agents might speak. Uses turn-taking strategies to prevent overlapping speech.

**Characteristics:**
- Multi-agent friendly
- Voice-optimized
- Prevents overlapping

#### voice-game
**Best for:** Voice gaming scenarios

LLM-based router specialized for voice gaming. Scores 4 game-specific strategies:
- GENERAL_TALK: Share stories, start topics
- REACT: Respond to game events
- TALK_TO_OTHERS: Social, conversational
- FOCUS_ON_GAME: Stay quiet, concentrate

**Characteristics:**
- Game-aware
- Voice-optimized
- No timeliness calculation

### Router Selection Guide

| Use Case | Router | Why |
|----------|--------|-----|
| Turn-based game (Go Fish) | llm-judge | Personality-driven decisions, strategic thinking |
| Chat application | conversational | Natural dialogue flow, timeliness awareness |
| Voice co-op gaming | voice-game | Game-specific strategies, voice-optimized |
| Multi-agent voice room | turn-taking | Prevents overlapping, natural turn flow |
| Testing / Development | random | Fast iteration, no LLM cost |
| Latency-sensitive app | random | Skip router LLM call entirely |

### Setting the Router

Specify `routerType` in your agent metadata:

```typescript
type RouterType = 'random' | 'llm-judge' | 'conversational' | 'turn-taking' | 'voice-game';

const metadata = {
  className: 'Finn',
  personality: FINN_PERSONALITY,
  instructions: FINN_INSTRUCTIONS,
  tools: GO_FISH_TOOLS,
  routerType: 'llm-judge',  // Default if omitted
};
```

### How Strategies Affect Behavior

The router's strategy scores directly shape how the agent responds:

**Finn (Friendly) - High PLAYFUL + ENTHUSIASTIC:**
- "Ooh, I'll ask Victoria for some 7s!"
- "Yes! Got 'em! Thanks Victoria!"
- "Aw man, go fish again..."

**Victoria (Strategic) - High STRATEGIC + RESERVED:**
- "Based on the previous asks, I'll try..."
- "Interesting. As expected."
- "Well played."

### When to Change Routers

**Same Agent, Different Contexts:**
Use the same personality but different routers. Finn might use `llm-judge` in a turn-based card game but `voice-game` in a real-time co-op shooter.

**Performance Optimization:**
If latency matters more than nuanced behavior, switch to `random`. Upgrade to `llm-judge` later.

**Multi-Agent Scenarios:**
When multiple agents might respond simultaneously, consider `turn-taking` or `voice-game` to manage coordination.

### Common Pitfalls

- **Using random for production**: Great for testing but produces inconsistent personality
- **Wrong router for voice**: `llm-judge` in real-time voice can feel sluggish
- **Over-engineering router choice**: Start with `llm-judge`. Only switch if you have specific issues

## Phase 7: Voice Mode

Add voice capabilities to your HUMA agents using Daily.co and ElevenLabs.

### What is Voice Mode?

Voice mode enables HUMA agents to join Daily.co audio rooms and communicate via speech:
- **Listen (STT)**: Hear user speech via Deepgram transcription
- **Speak (TTS)**: Respond vocally using ElevenLabs text-to-speech

### Voice Architecture

```
Your App ←→ Client WebSocket ←→ HUMA API
                                    ↓
                              Voice Service
                                    ↓
                    Daily.co | Deepgram | ElevenLabs
```

### VoiceConfig Schema

```typescript
interface VoiceConfig {
  /** Enable voice mode for this agent */
  enabled: boolean;

  /** ElevenLabs voice ID for TTS (optional) */
  voiceId?: string;
}

// In Huma01Metadata:
interface Huma01Metadata {
  className: string;
  personality: string;
  instructions: string;
  tools: ToolDefinition[];
  routerType?: RouterType;
  voice?: VoiceConfig;  // Add voice config here
}
```

### Enabling Voice Mode

**Step 1: Add voice config to metadata**
```javascript
const metadata = {
  className: 'Finn',
  personality: FINN_PERSONALITY,
  instructions: FINN_INSTRUCTIONS,
  tools: GO_FISH_TOOLS,
  routerType: 'voice-game',  // Use voice-game router for gaming scenarios

  // Enable voice mode
  voice: {
    enabled: true,
    voiceId: 'EXAVITQu4vr4xnSDxMaL',  // ElevenLabs voice ID (optional)
  },
};
```

**Step 2: Create agent with voice config**
```javascript
const response = await fetch('https://api.humalike.tech/api/agents', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'ak_your_key',
  },
  body: JSON.stringify({
    name: 'Finn Voice Agent',
    agentType: 'HUMA-0.1',
    metadata: metadata,  // Includes voice config
  }),
});

const { id: agentId } = await response.json();
```

**Step 3: Agent automatically gets the speak tool**

When voice is enabled and the agent joins a room, HUMA automatically adds a `speak` tool:
```javascript
{
  name: 'speak',
  description: 'Speak text aloud using text-to-speech. Only available when in a voice room.',
  parameters: [
    {
      name: 'text',
      type: 'string',
      description: 'The text to speak',
      required: true,
    }
  ],
}
```

### Choosing a Voice

The `voiceId` is an ElevenLabs voice identifier. You can:
- Use ElevenLabs pre-made voices (browse at elevenlabs.io)
- Clone custom voices in your ElevenLabs account
- Omit voiceId to use the default voice

**Example Voice IDs:**
| Voice ID | Name | Character |
|----------|------|-----------|
| EXAVITQu4vr4xnSDxMaL | Sarah | Warm, friendly female |
| 21m00Tcm4TlvDq8ikWAM | Rachel | Professional female |
| pNInz6obpgDQGcFmaJgB | Adam | Friendly male |
| VR6AewLTigWG4xSOukaG | Arnold | Deep, confident male |

**Tip:** Match voice to personality. Friendly Finn should have a warm, casual voice. Strategic Victoria might suit a more measured, professional voice.

### Router Selection for Voice

| Use Case | Router | Why |
|----------|--------|-----|
| Voice gaming (Go Fish) | voice-game | Game-specific strategies |
| Multi-agent voice room | turn-taking | Prevents overlapping speech |
| Voice chat / assistant | conversational | Natural dialogue with timeliness |

### Voice Mode vs Text Mode

| Aspect | Text Mode | Voice Mode |
|--------|-----------|------------|
| Input | Events from your app | Events + voice transcripts |
| Output | Tool calls | Tool calls + speak tool |
| Recommended router | llm-judge | voice-game, turn-taking |
| Latency sensitivity | Low | High (real-time) |

### Voice-Specific Instructions

Voice mode works best with shorter, more conversational responses:

```javascript
const instructions = FINN_INSTRUCTIONS + `

## Voice Mode Instructions
When in voice mode:
- Use the speak tool to talk to other players
- Keep responses natural and conversational
- React vocally to game events ("Nice!" "Go fish!")
- Don't speak too much - short phrases work best
`;
```

### Common Pitfalls

- **Not enabling voice at creation time**: Voice must be enabled in metadata when creating the agent
- **Using wrong router**: `llm-judge` adds latency that can feel sluggish in voice
- **Long responses**: Text instructions might produce long responses that sound unnatural when spoken
- **Invalid voiceId**: If you provide an invalid ElevenLabs voiceId, TTS will fail silently

## Phase 8: Voice Lifecycle

Manage Daily.co rooms, handle voice events, and control the agent's presence in voice calls.

### Lifecycle Overview

A voice-enabled agent goes through several states:

```
Agent Created (voice.enabled: true)
         ↓
    Connected (Client WebSocket active)
         ↓
  [join-daily-room] →
         ↓
   In Voice Room (Can speak & hear)
         ↓
  [leave-daily-room] →
         ↓
     Left Room (Can rejoin)
```

### Joining a Daily.co Room

Send a `join-daily-room` control event:

```javascript
socket.emit('message', {
  type: 'join-daily-room',
  roomUrl: 'https://your-domain.daily.co/room-name',
});
```

**Join Flow:**
1. Your app sends join-daily-room with Daily.co room URL
2. HUMA validates voice is enabled (returns error if not)
3. HUMA spawns Voice Service agent
4. Voice Service connects to Daily.co (agent appears in room)
5. HUMA emits voice-status: joined

**Handling Join Response:**
```javascript
socket.on('event', (event) => {
  if (event.type === 'voice-status') {
    switch (event.status) {
      case 'joined':
        console.log('Agent joined room:', event.roomUrl);
        break;
      case 'left':
        console.log('Agent left room');
        break;
      case 'error':
        console.error('Voice error:', event.error);
        break;
    }
  }
});
```

### Leaving a Room

Send a `leave-daily-room` control event:

```javascript
socket.emit('message', {
  type: 'leave-daily-room',
});
```

**Note:** If your client WebSocket disconnects, HUMA automatically removes the agent from any voice room.

### Voice Events

While in a voice room, HUMA processes several event types:

#### Transcript Events (Voice → Agent)
When someone speaks, HUMA receives transcripts:

```javascript
// Transcript event forwarded to your app
{
  type: 'transcript',
  text: 'Hey Finn, do you have any sevens?',
  isFinal: true,
  speaker: 'Alice'
}

// Internally converted to HUMA-0.1 event:
{
  name: 'voice-transcript',
  description: 'Alice said: "Hey Finn, do you have any sevens?"',
  context: {
    source: 'voice',
    participant: 'Alice',
    text: 'Hey Finn, do you have any sevens?'
  }
}
```

#### Speak Events (Agent → Voice)
When the agent decides to speak:

```javascript
{ type: 'speak-status', status: 'started', commandId: 'sp_abc123' }
// TTS audio is now playing

{ type: 'speak-status', status: 'finished', commandId: 'sp_abc123' }
// Speech completed normally

// OR
{ type: 'speak-status', status: 'interrupted', commandId: 'sp_abc123' }
// User interrupted the agent

{ type: 'speak-status', status: 'failed', commandId: 'sp_abc123', error: '...' }
// TTS failed
```

#### Participant Events (Room Events)
When participants join or leave:

```javascript
// Participant joined
{
  name: 'voice-participant-joined',
  description: 'Alice joined the voice room',
  context: { source: 'voice', participant: 'Alice' }
}

// Participant left
{
  name: 'voice-participant-left',
  description: 'Alice left the voice room',
  context: { source: 'voice', participant: 'Alice', reason: 'left call' }
}
```

#### Interruption Events (Barge-in)
When a user interrupts the agent while speaking:

```javascript
{
  name: 'voice-interrupted',
  description: 'Alice interrupted you while you were saying: "Let me think about..."',
  context: {
    source: 'voice',
    interruptedBy: 'Alice',
    spokenText: 'Let me think about...'
  }
}
```

### Complete Voice Flow Example

```javascript
import { io } from 'socket.io-client';

// 1. Connect to HUMA (agent already has voice.enabled: true)
const socket = io('wss://api.humalike.tech', {
  query: { agentId: 'agent_123', apiKey: 'ak_your_key' },
  transports: ['websocket'],
});

// 2. Listen for all events
socket.on('event', (event) => {
  switch (event.type) {
    case 'voice-status':
      handleVoiceStatus(event);
      break;
    case 'transcript':
      handleTranscript(event);
      break;
    case 'speak-status':
      handleSpeakStatus(event);
      break;
    case 'tool-call':
      handleToolCall(event);
      break;
  }
});

// 3. Join voice room
function joinVoiceRoom(roomUrl) {
  socket.emit('message', {
    type: 'join-daily-room',
    roomUrl: roomUrl,
  });
}

// 4. Handle voice status
function handleVoiceStatus(event) {
  if (event.status === 'joined') {
    console.log('Agent is now in voice room');
    showVoiceUI();
  } else if (event.status === 'left') {
    hideVoiceUI();
  } else if (event.status === 'error') {
    showError(event.error);
  }
}

// 5. Leave voice room
function leaveVoiceRoom() {
  socket.emit('message', {
    type: 'leave-daily-room',
  });
}
```

### Agent Lifecycle in Voice Calls

**When Agent Joins:**
- Agent appears in Daily.co room participant list
- `speak` tool becomes available
- Agent starts receiving transcript events
- Participant join/leave events trigger agent reactions

**During Voice Call:**
- Agent processes transcripts through router → action agent
- Agent can call `speak` to respond vocally
- Agent can also call other tools (game actions, etc.)
- Interruptions cancel current speech and notify agent

**When Agent Leaves:**
- Agent disappears from Daily.co room
- `speak` tool no longer available
- Pending speak calls are canceled
- Client WebSocket remains connected (can rejoin)

**When Client Disconnects:**
- Agent automatically leaves voice room
- Voice Service agent is terminated
- All resources are cleaned up

### Error Handling

| Error | Cause | Response |
|-------|-------|----------|
| Voice not enabled | Trying to join without voice config | Enable voice in agent metadata |
| Already in room | Trying to join while in a room | Leave first, then join new room |
| Voice Service unavailable | Service cannot be reached | Retry or notify user |
| Unexpected disconnect | Voice Service disconnected | Attempt to rejoin |

### Common Pitfalls

- **Not waiting for voice-status: joined**: The join-daily-room event returns immediately. Wait for the status event.
- **Invalid Daily.co room URL**: Ensure the room exists and is accessible
- **Not handling speak errors**: Handle failed and interrupted speak events
- **Ignoring interim transcripts**: HUMA only processes final transcripts (`isFinal: true`)

## Implementation Summary

Once all phases are complete, you'll have:

1. **Personality** - WHO the agent is (character, speech patterns)
2. **Instructions** - WHAT the agent does (rules, constraints)
3. **State Design** - Context the agent receives
4. **Event System** - How state changes are communicated
5. **Tool Definitions** - Actions the agent can take
6. **Router Selection** - How agent decides when to act
7. **Voice Config** - Enabling voice and selecting voice ID
8. **Voice Lifecycle** - Daily.co room management

## Complete Code Template (With Voice)

```typescript
import { io } from 'socket.io-client';

const API = 'https://api.humalike.tech';
const API_KEY = 'ak_your_api_key';

// 1. Define Agent Metadata WITH VOICE ENABLED
const AGENT_METADATA = {
  className: 'Your Agent',
  personality: `[Your personality from Phase 1]`,
  instructions: `
[Your instructions from Phase 2]

## Voice Mode Instructions
When in voice mode:
- Use the speak tool to talk
- Keep responses short and natural
- React vocally to events
`,
  tools: [
    {
      name: 'your_tool',
      description: 'Description',
      parameters: [
        { name: 'param1', type: 'string', description: 'Description', required: true }
      ]
    }
  ],
  routerType: 'voice-game',

  // CRITICAL: Enable voice
  voice: {
    enabled: true,
    voiceId: 'EXAVITQu4vr4xnSDxMaL'  // ElevenLabs voice ID
  }
};

// 2. Create Agent Instance
async function createAgent() {
  const response = await fetch(`${API}/api/agents`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'X-API-Key': API_KEY },
    body: JSON.stringify({
      name: 'Voice Agent',
      agentType: 'HUMA-0.1',
      metadata: AGENT_METADATA
    })
  });
  return response.json();
}

// 3. Connect WebSocket with Voice Event Handling
function connectToAgent(agentId) {
  const socket = io(API, {
    query: { agentId, apiKey: API_KEY },
    transports: ['websocket']
  });

  socket.on('event', (data) => {
    switch (data.type) {
      case 'status':
        console.log('Agent status:', data.status);
        break;
      case 'tool-call':
        handleToolCall(socket, data);
        break;
      // Voice events
      case 'voice-status':
        if (data.status === 'joined') console.log('Agent joined voice room');
        if (data.status === 'left') console.log('Agent left voice room');
        if (data.status === 'error') console.error('Voice error:', data.error);
        break;
      case 'transcript':
        if (data.isFinal) {
          console.log(`${data.speaker} said: ${data.text}`);
        }
        break;
      case 'speak-status':
        console.log(`Agent ${data.status} speaking`);
        break;
    }
  });

  return socket;
}

// 4. Join Voice Room
function joinVoiceRoom(socket, roomUrl) {
  socket.emit('message', {
    type: 'join-daily-room',
    roomUrl
  });
}

// 5. Leave Voice Room
function leaveVoiceRoom(socket) {
  socket.emit('message', {
    type: 'leave-daily-room'
  });
}
```

---

# APPENDIX: Full API Documentation

## A. HUMA-0.1 Agent API Overview

HUMA-0.1 is an event-driven agent architecture that enables real-time AI interactions through WebSocket connections.

**Key Concepts:**
- **Client-Defined Agents**: Define personality, instructions, and tools in your client code
- **Custom Tools**: Define any tools you need - executed by your client, results sent back
- **Flexible Context**: Send any JSON context - chat history, game state, user data, etc.
- **Event-Driven**: Real-time bidirectional communication via WebSocket

## B. Authentication

All API endpoints require API key authentication.

**Creating API Keys:**
```
POST /api/users/{userId}/api-keys
{ "name": "Production Key" }

Response:
{
  "id": "clx456...",
  "key": "ak_Xk9mP2qR4tV6wY8zA1bC3dE5fG7hJ9kL",  // Save this!
  "keyPreview": "...9kL",
  "name": "Production Key"
}
```

**Using API Keys:**
```typescript
// HTTP Requests
fetch('/api/agents', {
  headers: {
    'Content-Type': 'application/json',
    'X-API-Key': 'ak_your_api_key'
  }
});

// WebSocket Connection
const socket = io('wss://api.humalike.tech', {
  query: {
    agentId: 'your-agent-id',
    apiKey: 'ak_your_api_key'
  }
});
```

## C. Agent Lifecycle

Every agent goes through distinct phases:

**CREATED → CONNECTED → ACTIVE → ENDED**

- **Created**: Agent exists in database, not active
- **Connected**: WebSocket connection established
- **Active**: Processing events (IDLE → THINKING → ACTING → IDLE)
- **Ended**: Manual deletion, TTL expiration, or system cleanup

## D. REST API Reference

| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | /api/agents | List all agents |
| POST | /api/agents | Create new agent |
| GET | /api/agents/{id} | Get agent details |
| PUT | /api/agents/{id}/state | Update agent state |
| DELETE | /api/agents/{id} | Delete agent |

## E. WebSocket Events

### Client → Server

All events sent via `message` channel with type `huma-0.1-event`:

```typescript
// Context Update
socket.emit('message', {
  type: 'huma-0.1-event',
  content: {
    name: 'event-name',
    context: { /* full state */ },
    description: 'What happened'
  }
});

// Tool Result
socket.emit('message', {
  type: 'huma-0.1-event',
  content: {
    type: 'tool-result',
    toolCallId: 'tc_xxx',
    status: 'completed',
    success: true,
    result: 'Result data'
  }
});
```

### Server → Client

| Event | Description | Fields |
|-------|-------------|--------|
| status | Agent processing state | status: 'idle' \| 'thinking' |
| tool-call | Execute a tool | toolCallId, toolName, arguments |
| cancel-tool-call | Cancel pending tool | toolCallId, reason? |
| error | Error occurred | message, code? |

---

## F. Voice API Reference

### VoiceConfig Schema

```typescript
interface VoiceConfig {
  enabled: boolean;      // Must be true for voice
  voiceId?: string;      // ElevenLabs voice ID (optional)
}
```

### Voice Control Events (Client → HUMA)

**Join Room:**
```typescript
socket.emit('message', {
  type: 'join-daily-room',
  roomUrl: 'https://your-domain.daily.co/room-name'
});
```

**Leave Room:**
```typescript
socket.emit('message', {
  type: 'leave-daily-room'
});
```

### Voice Status Events (HUMA → Client)

| Event | Description | Fields |
|-------|-------------|--------|
| voice-status | Agent joined/left/error | status, roomUrl, error |
| transcript | Speech-to-text | speaker, text, isFinal |
| speak-status | Agent speaking state | status: started/finished/interrupted/failed |
| participant-joined | Someone joined | participantId |
| participant-left | Someone left | participantId |

### Best Practices

- Always wait for `voice-status: joined` before assuming agent is in call
- Use `isFinal: true` on transcripts for complete speech segments
- Handle `speak-status: interrupted` for natural conversation flow
- Send `leave-daily-room` before disconnecting (auto-cleanup on disconnect)