Basic

This guide describes how to integrate your own LLM chatbot with a Millis AI voice agent. By connecting your custom LLM, you can power the voice agent with your chatbot’s capabilities, providing a seamless voice interaction experience based on your model’s responses.

Prerequisites

  • Create your Voice Agent on the Playground
  • Setup a websocket server on your end.

Set Up Your WebSocket Endpoint:

  • When an outbound or inbound call is initiated with your voice agent, the Millis AI server will establish a connection to your specified WebSocket URL.
  • Your endpoint should be capable of both receiving messages from and sending messages to the Millis AI server. Here’s a Sample Code.
  • Here’s how the interaction flows after connection established:

1. Initiate a call:

Millis AI server will send start_call event to tell your server when the conversation starts.

{
  "type": "start_call",
  "data": {
    "stream_id": <millis_stream_id>,
    "agent_id": <millis_agent_id>,
    "call_sid": <twilio_call_sid>, // optional, None if conversation is not via phone call
    "session_id": <millis_session_id>, // unique id of the call session
    "metadata": {your own metadata}, // The metadata that you add in msClient.start function
    "voip": {
      "from": <receiver's phone number>,
      "to": <caller's phone number>
    } // optional, None if conversation is not via phone call
  }
}

2. Listen to user’s message:

Millis AI streams the user’s spoken message, including the full conversation transcript, to your LLM.

{
  "type": "stream_request",
  "data": {
    "stream_id": <stream_id>,
    "transcript": [<chat_history>]
  }
}

3. Generate LLM Responses:

Your LLM processes the transcript and streams back the response. Indicate the end of a message stream with end_of_stream.

{
  "type": "stream_response",
  "data": {
    "stream_id": request['stream_id'],
    "content": "text",
    "flush": true/false, // optional
    "pause": <in ms>, // optional
    "end_of_stream": true/false
  }
}
  • flush: Set this to true to instruct the agent to immediately generate audio based on the current response. If false, the agent will buffer the response and generate audio only when it receives a complete sentence.
  • pause: Set this to a number of milliseconds to instruct the agent to pause for that long after saying the response before saying the next response.
  • When your LLM generates a response, attach the stream_id from the original request so that we can keep track of which response corresponds to which request.
  • For the first message that your server sends after receiving the start_call event, use the stream_id from the start_call event.

Handle advanced interaction:

Millis AI manages the conversation flow, including interruption detection and end-of-turn signals. You will be notified of these events:

partial_transcript

Description: Sent to provide a partial transcript of the conversation. The transcript can be either final or partial.

Message Structure:

{
  "type": "partial_transcript",
  "data": {
    "session_id": "string",
    "transcript": "string",
    "is_final": "boolean"
  }
}

Parameters:

  • session_id: The unique identifier for the session.
  • transcript: The partial or complete transcript text.
  • is_final: Boolean indicating whether the transcript is final.

playback_finished

Description: Sent when the playback of agent’s audio stream has finished.

Message Structure:

{
  "type": "playback_finished",
  "data": {
    "session_id": "string",
    "stream_id": "integer"
  }
}

Parameters:

  • session_id: The unique identifier for the session.
  • stream_id: The unique identifier for the stream.

interrupt

Description: Sent when user interrupts agent’s stream.

Message Structure:

{
  "type": "interrupt",
  "stream_id": "integer"
}

Parameters:

  • stream_id: The unique identifier for the stream.

Connect your Voice Agent to your Custom LLM:

In your voice agent’s configuration on the Millis AI platform, specify your WebSocket endpoint.