Custom LLM
Integrating a Custom LLM with Millis AI Voice Agent
Basic
This guide describes how to integrate your own LLM chatbot with a Millis AI voice agent. By connecting your custom LLM, you can power the voice agent with your chatbot’s capabilities, providing a seamless voice interaction experience based on your model’s responses.
Prerequisites
- Create your Voice Agent on the Playground
- Setup a websocket server on your end.
Set Up Your WebSocket Endpoint:
- When an outbound or inbound call is initiated with your voice agent, the Millis AI server will establish a connection to your specified WebSocket URL.
- Your endpoint should be capable of both receiving messages from and sending messages to the Millis AI server. Here’s a Sample Code.
- Here’s how the interaction flows after connection established:
1. Initiate a call:
Millis AI server will send start_call
event to tell your server when the conversation starts.
2. Listen to user’s message:
Millis AI streams the user’s spoken message, including the full conversation transcript, to your LLM.
3. Generate LLM Responses:
Your LLM processes the transcript and streams back the response. Indicate the end of a message stream with end_of_stream
.
flush
: Set this totrue
to instruct the agent to immediately generate audio based on the current response. Iffalse
, the agent will buffer the response and generate audio only when it receives a complete sentence.pause
: Set this to a number of milliseconds to instruct the agent to pause for that long after saying the response before saying the next response.
- When your LLM generates a response, attach the
stream_id
from the original request so that we can keep track of which response corresponds to which request. - For the first message that your server sends after receiving the
start_call
event, use thestream_id
from thestart_call
event.
Handle advanced interaction:
Millis AI manages the conversation flow, including interruption detection and end-of-turn signals. You will be notified of these events:
partial_transcript
Description: Sent to provide a partial transcript of the conversation. The transcript can be either final or partial.
Message Structure:
Parameters:
session_id
: The unique identifier for the session.transcript
: The partial or complete transcript text.is_final
: Boolean indicating whether the transcript is final.
playback_finished
Description: Sent when the playback of agent’s audio stream has finished.
Message Structure:
Parameters:
session_id
: The unique identifier for the session.stream_id
: The unique identifier for the stream.
interrupt
Description: Sent when user interrupts agent’s stream.
Message Structure:
Parameters:
stream_id
: The unique identifier for the stream.
Connect your Voice Agent to your Custom LLM:
In your voice agent’s configuration on the Millis AI platform, specify your WebSocket endpoint.