Custom LLM
Integrating a Custom LLM with Millis AI Voice Agent
Basic
This guide describes how to integrate your own LLM chatbot with a Millis AI voice agent. By connecting your custom LLM, you can power the voice agent with your chatbot’s capabilities, providing a seamless voice interaction experience based on your model’s responses.
Prerequisites
- Create your Voice Agent on the Playground
- Setup a websocket server on your end.
Set Up Your WebSocket Endpoint:
- When an outbound or inbound call is initiated with your voice agent, the Millis AI server will establish a connection to your specified WebSocket URL.
- Your endpoint should be capable of both receiving messages from and sending messages to the Millis AI server. Here’s a Sample Code.
- Here’s how the interaction flows after connection established:
1. Initiate a call:
Millis AI server will send start_call
event to tell your server when the conversation starts.
{
"type": "start_call",
"data": {
"stream_id": <millis_stream_id>,
"agent_id": <millis_agent_id>,
"call_sid": <twilio_call_sid>, // optional, None if conversation is not via phone call
"session_id": <millis_session_id>, // unique id of the call session
"metadata": {your own metadata}, // The metadata that you add in msClient.start function
"voip": {
"from": <receiver's phone number>,
"to": <caller's phone number>
} // optional, None if conversation is not via phone call
}
}
2. Listen to user’s message:
Millis AI streams the user’s spoken message, including the full conversation transcript, to your LLM.
{
"type": "stream_request",
"data": {
"stream_id": <stream_id>,
"transcript": [<chat_history>]
}
}
3. Generate LLM Responses:
Your LLM processes the transcript and streams back the response. Indicate the end of a message stream with end_of_stream
.
{
"type": "stream_response",
"data": {
"stream_id": request['stream_id'],
"content": "text",
"flush": true/false, // optional
"pause": <in ms>, // optional
"end_of_stream": true/false
}
}
flush
: Set this totrue
to instruct the agent to immediately generate audio based on the current response. Iffalse
, the agent will buffer the response and generate audio only when it receives a complete sentence.pause
: Set this to a number of milliseconds to instruct the agent to pause for that long after saying the response before saying the next response.
- When your LLM generates a response, attach the
stream_id
from the original request so that we can keep track of which response corresponds to which request. - For the first message that your server sends after receiving the
start_call
event, use thestream_id
from thestart_call
event.
Handle advanced interaction:
Millis AI manages the conversation flow, including interruption detection and end-of-turn signals. You will be notified of these events:
partial_transcript
Description: Sent to provide a partial transcript of the conversation. The transcript can be either final or partial.
Message Structure:
{
"type": "partial_transcript",
"data": {
"session_id": "string",
"transcript": "string",
"is_final": "boolean"
}
}
Parameters:
session_id
: The unique identifier for the session.transcript
: The partial or complete transcript text.is_final
: Boolean indicating whether the transcript is final.
playback_finished
Description: Sent when the playback of agent’s audio stream has finished.
Message Structure:
{
"type": "playback_finished",
"data": {
"session_id": "string",
"stream_id": "integer"
}
}
Parameters:
session_id
: The unique identifier for the session.stream_id
: The unique identifier for the stream.
interrupt
Description: Sent when user interrupts agent’s stream.
Message Structure:
{
"type": "interrupt",
"stream_id": "integer"
}
Parameters:
stream_id
: The unique identifier for the stream.
Connect your Voice Agent to your Custom LLM:
In your voice agent’s configuration on the Millis AI platform, specify your WebSocket endpoint.