Using Millis Platform via WebSocket to build voice agents on desktop and mobile
This tutorial guides you through the process of integrating the Millis AI platform directly via WebSocket to build voice agents for desktop or mobile apps. Users can capture audio natively and send it to Millis via WebSocket, receiving voice responses in real-time.
Begin by establishing a connection with the Millis AI WebSocket endpoint. Here’s an example code in javascript.
Once connected, send an initiate message to start the interaction.
Millis will respond with the message {"method": "onready"}
indicating readiness.
Capture audio on your device and send it as an ArrayBuffer to Millis. Make sure it’s an Uint8Array
.
Note: Audio packets should be in PCM format, 16000 Hz sample rate, and mono (1 channel).
Millis will send audio responses as ArrayBuffers with the same format and sample rate. You need to buffer and play these on your side.
ArrayBuffer data will be the audio packets, while string data indicates normal events that you need to process accordingly.
Send a {"method": "ping"}
message every 1000 packets to keep the connection alive.
Millis may send various events to manage the session and interaction. Here is the logic behind each message:
Example:
Simply close the WebSocket connection to stop the conversation.
Using Millis Platform via WebSocket to build voice agents on desktop and mobile
This tutorial guides you through the process of integrating the Millis AI platform directly via WebSocket to build voice agents for desktop or mobile apps. Users can capture audio natively and send it to Millis via WebSocket, receiving voice responses in real-time.
Begin by establishing a connection with the Millis AI WebSocket endpoint. Here’s an example code in javascript.
Once connected, send an initiate message to start the interaction.
Millis will respond with the message {"method": "onready"}
indicating readiness.
Capture audio on your device and send it as an ArrayBuffer to Millis. Make sure it’s an Uint8Array
.
Note: Audio packets should be in PCM format, 16000 Hz sample rate, and mono (1 channel).
Millis will send audio responses as ArrayBuffers with the same format and sample rate. You need to buffer and play these on your side.
ArrayBuffer data will be the audio packets, while string data indicates normal events that you need to process accordingly.
Send a {"method": "ping"}
message every 1000 packets to keep the connection alive.
Millis may send various events to manage the session and interaction. Here is the logic behind each message:
Example:
Simply close the WebSocket connection to stop the conversation.