Web Client
Build voice experiences and chatbots for the web. This frontend client brings your Jovo app to websites and web apps.
Introduction
Jovo Clients help with two tasks:
- Record user input (speech, text, buttons) and send it as a request to the Jovo app (where the Web Platform handles the conversational logic).
- Handle the response from the Jovo app and play/show output to the user.
The Jovo Web Client can be used on websites and web apps. This is the vanilla JavaScript version for custom websites or frameworks and libraries like React. You can also find versions for Vue2 and Vue3.
Installation
Install the client package:
$ npm install @jovotech/client-web
If you want to use the client in a plain HTML/JS project (find an example HTML file here), you can set it up like this:
<script> const client = new window.JovoWebClient.Client('http://localhost:3000/webhook', { // Configuration }); // ... </script>
If you are using a library like React, you can initialize it like this:
const client = new Client('http://localhost:3000/webhook', { // Configuration });
The constructor accepts two parameters:
endpointUrl
: For local development of your Jovo app with Express, you can usehttp://localhost:3000/webhook
. Learn more in the deployment section.- Configuration options
Configuration
This is the default configuration for the Jovo Web Client:
{ version: '4.0', locale: 'en', platform: 'web', device: { id: '<uuid>', capabilities: [ 'AUDIO', 'SCREEN' ], }, input: { audioRecorder: { /* ... */ }, speechRecognizer: { /* ... */ }, }, output: { speechSynthesizer: { /* ... */ }, audioPlayer: { /* ... */ }, reprompts: { /* ... */ }, }, store: { storageKey: 'JOVO_WEB_CLIENT_DATA', shouldPersistSession: true, sessionExpirationInSeconds: 1800, }, }
version
: The version of the Jovo Web Platform request and response schemas.locale
: This locale is added to the request to the Jovo app. Default:en
.platform
: The platform name that is added to the request to the Jovo app. Default:web
.device
: Information about the device, includingcapabilities
. Learn more in the Jovo Device docs.input
: Learn more about theaudioRecorder
andspeechRecognizer
in the user input section.output
: Learn more about theaudioPlayer
andspeechSynthesizer
in the handle Jovo response section.store
: Defines how session data is stored in the browser's local storage.
Record User Input
You can record user input using the following methods:
await client.startRecording(); client.stopRecording(); // Successfully finish the recording client.abortRecording(); // Cancel the recording
You can also pass an input modality. The default is AUDIO
:
import { RecordingModalityType } from '@jovotech/client-web'; // ... await client.startRecording({ type: RecordingModalityType.Audio }); // or 'AUDIO'
Depending on the configuration and browser support, the recording either uses the AudioRecorder
or WebSpeech API SpeechRecognizer
. Make sure that the client audio recorder is already initialized.
You can check if the client is currently recording input by using the following helper:
client.isRecordingInput;
Initialize
Some browsers and devices (for example iOS) need a user touch event before they can play or recording audio.
For this, the initialize()
method can be used, which should be called in a click handler, for example:
initializeButton.addEventListener('click', async () => { await client.initialize(); });
This can be done as part of a launch button or a push to talk button.
You can check if the client is already initialized by using the following helper:
client.isInitialized;
AudioRecorder
The Jovo Web Client implements an AudioRecorder
that records speech in an audio file and sends it to your Jovo app as SPEECH
input type.
The default configuration for the AudioRecorder
(which you can access with client.audioRecorder
) is:
audioRecorder: { enabled: true, sampleRate: 16000, startDetection: { // enabled: true, timeoutInMs: 3000, threshold: 0.2, }, silenceDetection: { enabled: true, timeoutInMs: 1500, threshold: 0.2, }, // https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamConstraints/audio audioConstraints: { // ? echoCancellation: true, noiseSuppression: true, }, // https://developer.mozilla.org/en-US/docs/Web/API/AudioContext analyser: { bufferSize: 2048, maxDecibels: -10, minDecibels: -90, smoothingTimeConstant: 0.85, }, },
sampleRate
: The audio sample rate of the recording.startDetection
: The start detection determines when in the recording process the user starts speaking.silenceDetection
: The start detection determines when in the recording process the user stops speaking.audioConstraints
: Learn more in the official documentation by Mozilla.analyser
: Learn more in the official documentation by Mozilla.
You can also use the following helpers to detect browser support and check if AudioRecorder
is currently recording:
client.audioRecorder.isInitialized; client.audioRecorder.isRecording; client.audioRecorder.startDetectionEnabled; client.audioRecorder.silenceDetectionEnabled;
The AudioRecorder
also emits events based on the recording status. The table below shows all events of the type AudioRecorderEvent
:
Enum key | Enum value | Description |
---|---|---|
Start | 'start' | Recording has started. |
Processing | 'processing' | Recording is in progress. |
StartDetected | 'start-detected' | Speech was detected in the recording. Related to the startDetection configuration. |
SilenceDetected | 'silence-detected' | Silence was detected in the recording. Related to the silenceDetection configuration. |
Timeout | 'timeout' | Silence exceeded the silenceDetection.timeoutInMs configuration. |
Abort | 'abort' | Recording was cancelled. |
Stop | 'stop' | Recording was stopped. |
WebSpeech API SpeechRecognizer
The WebSpeech API offers a speech recognition service that makes it easier to turn speech audio into transcribed text right in the browser.
This way, you can record speech input and send it to your Jovo app as TRANSCRIBED_SPEECH
input type.
The default configuration for the SpeechRecognizer
(which you can access with client.speechRecognizer
) is:
speechRecognizer: { enabled: true, startDetection: { // enabled: true, timeoutInMs: 3000, threshold: 0.2, }, silenceDetection: { enabled: true, timeoutInMs: 1500, threshold: 0.2, }, // See https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition lang: 'en', continuous: true, interimResults: true, maxAlternatives: 1, grammars: window.SpeechGrammarList ? new window.SpeechGrammarList() : null, }
startDetection
: The start detection determines when in the recording process the user starts speaking.silenceDetection
: The start detection determines when in the recording process the user stops speaking.- All other configurations are explained in the official documentation by Mozilla.
You can also use the following helpers to detect browser support and check if SpeechRecognizer
is currently recording speech:
client.speechRecognizer.isAvailable; client.speechRecognizer.isRecording; client.speechRecognizer.startDetectionEnabled; client.speechRecognizer.silenceDetectionEnabled;
The SpeechRecognizer
also emits events based on the recording status. The table below shows all events of the type SpeechRecognizerEvent
:
Enum key | Enum value | Description |
---|---|---|
Start | 'start' | Recording has started. |
StartDetected | 'start-detected' | Speech was detected in the recording. Related to the startDetection configuration. |
SpeechRecognized | 'speech-recognized' | Speech is currently transcribed. |
SilenceDetected | 'silence-detected' | Silence was detected in the recording. Related to the silenceDetection configuration. |
Timeout | 'timeout' | Silence exceeded the silenceDetection.timeoutInMs configuration. |
Abort | 'abort' | Recording was cancelled. |
Stop | 'stop' | Recording was stopped. |
End | 'end' | Speech recognition has finished. |
Push to Talk
You can implement a push to talk experience by adding event listeners to a button, for example:
async onMouseDown(event: MouseEvent | TouchEvent) { if (!client.isInitialized) { await client.initialize(); } if (client.isRecordingInput) { return; } if (event instanceof MouseEvent) { window.addEventListener('mouseup', this.onMouseUp); } else { window.addEventListener('touchend', this.onMouseUp); } await client.startRecording(); } private onMouseUp(event: MouseEvent | TouchEvent) { window.removeEventListener('mouseup', this.onMouseUp); client.stopRecording(); }
Send a Request to Jovo
After successful user input, the Jovo Web Client sends a request to the Jovo app, where the Web Platform handles the conversational logic and then returns a response.
The request is based on different Jovo Input types, depending on the recording type:
TEXT
input for text (chat) messages.SPEECH
input for audio recordings with theAudioRecorder
.TRANSCRIBED_SPEECH
input for text based on audio recordings with theSpeechRecognizer
.
While the client already does the job for you for AudioRecorder
and SpeechRecognizer
input, you can also manually send a request based on Jovo Input to the Jovo app using the send()
method:
import { InputType } from '@jovotech/client-web'; // ... const response = await client.send({ type: InputType.Text, // or 'TEXT' text: 'Hello World', });
If you want to make modifications before sending a request, you can also use the createRequest()
method:
import { InputType } from '@jovotech/client-web'; // ... const request = client.createRequest({ type: InputType.Text, // or 'TEXT' text: 'Hello World', }); // ... const response = await client.send(request);
Handle the Response from Jovo
After sending a request to the Jovo app, the client waits for the app to go through the RIDR Lifecycle and return a Web Platform response.
This response contains an output
property, which includes output templates that are used by the client to show and play a response to the user. For example, an output template could look like this:
{ message: 'Do you like pizza?', quickReplies: ['yes', 'no'], }
The response can be text based (e.g. chat bubbles) as well as audio or speech output. For this, the client offers helpful features to make playing audio output easier.
AudioPlayer
The AudioPlayer
is responsible for playing audio files. Similar to the AudioRecorder
, it needs to be initialized.
The default configuration for the AudioPlayer
(which you can access with client.audioPlayer
) is:
audioPlayer: { enabled: true },
The player has the following features:
client.audioPlayer.play(audioSource: string, contentType = 'audio/mpeg'); client.audioPlayer.resume(); client.audioPlayer.pause(); client.audioPlayer.stop();
The AudioPlayer
also emits events based on the its status. The table below shows all events of the type AudioPlayerEvent
:
Enum key | Enum value |
---|---|
Play | 'play' |
Pause | 'pause' |
Resume | 'resume' |
Stop | 'stop' |
End | 'end' |
Error | 'error' |
You can also use the following helpers:
client.audioPlayer.isInitialized; client.audioPlayer.isPlaying; // or client.isPlayingAudio client.audioPlayer.canResume; client.audioPlayer.canPause; client.audioPlayer.canStop; client.audioPlayer.volume;
WebSpeech API SpeechSynthesizer
The WebSpeech API offers a speech synthesis service that makes it easier to turn output messages
and reprompts
into spoken audio right in the browser.
The configuration for the SpeechSynthesizer
(which you can access with client.speechSynthesizer
) is:
speechSynthesizer: { enabled: true, language: 'en', voice: SpeechSynthesisVoice, rate: number, pitch: number },
language
: Can also be overridden using thelocale
property in the root of the client configuration.voice
: Learn more in the official documentation by Mozilla.rate
: Learn more in the official documentation by Mozilla.pitch
: Learn more in the official documentation by Mozilla.
The player has the following features:
client.speechSynthesizer.speak(utterance: SpeechSynthesisUtterance | string, forceVolume = true); client.speechSynthesizer.resume(); client.speechSynthesizer.pause(); client.speechSynthesizer.stop();
The SpeechSynthesizer
also emits events based on the its status. The table below shows all events of the type SpeechSynthesizerEvent
:
Enum key | Enum value |
---|---|
Play | 'play' |
Pause | 'pause' |
Resume | 'resume' |
Stop | 'stop' |
End | 'end' |
Error | 'error' |
You can also use the following helpers:
client.speechSynthesizer.isAvailable; client.speechSynthesizer.isSpeaking; // or client.isPlayingAudio client.speechSynthesizer.canResume; client.speechSynthesizer.canPause; client.speechSynthesizer.canStop; client.speechSynthesizer.volume;
The Web Client also implements an SSMLProcessor
that processes standard SSML tags like audio
and break
.
Reprompts
The Web Client is able to play reprompts if the user doesn't respond to a prompt. This feature is currently only available for Speech Interfaces.
Reprompts are played by the RepromptProcessor
, which can be configured like this:
reprompts: { enabled: true, maxAttempts: 1, resetSessionOnRepromptLimit: true },
maxAttempts
property defines how many reprompts should be played before closing the session.resetSessionOnRepromptLimit
property determines if the current session will be closed after the the maximum number of reprompts has been played. The default istrue
.
Deployment
If you want to deploy your web experience to production, you need to do the following:
- Deploy the Jovo app: Learn more about server integrations here.
- Update the
endpointUrl
with your app endpoint (for example, an AWS API Gateway URL). - Deploy the client.