Web Client

Build voice experiences and chatbots for the web. This frontend client brings your Jovo app to websites and web apps.

Introduction

Jovo Clients help with two tasks:

Record user input (speech, text, buttons) and send it as a request to the Jovo app (where the Web Platform handles the conversational logic).
Handle the response from the Jovo app and play/show output to the user.

The Jovo Web Client can be used on websites and web apps. This is the vanilla JavaScript version for custom websites or frameworks and libraries like React. You can also find versions for Vue2 and Vue3.

Installation

Install the client package:

$ npm install @jovotech/client-web

If you want to use the client in a plain HTML/JS project (find an example HTML file here), you can set it up like this:

<script>
  const client = new window.JovoWebClient.Client('http://localhost:3000/webhook', {
    // Configuration
  });

  // ...
</script>

If you are using a library like React, you can initialize it like this:

const client = new Client('http://localhost:3000/webhook', {
  // Configuration
});

The constructor accepts two parameters:

endpointUrl: For local development of your Jovo app with Express, you can use http://localhost:3000/webhook. Learn more in the deployment section.
Configuration options

Configuration

This is the default configuration for the Jovo Web Client:

{
	version: '4.0',
	locale: 'en',
	platform: 'web',
	device: {
		id: '<uuid>',
		capabilities: [
			'AUDIO', 'SCREEN'
		],
	},
	input: {
		audioRecorder: { /* ... */ },
    speechRecognizer: { /* ... */ },
	},
	output: {
		speechSynthesizer: { /* ... */ },
		audioPlayer: { /* ... */ },
		reprompts: { /* ... */ },
	},
	store: {
		storageKey: 'JOVO_WEB_CLIENT_DATA',
    shouldPersistSession: true,
    sessionExpirationInSeconds: 1800,
	},
}

version: The version of the Jovo Web Platform request and response schemas.
locale: This locale is added to the request to the Jovo app. Default: en.
platform: The platform name that is added to the request to the Jovo app. Default: web.
device: Information about the device, including capabilities. Learn more in the Jovo Device docs.
input: Learn more about the audioRecorder and speechRecognizer in the user input section.
output: Learn more about the audioPlayer and speechSynthesizer in the handle Jovo response section.
store: Defines how session data is stored in the browser's local storage.

Record User Input

You can record user input using the following methods:

await client.startRecording();

client.stopRecording(); // Successfully finish the recording
client.abortRecording(); // Cancel the recording

You can also pass an input modality. The default is AUDIO:

import { RecordingModalityType } from '@jovotech/client-web';
// ...

await client.startRecording({ type: RecordingModalityType.Audio }); // or 'AUDIO'

Depending on the configuration and browser support, the recording either uses the AudioRecorder or WebSpeech API SpeechRecognizer. Make sure that the client audio recorder is already initialized.

You can check if the client is currently recording input by using the following helper:

client.isRecordingInput;

Initialize

Some browsers and devices (for example iOS) need a user touch event before they can play or recording audio.

For this, the initialize() method can be used, which should be called in a click handler, for example:

initializeButton.addEventListener('click', async () => {
  await client.initialize();
});

This can be done as part of a launch button or a push to talk button.

You can check if the client is already initialized by using the following helper:

client.isInitialized;

AudioRecorder

The Jovo Web Client implements an AudioRecorder that records speech in an audio file and sends it to your Jovo app as SPEECH input type.

The default configuration for the AudioRecorder (which you can access with client.audioRecorder) is:

audioRecorder: {
  enabled: true,
  sampleRate: 16000,
  startDetection: { //
    enabled: true,
    timeoutInMs: 3000,
    threshold: 0.2,
  },
  silenceDetection: {
    enabled: true,
    timeoutInMs: 1500,
    threshold: 0.2,
  },

  // https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamConstraints/audio
  audioConstraints: { // ?
    echoCancellation: true,
    noiseSuppression: true,
  },

  // https://developer.mozilla.org/en-US/docs/Web/API/AudioContext
  analyser: {
    bufferSize: 2048,
    maxDecibels: -10,
    minDecibels: -90,
    smoothingTimeConstant: 0.85,
  },
},

sampleRate: The audio sample rate of the recording.
startDetection: The start detection determines when in the recording process the user starts speaking.
silenceDetection: The start detection determines when in the recording process the user stops speaking.
audioConstraints: Learn more in the official documentation by Mozilla.
analyser: Learn more in the official documentation by Mozilla.

You can also use the following helpers to detect browser support and check if AudioRecorder is currently recording:

client.audioRecorder.isInitialized;
client.audioRecorder.isRecording;
client.audioRecorder.startDetectionEnabled;
client.audioRecorder.silenceDetectionEnabled;

The AudioRecorder also emits events based on the recording status. The table below shows all events of the type AudioRecorderEvent:

Enum key	Enum value	Description
`Start`	`'start'`	Recording has started.
`Processing`	`'processing'`	Recording is in progress.
`StartDetected`	`'start-detected'`	Speech was detected in the recording. Related to the `startDetection` configuration.
`SilenceDetected`	`'silence-detected'`	Silence was detected in the recording. Related to the `silenceDetection` configuration.
`Timeout`	`'timeout'`	Silence exceeded the `silenceDetection.timeoutInMs` configuration.
`Abort`	`'abort'`	Recording was cancelled.
`Stop`	`'stop'`	Recording was stopped.

WebSpeech API SpeechRecognizer

The WebSpeech API offers a speech recognition service that makes it easier to turn speech audio into transcribed text right in the browser.

This way, you can record speech input and send it to your Jovo app as TRANSCRIBED_SPEECH input type.

The default configuration for the SpeechRecognizer (which you can access with client.speechRecognizer) is:

speechRecognizer: {
  enabled: true,
  startDetection: { //
    enabled: true,
    timeoutInMs: 3000,
    threshold: 0.2,
  },
  silenceDetection: {
    enabled: true,
    timeoutInMs: 1500,
    threshold: 0.2,
  },

  // See https://developer.mozilla.org/en-US/docs/Web/API/SpeechRecognition
  lang: 'en',
  continuous: true,
  interimResults: true,
  maxAlternatives: 1,
  grammars: window.SpeechGrammarList ? new window.SpeechGrammarList() : null,
}

startDetection: The start detection determines when in the recording process the user starts speaking.
silenceDetection: The start detection determines when in the recording process the user stops speaking.
All other configurations are explained in the official documentation by Mozilla.

You can also use the following helpers to detect browser support and check if SpeechRecognizer is currently recording speech:

client.speechRecognizer.isAvailable;
client.speechRecognizer.isRecording;
client.speechRecognizer.startDetectionEnabled;
client.speechRecognizer.silenceDetectionEnabled;

The SpeechRecognizer also emits events based on the recording status. The table below shows all events of the type SpeechRecognizerEvent:

Enum key	Enum value	Description
`Start`	`'start'`	Recording has started.
`StartDetected`	`'start-detected'`	Speech was detected in the recording. Related to the `startDetection` configuration.
`SpeechRecognized`	`'speech-recognized'`	Speech is currently transcribed.
`SilenceDetected`	`'silence-detected'`	Silence was detected in the recording. Related to the `silenceDetection` configuration.
`Timeout`	`'timeout'`	Silence exceeded the `silenceDetection.timeoutInMs` configuration.
`Abort`	`'abort'`	Recording was cancelled.
`Stop`	`'stop'`	Recording was stopped.
`End`	`'end'`	Speech recognition has finished.

Push to Talk

You can implement a push to talk experience by adding event listeners to a button, for example:

async onMouseDown(event: MouseEvent | TouchEvent) {
  if (!client.isInitialized) {
    await client.initialize();
  }
  if (client.isRecordingInput) {
    return;
  }
  if (event instanceof MouseEvent) {
    window.addEventListener('mouseup', this.onMouseUp);
  } else {
    window.addEventListener('touchend', this.onMouseUp);
  }
  await client.startRecording();
}

private onMouseUp(event: MouseEvent | TouchEvent) {
  window.removeEventListener('mouseup', this.onMouseUp);
  client.stopRecording();
}

Send a Request to Jovo

After successful user input, the Jovo Web Client sends a request to the Jovo app, where the Web Platform handles the conversational logic and then returns a response.

The request is based on different Jovo Input types, depending on the recording type:

TEXT input for text (chat) messages.
SPEECH input for audio recordings with the AudioRecorder.
TRANSCRIBED_SPEECH input for text based on audio recordings with the SpeechRecognizer.

While the client already does the job for you for AudioRecorder and SpeechRecognizer input, you can also manually send a request based on Jovo Input to the Jovo app using the send() method:

import { InputType } from '@jovotech/client-web';
// ...

const response = await client.send({
  type: InputType.Text, // or 'TEXT'
  text: 'Hello World',
});

If you want to make modifications before sending a request, you can also use the createRequest() method:

import { InputType } from '@jovotech/client-web';
// ...

const request = client.createRequest({
  type: InputType.Text, // or 'TEXT'
  text: 'Hello World',
});

// ...

const response = await client.send(request);

Handle the Response from Jovo

After sending a request to the Jovo app, the client waits for the app to go through the RIDR Lifecycle and return a Web Platform response.

This response contains an output property, which includes output templates that are used by the client to show and play a response to the user. For example, an output template could look like this:

{
  message: 'Do you like pizza?',
  quickReplies: ['yes', 'no'],
}

The response can be text based (e.g. chat bubbles) as well as audio or speech output. For this, the client offers helpful features to make playing audio output easier.

AudioPlayer

The AudioPlayer is responsible for playing audio files. Similar to the AudioRecorder, it needs to be initialized.

The default configuration for the AudioPlayer (which you can access with client.audioPlayer) is:

audioPlayer: {
  enabled: true
},

The player has the following features:

client.audioPlayer.play(audioSource: string, contentType = 'audio/mpeg');
client.audioPlayer.resume();
client.audioPlayer.pause();
client.audioPlayer.stop();

The AudioPlayer also emits events based on the its status. The table below shows all events of the type AudioPlayerEvent:

Enum key	Enum value
`Play`	`'play'`
`Pause`	`'pause'`
`Resume`	`'resume'`
`Stop`	`'stop'`
`End`	`'end'`
`Error`	`'error'`

You can also use the following helpers:

client.audioPlayer.isInitialized;
client.audioPlayer.isPlaying; // or client.isPlayingAudio
client.audioPlayer.canResume;
client.audioPlayer.canPause;
client.audioPlayer.canStop;
client.audioPlayer.volume;

WebSpeech API SpeechSynthesizer

The WebSpeech API offers a speech synthesis service that makes it easier to turn output messages and reprompts into spoken audio right in the browser.

The configuration for the SpeechSynthesizer (which you can access with client.speechSynthesizer) is:

speechSynthesizer: {
  enabled: true,
  language: 'en',
  voice: SpeechSynthesisVoice,
  rate: number,
  pitch: number
},

language: Can also be overridden using the locale property in the root of the client configuration.
voice: Learn more in the official documentation by Mozilla.
rate: Learn more in the official documentation by Mozilla.
pitch: Learn more in the official documentation by Mozilla.

The player has the following features:

client.speechSynthesizer.speak(utterance: SpeechSynthesisUtterance | string, forceVolume = true);
client.speechSynthesizer.resume();
client.speechSynthesizer.pause();
client.speechSynthesizer.stop();

The SpeechSynthesizer also emits events based on the its status. The table below shows all events of the type SpeechSynthesizerEvent:

Enum key	Enum value
`Play`	`'play'`
`Pause`	`'pause'`
`Resume`	`'resume'`
`Stop`	`'stop'`
`End`	`'end'`
`Error`	`'error'`

You can also use the following helpers:

client.speechSynthesizer.isAvailable;
client.speechSynthesizer.isSpeaking; // or client.isPlayingAudio
client.speechSynthesizer.canResume;
client.speechSynthesizer.canPause;
client.speechSynthesizer.canStop;
client.speechSynthesizer.volume;

The Web Client also implements an SSMLProcessor that processes standard SSML tags like audio and break.

Reprompts

The Web Client is able to play reprompts if the user doesn't respond to a prompt. This feature is currently only available for Speech Interfaces.

Reprompts are played by the RepromptProcessor, which can be configured like this:

reprompts: {
  enabled: true,
  maxAttempts: 1,
  resetSessionOnRepromptLimit: true
},

maxAttempts property defines how many reprompts should be played before closing the session.
resetSessionOnRepromptLimit property determines if the current session will be closed after the the maximum number of reprompts has been played. The default is true.

Deployment

If you want to deploy your web experience to production, you need to do the following:

Deploy the Jovo app: Learn more about server integrations here.
Update the endpointUrl with your app endpoint (for example, an AWS API Gateway URL).
Deploy the client.