Making the Audio Flow in Your IVR Apps

It may seem like a minor detail but having audio that flows together well in your IVR applications can have a major impact on customer experiences.

Getting good, smooth sounding audio has more to do with being a good sound engineer than it does with designing IVR apps.

Why Flow Matters

The most common issues come when you combine pre-recorded audio and text-to-speech audio in the same call-flow. In part, this is a result of how we break up audio files for use in IVR apps.

For instance, you may want to ask callers questions to try to gauge their satisfaction with your products or services.

A prompt for something like this may be:

“How was your coffee today? If it was good, press 1. If not, press two.”

This prompt can be broken into two different files. The first, which we’ll call ‘the question file’ asks about coffee. The second, which we’ll call ‘the action file’ gives feedback instructions. You may want to use the same action file for multiple, different question files. Yet, all of those question files need to flow smoothly into the action file.

The process of dynamically piecing together audio in this way is called concatenation. Your IVR app arranges (or concatenates) the correct files at the right time based on your call-flow. The app then plays the audio files sequentially.

Getting Better Audio Files

Here are a few tips for preparing your pre-recorded audio files to ensure optimal audio quality.

Consistent Tone of Voice: This is primarily the responsibility of the voice talent. It’s a good idea to listen to samples from multiple voice talent options to ensure that you get one you like. It’s also worth considering what voice you plan to use for text-to-speech playback. If you use a female voice for TTS, you will probably want to find female voice talent to minimize jarring audio changes.
Consistent Recording Environment: Voice talent should use the same microphone, recording studio, etc. to capture the audio for your files. This is especially true for long-term projects that may require regular changes in the pre-recorded audio. If audio gets recorded with a different microphone the end result can be very different and jarring to callers. The audio may be less clear and have issues with normalization and pitch that don’t mesh with the audio you already have. Let’s say you’re in a rush to get a new audio file for you app. If your voice talent doesn’t have time to get to their usual setup and just recorded something quickly in a quiet room on their mobile phone it would sound very different by the time that file gets processed.
Process Audio the Same Way Every Time: You should establish a set process for processing audio files and adhere to it for every audio file that your IVR app uses. This should include a proper noise reduction pass as well as normalization.
Normalization Matters: Related to the previous tip, normalization is critical if you use text-to-speech anywhere in your call-flow. Make sure you match the normalization levels of your pre-recorded audio to the levels of your text-to-speech engine. Failure to do so may result in the TTS audio being much louder or softer than the pre-recorded audio. This can detract from the caller experience as well.
Text-to-Speech: If you end a sentence with a text-to-speech file make sure you include punctuation for the TTS engine. This affects the inflection of the TTS audio so questions sound like questions rather than statements.

If you have questions about recording audio files for your IVR application, gives us a call. We can help. We have decades of experience working with voice talent and IVR applications.