The Combination

  • Posted on March 25, 2021 at 8:26 pm

A speaker can understand in the recording studio, how the individual prompts are combined be, because only he can reach a harmonious interplay of the audio data through nuance and bindings of emphases. In the preparation of a recording session, value should be placed on a careful recording document. Numerically generated prompt lists don't make it often, to give an impression how the dialogue should be run. On the other hand, a recording document that follows the structure of the dialogue, contributes a lot to the naturalness of a system: the continuity of the persona is retained. Also the receiving technician (or the editor of shots) should understand something of the Konkatenieren. Just when choosing among multiple speaker takes the closest harmonizing audio files should be sought out. Not always an easy task. Especially not if the combination of the audio files is ambiguous.

Also the reproduction of databases you can improve a lot. IVRs, which are dependent on a TTS engine, it is E.g. possible to realize recordings that the TTS voice spoke up with the same voice. The result would be a consistent voice for the data (TTS output) and the Dialogprompts (Studio speakers). Thus, the user perceives no major break in the tuning of the system. Another way to use TTS engines without serious loss of naturalness, is bringing in a clever overall concept. There are examples already on the market.

The user is E.g. just a wizard in the price comparison of 11864″redirected, then the TTS output takes over. Simple and well resolved. For concatenated databases, which can waive a TTS engine, there is also opportunities for improvement. Specifically displaying numeric digits, such as Telfonnummern, passwords, PINs etc. notice that these often robotic composed sound. This is partly because many systems to record only one, maximum two tones a digit. Much more natural, however, three intonations sound: initial, medial and final.

