Audio

The audio-related scenarios are divided into three types: Voice Recognition, Voice Synthesis, and Voice Cloning.

  1. Voice Recognition

In the Lab, select the Voice Recognition scenario from the left sidebar. On the right sidebar, choose the language for the text you want to generate. The center area will display the Voice Recognition section. Select the desired voice model from the dropdown menu, then upload the audio file you want to recognize or record audio directly. Once the recording is complete, an audio file will be automatically generated. Finally, click the blue "Generate" button at the bottom, and the text corresponding to the audio file will be displayed in the text box.

  1. Voice Synthesis

In the Lab, select the Voice Synthesis scenario from the left sidebar. On the right sidebar, choose whether the generated audio should be male or female voice and select the language for the output. The center area will display the Voice Synthesis section. Select the desired voice model from the dropdown menu, then enter the text content that you want to synthesize into speech. Finally, click the blue "Generate" button at the bottom, and the entered text will be converted into an audio file.

  1. Voice Cloning

In the Lab, select the Voice Cloning scenario from the left sidebar. Voice Cloning is an advanced form of Speech Synthesis, as the output audio is a clone of the input audio rather than a standard voice. In the central interaction area, users need to upload an audio file to be mimicked or record a new audio file directly. Enter the text corresponding to the uploaded or recorded audio file on the left side to help the model learn the user's voice parameters more accurately. Then, input the text you want to clone into the voice. Finally, click the blue "Generate" button at the bottom, and the audio file with the user's voice reading the text will be displayed at the bottom.

Last updated