Text to Speech / Speech to Text

NOTE This feature requires a VEGAS Pro 365, VEGAS Pro Edit or VEGAS Pro Post 365 subscription. To use this feature, you must first log into your VEGAS Hub account (see the VEGAS Hub section for more details).

Text to Speech

The Text-to-Speech feature in VEGAS Pro allows you to convert text in videos into speech and add it as an audio file. It provides an efficient way to transform large amounts of text into speech without the need to hire professional voice actors. With this feature, you can utilize AI technology to generate computer-generated voices in various languages and voice options. Additionally, it offers the convenience of translating your narrations into different languages using natural-sounding native voices. Moreover, the cloud-based functionality enables you to seamlessly access new voices and features without requiring a new software build.

Converting text to speech

  1. Choose Tools | Text to Speech.

  2. In the Text to Speech dialog box, enter the text you want to turn into audio into the text field.

  3. Adjust style

    Change Voice Click on the drop-down list Voices and select the desired voice.
    Adjust Speed Use the slider to decrease or increase the speech speed.
    Change Speech Style For selected voices, you can choose from different voice styles.

    Adjust Pitch

    Use the slider to decrease or increase the pitch.
  4. Click on GENERATE SPEECH. The text will be converted and played back.

    Now you can save the generated speech as an audio file and import it into your project.

Saving Audio File to Project

  • Click on Add to Project Media. The generated audio file will be saved as a .wav file in your project.

    You can access the folder through the Project Mediawindow.

Inserting Audio File into the Project

  • Click on Insert on Timeline. The audio file will be inserted as a new audio event on a new audio track labeled Synthesized Audio at the current cursor position in your timeline and automatically saved in the project.

Translating Text

  1. Enter text in the text field.

  2. Click on the (Translate Text) button.

  3. In the displayed dialog box, select the languages:

    • Text Language: Language of the entered text

    • Translate to: Target language

  4. Click on the Translate button. The text in the text field will be replaced with a translation in the specified language.

Load text from Titles & Text events into Text to Speech

You can load the text from any Titles & Text event on your timeline into the Text To Speech tool in order to generate and audio file for that text.

  1. Click the event that holds your Titles &Text generated media to select it.

  2. In the Text to Speech dialog box, click the Load text from existing event button. This loads the text from the Titles & Text event into the Text to Speech text input field.

  3. Preview the audio and make any changes you need.

Using SSML input mode

SSML (Speech Synthesis Markup Language) is a markup language designed specifically for controlling the output of text-to-speech (TTS) systems. It allows for detailed instructions to format and style the spoken language, such as emphasizing certain words, controlling pause length, or changing speech speed.

SSML provides a set of tags that can be embedded within the text to indicate how it should be pronounced or delivered. These tags provide control over various aspects of speech synthesis, including prosody, pronunciation, volume, and more.

For more information, see https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup

EXAMPLE
Copy

SSML Example

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="string">
  <voice name="en-US-ChristopherNeural" effect="eq_car" role="YoungAdultMale" >
     Welcome <break strength="medium" /> to text to speech.
  <p>
  <prosody rate="slow">This is a sentence that will be spoken slowly.</prosody>  <prosody rate="fast">This is a sentence that will be spoken quickly.</prosody>
  </p>
  <p>
  <break time="1s"/>A pause of 1 second is inserted here.<break time="1s"/>
  </p>
    </voice>
    <voice name="en-US-JennyMultilingualNeural" style="assistant">
        <lang xml:lang="en-US">
           Enjoy using the feature!
        </lang>
        <lang xml:lang="de-DE">
                    Viel Spaß beim Benutzen des Features!
        </lang>
    </voice>
</speak>

Speech to Text

The Speech to Text tool enables you to analyze the audio held in timeline clip events and automatically create subtitles to your project timeline or as text files. You can correct mistakes in the generated text and manipulate the appearance of the subtitles in your project.

Using the Speech to Text tool

  1. Add an audio or video file that contains someone speaking.

  2. Click the audio event to select it and choose Tools | Speech to Text.

    This opens the Speech to Text window as a floating window.

    TIP You can dock the window into either the Window Docking area or as part of a floating dock if you’d like. This enables you to keep the window open while you continue with other work if you want to.

  3. In the Speech to Text window, click the Language drop-down list and choose the correct language.

    The drop-down list contains all of the currently supported languages. The default language setting of Auto detect. For most files spoken in English and several other languages, you can leave this set to Auto detect and the tool does a nice job of accurately identifying the spoken language. If the tool has trouble identifying the language spoken in your file, click the Language drop-down list and choose the correct language.

  4. Once you’ve specified your Language setting, click the Analyze button.

    A progress bar indicates the progress and the Preview section shows the text as it’s being generated. When the process completes, the generated text appears in a list in the Preview area. Time code values before each line of text identify the exact times in your project timeline that generated subtitle events will span.

You have several controls that you can use to build your subtitles:

Maximum length in characters Adjust this slider to specify the maximum length of each subtitle. As you adjust this value, you see the subtitles in the list change accordingly
Preset The tool generates text events for your timeline with the Titles & Text media generator. Click this drop-down arrow to choose the Titles & Text preset you want to use for your timeline subtitle events. If you’ve created custom presets for your subtitles, those also appear as options in this list
Lines Use this control to specify whether you want one-or two-line subtitles. Notice that with this set to Single, only one subtitle line appears under each set of time code values. With this set to Double, two lines appear under each set of time code values

Correcting errors in subtitles

Sometimes the generated text does not accurately reflect what was said in the audio file. Many factors can affect the accuracy of your results including background noise in the file, clarity of the speech being analyzed, speaker accent, and so on.

Click any word in the subtitle list. Notice that your timeline cursor jumps to the time code location that corresponds to the word you clicked on. This makes it easy to quickly play your project so you can hear the actual spoken word. Notice that if you let the project continue to play, a blue highlight moves along through your generated text to show the word that’s currently playing. Stop playback.

Similarly, click anywhere in the waveform of the audio you’ve analyzed. Notice that a blue highlight appears on the generated text that corresponds to the audio you clicked on.

If you notice an error that you want to correct, double-click the word in the generated text list. This activates the word for editing. Double-click again to select the word and type in the correction.

Type the Enter key to finalize the change.

Creating subtitles for just a portion of the audio

Select the event that holds the audio.

Create a time selection that covers just the text you want to analyze. With both an event selection and a time selection, the tool analyzes just the audio that falls within both selections.

Click the Analyze button. When analysis completes, notice that text has been generated for only the audio that was in both the selected event and within the time selection.

TIP You can also select multiple events across your timeline and generate text for each of them simultaneously. Further, if you also create a time selection, the tool analyzes only audio that is both in a selected event and sits within the time selection.

Delivering your subtitles:

  • Export

    Use this option to create a file that holds your subtitles. You can create three different file types:

    • SubRip format files (SRT)

    • DVD Architect Subtitle Script (SUB)

    • Text files (TXT)

  • Generate

    Click this button to create Titles & Text events on a new track in your timeline

If you’ve generated a new subtitle track, the tool creates a new text event for each of the subtitles in your list. These events are standard Titles & Text events, and you can edit them however you need to. For instance, if the text doesn’t line up perfectly with the spoken audio, you can move the text event to line it up properly. Or you can trim either edge of the event to make it last longer or shorter. You can open the generator and make corrections or adjustments to the text. In short, any edit you would normally make to a text event on your timeline you can make here to perfect your subtitles.