Artificial intelligence can replicate any voice, including the emotions and tone of a speaker with just 3 seconds of training

What you need to know

  • Microsoft recently released an AI tool called VALL-E. This tool can create convincing reproductions of people’s voices.
  • This tool uses just 3 seconds of recordings as prompts to generate content.
  • VALL-E is distinguished from multiple AI models because it can reproduce the speaker’s emotions.

Microsoft recently released an artificial intelligence tool known as VALL-E that can replicate the human voice (via AITopics). The tool was trained on 60,000 hours of English audio data and uses his 3-second clips of specific audio to generate content. Unlike many AI tools, VALL-E can recreate the speaker’s emotion and tone, even when making a recording of words the original speaker never said.

A paper from Cornell University used VALL-E to synthesize multiple voices. Some examples of our work are available on GitHub.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *