What you need to know
- Microsoft recently released an AI tool called VALL-E. This tool can create convincing reproductions of people’s voices.
- This tool uses just 3 seconds of recordings as prompts to generate content.
- VALL-E is distinguished from multiple AI models because it can reproduce the speaker’s emotions.
Microsoft recently released an artificial intelligence tool known as VALL-E that can replicate the human voice (via AITopics). The tool was trained on 60,000 hours of English audio data and uses his 3-second clips of specific audio to generate content. Unlike many AI tools, VALL-E can recreate the speaker’s emotion and tone, even when making a recording of words the original speaker never said.
A paper from Cornell University used VALL-E to synthesize multiple voices. Some examples of our work are available on GitHub.
The audio samples Microsoft shares are of varying quality. Some of them sound natural, while others are clearly machine-generated and healthy robots. will be more convincing. Furthermore, VALL-E as a prompt he only uses 3 seconds of recording. If this technique were used on a larger sample set, it could arguably produce more realistic samples.
At this time, VALL-E is not generally available, but it could be used because AI-generated duplicates of people’s voices could be used in dangerous ways by attackers or others with malicious intent. may be a good thing.
Windows Central view: Impressive but scary
VALL-E is undoubtedly impressive, but it raises some ethical concerns. As artificial intelligence becomes more powerful, the voices produced by VALL-E and similar technologies become more convincing. It opens the door to realistic spam calls that replicate.
Politicians and other public figures can also be impersonated. Given the rapid proliferation of social media and the polarizing political debate, few people ask if scandalous recordings are genuine.
Security concerns also come to mind. My bank uses my voice as a password when making phone calls. Measures have been taken to detect recorded audio, and I believe this technology will be able to sense if VALL-E’s audio is being used. That being said, I still feel uneasy. An escalating arms race between AI-generated content and AI-detection software is entirely possible.
While not a security issue, some have raised the fact that voice actors may lose their jobs due to VALL-E or competing technologies. There is no way around it. Once VALL-E can replace voice actors in audiobooks and other content, companies will use it. In fact, Apple recently announced the ability to read audiobooks using AI.
Like any technology, VALL-E is used for good, bad and everything in between. Microsoft has issued an ethics statement regarding the use of VALL-E, but the future of its use is still uncertain. Microsoft president Brad Smith has discussed regulation of AI in the past (via GeekWire). We’ll have to see what measures Microsoft takes to regulate his use of VALL-E.
Original: Microsoft’s VALL-E can imitate any voice with just a 3-second sample
Than: microsoft research