Microsoft's Azure AI Speech needs just seconds of audio to spit out a convincing deepfake
Microsoft has upgraded Azure AI Speech so that users can rapidly generate a voice replica with just a few seconds of sampled speech.
The personal voice feature for AI Speech became generally available on May 21, 2024. It was impressive but required some training to get the best out of it. According to Microsoft, the feature has been upgraded to a new zero-shot text-to-speech model named "DragonV2.1Neural" with "more natural-sounding and expressive voices." It will also generate audio in any of the more than 100 supported languages.
Microsoft said the upgrade, compared to the previous model, "brings improvements to the naturalness of speech, offering more realistic and stable prosody while maintaining better pronunciation accuracy."
The system, which was already pretty good, is now even more worryingly accurate. "This capability unlocks a wide range of applications, from customizing chatbot voices to dubbing video content in an actor's original voice across multiple languages, enabling truly immersive and individualized audio experiences," Microsoft said.
It could also be a boon for people with goals that may be malicious or deceptive, and we can imagine audio deepfakes produced with the service becoming ever
https://www.theregister.com/2025/07/31/microsoft_updates_azure_ai_speech/]