A Chinese company, ByteDance, has developed generative AI technology. This technology allows users to transform their voice into another person’s instantly. The name of this newly launched tool is “StreamVoice.”
As of right now, StreamVoice is not an accessible tool for all. However, it demonstrates the rapid advancement of AI, which makes it possible to efficiently and convincingly mimic the voice and appearance of well-known people, a practice known as “deepfakes.” As the 2024 election draws near, people have already employed AI to mimic President Joe Biden and music sensation Taylor Swift this year.
Several technological researchers from Northwestern Polytechnical University, a Chinese institution well-known for its work with the nation’s armed forces, and ByteDance, the organization in charge of the enormously popular social media platform TikTok, developed StreamVoice. ‘Northwestern University in the US is not connected to Northwestern Polytechnical University.’
In a recent publication, the researchers reported that StreamVoice can do “real-time conversion” between a user’s and any other voice. Provided the user has “one word” of speech from the desired voice. The paper observed that AI voice-conversion technology has primarily only lived practical “offline” and that the output happens at the pace of live streaming, with just 124 milliseconds of latency. “Recent language-model advancements” enabled the researchers to develop this tool, according to their statement.
The author believes that experiments indicate StreamVoice’s ability to convert speech in a streaming manner with high speaker resemblance for both seen and unseen speakers while maintaining performance relative to non-streaming voice modification systems.”
Citing Meta’s LlaMA language model, which has gained traction since its initial release a year ago, the report claims that StreamVoice was constructed using “the LLaMA architecture.” Additionally, it used AudioDec’s open-source code, which Meta refers to as a “plug-and-play benchmark for audio codec applications.” According to the paper, the tool was primarily trained on sets of Mandarin speech and one multilingual set that comprised English, Finnish, and German.
Although they acknowledged that it may bear potential risks of misuse for various objectives, such as circulating fake information or phone scams, the researchers did not suggest how users should operate StreamVoice. They recommended individuals notify the appropriate authorities about any unauthorized use of voice converters.
Experts in artificial intelligence have long cautioned that deepfakes will become more common as technology advances. A recent deepfake of Joe Biden robocalled voters. It advised them not to cast a vote in the New Hampshire primary. The Department of Justice in New Hampshire is looking into this fake call matter to avoid any potential damage in this critical and sensitive time of elections.