HomeAI News & UpdatesOpenAI’s Voice Engine - A Tool For Cloning Voices

AI News & Updates AI Tools/Gadgets

OpenAI’s Voice Engine – A Tool For Cloning Voices

By Editorial Staff

March 30, 2024

0

Today is the test release of OpenAI’s Voice Engine, which adds to the company’s text-to-speech API. Voice Engine has been in development for about two years. It lets users post any 15-second voice sample to make a synthetic replica of that voice. However, the public release date has not been announced just yet, allowing the corporation plenty of time to address any misuse or inappropriate usage of the model.

Jeff Harris, a member of the product team at OpenAI, said in an interview, “We want to make sure that everyone feels good about how it’s being deployed.” he went on to say that they know where this tech is dangerous and have ways to deal with that.

Model Training

Harris said that the creative AI model that powers Voice Engine had been out in the open for a while. The voice and “read aloud” features in OpenAI’s AI-powered robot ChatGPT and the pre-set voices in OpenAI’s text-to-speech API are both based on the same model. It has been used by Spotify since the beginning of September to translate podcasts for well-known speakers like Lex Fridman into other languages.

Upon asking about the source of the model’s training data. He would only say that the voice engine was trained on both paid and open data.

Voice Cloning with OpenAI: How and Why?

Vocalizer models are learned on a huge number of examples, in this case, speech recordings, that are usually found on public websites and in large data sets on the internet. Training data is something that many companies that sell generative AI think is valuable, so they keep it and information about it secret. But training data information could also lead to intellectual property lawsuits, which is another reason not to give away too much.

There are already claims that OpenAI broke intellectual property laws by teaching its AI on photos, artwork, code, articles, and e-books that belonged to other people without giving credit or payment to the authors or owners.

OpenAI has licensing deals with certain content vendors, like Shutterstock and news company Axel Springer. Webmasters can stop OpenAI’s web crawler from collecting data from their site for training purposes. It’s also possible for artists to “opt out” of having their work used in OpenAI’s image-making models, such as the company’s newest DALL-E 3.

Such an option is not available for OpenAI’s other goods, though. And in a recent statement to the U.K.’s House of Lords, OpenAI said it was “impossible” to make helpful AI models without copyrighted content. They noted that fair use, the legal principle that lets you use copyrighted works to make a second creation as long as it’s transformative, protected them when it came to training the models.

Voice Synthesis

Synthesizing voice This is due in part to the model’s ephemeral speech generation, which combines a diffusion process with a transformer. Harris said, “We take a small audio sample and text and generate realistic speech that matches the original speaker,” and the sound that is used is turned off when the request is over.

He said that the model looks at both the speech data it gets as well as the text data that is going to be read out loud at the same time. This way, it can make a voice that matches without having to make a separate model for each person.

It’s not brand-new technology. ElevenLabs, Replica Studios, Papercup, Deepdub, and Respeecher are just a few of the companies that have been making voice cloning software for years. Big Tech companies like Amazon, Google, and Microsoft have also done this. In fact, Microsoft is one of OpenAI’s largest investors. Harris said that OpenAI’s method actually produces better speech overall.

We also know that the price will be very low. OpenAI took the price of Voice Engine out of the advertising materials it released today, but as per sources, it still costs $15 per million characters, which is about 162,500 words. That would just about fit “Oliver Twist” by Dickens. An “HD” quality choice costs twice as much, but an OpenAI representative said that there is no difference between HD and non-HD voices, which is not clear.

In terms of price per hour, that’s a little less than $1 for 18 hours of music. That is less than what ElevenLabs, a well-known competitor, charges, which is $11 a month for 100,000 characters. It’s not possible to change the tone, pitch, or cadence of a speech in a speech Engine. Harris says, “Any expressiveness in the 15-second voice sample will carry over to later generations.” For example, if you speak in an excited tone, the synthetic voice will sound consistently excited.

Voice Talent As A Commodity

ZipRecruiter pays voice actors between $12 and $79 an hour, which is much more than Voice Engine, even at the low end. OpenAI’s tool could turn voice work into a commodity if it takes off. After that, what does that mean for actors?

It’s not like the talent business would be caught off guard; it’s been dealing with the looming doom of generative AI for a while now. Voice actors are being asked to give up the rights to their sounds more and more so that clients can use AI to make synthetic versions which could one day replace them. Voice work, especially low-paying entry-level work, could be replaced by speech made by AI.

Some AI voice services are now trying to find a middle ground. Last year, Replica Studios made a deal with SAG-AFTRA to make and sell copies of the sounds of members of the media’s artist union. This deal caused some controversy. The groups said that the deal set up fair and moral rules to make sure that performers gave their permission while they talked about how to use fake voices for fresh works, like video games.

ElevenLabs, on the other hand, has a market for synthetic sounds where people can make one, check it, and then share it with the public. People who create a voice get paid a set amount of money for every 1,000 lines that are used.

Any labor union deals or marketplaces set up by OpenAI will not happen soon. For now, all that is needed is for users to get “explicit consent” from the people whose voices are being cloned, make “clear disclosures” about which voices are artificial, and acknowledge not to make use of the voices of minors, the dead, or political figures from their generation.

Harris showed excitement that they were really interested in it and kept a close eye on how this affected the voice actor economy. He shared that he thinks this type of technology will give voice actors a lot of chances to reach more people. But we’ll find out more about this as people start using and experimenting with the technology.

Deepfakes And Ethical Concerns

Voice cloning tools can be misused in ways other than putting actors’ jobs at risk, and they have been. People on the notorious message board 4chan, known for its conspiracy theories, used ElevenLabs to post cruel messages that looked like they were from famous people like Emma Watson. Reporter Joseph Cox at Vice wrote about making a voice clone that was believable enough to deceive a bank’s security system.

People are afraid that malicious individuals will use voice cloning to try to change the results of elections. And they’re not without reason: In January, a telephone campaign in New Hampshire used a fake President Biden to discourage people from voting. This caused the FCC to take action to make future campaigns like this illegal.

In addition to making it illegal to use deepfakes, what else does OpenAI do to make sure Voice Engine isn’t abused?

In answer to that Harris named a few. He explained that a very small group of engineers (about 10) will be able to use Voice Engine at first. Harris says that OpenAI is focusing on “low risk” and “socially beneficial” use cases, such as people with healthcare and accessibility. They are also testing “responsible” synthetic media.

Some of the first companies to use Voice Engine are Age of Learning, an ed-tech business that uses it to make voice-overs from actors who have already been cast, and HeyGen. This audio storytelling app uses Voice Engine to translate into different languages. Voice Engine is being used by Livox and Lifespan to offer sounds to people who have trouble speaking, and Dimagi is creating a Voice Engine-based application to assist health workers in their native languages.

Here is the example of audios provided by OpenAI

Reference voice: audio link

Generated voice: audio link

The voice generated uses the following text

“Some of the most amazing habitats on Earth are found in the rainforest. A rainforest is a place with a lot of precipitation and it has many kinds of animals trees and other plants. Tropical rainforests are usually not too far from the equator and are warm all year.”

Second, clones made with Voice Engine have watermarking with a method that OpenAI built that adds identifiers to recordings that can’t be heard. Watermarks like these are used by other companies, like Resemble AI and Microsoft. Harris didn’t say that the watermark couldn’t be broken, but he did say that it was “tamper-resistant.” If there is an audio clip out there, it is very simple for you to look at that clip and know that it was made by their system and its creator who made it. He said that It’s not open source yet. For now, they have it inside the company. We’re interested in making it public, but that obviously comes with more risks of breaking it and being seen by more people.

Third, OpenAI plans to let people in its “red teaming network” use Voice Engine to look for bad uses. The “red teaming network” is a paid group of specialists that helps the company assess the risks of its AI models and come up with ways to deal with them.

Some experts say that AI red teaming doesn’t go far enough and that it’s up to the sellers to make tools to protect against the harm that their AI could do. Voice Engine isn’t going that far for OpenAI. However, Harris says that releasing the tools safely is the “top principle” of the business.

Public Release

If the test goes well and people like Voice Engine, OpenAI might make the tool available to all of its developers. But for now, the company isn’t ready to make a firm decision.

But Harris did show a sneak peek at Voice Engine’s plans. He said that OpenAI is trying a security feature that requires users to read random text as proof that they are there and aware that their voice is being used. Harris said that this could provide OpenAI the trust it needs to make Voice Engine available to more people, or it could just be the start.

He said that it would depend upon what they learn through the pilot, the safety problems that come up, and the steps they take to fix them that will keep them moving forward with the voice-matching technology.

Previous article

AI Companies Scale AI And Cohere Are Looking For Big Investments

Next article

AI In Cybersecurity – Building A More Secure Future With AI And Cybersecurity

Editorial Staff

Editorial Staff at AI Surge is a dedicated team of experts led by Paul Robins, boasting a combined experience of over 7 years in Computer Science, AI, emerging technologies, and online publishing. Our commitment is to bring you authoritative insights into the forefront of artificial intelligence.

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Please enter your comment!

Please enter your name here

You have entered an incorrect email address!

Please enter your email address here

Most Popular

Recent Comments