Sunday, September 8, 2024
HomeAI News & UpdatesDeepgram's Aura Gives Voices To AI Agents 

Deepgram’s Aura Gives Voices To AI Agents 

Deepgram is well-known as one of the leading voice recognition startups. Aura, the company’s new real-time text-to-speech API, was launched today by the well-funded startup. Developers can create conversational AI bots in real-time with Aura’s low-latency API and extremely realistic speech models. These agents, supported by LLMs, can then fill in for real customer service representatives in settings where they interact with the public, such as call centers.

 Scott Stephenson, co-founder and CEO of Deepgram, said that access to high-quality voice models has always been possible but that doing so was prohibitively expensive and computationally intensive. On the other hand, low latency models tend to seem mechanical. According to Stephenson’s repeated observations, Deepgram’s Aura mixes human-like voice models which display incredibly quickly (usually in under half a second) and inexpensively. 

He said Everyone is now saying, ‘We need real-time speech AI bots that can sense what is being said and understand and develop a response— and then they can talk back.’. While accessing LLMs is rather expensive, he believes that for something like this to be useful for enterprises, it requires a mix of low latency, acceptable pricing, and accuracy—what he called “table stakes” for the service. 

With a price of $0.015 per 1,000 characters, Deepgram claims that Aura is presently offering the best deal compared to its competitors. That’s cheaper than Google’s WaveNet sounds (0.016 per 1,000 characters) and Amazon’s Polly’s Neural voices (also $0.016 per 1,000 characters), but it’s still not too far off. However, the most expensive tier on Amazon is way out of reach. 

Stephenson believes that not only must you achieve an excellent pricing point across all sectors, but your latency, speed, and accuracy must be phenomenal. “So it’s a really hard thing to hit,” he stated about Deepgram’s overall product development strategy. He explained that this had been their primary goal from the start, and it is the reason they spent four years developing the necessary infrastructure before releasing anything—to ensure that this would be a reality. 

Currently, Aura provides a selection of speech models—about a dozen—that were trained using a dataset that Deepgram collaborated on with voice actors. All of the models used by the organization, including the Aura one, were trained internally. 

After giving it a try, you can say that the speed is what truly sets it apart, along with Deepgram’s already impressive speech-to-text model. Sure, there are occasional instances of strange pronunciations, but overall, it’s pretty impressive. The time it took for the model to begin speaking (usually under 0.3 seconds) and the time it took for the LLM to complete creating its answer (often just under a second) are recorded by Deepgram to demonstrate the speed at which it creates responses. You can get a feel for Aura with this free trial. 

Editorial Staff
Editorial Staff
Editorial Staff at AI Surge is a dedicated team of experts led by Paul Robins, boasting a combined experience of over 7 years in Computer Science, AI, emerging technologies, and online publishing. Our commitment is to bring you authoritative insights into the forefront of artificial intelligence.
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments