Monday, November 25, 2024
HomeTechnologyStability AI brings new readability and energy to gen AI audio with...

Stability AI brings new readability and energy to gen AI audio with Secure Audio 2.0


Be a part of us in Atlanta on April tenth and discover the panorama of safety workforce. We’ll discover the imaginative and prescient, advantages, and use instances of AI for safety groups. Request an invitation right here.


Stability AI is constant to push ahead its imaginative and prescient for generative AI with the Secure Audio 2.0 audio mannequin right now.

Stability AI is probably greatest identified for its text-to-image Secure Diffusion fashions, however that’s solely one in every of many fashions the corporate has been engaged on. Secure Audio had its preliminary launch in Sept. 2023, introducing the power for customers to generate brief audio clips with a easy textual content immediate. With Secure Audio 2.0, customers can generate high-quality audio tracks of as much as 3 minutes, double the 90 seconds the preliminary Secure Audio launch enabled.

Along with supporting text-to-audio, Secure Audio 2.0 can even help audio-to-audio technology, the place customers add a pattern they wish to use as a immediate. Stability AI is making Secure Audio accessible for restricted use without cost on the Secure Audio web site, with API entry accessible quickly so builders can construct companies.

The brand new Secure Audio 2.0 launch is the primary main mannequin drop from Stability AI because the firm’s former CEO and founder Emad Mostaque abruptly resigned on the finish of March. Based on the corporate, it’s nonetheless very a lot enterprise as ordinary and the Secure Audio 2.0 replace is a testomony to that.

VB Occasion

The AI Impression Tour – Atlanta

Persevering with our tour, we’re headed to Atlanta for the AI Impression Tour cease on April tenth. This unique, invite-only occasion, in partnership with Microsoft, will characteristic discussions on how generative AI is reworking the safety workforce. House is proscribed, so request an invitation right now.


Request an invitation

Classes realized from Secure Audio 1.0 knowledgeable model 2.0

Stability AI iterated on its preliminary expertise of growing Secure Audio in 2023.

Zach Evans, head of audio analysis at Stability AI informed VentureBeat that for the preliminary launch of Secure Audio 1.0, the main focus was on launching a groundbreaking text-to-audio generative mannequin with distinctive audio constancy and a significant output length. 

“Because the preliminary launch, we’ve got devoted ourselves to advancing its musicality, extending the output length, and honing its capacity to reply precisely to detailed prompts,” Evans mentioned. “These enhancements are geared toward optimizing the know-how for sensible, real-world functions.”

Secure Audio 2.0 introduces the power to supply full musical tracks with coherent musical construction. Utilizing latent diffusion know-how, the mannequin can generate compositions as much as 3 minutes lengthy containing distinct intro, growth and outro sections. That is an development from the prior Secure Audio launch that would solely create brief loops or fragments fairly than full-length songs.

Wanting on the machine studying (ML) science behind Secure Audio 2.0, the mannequin nonetheless depends on what is called a latent diffusion mannequin (LDM). Evans defined that because the Secure Audio 1.1 beta launch replace that got here out in December Secure Audio has had a transformer spine, making it what he known as a “diffusion transformer” mannequin.

“We additionally elevated the quantity of information compression we apply to the audio information throughout coaching, permitting us to scale the mannequin outputs to 3 minutes and past whereas sustaining cheap inference instances,” Evans mentioned.

Reworking audio samples with textual content prompts

Along with producing audio from textual content prompts, Secure Audio 2.0 permits audio-to-audio transitions. 

Customers can add audio samples and use pure language directions to remodel the sounds into new variations. This opens up artistic workflows like iteratively refining and modifying audio by offering textual steerage.

Secure Audio 2.0 additionally considerably will increase the vary of sound results and textures that may be produced by way of AI technology. Customers can immediate the system to generate immersive environments, ambient textures, crowds, cityscapes and extra. The mannequin additionally permits modifying the fashion and tone of generated or uploaded audio samples.

An ongoing concern throughout the gen AI panorama is concerning the correct use of supply materials to coach a mannequin.

Stability AI has prioritized mental property protections with its new audio mannequin. To deal with copyright issues, Secure Audio 2.0 was educated completely on licensed information from AudioSparx, with opt-out requests honored. Audio uploads are monitored utilizing content material recognition to stop copyrighted materials from being processed.

Defending copyright is vital to creating positive that Stability AI can commercialize Secure Audio and the know-how can be utilized safely by organizations.  Secure Audio is at the moment monetized by way of subscriptions to the Secure Audio internet utility and can quickly be accessible on the Secure Audio API.

Secure Audio is just not nevertheless an open mannequin, a minimum of not but.

“The weights for Secure Audio 2.0 is not going to be accessible for obtain; nevertheless, we’re engaged on open audio fashions to be launched later within the yr,” Evans mentioned.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments