OpenAI’s new AI Model can generate songs similar to Elvis Presley, Katy Perry, and more!

Musical AI is fast evolving. Many independent organizations are coming up with impressive AI solutions to implement machine learning as a tool in musical workflows, for example, OpenAI, an independent research organization which aims at developing “friendly AI,” has delivered many impressive AI tools over the last few years. The organization, for example, has created a language generating tool called GPT has recently added Jukebox

Jukebox: An AI that generates raw audio of genre-specific songs might not be the most practical application of AI and machine learning but given that it can create new music just by providing genre and lyrics as input is quite astonishing. Jukebox can also rewrite existing music; generate songs based on samples; and even do covers of famous artists. Samples are offered in the voice of Elvis Presley, Katy Perry, Frank Sinatra, and Bruno Mars ( at jukebox.openai.com). The results are nowhere near realism. But listening to ‘Katy Perry’ or ‘Frank Sinatra’ in different styles shows that the Jukebox is capturing some aspects of their music styles. As OpenAI specified on their blog “ the results researchers got were impressive; there are recognizable chords and melodies and words”. 

But how did OpenAI do it? 

OpenAI’s engineers made use of Artificial Neural Networks(ANN) which are essentially machine learning algorithms used to identify patterns in images and languages. Similarly, it is used to identify patterns in audio, millions of songs, and their metadata is passed through these neural network algorithms from which new music is created. In other words, the engineers have provided the AI computer with a huge database of songs and then ordered the computer to create new tracks that follow the same patterns and beats found in the songs database given to them. 

Creating tracks that resemble the provided samples requires a lot of computing power. The AI computer has to go intensive training with large amounts of data. According to the OpenAI team, to train the model, the team had created a new dataset of 1.2 million songs, from which 600,000 of them in English, paired with their lyrics and data which includes genre, artist, and year of the songs. 

Technical Details of Training Model – For those of you who are into ML engineering. Others can skip of course 🙂 

▪ The model on which the AI was trained on had two million variables running on more than 250 graphic processing units for three days. 

▪ The sampling sub-model which adds loops and transitions to track was also composed of one billion parameters and was trained on about 120 graphic processing units for many weeks. 

▪ The top hierarchy of the output track has more than five billion parameters and is trained on more than 500 GPUs. 

▪ The lyrics, which are being outputted from Jukebox, had also gone through an intensive training of two weeks. 

▪ The model is trained on 32-bit, 44.1 kHz raw audio using Vector Quantized Variational Auto Encoder (VQ-VAE). since generating music from other audio formats takes more time because of the long sequences. 

The training model and code are available in the openai/jukebox GitHub repo. 

Limitations of Training model

There was a significant gap between music created by the Jukebox neural network and human-created music. Jukebox created songs showed similarity with plenty of features such as coherence, solos, and older instrument patterns, but they lacked choruses and structure which is repeated in a song. Sampling of the tracks produced noise which degraded the overall quality of the track. Performance of the training model was also not up to the par, On average it takes about 9 hours to fully output a minute of audio using training models, which can be a bottleneck when rendering and delivering audio samples on cloud platforms. Lastly, the model only produced English tracks since it is only being provided with a database of English songs. Samples and lyrics in other languages are not yet trained on the platform. 

Legal and Ethical issues with such AI Models 

Jukebox has many other issues when it comes to delivering a sample from the provided input. First, the copyright issue around training an AI on already recorded music will always require a copy of that track. Although this type of training is considered ‘fair use’. The second issue is the output, and this one can have serious consequences. Jukebox produces new tracks from existing metadata that are the lyrics and genre. What if those lyrics are protected by copyright?, What if the music is ‘in another style of the genre’ created a different appearance of the original singer in front of their audiences. 

In many areas of the music community, there can be many issues from the Jukebox platform, whether it’s on the basis of copyright infringement and/or decreasing the value of human-made music. With the issues of the Jukebox platform also comes the benefits from it music creators will be excited and curious about Jukebox: how they can implement this creative AI tech in their workflow. 

All these opinions and questions are completely natural. They always come at the cost of the latest tech innovations, Is AI good or bad for humans? Well, It all depends. So the best option is to explore and understand the possibilities of what Jukebox technology is really capable of. Understanding the technology will not only help in forming reasoned opinions while decreasing real-time issues with the platform. 

Conclusion 

Overall, Jukebox represents a step forward in improving the musical quality of samples with new lyrics, thus providing the creators with more freedom to create music over the generations. The ability to change the output on the basis of artist, genre, and lyrics is one of the biggest strengths of Jukebox. 

Also, This is not the first music AI tool that San Francisco-based AI laboratory has delivered. OpenAI has been working on generating automatic audio samples conditioned on different kinds of metadata for many years now. Last year, they brought MuseNet, which was trained with a large amount of MIDI data using deep neural networks to compose new tracks with different instruments and genres from country to pop to rock. 

Looking to Leverage AI in your organization? Reach out to us here

Interested in joining our ML Team? Please check out the open positions here