We’ve seen AI dominate the world of art and the written word, and now it is coming for music, but it has a long way to go.
Opening up Google’s MusicLM, an artificial intelligence (AI) platform for making music, I hoped to hear soaring melodies, carefully modelled after the finest orchestras the world has to offer… I was, somewhat unsurprisingly, disappointed.
Announced back in January, MusicLM was Google’s first attempt at generating music through AI. Now, Google is letting people try it out for themselves, creating their own mini-masterpieces with AI.
In my time using the platform – requesting different genres, songs, styles and instruments – I was left feeling as if, unlike image and text generation, AI music has a long way to go.
And yet, this was the exact thought that a lot of the world had roughly a year ago when OpenAI first started its viral journey, producing AI images that, compared to what we see today, we’re pretty awful.
How Google’s MusicLM works
Google wastes no time to make it clear how premature this technology is. It is listed as ‘experimental technology’ for synthetic music only. It can’t generate vocals, and requests for specific bands or artists won’t be generated.
For now, that limits its ability to the still vast world of instrumental music.
Google offers up an array of prompt suggestions, ranging from the simple “high-pitched bongos with ringing tones” through to the ever-so-slightly more detailed “optimistic melody about the arrival of spring, full of joy and hope, tranquil flute in the background, upbeat guitar”.
Music Google’s MusicLM generated from the prompt “Jazz group playing a fast song at a party”
However, no matter how complicated the prompt is, or how many details it is given, a lot of the songs that Google’s MusicLM can create suffer from a few repeating issues.
Whether the genre is death metal, acoustic pop, or fast-paced folk music, many of these tracks sound like they are playing through a wall, or even underwater.
That accompanied by an overwhelming bass presence on… well, everything, makes for an overwhelming sound.
Every track also seems to take after experimental jazz, following no real rhythm. This isn’t too surprising considering the AI is merely putting the logical note after the last one. But it can often feel like throwing a dart at a keyboard and seeing what sound comes next.
In some cases, MusicLM seems to misunderstand the prompt completely.
When I requested a ‘German metal song about the coming of spring’, I was presented with drums and some extremely glitchy piano and bass.
German metal song about the coming of spring
In other cases, it understands part of the prompt but completely ignores the rest.
For example, the prompt “calming violin backed by distorted guitar” offered a piano and violin ballad. There is absolutely no guitar but there is a rather aggressive drum track.
Calming violin backed by distorted guitar
Elevator music
Where this model thrives is in the more generic sounds. If it could be played in an elevator, or used as copyright-free music, Google has it covered.
This doesn’t seem all that surprising. In a model that is replicating existing work, genres that utilise repeating sounds, song structures and chord progressions are going to be easier to make.
Folk music about someone going on an adventure
The most complications come with sounds that are more complicated in nature. Orchestra performances sounded extremely messy, as did experimental jazz and progressive metal.
Equally, genres that are more specific or less common proved a challenge to Google. Once you get into the many sub-genres of electronic music, Google just starts pumping out an identical mish-mash of beeps and boops.
Copyright strikes galore
Like all other generative AI models, to make this work it needed to be trained on existing music. A dataset of 280,000 hours of music was used in the training, aiming to give the model an understanding of a wide range of genres.
However, like previous AI models, this is a bit of a legal minefield. In certain audio clips, vocals can be vaguely heard in the background, and while you aren’t able to request artists or songs, the world of copyright in music is complicated. As Ed Sheeran’s recent court battles show, it can all come down to something as simple as a riff or order of chords.
Pop ukulele song
While the model can only produce instrumental tracks, all of this is relatively easy to manage. However, as soon as lyrics and voices are introduced, especially if they are trained on existing content, Google could be opening up a wealth of copyright concerns.
The future of AI music
Google’s MusicLM has a long way to go, but realistically so did both ChatGPT and the many image generators in their early days. When each of these AI platforms first land, they are filled with errors, and, at times, offer laughable results.
However, this is early days and even right now, it is impressive that the model is generating unique songs… no matter how strange they can be.
Realistically, each prompt is almost right. The genre is almost always correct, most of the instruments you request are there, and while the structure is a mess, it is a passable mess.
As more users try out the software, it will begin to drastically improve.
That paired with future updates from Google, extended research and more training, and we could soon be seeing AI topping the charts.
Read more:
- Google Bard: Everything you need to know about ChatGPT’s AI rival
- Dall-E 2: Why the AI image generator is a revolutionary invention