Meta AI have just delivered the best neural network for synthesizing music by text or by a given melody – Audiocraft. The architecture is based on autoregressive transformers (decoder only). The whole generation goes through one transformer, without any hierarchical upsampling, as was usually the case in previous works.
Audiocraft Library Code
The code was written as part of the new Audiocraft library, which is designed for further research in audio generation. There are models of different sizes: from 300M to 3.3B parameters. For inference, 16GB of RAM will be required locally, but it is also possible in Google Colab.
Our focus is on addressing the challenge of generating music that is conditioned on specific criteria. To achieve this, we have developed MusicGen – a single Language Model (LM) that can work with multiple streams of compressed discrete music representation, or tokens. Unlike previous approaches, MusicGen comprises a single-stage transformer LM which uses efficient token interleaving patterns that negate the need for cascading several models (such as in hierarchically or upsampling). Thanks to this method, MusicGen can produce high-quality output while being conditioned on either textual descriptions or melodic features, allowing for better control over the generated results. We have conducted extensive empirical evaluations, incorporating both automatic and human studies, which demonstrate that our approach outperforms the evaluated baselines on a standard text-to-music benchmark. Through ablation studies, we have been able to identify the importance of each component of MusicGen.
Audiocraft is a cutting-edge library designed for audio processing and generation using deep learning techniques. Its advanced capabilities include the highly acclaimed EnCodec audio compressor/tokenizer, as well as MusicGen, a user-friendly music generation LM that can be easily controlled and adapted to accommodate both textual and melodic conditioning.
OpenAI has also developed a music-generating AI called Jukebox, which can compose complete songs with lyrics, vocals, and instrumental accompaniment. Jukebox is trained on a massive dataset of audio recordings spanning multiple genres and styles, allowing it to generate highly creative and diverse pieces of music.
While some critics may argue that AI-generated music lacks the emotional depth and creativity of human-created music, there is no denying that this technology has the potential to revolutionize the way we think about the creative process.
You can drink to composers writing music for movies. As AI continues to advance and become more sophisticated, we can expect to see even more innovative applications of this technology in the world of music and art. While it may never replace the creativity and emotional resonance of human-created music.