New carefully curated, open-source dataset will let everyone train music models without worries about copyright.
GCX knows what the AI music community needs right now: A diverse, open-source dataset that covers nearly every crucial genre and instrument from around the world. So it’s providing one that includes 15k tracks, including stems, chords, and detailed metadata parameters for every track.
Many AI music datasets run the risk of being falsely labeled as creative commons or are scraped from copyrighted material. GCX’s copyright-free dataset solves this problem, giving everyone the peace of mind to train their music models without worry.
“Music AI won’t progress if researchers and developers can’t get good datasets that don’t hide any copyright pitfalls,” explains GCX founder and CEO Alex Bestall. “Our datasets are used by some of the best teams in the world already, but we wanted to open things up to more people, to keep this field moving forward.
The GCX dataset will be available on all major cloud and dataset services – everywhere AI researchers might need it. The dataset will be released under an MIT license, which will allow limited commercial use for companies or services. These open-source datasets can be used to train AI music models to generate new music, compose music in different styles, transcribe, and recommend personalized music playlists. These datasets can also be used to develop new AI music products and services, including new foundation music models, and educate AI music developers on the continuing advancements of the use of AI in music.
“Open-source datasets are essential elements in supporting a truly innovative AI ecosystem, where creative small players can compete with larger teams. At a time when so many models and resources are moving from open to closed, we wanted to head the opposite direction, making better music data available to more people,” says Bestall. “We see this as a vital next step in making better musical experiences and products using AI.”
About GCX
Global Copyright Exchange (GCX) by Rightsify provides a comprehensive and compliant dataset licensing framework to developers, music and entertainment companies, and anyone else looking to train generative AI ethically. With more than one hundred years of copyright-cleared music in a wide range of genres, GCX is the only “clean” catalog with the robust metadata to support training of text-to-music and other AI models. For more information, go to www.gcx.co.
