Atlantic Releases Database of AI Music Training Datasets

Musicians Backxwash and Titus Andronicus have already discovered their songs among the 21 million tracks found in newly revealed AI training datasets.

MF
Maya Feldman

June 21, 2026 · 2 min read

A musician stands before a massive, glowing digital database of music and AI code, symbolizing the revelation of AI music training datasets.

Musicians Backxwash and Titus Andronicus have already discovered their songs among the 21 million tracks found in newly revealed AI training datasets. Exclaim! confirms what artists long suspected. The Atlantic's new searchable database, compiled by Alex Reisner, finally pulls back the curtain on opaque AI development practices, revealing four massive datasets, The Verge reports.

AI models are being built on an unprecedented scale using existing musical works, but transparency and artist compensation have been virtually nonexistent. This glaring omission fuels the current dispute.

These revelations will likely spark a wave of legal challenges and new regulatory frameworks for AI training data, fundamentally reshaping the relationship between creators and AI developers.

The Hidden Scale of AI's Music Library

Alex Reisner's investigation uncovered giant datasets of songs shared within AI development circles: one with 12 million tracks, another with 9 million, according to The Verge, MusicTech reports. Two smaller datasets each hold over 100,000 songs, according to The Verge. These figures expose the industrial-scale appropriation of existing music, forming an uncompensated foundation for many AI-generated works. The sheer volume—over 21 million tracks across just two datasets, according to The Verge—indicates a systemic disregard for copyright by AI developers, making individual licensing impractical and collective legal action inevitable.

Artists Discover Their Work in AI Training Data

The Atlantic's findings arm artists with direct tools to identify infringement. This immediate, undeniable evidence shifts the debate from theoretical concerns to concrete legal claims, fueling demands for accountability. Musicians like Backxwash and Titus Andronicus have already confirmed their songs within these datasets, Exclaim! reports.

How Tech Giants Are Fueling the AI Music Boom

Google reportedly downloaded a smaller dataset from the Free Music Archive to train its AI models, MusicTech details. This reveals a pervasive culture among AI developers: sourcing music without explicit consent or compensation. Even seemingly open-source archives are repurposed by major players for commercial AI training, blurring ethical lines and expanding the problem beyond illicit sharing to include appropriation of publicly available material.

Legal Repercussions and Industry Investigations Begin

APRA AMCOS, Australia's official music copyright team, will launch an investigation into The Atlantic's findings regarding AI companies allegedly stealing mass datasets for training purposes, MusicTech reports. Official investigations are now swiftly launching, signaling a global push to establish new legal precedents and compensation models for creators. Copyright enforcement bodies are mobilized. The era of unchecked AI model training on copyrighted material is nearing its end, indicating significant financial liabilities for AI companies are imminent.

By early 2027, the APRA AMCOS investigation, along with similar actions globally, is expected to clarify the financial liabilities tech companies face for their current training practices.