Backing up Spotify (opens in new tab)

(annas-archive.li)

1980 pointsvitplister5mo ago701 comments

701 comments

This is insane.

I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff. Or if the major record labels already license their entire catalogs for training purposes cheaply enough, so this really is just solely intended as a preservation effort?

Aurornis5mo ago

> The thing is, this doesn't even seem particularly useful for average consumers/listeners, since Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand. They’re so common that I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.

> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.

The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

jsheard5mo ago

> The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

They have a page directly addressed to AI companies, offering them "enterprise-level" access to their complete archives in exchange for tens of thousands of dollars. AI may not be their original/primary motivation but they are evidently on board with facilitating AI labs piracy-maxxing.

5 more replies

cryzinger5mo ago

> I had non-technical family members bragging at Thanksgiving about how they bought at box at their local Best Buy that has an app which plays any movie or TV show they want on demand without paying anything. They didn’t understand what was happening, but they said it worked great.

Sounds like one of these: https://krebsonsecurity.com/2025/11/is-your-android-tv-strea...

Probably not your problem to play tech support for these people and explain why being part of a botnet is bad, but mildly concerning nonetheless!

1 more reply

crazygringo5mo ago

> The Anna’s archive group is ideologically motivated.

Very interesting, thank you. So using this for AI will just be a side effect.

And good point -- yup, can now definitely imagine apps building an interface to search and download. I guess I just wonder how seeding and bandwidth would work for the long tail of tracks rarely accessed, if people are only ever downloading tiny chunks.

1 more reply

varenc5mo ago

Spotify is $12/month at most to get unlimited ad-free access to virtually all music.

To get access to "all" TV content legally would be hundreds of dollars a month. And for many movies you must buy/rent each individually. And legal TV and movies are much more encumbered by DRM and lock in, limiting the way you can view them. (like many streaming apps removing AirPlay support, or limiting you to 720p in some browsers)

I think Spotify wins over pirating because of its relatively low cost and convenience. Pirating TV/Movies have increased as the cost to access them has.

5 more replies

delusional5mo ago

> There are already tools to automatically locate and stream pirated TV and movie

Before we had spotify we had grooveshark. Streaming pirated content came first, and everything old is new again.

sneak5mo ago

They’re doing it for everyone, so, yes, they are doing it for AI companies.

wartywhoa235mo ago

> They’re definitely not doing this for AI companies.

So it's just yet another instance of enormous luck / annuit coeptis for the wealthy and powerful, then.

Such lucky bastards. Whatever happens, does so to their benefit, and all inconvenient questions about the nature of their luck automatically recede into the conspiracy theory domain.

And let's not forget that Anna's Archive is also the host to the world's largest pirate library of books and articles.

silcoon5mo ago

> The Anna’s archive group is ideologically motivated.

Anna’s archive business is stealing copyrighted content and selling access to it. It's not ideologically motivated.

What ideology is about pirating books and music where most of the people producing this stuff cannot afford to do it full-time? It's not like pirating movies, software and large videogame studios, which is still piracy, but they also make big money and they don't act all the time in the interests of the users.

Writers and musicians are mostly broken. If we sum the rising cost of living, AI generated content and piracy, there's almost no reward left for their work. Anna’s archive is contributing to the art and culture decadence. They sell you premium bandwidth for downloading and training your AIs on copyrighted content, so soon we can all generate more and more slop.

3 more replies

shevy-java5mo ago

> I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand.

It may be relevant for those people, but I lost all interest in current TV or streaming stuff. I just watch youtube regularly. What's on is on; what is not on is not really important to me. My biggest problem is lack of time anyway, so I try to reduce the time investment if possible, which is one huge reason why I have zero subscriptions. I just could not keep up with them.

gorbachev5mo ago

Flippant response: If it's ok for Meta for commercial use, why not for researchers for legitimate research work?

More serious response: research is explicitly included in fair use protections in US copyright law. News organizations regularly use leaked / stolen copyrighted material in investigative journalism.

zouhair5mo ago

Because the laws are there to protect people with money from people who don't have money.

VanTheBrand5mo ago

The metadata is probably more useful than the music files themselves arguably

vintermann5mo ago

Self-supplied metadata in music catalogs is notoriously shit. The degree to which most rights owners don't give a damn is telling.

Spotify's own metadata is not particularly sophisticated. "Valence", "Energy", "Danceability", etc. You can see from a mile away that these are assigned names to PCA axes which actually correspond pretty poorly to musical concepts, because whatever they analyzed isn't nicely linearly separable.

cm20125mo ago

Especially since they scraped Spotify's popularity rating as well

1 more reply

zuspotirko5mo ago

Are you aware Annas Archive already solved the exact same problem with books?

1 more reply

thiht5mo ago

> this doesn't even seem particularly useful for average consumers/listeners

I can imagine this making it wayyy easier to build something like Lidarr but for individual tracks instead of albums.

IshKebab5mo ago

I dunno if they publish like a 10 TB torrent of the most popular music I can see people making their own music services. A 10 TB hard disk is easily affordable, and that's about 3 million songs which is way more than anyone could listen to in a lifetime, even if you reduce that by 100x to account for taste.

It's probably going to make the AI music generation problem worse anyway...

justatdotin5mo ago

I would expect more data to make ai music generation better

2 more replies

sowbug5mo ago

A little off topic, but I remain naively hopeful that the horror you describe will keep Spotify from going down the same road Netflix did once content owners decided to get into the streaming business themselves, so that streaming a movie today requires you to "change the channel" to whichever service offers that movie.

Can you imagine your favorite playlist needing to swap among 10 apps, each requiring a $10/month subscription?

fsckboy5mo ago

>The thing is, this doesn't even seem particularly useful for average consumer

it's an archive to defend against Spotify going away. Remember when Netflix had everything, and then that eroded and now you can only rely on stuff that Netflix produced itself?

the average consumer will flock when Spotify ultimately enshitifies

troupo5mo ago

Netflix didn't lose content by choice. Actual right holders decided to pull their content and create rival services.

Has nothing to do with perceived enshittification by Netflix (even though they have enshittification too).

Spotify is under the same threat: they have no content that they own. Everything is licensed.

4 more replies

raw_anon_11115mo ago

There was never a time that Netflix had the majority of popular movies on their streaming service.

1 more reply

basisword5mo ago

>> But this does seem like it will be a godsend for researchers working on things like music classification and generation. The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.

VanTheBrand5mo ago

They are too big to fail but they aren’t too big to have to pay out a huge settlement. Facebook annual revenue is about it twice that of the entire global recording industry. The strategy these companies took was probably correct but that calculation included the high risk of ultimately having to pay out down the line. Don’t mistake their current resistance to paying for an internal belief they never will have to.

1 more reply

hugholousk5mo ago

This makes me think that after the crack, they probably had to come up with a formula that can statistically calculate how fast they should download spotify songs without letting Spotify realizing that they're scraping the company data and block the access. Remind me of Alan Turing formula after cracking the Enigma

Forgeties795mo ago

Just cite facebook getting busted training its AI on torrents proven to contain unlicensed material lol

stefan_5mo ago

DRM aside, Spotify clearly should have logic that throttles your account based on requests (only so many minutes in a day..), making it entirely impractical to download the entirety of it unless you have millions of accounts.

reactordev5mo ago

>unless you have millions of accounts.

Challenge accepted…

This is probably how they did it, over time, was use a few thousand accounts and queued up all the things, and download everything over the course of a year.

1 more reply

troupo5mo ago

Just like with anything digital you (and Spotify) are fully at the mercy of the rights holders. When (not if) they pull their stuff, or replace their stuff, or change their stuff, you can never get the original back unless you preserve it.

Largest example: a lot of Russian music is not available on Spotify because of the Russia-Ukrane war, and Spotify pulling out of Russia. So they don't have the licneses to a lot of stuff because that belongs to companies operating within Russia.

larodi5mo ago

This, indeed, has mostly implications for ML, training, etc. As otherwise the whole catalog is available to partners, but costs a lot. So Anna did indeed liberate the content, but I'm definitely not switching off my Spotify subscription, even though, in my personal taste, neither quality, nor UI does match Apple Music. It is still useful to have s.o. serve the content for you.

firefax5mo ago

>I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

What's stopping someone from sticking a microphone next to their speaker?

Slow, but effective.

michaelmior5mo ago

> Slow, but effective.

I wouldn't call this very effective. It would take an impractically long amount of time to capture a meaningful fraction of the collection and quality would suffer greatly.

coppsilgold5mo ago

Even if you plug the audio output into the input you would still be taking a quality loss by passing the audio through a DAC and then an ADC. Maybe if the quality of your hardware is good enough it wouldn't matter, but then you would be limited to only ripping 24 hours of audio per day...

3 more replies

layman515mo ago

Audio fingerprinting?

1 more reply

dbalatero5mo ago

They'd probably do a shit job of capturing it?

thaumasiotes5mo ago

> I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

Do they have DRM at all? Youtube and Pandora don't.

Retr0id5mo ago

Spotify has DRM, and you can find open-source reimplementations of it on github.

Their native clients use a weak hand-rolled DRM scheme (which is where the ogg vorbis files come from), whereas the web player uses Widevine with AAC.

ale425mo ago

Yes they do use DRM. I know they are using Widevine on the web player, but possibly other ones too (never looked very far). Not sure for the app, it might be that it is using OGG streams with a custom DRM (which is probably the one some existing downloaders actually (ab)use).

nsteel5mo ago

It's called playplay. It's used for protecting their new lossless files. But the first rule of playplay is you can't talk about playplay. https://torrentfreak.com/spotify-dismantles-spotifydl-track-...

Mindwipe5mo ago

YouTube Music uses Widevine.

1 more reply

cm20125mo ago

This leak will also be really useful to bad actors who will resell the music from this list without paying royalties to the artists.

lkramer5mo ago

Which is how Spotify started... And is still carrying on. So nothing has changed.

2 more replies

cedws5mo ago

I just started DJing and something I quickly noticed is how garbage Spotify's music sounds compared to FLACs I've purchased. The max bitrate is very low.

2 more replies

hermanzegerman5mo ago

Spotify fucks over most artists anyway, so who cares?

2 more replies

chrneu5mo ago

this argument is so tired.

most artists dont really care about streaming or selling their music. most of their real money comes from touring, merch, and people somehow interacting with them.

most musicians just want to make music, express themselves, and connect with folks who enjoy their stuff or want to make music with em.

Even some of the largest artists in the world only receive a few grand a year from streaming. Only the top 1% or so of artists get enough streams to even come close to living off it. It isn't that big of a deal. Music piracy isn't the theft people think it is, lars.

youtube is kind of the same way. the real money comes from sponsorships which come from engagement. nobody on youtube is upset that their video got stolen because that mentality was never sold to us to justify screwing us over. musicians, however, were used as pawns so music labels could get more money.

now folks will say stuff like "this is theft" which is just a roundabout way of supporting labels who steal from the artists. so, it's just a weird gaslighting. there's a reason folks turned on metallica over the napster stuff. metallica were being used to further the interests of labels over the interests of fans. and now you're doing the same thing :) It's a script we hear over and over again yet people keep falling for it.

5 more replies

londons_explore5mo ago

> Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

Download the lot to a big Nas and get Claude to write a little fronted with song search and auto playlist recommendations?

ccppurcell5mo ago

>The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Curious why not? Assuming you only used the metadata. I think they would be considered raw facts and not copyrightable.

madduci5mo ago

The first users of this dataset will be Big Tech corps. Meta, Alphabet, OpenAI, Microsoft, Apple will all be happy to use this dataset for training their LLMs.

For them, 300TB is just cheap

ipsum25mo ago

They already have this data. See jukebox from OpenAI, released before chatgpt.

1dry5mo ago

Thank god we are taking care of the “researchers working on things like music classification and generation” ! As long as we can convince ourselves we have a sound analysis of it, no need to support and defend people making actual art right. So much already made, who needs more?

This is not to defend Spotify (death to it), but to state that opening all of this data for even MORE garbage generation is a step in the wrong direction. The right direction would be to heavily legislate around / regulate companies like Spotify to more fairly compensate the musicians who create the works they train their slop generators with.

nimih5mo ago

What, precisely, is the point you’re trying to make here?

1 more reply

kachnuv_ocasek5mo ago

How does Spotify defend people who actually make art? There's virtually no difference between pirating and steaming through Spotify for the vast majority of artists.

1 more reply

1dry5mo ago

updated - thank you commenters for making it clear that my sentiment was not clear

fao_5mo ago

Spotify doesn't take care of artists, if you knew any artists you'd understand that Spotify is atrocious for people who make music.

robtherobber5mo ago

I believe that we need to distinguish between convenience and preservation here. It is indeed convenient for consumers to use Spotify now whilst it exists and operates the way it does. They could go under, they could change their business model, they could decide to purge everything that is not easily justifiable commercially.

As a society, we should do our best to preserve this trove.

hkt5mo ago

Id be stunned if we didn't find out Anna's Archive is a front for a handful of shadier VCs who are into AI. Even if AA themselves don't know it and just take the cash.

shevy-java5mo ago

> The thing is, this doesn't even seem particularly useful for average consumers/listeners

Yeah. To me it is not really relevant. I actually was not using spotify and if I need to have songs I use ytldp for youtube but even that is becoming increasingly rare. Today's music just doesn't interest me as much and I have the songs I listen to regularly. I do, however had, also listen to music on youtube in the background; in fact, that is now my primary use case for youtube, even surpassing watching movies or anything else. (I do use youtube for getting some news too though; it is so sad that Google controls this.)

Etheryte5mo ago

To put this into perspective, What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth. What had in the ballpark of a few million torrents when it got raided and shut down. Anna's rip of Spotify includes roughly 186 million unique records. Granted, the tail end is a mixed bag of bot music and whatnot, but the scale is staggering.

[0] https://en.wikipedia.org/wiki/What.CD

flxy5mo ago

I think what earned what.cd that title wasn't necessarily just the amount but the quality, as you mentioned, as well as the obscurity of a lot of the offered material. I remember finding an early EP of an unknown local band on there, and I live in the middle of nowhere in Europe. There were also quite a few really old and niche records on there which possibly couldn't be put on streaming services due to the ownership of rights being unknown. It was the equivalent of vinyl crate digging without physical restrictions.

Additionally there was a lot of discourse about music and a lot of curated discovery mechanisms I sorely miss to this day. An algorithm is no replacement for the amount of time and care people put into the web of similar artists, playlists of recommendations and reviews. Despite it being piracy, music consumption through it felt more purposeful. It's introduced me to some of my all time favourite artists, which I've seen live and own records and merchandise of.

sbarre5mo ago

> I remember finding an early EP of an unknown local band on there

So there was a clever trick that smaller artists did on what.cd: put up a really generous upload credit bounty for your own music, in order to sell digital copies.

I knew a few bands in Toronto who did this as a way to make sales.

They'd put up a big bounty right after setting up a webpage offering the album for sale via Paypal, then spend a few days collecting orders (and they would get a lot of them - hundreds sometimes - because What.cd had a lot of users looking for ratio credits) and then eventually email a link to the album after a few days.

No idea what the scale of this trick/scam (call it whatever) was but anecdotally I heard about it enough.

toast05mo ago

> There were also quite a few really old and niche records on there which possibly couldn't be put on streaming services due to the ownership of rights being unknown.

Music licensing (in the US at least) is actually pretty nice for this (from the licensee perspective anyway). There are mechanical licenses which allow you to use music for many uses without contracting with the rightsholders and clearinghouses whose job is to determine where to send royalties. So you can use the music and send reporting and royalties to the clearing houses and you're done.

Of course, you may want to contract with the rightsholders if you don't like the terms of the mechanical license; maybe it costs too much, etc. If you're Spotify or similar and you have specific contracts for most of the music, and have to pay mechanical license rates for the tail, it might make sense to do so in order to boast of a larger catalog.

some-guy5mo ago

I’m still using the “successor” to what.cd and I usually discover artists through random lists, “related artists”, among other things on the platform.

One interesting way of discovering artists is finding an artist that I already like on a compilation CD, and then seeing what else is on the CD.

2 more replies

paraknight5mo ago

What.cd were extreme sticklers for quality! When you applied to get in, they did a live interview on IRC to test your knowledge of ripping, transcoding, and different kinds of compression, how torrents and private trackers work, and their code of conduct. I remember studying for it. They also had ways to make sure you weren't cheating like checking your screen, as well as very aggressive automated checks for VPNs and blocklisted IPs to prevent ban evasion and multiple accounts.

They also had good incentive structures for keeping the bar high -- you could get kicked out for having a bad ratio, so the easiest way to pump your upload up was to fulfil obscure requests for FLACs you could purchase online but were extremely difficult to purchase (if you're lucky it's just an unknown artist on Bandcamp). I discovered a lot of obscure music this way, some that I'm still looking for to this day after it shut down.

Because I cared so much about being part of that private tracker, this is what also prompted me to rent a seedbox for the first time. I paid in Bitcoin out of paranoia (I lived in Germany where the fines for piracy are HEFTY, and they actually do come after you) back when Bitcoin wasn't really worth that much, and later found that that old wallet suddenly had a couple thousand in it instead of the spare change I couldn't move!

1 more reply

girvo5mo ago

Yeah, What.CD had a bunch of the local Brisbane post-rock bands from the 00s on there which was amazing to me. I at least have copies of a lot of their records!

MarcelOlsz5mo ago

email me please

VanTheBrand5mo ago

True but What.cd had a tremendous amount of notable music not available on Spotify though because it was also sourced from cds, bootlegs, vinyl, tape etc whereas Spotify only includes music explicitly licensed for streaming.

Etheryte5mo ago

This is true and a category of music that got hit notably hard was live recordings. What had a wide array of live recordings made by sound engineers straight from the mixer. This is something that you simply cannot find now unless you maybe know a guy.

2 more replies

leetbulb5mo ago

Yes. RIP a ton of very rare material. What.cd has a special place in my heart.

1 more reply

BoingBoomTschak5mo ago

Which also means almost always limited to the latest, almost always crappy (or blind to the original ambiance) remaster! One of the main reasons why I don't bother with streaming, really.

(And because they lack much obscure stuff and I don't like being dependent on the Internet and a renter's whims for something as essential as music, I guess)

1 more reply

tclancy5mo ago

Yeah, it was a great place. I have a paid Spotify account but finally got an ancient hard drive onto my network for all sorts of stuff Spotify doesn’t or can’t have (e.g., Coldcut: 70 Minutes of Madness).

rckclmbr5mo ago

You can’t talk about what.cd without talking about its precursor OiNks Pink Palace. Even Trent Reznor was public about what an amazing place it was. Music aside, the community existing just for the shared love of music and not for any other kind of monetary or influencer gain is what set it apart. We just don’t have those kinds of communities for music online anymore

chrneu5mo ago

>We just don’t have those kinds of communities for music online anymore

They're still kind of around, but yeah, everything is very much on it's way out in the music scene, at least in terms of that late 90s early 00s culture. Or has been until recently. There is a renewed interest in self-hosting and "offline" style music collections.

It sucks too. The way folks discover music is important. The convenience of streaming has lead to some interesting outcomes. When self-hosting music comes up this is always one of the top questions people have: How do you find new music?

The answer isn't that hard and really hasn't changed much. People just don't want to spend any time or effort doing it. Music stores still exist, they're amazing. Lots of 2nd hand stores carry vinyl and CDs now, which can give you great ideas for new music. There are self-hosted AI solutions and tools. Last.fm and Scrobbling are still very much around. My scrobble history is so insanely useful. There are music discords. Friends. Asking people what they're listening to in public. Live shows with unique openers(I once went to a Ben Kweller show with 4 opening bands, I still listen to 3 of them.)

2 more replies

SSLy5mo ago

I mean, WCD has two healthy replacements, plus slsk

2 more replies

josteink5mo ago

> What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth.

It was quality in technical quality of the audio in the files, but also in the organization and sourcing of the material, the QA-process of the encoding - down the the specific release the audio-file was from.

There was quantity, sure, but that was secondary to the quality. The quantity was just a side-effect of the place being known for quality, making it an attractive arena to participate in.

And it also had all the "weird"/non-standard things you don't find on mainstream streaming-services precisely because that is what independent curators are good at and often driven by.

This Anna's release... While in itself impressive in many ways does not compare to the things What.CD represented. It's almost the exact opposite:

- focus on most popular content - niche content (even by mainstream Spotify-standards) is not included

- quality is 160kbps ogg files, which is far from lossless, it's not tightly coupled to a release and even as so far the audio-grading goes, there's no transparent QA process for the content, nor is it available in audiophile fidelity.

This is definitely Apples vs Oranges.

layer85mo ago

That being sad, I have a lot of non-mainstream tracks in my playlists on YouTube Music that have YouTube comments along the line of “I wish this was available on Spotify :’(“. I bet the same goes for What.CD.

So there’s some way to go for a comprehensive music archive.

b85mo ago

Redacted, their replacement has more records then they had now.

rldjbpin5mo ago

about the scale, the same album in the tracker had several submissions, for dedicated format and regional editions.

while one can compare in terms of number of tracks, the quality used to be in another level altogether. from the article:

> The quality is the original OGG Vorbis at 160kbit/s.

meanwhile the tracker had 16/24-bit flac rips of vinyl, with decent quality control where the track's metadata was verified for any artifacts. for the given quality, one could rip youtube music (maybe not as easily anymore) and achieve a larger scale in a similar quality level.

now if hypothetically tidal had all the music of the world and was accessible this way, then it would be a comparable resource. insane regardless.

1 more reply

WadeGrimridge5mo ago

anna's rip has ~86m tracks, not ~186. ~186m is metadata, specifically ISRCs.

laughingcurve5mo ago

Wow, I have not thought about OiNK in ages... great memories! OiNK and WhatCD did something very special for the musical community

SSLy5mo ago

Well, what.cd counted any album as one torrent. While current spotify has also podcasts and AI slop.

virtualritz5mo ago

I just found out that https://annas-archive.li/ is masked by my German internet provider (SIM.de/Drillisch). I usually use a VPN but I had it switched off temp. to watch Fallout (Prime Video won't let you watch through a VPN). Only when I switched Mullvad back on could I open the site.

I didn't know German providers do this.

oarfish5mo ago

Yeah this is actually quite nefarious, as it is a private organization that decides what sites get blocked, with no legal oversight.

- https://de.wikipedia.org/wiki/Clearingstelle_Urheberrecht_im...

- https://netzpolitik.org/2024/cuii-liste-diese-websites-sperr...

Its a DNS based block, so overriding your default DNS server is enough to circumvent it. I think Dns over Https also works.

NoahZuniga5mo ago

Pretty sure this was a thing in the past, but that currently it has to be a court order.

1 more reply

croemer5mo ago

I think it's a DNS level block. I've been using NextDNS (free plan) and one side effect (besides auto ad block) is that it doesn't have those blocks. Highly recommend - there are alternative services as well, just saw NextDNS recommended here.

Alternative: https://archive.ph/2025.12.21-050644/https://annas-archive.l...

grumbelbart5mo ago

Someone compiled a list of blocked domains (by probing different DNS servers):

https://cuiiliste.de/

This is also how, for example, RT is blocked in Germany.

iknowstuff5mo ago

In that vein, I am trying to find out why searching for

    alextud popcorntime

which should trivially yield http://github.com/alextud/PopcornTimeTV results in anything but that one particular URL in every search engine: Google, Kagi, DuckDuckGo, Bing

They even find a fork of that particular repo, which in turn links back to it, but refuse to show the result I want. Have't found any DMCA notices. What is going on?

ticoombs5mo ago

They have marked the repo as noindex (or GitHub is forcing a noindex header).

Its returning a noindex flag so every serp is correctly doing what the repo has been asked.

That is... except for brave! I checked on my searx instance and it still showed up in brave's results

Mythli5mo ago

Try Yandex search, trust me later.

It has 0 censorship - regarding pirated content at least.

ZeWaka5mo ago

Very interesting. The security page does show up on kagi at #6.

I wonder if GitHub flags it to not be indexed or something.

polytely5mo ago

Also true in the Netherlands, I hate these copyright freaks constantly trying to restrict access.

junon5mo ago

Was also shocked to see that (Berlin, Telekom here).

sva_5mo ago

They also block some foreign "news" like Russia Today last time I checked.

mvkel5mo ago

This work is so critical.

Read an article that was published just 10 years ago, and witness the bit rot as most external links will 404, gone forever.

I think it's worth questioning the value of preserving -everything-, but it seems like if we can, we should.

larodi5mo ago

You know, I had the (at time of writing) 600 something comments ran through Opus 4.5 and do a summary of the sentiments. It could't find a single comment that genuinely defends Spotify or expresses sympathy for the company.

HN crowd is, of course, biased in the technocratic sense, but you see - everyone seems to actually rejoice the move.

The closest to remorse is `linhns` and `locusofself` expressing concern about artists getting hurt (not Spotify itself), but locusofself prefaces with "I hate spotify as a company but..."

(disclaimer: this text is NOT LLM generated, I wrote myself a summary of the summary. here's the Claude thread should anyone care https://claude.ai/share/cfc4ca63-2b9e-47ac-a360-202025d1a134)

mycall5mo ago

Are those 404 links available on web.archive.org?

bob10295mo ago

I recall many interesting tracks that were very aggressively deleted from all platforms in sync. I wonder if I could find them in this archive.

There is contemporary lost media being created every day because of how we distribute things now. I think in some cases, the intent of the publisher was to literally destroy every copy of the information. I understand the legal arguments for this, but from a spiritual perspective, this is one of the most offensive things I can imagine. Intentionally destroying all copies of a creative work is simply evil. I don't care how you frame it.

Making media effectively lost is not much different in my mind. Is it available if it's sitting on a tape in an iron mountain bunker that no one will ever look at again?

WD-425mo ago

Incredible.

> A while ago, we discovered a way to scrape Spotify at scale.

They wont and shouldn’t divulge the details, but I imagine that would be a fun read!

DUDOS5mo ago

How they manage to transfer 300TB of data while remaining anonymous is also astonishing.

tacker20005mo ago

I would guess this can be hidden under normal music streaming activity? But one would need lots of proxies!

eterm5mo ago

It's hard to imagine anything but physical egress for that kind of volume.

1 more reply

monerozcash5mo ago

Rent a dedicated server, setup mullvad wireguard on it or whatever. Download stuff to said server using wireguard.

Sure, you can also use Tor. The people engaged in copyright-related illegality generally don't.

1 more reply

NelsonMinar5mo ago

Perhaps they leased a botnet. https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-fro...

Thaxll5mo ago

I mean 300TB is nothing for a streaming service, like it woudn't even show on a dashboard. They probably did that over weeks which is invisible.

derkades5mo ago

It is not hard. But please don't misuse it and ruin the fun for everyone. It is nice to be able to use the music relatively easily for hobby projects. My music server has functionality to play tracks from Spotify this way:

https://codeberg.org/raphson/music-server/src/branch/main/sp...

KomoD5mo ago

Where the magic actually happens: https://github.com/librespot-org/librespot

2 more replies

bambax5mo ago

"at scale" could mean they had direct access to a server or to storage, maybe because they had an insider giving them access, or they found secrets that had leaked somewhere?

bmikaili5mo ago

they're probably just using something like https://github.com/nor-dee/spotizerr-spotify

WD-425mo ago

No way, that would take far too long.

bigyabai5mo ago

Probably not, those tools don't actually download Spotify tracks at source quality.

1 more reply

xandrius5mo ago

Truly amazing work. I couldn't help but being sad of the less popular songs not being currently stored, as those are definitely the ones more in risk of being lost forever.

If you like the goal and you have even a few 100gb available on your server, consider "donating" some of that space to seeding the data (music or books). It's absolutely how we can fight the system, even if just a tiny bit. https://annas-archive.org/torrents

squigz5mo ago

Going off the blog post, archiving the rest of Spotify (which only represents 0.4% of total listens) would bring the total size up to something like 1PB, and would likely include a huge amount of AI generated stuff, which I don't think is worth it. I'd rather see them focus resources on archiving other stuff.

1 more reply

472828475mo ago

Hmmm I don’t like this. There are sources for music with better quality out there and all this will do is paint them a bigger target for takedowns/prosecution. I am worried about losing their ebook library. Quoting from the announcement: “Generally speaking, music is already fairly well preserved.“ They should have done this as a separate identity.

xandrius5mo ago

The main difference is that people can re-host and seed part of the data by offering space in their own servers.

If AA goes down, it's not the end of it all, a new one comes back up and the seeders are still there.

1 more reply

lukan5mo ago

"and all this will do is paint them a bigger target for takedowns/prosecution"

They are based in russia. And they currently do not work together so well with the west.

So it is imaginable, that if some people give Trump quite some money, to make Annas takedown part of some deal to lift sanctions after a ceasefire in Ukraine, but .. it does not seem like it. I rather suspect more effort in the west to block access to unwanted sites like this. My ISP in germany is already blocking it.

472828475mo ago

Your ISP is filtering DNS records. Easily fixed by changing DNS. It may even speed up your lookups, as most ISP DNS are slower than the large ones like quad1/8/9.

> They are based in russia.

“Russian authorities have without any notice suspended Russia's most popular file-sharing website torrents.ru for the alleged violation of copyright laws.” (2010) https://www.petosevic.com/resources/news/2010/03/000350

“In 2016, for example, the Moscow City Court (Mosgorsud) granted more than 700 requests to protect intellectual property.” https://www.group-ib.com/blog/torrents/

“The ISPs in Russia are required to block subscriber access to thepiratebay.se and thepiratebay.mn following the complaint of […]” (2015) https://www.maverickeye.de/russia-has-ordered-local-isps-to-...

“Roskomnadzor, the country’s telecom and media industries regulating body wants people to pay, so in 2016 it’s going to block Russia’s 15 most popular torrent websites” https://www.inverse.com/article/9619-russia-will-crack-down-...

etc

There are plenty of Russian music labels. Big book publishers? Not so much. Some sites explicitly ban content from the hosting country to try and avoid that. Not the case here.

flexagoon5mo ago

> They are based in russia.

Are you sure? I don't think they are, from what I've seen

computergert5mo ago

Trump threatened the EU to tax Spotify (and others) just this week. So it doesn’t look like Trump would be happy to help Spotify out, though in exchange for money he’ll probably change his mind.

shevy-java5mo ago

Hmm. This is actually not really something I need, I think; but I consider anna's archive etc... as about as important as the internet web archive. We need to preserve data, at the least important data, also historic data - how the original websites looked. Creativity of past generations. Same for games and books.

It may be only ~30 years for webpages to have emerged, but there are also many young people who may not have experienced that since they are too young to have experienced it. There is always a generational change; our generation has the opportunity to store more things.

p0w3n3d5mo ago

This is something really important, especially in the days when music and film vanishes from platforms one by one. I myself have three playlists with greyed out titles (titles are missing so there's no possibility for me to find out what was there).

That's why I divide music to the one that I want to have forever - I buy it on CDs - and dance music that I can live without one day

eightys3v3n5mo ago

I really appreciate platforms that still show the titles and metadada after something is removed. Then at least I can go find it again to maintain my collection. Tidal does this.

yegle5mo ago

Not that we should, but it's technically feasible to have a music streaming server with the torrent as the backend, and selectively download the part of the torrent in respond to on-demand streaming request from the client.

uhfraid5mo ago

spotify used to do just that (stream p2p) until 2014 or so

https://www.scribd.com/document/56651812/kreitz-spotify-kth1...

zanderz5mo ago

The person who wrote this Spotify p2p software also wrote uTorrent, which was bought by the company bittorrent after they struggled to make a C++ client on their own. The original bittorrent implimentation was in python, but they re-skinned uTorrent as bittorrent and shipped both for a few years. https://en.wikipedia.org/wiki/Ludvig_Strigeus

johanyc5mo ago

https://www.csc.kth.se/~gkreitz/spotify/kreitz-spotify_kth11...

KTH link is better than scribd for downloading. though academic links are sometimes prone to link rot.

willio585mo ago

I recently got into the whole homelab *arr stack for things like movies and tv and while I know options exist for music I just don’t see the need yet price-wise. Spotify is still just cheap enough for me to not care enough. We’ll see how long this holds.

That being said it’s no secret Spotify and other streaming services barely pay even popular artists. Artists make money from live shows and merch. The fact that their music is behind a paywall at all could mean they make less money from some lack of exposure.

I do hope one day self-hosting music with an extremely easy setup with torrenting for sourcing is set up again. What I’m talking about exists to some extent, but it’s not trivial for most people.

justatdotin5mo ago

for me its the arms trade.

Daniel Ek pours spotify wealth into next gen miltech.

sometimes I worry that I don't know what music means to other people but I am certain that to me it is antithetical to war culture.

2 more replies

rasmus-kirk5mo ago

I'd rather download music and buy LP's, especially from smaller artists, than having a Spotify subscription. They get a much bigger cut and I get something tangible, if unpractical. The only ironic part is that a lot of small artists only print an extremely limited number of LP's, I don't understand why they don't let people purchase their stuff? Like maybe it's for the "limited feeling", but that just feels dumb as fuck.

woile5mo ago

I'm paying for youtube music, but on the side I started buying records in bandcamp directly from artists and putting them in my jellyfin library. I do use lidarr for some older tracks. I think the ecosystem is starting to look good enough, where you can have your own personal spotify.

pjerem5mo ago

Yeah we shouldn’t. But we may.

nness5mo ago

a la "Popcorn Time."

yoan92245mo ago

The metadata alone is incredibly valuable for researchers. Having 186 million ISRCs catalogued with associated genre, tempo, and popularity data is a goldmine for music analysis that doesn't even require touching the audio files.

  I've always found it interesting how streaming services have become the de facto music library of record, yet they can and do remove content at will. When Spotify pulled out of Russia, entire catalogs became inaccessible. Physical media and personal archives suddenly matter again in ways we thought were obsolete.

  The copyright discussion is complex, but from a pure preservation standpoint, I'm glad someone is doing this work.

gorbachev5mo ago

Quoting from their page:

--------------

This is by far the largest music metadata database that is publicly available. For comparison, we have 256 million tracks, while others have 50-150 million. Our data is well-annotated: MusicBrainz has 5 million unique ISRCs, while our database has 186 million.

--------------

If they truly are on a mission to protect world's information from disappearing, they should work with MusicBrainz to get this data on it.

Alternatively, it would be amazing, if they built a MusicBrainz like service around it.

In either case, to make the data truly useful, they'd need to solve the problem on how to match the metadata to a fingerprint used to identify the music tracks, assuming that data is not part of the metadata they collected.

aerozol5mo ago

It would be reasonably trivial to set up a bot that mass-imports metadata from Spotify to MusicBrainz (note that MB rules do not allow this, community cleanup from a single user doing this with another source, years ago, is still ongoing).

The value that MusicBrainz adds is the community editor who spent a few hours going through YouTube videos and wayback machine social links to figure out that Fog (Wellington, NZ, punk/post-punk) and Fog (Auckland, NZ, Post-Punk) are different bands - even if they share a Spotify profile. The editor that hunted down and listened to 5 compilations that have mixed up a radio edit and an original mix of a track, to find out which is which, and separate them in MB and make notes. [these are made up examples]

That's not to imply that these two projects are 'competing', or that the ISRC figure comparison isn't useful and correct. But community database + scraped data is apples and oranges. And a mixed fruit bowl is wonderful.

2 more replies

472828475mo ago

> n either case, to make the data truly useful, they'd need to solve the problem on how to match the metadata to a fingerprint used to identify the music tracks

How is that a problem?

    for each track in collection do extract_fingerprint

djfergus5mo ago

Anna’s Archive has largely flown under the radar by focusing on books.

Even perceived involvement in music piracy puts a much bigger target on their back from far more aggressive actors (RIAA, major labels)

pmdr5mo ago

The bulk of today's customers has no idea how to pirate music, so they're not really a threat anymore. Music streaming has been rather convenient, you pretty much get the same content across all services. Video streaming platforms have, unfortunately become fragmented and, as of late, ad-ridden.

reassess_blind5mo ago

“Good luck, we don’t care.” is their stance, as far as I can tell.

yellow_lead5mo ago

Is the music torrent not up yet? Only see the metadata one here: https://annas-archive.li/torrents/spotify

artninja19885mo ago

Yeah, in the article they write:

The data will be released in different stages on our Torrents page:

[X] Metadata (Dec 2025)

[ ] Music files (releasing in order of popularity)

[ ] Additional file metadata (torrent paths and checksums)

[ ] Album art

[ ] .zstdpatch files (to reconstruct original files before we added embedded metadata)

yellow_lead5mo ago

Oh I see, thanks! I missed that

ZeWaka5mo ago

Since the article asks:

> We're curious about the peaks at whole minutes (particularly 2:00, 3:00, 4:00). If you know why this is, please let us know!

As a hobby video/audio editor, people will start with their track taking up a preset amount and fill up the time - even if it means having some dead space at the end.

The other alternative is algorithmically created music.

nemomarx5mo ago

I've heard 2:00 is some kinda sweet spot for the Spotify algorithm and payouts? You get paid per play so you don't want to it too long, but if your track is much shorter than two minutes you get penalized or something. I know they've had to remove ambient tracks that were cut into 40 second clips as part of this.

So you might see a lot of anchoring just like YouTube videos kept stretching to almost exactly ten minutes?

syntaxing5mo ago

Moral and legal discussion aside, this is technically very impressive. I also wouldn’t be surprised if this somehow kickstarts open source music generative AI from China.

robotbikes5mo ago

This already exists and is interesting to play around with - https://github.com/ASLP-lab/DiffRhythm

frereubu5mo ago

Site is down for me. Archive link: https://archive.is/jf3HW

mawax5mo ago

Probably not down, but blocked by your ISP. Try a VPN. Same thing happens here.

lukan5mo ago

Yes, blocked. This is what I see in germany without a VPN

https://notice.cuii.info/

"Their buisness model is based on copyright infringement"

Well, where to complain that Anna's Archive ain't a buisness?

3 more replies

ipsum25mo ago

Ironic. But its working for me.

tristanc5mo ago

This is one of the greatest news I've ever heard for the digital preservation community. Just so many projects over the years could have used resources like this. Thank you for contributing to humankind!

nighthawk4545mo ago

Amazing! I wonder if the Every Noise At Once[1] site could be updated with the metadata from this?

[1] https://everynoise.com/

iggldiggl5mo ago

Thanks for linking that page, interesting rabbit hole that I hadn't heard about until today…

Fizzadar5mo ago

I have Spotify premium but the constant shuffle of content availability has meant I’ve stared routinely archiving my liked songs to avoid any rug pull. Zspotify and co still work a charm.

throwaway6137455mo ago

I wonder how deep the hole they're gonna put whoever runs this site into is gonna be?

urbandw311er5mo ago

I heard they’re based in Russia so one assumes they probably will be welcomed by the current government (or even aided) rather than prosecuted.

frytaped5mo ago

It seems to be that the metadata doesn't include the lyrics, probably because they are provided by Musixmatch. It would have been nice to have a database of lyrics linked to ISRCs. AFAIK Lrclib doesn't support downloading lyrics for a given ISRC.

peterburkimsher5mo ago

For a fully-legal alternative of metadata archiving, I suggest the iTunes EPF (Enterprise Partner Feed). https://performance-partners.apple.com/epf

The best metadata I've found, though, is the MySpace Dragon Hoard: https://archive.org/details/myspace_dragon_hoard_2010

That included the artist location, allowing me to tag songs based on their country. I then created playlists such as "NERAS" Non-English Rock Artist Sample, where the one most popular song for a particular artist was chosen, and only when the country of origin was not English-speaking, and the genre was Rock. I like listening to music while working, but English lyrics distract me because I understand what they're saying.

After discovering music via the MySpace archive, I've since purchased 73 songs from 35 artists that I'd never heard of before digging into the data. I rebuilt my playlist on Spotify, but got greyed out tracks, and YouTube Music, but got "unavailable video". So I still prefer purchasing tracks via the iTunes Music Store, Qobuz, Bandcamp, and 7digital.

Other data sources such as the MP3.com rescue barge, PureVolume archive, and Anna's Spotify archive lack the country-of-origin metadata, so are of less interest to me. It may be possible to use an LLM to guess the language of each track title, but someone else will have to do that.

Meanwhile, if you're interested in the genre-by-country MySpace data, or have questions about the iTunes EPF, feel free to reach out and we can discuss your research.

squigz5mo ago

> Other data sources such as the MP3.com rescue barge, PureVolume archive, and Anna's Spotify archive lack the country-of-origin metadata, so are of less interest to me. It may be possible to use an LLM to guess the language of each track title, but someone else will have to do that.

I would guess that combining these sources, along with info from MusicBrainz, would help quite a bit? Still, I'm rather surprised Spotify doesn't provide more information about artists.

realdeal794mo ago

With the MySpace stuff, where are you seeing the metadata? All of the the zips I’ve downloaded from the Dragon Hoard don’t have any metadata.

1 more reply

o_____________o5mo ago

> Please note that Apple Music and iTunes Music data will be migrating away from the Enterprise Partner Feed (EPF). Starting July 16, 2024

zzzeek5mo ago

great. Spotify just removes things all the time (things I actively listen to and work on for my jazz practices, one day just go "poof" because they didn't want to pay the record company anymore), and they are not as a company deserving of the role of "keeper of all the world's music". They don't give a shit and they'd vastly prefer we all listen to their AI generated royalty free crap and Joe Rogan.

DoctorOetker5mo ago

I'd rather see them use AI to convert all the scanned scientific articles into proper PDF or other formats.

Also sort and classify the articles by binary size, vs page count, plot count, raster image count etc, in order to compress the outliers and detect when a raster image should have been a plot and convert it to vectorized images etc.

How compact can we get the collective human scientific corpus?

vlaaad5mo ago

Unrelated, but I just can't stop myself from saying that I absolutely hate Spotify even though I'm a paying customer. Fuck you Spotify. You were supposed to be a convenient way to discover and listen to music. Now you are only convenient for listening to music, and absolutely terrible for any recommendations. This is sad really. Spotify had good recommendations. It's absolutely in a position where it can provide good recommendations — it has both a vast music library and a vast amount of data on user preferences. And it chooses to push procedural/ai-generated slop instead to earn more money. I thought that maybe buying $SPOT stock will make me more at peace with its greed, but it didn't work. Spotify fucking deserves to crash and burn because it sees paying customers as idiots who might not notice they are fed garbage. Fuck you Spotify, fuck you.

xyzzy_plugh5mo ago

I always find these takes curious because they could not be further from my experience. I'm still discovering tons of good music. Perhaps it's specific to genres, but I haven't encountered any generated junk tracks.

RGamma5mo ago

Since relatively recently I'm getting AI music in my automatic radio. They look/sound like soulless facsimiles of the real thing.

1 more reply

davsti45mo ago

Really? How about asking google to "play bloomberg news on spotify" next time. Then see if you can remove the resulting chaos from your history so it won't start feeding you slop.

gck15mo ago

When they launched Discover Weekly thing, I used to add at least 1 track from it to my library - it was insanely good. Now it's all junk - not even close to what I listen to.

They also removed a lot of discovery features - Playlist Radio - for example. And they still do have some version of it on the backend, but you have to go through some weird mechanisms to trigger it - like play the last song in playlist, wait till it ends (or rewind) and you get the playlist radio. But it's also a crippled version of it - prefers playing the exact same popular songs for some reason.

Then they released this DJ thing, which is laughably bad. No Spotify, I don't want someone talking to me with useless information in between songs. Who though that was a good idea? Who actually uses that?

There hasn't been a change in Spotify in last 7 years or so that wasn't negative.

layer85mo ago

YouTube Music works pretty well for me. One great feature is that it includes not just a commercial music streaming catalog, but all user uploads of music on YouTube.

komali25mo ago

I had to chuck Youtube Music away when it was polluting my youtube playlists with stuff I was liking on youtube music. Me as a video viewer and me as a music listener are two completely different people.

nickthegreek5mo ago

and you can upload 100,000 of your own tracks to the service for your private use as well. It is a great service considering I am getting it as a side effect of youtube premium. Single handedly the last subscription I would cancel.

eastbound5mo ago

This is more frequent than you would assume. I’ve neither subscribed to Apple Music nor Spotify for this exact reason: I’m a millenial who would like to discover music.

Another extremely annoying effect is, being 40+, they only suggest music for my age. In “New” and “Trending”, I see Muse and Coldplay! I should make myself a fake ID just to discover new music, but that gets creepy very fast.

wintermutestwin5mo ago

Why do you want a megacorp to tell you what to listen to!?? There are a million ways to do discovery where some enshitified corp isn’t incentivized to push something at you.

1 more reply

venturecruelty5mo ago

Why haven't you unsubscribed then?

sneak5mo ago

199GB, only metadata released for now.

Magnet link found here: https://annas-archive.li/torrents/spotify

Are magnet links allowed on HN?

cranberryturkey5mo ago

that is only 199gb, the real one is 300TB

acjohnson555mo ago

This is incredible. I once assembled a collection of 100,000 tracks for research on exploration of large music libraries. Essentially vector search. I was limited in storage and processing power to a single machine.

If I were to do it today, I could get so much farther with hyperscaler products and this dataset.

bguberfain5mo ago

We can finally search for playlists with a giving song! A basic feature that Spotify is missing!

ipsum25mo ago

Can someone explain why C#/Db (major/minor) is the third most popular key? Very unexpected for me, since its relatively more difficult to play.

ghostie_plz5mo ago

Both C#m and Db can be played on piano using only the black keys (skipping the 3rd note of the scale). This makes them easy keys for beginners. I'm not sure if that's the reason, but it could be related.

Anecdotally, I know a few vocalists that sound great in these keys and use them as a starting point

thaumasiotes5mo ago

> Both C#m and Db can be played on piano using only the black keys (skipping the 3rd note of the scale)

For the major scale, there are 7 notes in the scale and only 5 black keys; you also need to skip ti, the 7th note.

For the minor scale ("C#m"), it's worse; only four of the five black keys are part of that scale.

And I would have thought that something intended to be played only on the black keys would be described as using a pentatonic scale anyway?

1 more reply

adzm5mo ago

For electronic music, it's around the lowest bass root note that most systems can play well without a subwoofer. C pretty much requires a sub and things rarely go lower than that.

kzrdude5mo ago

Electronic dance music is the biggest genre in the data. So then easy to play shouldn't matter. It's still an interesting question. I think playing Db is pretty nice on the piano even if it's not the easiest.

ruuda5mo ago

There is a sweet spot for the bass. Lower is better for deep bass, but too low and it stops being a recognizable note, and consumer speakers can't reproduce it. This effect exists though I'm not sure if it is the cause of the pattern here.

klysm5mo ago

Difficult to play in what instrument?

yurishimo5mo ago

C# I don’t believe was/is a common tuning for most western instruments, classical or modern.

A digital piano can transpose things to make it “easier” to play.

Cursory google search says that a sitar is traditionally tuned to something useful for c#

I’m curious if C# is one of those notes that lines up nicely with whatever crappy consumer stereos/subs were capable of reasonable reproducing in the 90s as electronic music was taking off and it stuck around as a tribal knowledge for getting more “oomph” out of your tracks.

1 more reply

RickyLahey5mo ago

i believe the most popular reason is capo on 1st fret when writing songs, other factors coming 2nd or 3rd (electronic music, sped up old samples, etc)

nmz5mo ago

This might be the perfect time to do archiving before the entire internet gets inundated by sub-par AI generated content.

635mo ago

Attracting the ire of the music industry seems like a huge, unnecessary risk. I wish they had performed this as some kind of other entity to try to keep the ebook archive protected from the fallout. I fear this will not end well.

urbandw311er5mo ago

They can’t be touched by the music industry they’re based in Russia.

userbinator5mo ago

Music files (releasing in order of popularity)

Increasing or decreasing? IMHO increasing would make more sense, as the most popular music is already mirrored in countless other places. It's the rare stuff that is most in need of preservation.

I wonder how much of the content there is AI-generated. Honestly, even as someone who was initially skeptical, I've found some of it to be rather good --- not knowing that it was AI-generated at first. Now if they could only reverse-engineer the prompt and only store the model, that would be an extremely efficient form of "compression".

reassess_blind5mo ago

Same model and same prompt won’t necessarily create the same result, unless I misunderstand how these audio models work.

squigz5mo ago

It's possible to generate the same images and text from LMs by tweaking the settings, right? Are audio models different?

1 more reply

Motorbytes5mo ago

Does the Spotify backup contain any so far grayed out or unavailable songs on their list?

I'm a music archivist & preservationist, I've archived and found several formerly lost or on the verge of becoming lost albums, EPs, and Singles, and I've been wondering if the backup of Spotify so far, even with the available info, contain any taken down, region limited, or no longer available songs?

any response is appreciated!

pekkag5mo ago

Extremely useful statistics. However, users need to know that IRSC codes are not really unique identifiers. The code was created to identify unique digital tracks (recordings). When older analog recordings (there are millions of them) the publisher assigns it an ISRC code, which shows the year of reissue. If the recording is in public domain, anyone can reissue it and assign it a new ISRC code. Even if the recording is still in copyright, the company can assign each new rerelease a new code - all with a different year. So be careful with interpreting statistics based on these codes.

junon5mo ago

TIL Anna's Archive is blocked in Germany (by a rather obtrusive MitM, I might add). Get redirected to a "Copyright Clearing House" or something.

tjoff5mo ago

I just want to be able to backup my playlists. Maybe thats possible but last time I looked I could only find sites that wanted your login, not gonna happen.

lelandfe5mo ago

https://developer.spotify.com/documentation/web-api/referenc...

I bet you can whip up a super simple script with an LLM to do this!

Spivak5mo ago

Not that using the Spotify API directly is all that hard but the spotipy library makes it even easier.

hn1115mo ago

This works nicely: https://github.com/spotDL/spotify-downloader

Eckter25mo ago

There are a few tools that can export your spotify playlists into folders of audio files. That's what I used a few years ago for my initial spotify -> navidrome migration.

But they're not that good. They look for the songs on youtube, and the versions uploaded there are often modified (or just very low quality). And I've had some issues with metadata. I'd say about 5% of my songs had some issues, and 1% were completely off.

Once they release the actual torrents and not just the metadata, I'm assuming that new playlist export tools will soon show up, and they'll use these new torrents as source instead of youtube. They'll be a lot more reliable. I'd wait for that to happen. In fact I may end up re-exporting my old spotify playlist.

crazygringo5mo ago

This is where ChatGPT shines. Just ask it to write you a script, it'll give you all the instructions.

I've used ChatGPT to write a whole bunch of playlist logic scripts (e.g. create a playlist that takes tracks from playlists A, B and C, but exclude tracks in playlist D.)

venturecruelty5mo ago

You can also do this with your human brain, which doesn't require 1 MWh or a thousand gallons of water to write a script to pipe API results to jq.

1 more reply

emsixteen5mo ago

I worry about potential bans from scraping files through this sort of thing.

1 more reply

emsixteen5mo ago

Exactly the same here, I just wanna back up my playlists and liked songs, in an organised and tagged manner, at a non-potato quality.

aftbit5mo ago

Has anyone tried to add up the track file size from the metadata dump?

In spotify_clean_track_files.sqlite3:

    SELECT count(*), sum(filesize_bytes) FROM track_files;

    255966403|15970064861274

That's only 14.5 TiB, nowhere near 300 TiB. What makes up the other 285 TiB of content?

squigz5mo ago

That's curious and changes things pretty dramatically. It's a lot easier to host 15TB than 300. I wonder what's up here.

TheAceOfHearts5mo ago

I wonder if they'll explore other music services as well. As I understand it, Deezer, Qobuz, and Tidal can all get ripped easily enough. Although I'm not sure if they rate limit downloads past a certain point.

I'm a bit sad that they chose to focus on music rather than audiobooks. Creating an archive of audiobooks seem like it would be more aligned with their mission.

TechSquidTV5mo ago

The metadata is gold, but I was immediately curious why why wouldnt go for Tidal first. Though what ever they have on Spotify I think is unique.

Jumpmanlives5mo ago

Good stuff Anna's Archive. The Anchormen, premium sea shanty crew from Western Australia, officially endorsed you sharing our salty tunes. https://www.facebook.com/theanchormenwa/

https://open.spotify.com/album/07IyzOA9jJWPZcLDysQwpo?si=KZO...

xnx5mo ago

Merry Christmas!

Mr_Minderbinder5mo ago

> Over-focus on the highest possible quality

This is not an issue in my view. I like the fact that I can download 100 MiB ultra-high resolution TIFF files of scans of photographs from the original negative from the Library of Congress and 24-bit/96kHz FLAC files of captures of 78 RPM records from the Internet Archive. In addition to maintaining completeness and quality of information, one of the main goals of preservation is to guard against further degradation and information loss. You should try to preserve the highest quality copies available (because they contain more information) and re-encoding (deliberate degradation) should only be used to create convenient access copies.

Inferior copies, in addition to being less informative, have the potential to misinform. Only the archivist will enjoy space savings. All the readers who might consult your library in the infinite future will bear the cost.

> ...(e.g. lossless FLAC). This inflates the file size...

This is entirely the wrong view. The file size of a raw capture compressed to FLAC should be thought of as the “true” or “correct” size. It is roughly the most efficient (balancing various trade-offs) representation of sampled audio data that we can presently achieve. In preservation we seek to preserve the item or signal itself and not simply what we might perceive thereof. This human-centric perception view is just wrong. There is data in film photographs which cannot be perceived visually yet can be of interest to researchers and be revealed with digital image analysis tools.

As an example of how much information celluloid can contain see: https://vimeo.com/89784677 (context: he is comparing a Blu-ray and a scan of a 35mm print)

HawkEyeSpaceMan4mo ago

Not worth the risk imo. This might backfire at some point and ruin a good thing with the book libraries.

Kerollmops5mo ago

So nice! That's an excellent extract and looks useful for benchmarking Meilisearch. I'll probably spend my Christmas holidays importing the tracks, albums, and artists into Meilisearch, while my CEO builds a beautiful front-end for it. I'll probably replace [the current music search demo](https://music.meilisearch.com) we have with this much higher-quality dataset!

That would also be a good fit for [the new delta-encoded posting lists I am working on](https://github.com/meilisearch/meilisearch/pull/5985). Let's see how good it can get. My early benchmarks showed a 50% reduction in disk usage.

romanovcode5mo ago

`spotdl download "https://open.spotify.com/user/{username}" --user-auth --output '{list-name}/{title} - {artists}.{output-ext}'`

This is literally all you need to back up Spotify.

Philpax5mo ago

spotdl downloads from YouTube, not Spotify, afaik

new_hair4mo ago

Rookie Question, but how do i access all this metadata especially in a cleaner way, or genre-wise for my project development.

lelouch90995mo ago

How legal is this with regards to copyright laws?

Aurornis5mo ago

Not legal. This group does not concern themselves with copyright law.

chrneu5mo ago

they do concern themselves with it, but in a "calling it out for being shit" kind of way.

toomuchtodo5mo ago

Adherence to the legal framework is a function of your risk appetite.

luke-stanley5mo ago

Currently it says they have released metadata and album art. Is archiving and sharing the textual track metadata alone (no images, no audio) legal in the US, or Europe? By what basis is it legal or illegal?

ronsor5mo ago

Very, if we delete copyright like we're supposed to.

phainopepla25mo ago

Not legal

layer85mo ago

Completely illegal.

sneak5mo ago

The metadata scrape might not be.

1 more reply

basisword5mo ago

It's not. It's awful people justifying awful behaviour. And it's why we can't have nice things. There are always assholes ready to exploit others.

jopicornell5mo ago

Monopoly is not a nice thing. Maybe it is convenient, but not nice.

People that gives money to artists are the ones going to concerts and buying music directly to artists. Spotify gives cents to artists, incetivizing awful behaviour (AI music, aggressive marketing, low effort art...).

nemomarx5mo ago

There's some irony here considering Spotify used pirated mp3s at the start of their operations, I suppose.

poly2it5mo ago

Some people's urges to destroy all traces of human civilisation astonish me. What do you think Spotify is going to do with all its music when it ceases to exist in however many years? No, we must collectively feed Daniel Ek the Hungry.

venturecruelty5mo ago

You're talking about Spotify, right? Famously started by ad execs pirating music and then selling it.

conception5mo ago

Are you talking about Spotify here…?

chrneu5mo ago

lol is this comedy? Cuz it's absolutely hilarious opposite humor.

rireads5mo ago

You must be the Spotify CEO, lol

Yeri5mo ago

wow. Blocked in Belgium.

Error HTTP 451 - Unavailable For Legal Reasons

https://lumendatabase.org/notices/71398835

krackers5mo ago

New multimodal training set just dropped.

pranavm275mo ago

Miss anna, next time please scale down image dimensions so that us on mobile can read properly haha

Jokes aside, I always thought the best way to deal with piracy was to understand or convince the demand not to do it over dealing with the supply.

krick5mo ago

Uh, cool, I guess? I want to applaud that, but, first off, unless you are OpenAI or Facebook, it is not exactly plausibly easy to participate in the festivities. Even if I had spare 300 TB laying around, how the fuck do I download that?

But, more importantly, I cannot even say "good for you", because I don't actually think it is good for Anna's Archive. I wouldn't touch that thing, if I was them. Do we even have any solid alternatives for books, if Anna's Archive gets shot down, by the way? Don't recommend Amazon, please.

pjerem5mo ago

BitTorrent protocol doesn’t force you to download all of the files of a torrent :)

Now imagine a dedicated music client that will download and stream (and share, because we are polite) only the needed files :)

Spivak5mo ago

I am in no way saying that this is cheap but 300 TB will set you back a little less than $6k with tax. Very attainable for people other than OpenAI and Facebook. And it's not crazy at all to snag a server with enough bays to house all those.

dmicah5mo ago

For reference, considering you can purchase a 12-month Spotify Premium subscription via a $99 gift card at the moment, that same $6k could be used for 60 years of Spotify Premium.

1 more reply

sneak5mo ago

I have a Supermicro 24 bay 2U in my house with an array around half that size in it. It’s not prohibitive.

emsixteen5mo ago

The cost of rest of the hardware, running it constantly, and 'admin' overheads aren't to be scoffed at to be fair.

chrneu5mo ago

think popcorn time for mp3s/flac instead of mp4.

a client can selectively list and then stream individual files from a huge torrent. if you've ever watched illegal movies/shows on those random domain websites, you're likely streaming it from a torrent on the backend somewhere.

it wouldn't surprise me if we start to see some docker images pop up in a few days to do exactly this as a sort of "quasi-self-hosted jellyfin". Where a person host a thin client on a machine that then fetches the data from the torrent, then allows the user to "select" their library. A user can just select "Top hits from the 80s" and it'll grab those files from the torrent, then stream or back them up.

I don't really see why it wouldn't, from an end user perspective, be any different than a self hosted jellyfin or plexamp.

killingtime745mo ago

You can download torrents selectively. I think if they adopted that cautious attitude they wouldn't exist in the first place

Gander57395mo ago

Anna's archive mirrors z-lib and libgen, so those are the main alternatives. But it's unlikely anna's archive would go down so easily, they take a lot of precautions.

krick5mo ago

Oh, I was somehow under impression that libgen is no more. Glad to see it's not. I guess it was just a different domain.

machloof5mo ago

Thats huge, altho as a musician myself i am kinda scared of ai just taking all this data so they could make music better then me, i dunno maybe drop in there an anti ai trap zipbomb or somthing, that way it will work for normal users but not for ai

Aldipower5mo ago

Oh, just noticed my provider "Vodafone Germany" is blocking the domain annas-archive.li on DNS level.

puffpuff123455mo ago

Amazing!

Is there any way to search this spotify database without downloading the currently available metadata torrent?

Uninen5mo ago

I hope someone builds an open API around this metadata. I'd love to have alternatives to the big player APIs.

tolerance5mo ago

I am not enthused by this news. Let us entertain the possibility that similar institutions will eschew this catalog.

soundsgoodman5mo ago

You need to seriously re-think this...

Releasing indie music, like really low-level indie music, for free in the name of "preservation" is so misguided.

Don't do this. You will only end up hurting the artists who rely on paid downloads.

thenthenthen5mo ago

Full circle! Thank you! (https://torrentfreak.com/how-the-pirate-bay-helped-spotify-b...)

performative5mo ago

this is a really incredible effort. but, for the developers and analysts currently working with music metadata in a world where so much of music is being consumed thru streaming services that keep a tight hold on how their metadata and album art can be used, i am constantly yearning for a way to link streaming releases to public metadata sources that can be manipulated, embedded, and queried. i've done my best to build my own w/o a background in data science, but it's a hole that desperately needs filling to enable the new generation of scrobbling/music listening habit exploration.

ewzimm5mo ago

The data analysis here is interesting. One thing that stood out to me is that black metal is the 6th most common musical genre for bands, right after rockabilly. I would never have expected that.

htx80nerd5mo ago

>Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.

There is a ton of good bands with under 10k or even 1k monthly listeners.

walthamstow5mo ago

Very interesting that a white noise track for babies is the 4th most popular track on Spotify.

cluckindan5mo ago

Interesting if that is considered to be copyrightable. Any white noise track is perceptually indistinguishable from another, but none have the exact same sequence of samples except by chance, or if the noise generator happens to be deterministic as a function of time.

zarzavat5mo ago

White noise isn't copyrightable.

1 more reply

al_borland5mo ago

I find it so odd that people then to streaming services for stuff like this. I have a dedicated white noise machine, and when I travel, I use the white noise (bright noise actually) built into the iPhone.

Relying on an external hosted service would never cross my mind, and surely wouldn’t be something I go to on a daily basis.

komali25mo ago

You might find it interesting that there's an entire genre of youtube video that's designed to just be chucked one by one into slideshows for elementary school teachers to use as their lesson plan. Including videos that are just "2 minute timer for kids!"

e.g. https://www.youtube.com/@Ask.the.Teacher

"Independent Reading: Count Up Timer for Classrooms": https://www.youtube.com/watch?v=AfLfJtVeME8 straight up just stock imagery and a timer lol

junon5mo ago

It's not odd if you aren't the type who frequents hacker news. We are, after all, very much in a bubble here.

ThinkBeat5mo ago

Can this last?

I envision an army of lawyers and cyber security companies being prepared to unleash a scorched earth campaign that book publishers might want to be part of as well.

At the end it may take down more than just this publication but most others as well.

meysamazad5mo ago

I wonder if Spotify will pursue any legal actions to take this archive or the site down!

artninja19885mo ago

Wow. Anna is a godsend. Hopefully now we get some really good open source music models

brcmthrowaway5mo ago

First we need good stem splitting

artninja19885mo ago

What do you think about the recent SAM audio model by meta? https://ai.meta.com/blog/sam-audio/

1 more reply

baxuz5mo ago

> The quality is the original OGG Vorbis at 160kbit/s.

Yeah, the original quality is either a 320kbps OGG or lossless. Not 160.

While this is _a_ backup, it's a pretty lossy one.

none149885mo ago

Downloading of individual files to Anna’s Archive Please

haghiri755mo ago

I guess having an API to do search on metadata may be cool. Anyone thought of that?

schmuckonwheels5mo ago

I want to time-travel back to 2000 like Old Biff with the sports almanac so I can tell Shawn Fanning to use the "it's for historical preservation" defense.

nutjob25mo ago

I wonder how definitive their collection is and how much ripping Google Music/YouTube would improve on this.

A distributed ripping project to do that would be a fine thing.

damnitbuilds5mo ago

Well done !

Until we have reasonable copyright terms, Pirate On !

hmokiguess5mo ago

What an early christmas gift for humanity. Now, asking for a friend, what's the ideal setup for torrenting this? Mullvad / Tailscale?

lanalanabobana5mo ago

these guys are 100% selling that data to "AI" companies for thousands of dollars so the internet and world at large can get a little more shitty. awesome -_-

gorbachev5mo ago

I want to peek in that metadata collection to see if it could be used to identify the AI slop that's infecting Spotify.

If you could identify a track supposedly by artist X was actually AI slop not created by artist X, you could use that information to skip tracks on (web) music players, for example.

shomp5mo ago

If only Spotify paid musicians their fair share

wartywhoa235mo ago

https://annas-archive.li/llm

markstos5mo ago

> ≥70% of songs are ones almost no one ever listens to (stream count < 1000).

So much interesting but undiscovered music is out there!

halperter5mo ago

It would be interesting to find out how that has changed with the growth of the music industry over the years. I suspect that many of these <1000 streamed could be artificially generated for monetary purposes but I'm not entirely sure. That being said, there is a lot of good music with less than 1000 streams. I've been looking myslef and I've definitely found some hidden gems.

_vqpz5mo ago

I really don't understand how focusing on source quality files is supposed to be a "major issue" with the music preservation community. It's bizarre for them to talk about these being barriers for creating a "full archive of all music that humanity has ever produced" have and their answer be scraping Spotify to end up with a music library comprised of many AI and bulk produced songs at 75/160kbps.

dmix5mo ago

I hope they get the new lossless versions

thih95mo ago

This is conspiracy theory territory but I wonder if big tech is sponsoring efforts like this as an easy way to get training data.

littlecranky675mo ago

For some reason, the link does not work for me (spain). Works perfect at the same time in tor browser.

fungonimus5mo ago

I would like a downloader! :D this is such an awesome project

m00dy5mo ago

Congrats! I’m sure the Spotify lawyers are gonna have some sleepless nights ahead.

827a5mo ago

Holy crap. This is going to trigger a five-alarm fire at Spotify Engineering. This has got to be among the largest proprietary datasets ever unintentionally publicized by a company.

rightbyte5mo ago

Wasn't all data available to users though?

cm20125mo ago

Yes but very hard to scrape in bulk from user accounts

okokwhatever5mo ago

Who cares now, it's already downloaded and ready to be torrented... God is good

potwinkle5mo ago

I mean... not really? Not much music is Spotify exclusive (at least from the 99.6% of what people listen to mentioned in the article), and from friends in the industry I can guarantee you all major content platforms (Netflix, Disney+, Prime Video, a large chunk of YouTube) have already been completely copied without a business agreement with the rightsholders by AI startups and big-name players.

gverrilla5mo ago

GREAT DAY

none149885mo ago

Downloading of individual files to Anna’s Archive Please!

eastoncrafter5mo ago

Plans to upload all this to musicbrainz soundid program?

eastoncrafter5mo ago

Plans to upload all of this to music brainz soundid?

msephton5mo ago

Is this all regions? I'm assuming so but I can't be sure

marstall5mo ago

the top 10,000 songs seem to be 99.9% top-40 corporate pop, which suprised me. thought a list that broad would pick up more that was outside the maintream ...

squigz5mo ago

10,000 sounds like a lot, but it really isn't. Even my own personal music collection - which isn't all that impressive - is nearly 20,000 tracks.

reactordev5mo ago

Oh this is going to go over real well in Nashville, TN.

gyrgtyn5mo ago

is there a torrent client already that is be good at partial downloads? I didn't realize how popcorn time worked until I read this thread.

kccqzy5mo ago

All torrent clients must necessarily support partial downloads because of the nature of torrents. The files are split into pieces which are downloaded and then assembled by the torrent client.

flexagoon5mo ago

"Partial downloads" in the context of torrenting usually refers to downloading specific files from a torrent

BaudouinVH5mo ago

error 451 https://postimg.cc/QFddnW41

siquick5mo ago

Is there a way to see the shape of the metadata?

rendaw5mo ago

Looking at the analysis, I'm totally surprised opera and psytrance are so prolific.

Psy-trance... I thought it was the same as any other electronic genres, but do people get high and just start shoveling psy-trance tracks out or something?

Opera I thought was a very strict discipline, needing rigorous somewhat esoteric training in order to produce the right sounds. How could there be so many opera artists?

I mean, I'm sure there's some misclassification, but chamber music is basically a couple people with any sort of music training on classical instruments so that doesn't surprise me nearly as much... I can easily imagine there being _lots_ of those, and you might come up with a different artist name for each unique set of people you collaborate with.

captbaritone5mo ago

Former classical singer here. Only theory I can come up with is that opera tends to have large casts where all the singers are credited individually which would inflate the absolute numbers of "artists" relative to other generes. I still struggle to imagine this accounting for bringing such a niche genera to the top here.

komali25mo ago

> Opera I thought was a very strict discipline, needing rigorous somewhat esoteric training in order to produce the right sounds. How could there be so many opera artists?

My guess is just the same opera performed by a ton of different orchestras, and perhaps the same orchestra for different recordings, times however many operas there are.

legacynl5mo ago

I'm assuming you don't know much about music then?

> do people get high and just start shoveling psy-trance tracks out or something?

Like with most art-forms, it's basically impossible to properly appreciate the art-form without having any context.

R68B245mo ago

I was suspicious of this too. I don't think "genres" table is correct.

On Spotify, Blue Öyster Cult are listed as: ['album rock', 'classic rock', 'glam metal', 'hard rock', 'progressive rock', 'rock'] In the archive, they are just coming up as ['classic rock', 'hard rock']

Grimes: ['art pop', 'canadian electropop', 'grave wave', 'indietronica', 'metropopolis', 'neo-synthpop'] In the archive: ['art pop']

Taylor Swift: ['pop'] In the archive... nothing.

1 more reply

gorbachev5mo ago

My guess is a large portion of the psytrance music is slop, whether AI or some other form of auto-generation.

1 more reply

iqandjoke5mo ago

That’s why Spotify would lose against Apple. Spotify may need to pay a fortune for this scraper behaviour while Apple Music does not.

simmo90005mo ago

We need insane for culture to survive.

7ero5mo ago

free the music

RickyLahey5mo ago

This will be great to train AI on.

rldjbpin5mo ago

the metadata alone is a staggering couple hundred gb, however it contains quite handy information to play with. consider the following:

> /audio-features/{id} "Get audio feature information for a single track identified by its unique Spotify ID."

this combined with track metadata can finally allow those motivated enough to create their own personalized shuffle. potentially better than the slop we get nowadays. no generative ai required*.

Varaldar5mo ago

im thinking about the consolidation around minute marks. its at every minute mark below 10 minutes, albeit dropping precipitously after 4 minutes. i have 2 guesses. guess one is that people like even numbers so if a track was already going to be within so many seconds of exactly a minute mark that they are more likely to push it to that number. with people caring less above 4 minutes because you are already making a long song, i could imagine caring less at that point. but my second guess is that along with the vast increase of ai slop posted to spotify both by spotify themselves and by other people, some of the programs they use probably fix on minute increments. like how a lot of ai videos are 10 seconds long or a series of 10 second videos. just a guess, however. i have no information or facts to back this up

verisimi5mo ago

Yes, but do they have the one that goes like: to-to-to dotodoo? Hmmm? Do they?

udoyxyz5mo ago

yo, this is insane!! why would anyone do that? I think it is for AI music generation models, like training them. Maybe ai labs people did it?? yeah that is likely

dbacar5mo ago

Now, anyone with some decent info on signal processing and machine learning can build his/her own Shazam.

shmerl5mo ago

Just buy music DRM-free in the first place.

snoozebutton5mo ago

is this not highly illegal?

MuffinFlavored5mo ago

At first I was thinking "ok maybe they only backed up artists who released under some kind of like... public open source music sharing license"

then I read deeper... I had never heard of Anna's Archive before. Feels similar to ThePirateBay2.0. Surprised they are so public about their crimes?

throw-12-165mo ago

I love coming to these threads to read the pearl clutching of "technologists" who suddenly care about IP and copyright law.

sma3in5mo ago

spotify undressed

bekindtoartists5mo ago

I’m hugely disappointed in Anna’s archive. As much as they believed they were doing this for good, they have now allowed bad faith actors to obtain all music for AI gen. This is just horrific for all artists out there who are fighting against so many issues that impact their creativity and sustainability. Why not just digest the data and not allow the music out there. As usual artists get fucked over.

asacrowflies5mo ago

Any serious player with ai training already had this data. This is just evening the playing field .

zoklet-enjoyer5mo ago

Wow. Now I just need some hard drives and a way to download that without my ISP doing something about it. That's amazing.

timcobb5mo ago

> and a way to download that without my ISP doing something about it.

what would your ISP do?

komali25mo ago

When I left my apartment back in 2018, I was switching the Comcast account over to my housemate who was staying on there. In doing so I discovered I had a myname2342@comcast.com email account. The UI showed something like 8,000 unread emails. Bemused, I opened it to see what kind of spam it had accumulated. None at all! It was just under 8,000 DMCA / torrent warning emails from Comcast itself. "We know you torrented The.Pokemon.Movie.2001.h264.mkv, you better stop that!"

A full year of these emails and nothing more than that ever happened.

(if you're wondering how I hit 8000 torrents, the answer is individual album torrents)

1 more reply

haryj5mo ago

wow

1dry5mo ago

Yuck. Just to make it easier to train slop machines. The point of art is not to have completionist archives of EVERYthing that’s ever been made! Let it die. Death is the most natural part of life. Art is about the human experience, not “for researchers”.

The point is human connection. Art is a living reflection and record of human experience. Art will persevere- the kinds of folks who prioritize what they like based on popularity were never the supporters artists (contrast with craftspeople trying to make a buck) counted on in the first place. Enjoy your derivative slop - we’ll continue on our imperfect, messy, individual, human artistic lives.

justatdotin5mo ago

I am having a lot of trouble following you. Something has upset you: what would make you feel better?

do you mean that researchers should be disallowed from accessing art?

I do not see how research interferes with all the benefits you prioritise. Can't you continue to enjoy those benefits?

Many people think 'real' music has electric guitars. I think they're wrong, but why argue with them? I think it's fine if you do not like music made from music, but that ship sailed last century. One detail you may be missing is that there are imperfect messy individual artistic humans who make music from music too. Computers are no more an obstacle to human connection through music than electric guitars are.

junon5mo ago

> I am having a lot of trouble following you. Something has upset you: what would make you feel better?

Don't talk to people like here, please. It's passive aggressive and unproductive. GP's comment was fine, if not a bit impassioned, regardless if you agree with it.

1 more reply

linhns5mo ago

Unlike books, which are massively overpriced, this will hurt artists a lot as they need the fees paid by Spotify to make ends meet.

Stagnant5mo ago

I don't think so. Streaming services are used for convenience. Torrenting and managing music at this scale is inconvenient.

Distributing these huge torrents is the perfect way to avoid any real damage to artists while being invaluable to preservation of culture.

themusicgod15mo ago

> this will hurt artists a lot as they need the fees paid by Spotify to make ends meet.

Anyone using DRM/paracopyright to "make their ends meet" deserves what they get. This is de facto theft from the public domain.

locusofself5mo ago

I hate spotify as a company but I agree, at least in my case, a large share of my wife's income comes from spotify.

1 more reply

j / k navigate · click thread line to collapse

701 comments

crazygringo5mo ago

This is insane.

I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

Aurornis5mo ago

> Definitely wondering if this was in response to desire from AI researchers/companies who wanted this stuff.

The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

jsheard5mo ago

> The Anna’s archive group is ideologically motivated. They’re definitely not doing this for AI companies.

5 more replies

cryzinger5mo ago

Sounds like one of these: https://krebsonsecurity.com/2025/11/is-your-android-tv-strea...

Probably not your problem to play tech support for these people and explain why being part of a botnet is bad, but mildly concerning nonetheless!

1 more reply

crazygringo5mo ago

> The Anna’s archive group is ideologically motivated.

Very interesting, thank you. So using this for AI will just be a side effect.

1 more reply

varenc5mo ago

Spotify is $12/month at most to get unlimited ad-free access to virtually all music.

I think Spotify wins over pirating because of its relatively low cost and convenience. Pirating TV/Movies have increased as the cost to access them has.

5 more replies

delusional5mo ago

> There are already tools to automatically locate and stream pirated TV and movie

Before we had spotify we had grooveshark. Streaming pirated content came first, and everything old is new again.

sneak5mo ago

They’re doing it for everyone, so, yes, they are doing it for AI companies.

wartywhoa235mo ago

> They’re definitely not doing this for AI companies.

So it's just yet another instance of enormous luck / annuit coeptis for the wealthy and powerful, then.

Such lucky bastards. Whatever happens, does so to their benefit, and all inconvenient questions about the nature of their luck automatically recede into the conspiracy theory domain.

And let's not forget that Anna's Archive is also the host to the world's largest pirate library of books and articles.

silcoon5mo ago

> The Anna’s archive group is ideologically motivated.

Anna’s archive business is stealing copyrighted content and selling access to it. It's not ideologically motivated.

3 more replies

shevy-java5mo ago

> I wouldn’t be so sure. There are already tools to automatically locate and stream pirated TV and movie content automatic and on demand.

gorbachev5mo ago

Flippant response: If it's ok for Meta for commercial use, why not for researchers for legitimate research work?

More serious response: research is explicitly included in fair use protections in US copyright law. News organizations regularly use leaked / stolen copyrighted material in investigative journalism.

zouhair5mo ago

Because the laws are there to protect people with money from people who don't have money.

VanTheBrand5mo ago

The metadata is probably more useful than the music files themselves arguably

vintermann5mo ago

Self-supplied metadata in music catalogs is notoriously shit. The degree to which most rights owners don't give a damn is telling.

cm20125mo ago

Especially since they scraped Spotify's popularity rating as well

1 more reply

zuspotirko5mo ago

Are you aware Annas Archive already solved the exact same problem with books?

1 more reply

thiht5mo ago

> this doesn't even seem particularly useful for average consumers/listeners

I can imagine this making it wayyy easier to build something like Lidarr but for individual tracks instead of albums.

IshKebab5mo ago

It's probably going to make the AI music generation problem worse anyway...

justatdotin5mo ago

I would expect more data to make ai music generation better

2 more replies

sowbug5mo ago

Can you imagine your favorite playlist needing to swap among 10 apps, each requiring a $10/month subscription?

fsckboy5mo ago

>The thing is, this doesn't even seem particularly useful for average consumer

it's an archive to defend against Spotify going away. Remember when Netflix had everything, and then that eroded and now you can only rely on stuff that Netflix produced itself?

the average consumer will flock when Spotify ultimately enshitifies

troupo5mo ago

Netflix didn't lose content by choice. Actual right holders decided to pull their content and create rival services.

Has nothing to do with perceived enshittification by Netflix (even though they have enshittification too).

Spotify is under the same threat: they have no content that they own. Everything is licensed.

4 more replies

raw_anon_11115mo ago

There was never a time that Netflix had the majority of popular movies on their streaming service.

1 more reply

basisword5mo ago

Didn't Meta already publicly admit they trained their current models on pirated content? They're too big to fail. I look forward to my music Slop.

VanTheBrand5mo ago

1 more reply

hugholousk5mo ago

Forgeties795mo ago

Just cite facebook getting busted training its AI on torrents proven to contain unlicensed material lol

stefan_5mo ago

reactordev5mo ago

>unless you have millions of accounts.

Challenge accepted…

This is probably how they did it, over time, was use a few thousand accounts and queued up all the things, and download everything over the course of a year.

1 more reply

troupo5mo ago

larodi5mo ago

firefax5mo ago

>I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

What's stopping someone from sticking a microphone next to their speaker?

Slow, but effective.

michaelmior5mo ago

> Slow, but effective.

I wouldn't call this very effective. It would take an impractically long amount of time to capture a meaningful fraction of the collection and quality would suffer greatly.

coppsilgold5mo ago

3 more replies

layman515mo ago

Audio fingerprinting?

1 more reply

dbalatero5mo ago

They'd probably do a shit job of capturing it?

thaumasiotes5mo ago

> I definitely was not aware Spotify DRM had been cracked to enable downloading at scale like this.

Do they have DRM at all? Youtube and Pandora don't.

Retr0id5mo ago

Spotify has DRM, and you can find open-source reimplementations of it on github.

Their native clients use a weak hand-rolled DRM scheme (which is where the ogg vorbis files come from), whereas the web player uses Widevine with AAC.

ale425mo ago

nsteel5mo ago

Mindwipe5mo ago

YouTube Music uses Widevine.

1 more reply

cm20125mo ago

This leak will also be really useful to bad actors who will resell the music from this list without paying royalties to the artists.

lkramer5mo ago

Which is how Spotify started... And is still carrying on. So nothing has changed.

2 more replies

cedws5mo ago

I just started DJing and something I quickly noticed is how garbage Spotify's music sounds compared to FLACs I've purchased. The max bitrate is very low.

2 more replies

hermanzegerman5mo ago

Spotify fucks over most artists anyway, so who cares?

2 more replies

chrneu5mo ago

this argument is so tired.

most artists dont really care about streaming or selling their music. most of their real money comes from touring, merch, and people somehow interacting with them.

most musicians just want to make music, express themselves, and connect with folks who enjoy their stuff or want to make music with em.

5 more replies

londons_explore5mo ago

> Spotify itself is so convenient, and trying to locate individual tracks in massive torrent files of presumably 10,000's of tracks each sounds horrible.

Download the lot to a big Nas and get Claude to write a little fronted with song search and auto playlist recommendations?

ccppurcell5mo ago

>The only thing is, you can't really publicly admit exactly what dataset you trained/tested on...?

Curious why not? Assuming you only used the metadata. I think they would be considered raw facts and not copyrightable.

madduci5mo ago

The first users of this dataset will be Big Tech corps. Meta, Alphabet, OpenAI, Microsoft, Apple will all be happy to use this dataset for training their LLMs.

For them, 300TB is just cheap

ipsum25mo ago

They already have this data. See jukebox from OpenAI, released before chatgpt.

1dry5mo ago

nimih5mo ago

What, precisely, is the point you’re trying to make here?

1 more reply

kachnuv_ocasek5mo ago

How does Spotify defend people who actually make art? There's virtually no difference between pirating and steaming through Spotify for the vast majority of artists.

1 more reply

1dry5mo ago

updated - thank you commenters for making it clear that my sentiment was not clear

fao_5mo ago

Spotify doesn't take care of artists, if you knew any artists you'd understand that Spotify is atrocious for people who make music.

robtherobber5mo ago

As a society, we should do our best to preserve this trove.

hkt5mo ago

Id be stunned if we didn't find out Anna's Archive is a front for a handful of shadier VCs who are into AI. Even if AA themselves don't know it and just take the cash.

shevy-java5mo ago

> The thing is, this doesn't even seem particularly useful for average consumers/listeners

Etheryte5mo ago

[0] https://en.wikipedia.org/wiki/What.CD

flxy5mo ago

sbarre5mo ago

> I remember finding an early EP of an unknown local band on there

So there was a clever trick that smaller artists did on what.cd: put up a really generous upload credit bounty for your own music, in order to sell digital copies.

I knew a few bands in Toronto who did this as a way to make sales.

No idea what the scale of this trick/scam (call it whatever) was but anecdotally I heard about it enough.

toast05mo ago

> There were also quite a few really old and niche records on there which possibly couldn't be put on streaming services due to the ownership of rights being unknown.

some-guy5mo ago

I’m still using the “successor” to what.cd and I usually discover artists through random lists, “related artists”, among other things on the platform.

One interesting way of discovering artists is finding an artist that I already like on a compilation CD, and then seeing what else is on the CD.

2 more replies

paraknight5mo ago

1 more reply

girvo5mo ago

Yeah, What.CD had a bunch of the local Brisbane post-rock bands from the 00s on there which was amazing to me. I at least have copies of a lot of their records!

MarcelOlsz5mo ago

email me please

VanTheBrand5mo ago

Etheryte5mo ago

2 more replies

leetbulb5mo ago

Yes. RIP a ton of very rare material. What.cd has a special place in my heart.

1 more reply

BoingBoomTschak5mo ago

Which also means almost always limited to the latest, almost always crappy (or blind to the original ambiance) remaster! One of the main reasons why I don't bother with streaming, really.

(And because they lack much obscure stuff and I don't like being dependent on the Internet and a renter's whims for something as essential as music, I guess)

1 more reply

tclancy5mo ago

rckclmbr5mo ago

chrneu5mo ago

>We just don’t have those kinds of communities for music online anymore

2 more replies

SSLy5mo ago

I mean, WCD has two healthy replacements, plus slsk

2 more replies

josteink5mo ago

> What.CD [0] was widely considered to be the music library of Alexandria, unparalleled in both its high quality standard and it's depth.

There was quantity, sure, but that was secondary to the quality. The quantity was just a side-effect of the place being known for quality, making it an attractive arena to participate in.

And it also had all the "weird"/non-standard things you don't find on mainstream streaming-services precisely because that is what independent curators are good at and often driven by.

This Anna's release... While in itself impressive in many ways does not compare to the things What.CD represented. It's almost the exact opposite:

- focus on most popular content - niche content (even by mainstream Spotify-standards) is not included

This is definitely Apples vs Oranges.

layer85mo ago

So there’s some way to go for a comprehensive music archive.

b85mo ago

Redacted, their replacement has more records then they had now.

rldjbpin5mo ago

about the scale, the same album in the tracker had several submissions, for dedicated format and regional editions.

while one can compare in terms of number of tracks, the quality used to be in another level altogether. from the article:

> The quality is the original OGG Vorbis at 160kbit/s.

now if hypothetically tidal had all the music of the world and was accessible this way, then it would be a comparable resource. insane regardless.

1 more reply

WadeGrimridge5mo ago

anna's rip has ~86m tracks, not ~186. ~186m is metadata, specifically ISRCs.

laughingcurve5mo ago

Wow, I have not thought about OiNK in ages... great memories! OiNK and WhatCD did something very special for the musical community

SSLy5mo ago

Well, what.cd counted any album as one torrent. While current spotify has also podcasts and AI slop.

virtualritz5mo ago

I didn't know German providers do this.

oarfish5mo ago

Yeah this is actually quite nefarious, as it is a private organization that decides what sites get blocked, with no legal oversight.

- https://de.wikipedia.org/wiki/Clearingstelle_Urheberrecht_im...

- https://netzpolitik.org/2024/cuii-liste-diese-websites-sperr...

Its a DNS based block, so overriding your default DNS server is enough to circumvent it. I think Dns over Https also works.

NoahZuniga5mo ago

Pretty sure this was a thing in the past, but that currently it has to be a court order.

1 more reply

croemer5mo ago

Alternative: https://archive.ph/2025.12.21-050644/https://annas-archive.l...

grumbelbart5mo ago

Someone compiled a list of blocked domains (by probing different DNS servers):

https://cuiiliste.de/

This is also how, for example, RT is blocked in Germany.

iknowstuff5mo ago

In that vein, I am trying to find out why searching for

    alextud popcorntime

which should trivially yield http://github.com/alextud/PopcornTimeTV results in anything but that one particular URL in every search engine: Google, Kagi, DuckDuckGo, Bing

They even find a fork of that particular repo, which in turn links back to it, but refuse to show the result I want. Have't found any DMCA notices. What is going on?

ticoombs5mo ago

They have marked the repo as noindex (or GitHub is forcing a noindex header).

Its returning a noindex flag so every serp is correctly doing what the repo has been asked.

That is... except for brave! I checked on my searx instance and it still showed up in brave's results

Mythli5mo ago

Try Yandex search, trust me later.

It has 0 censorship - regarding pirated content at least.

ZeWaka5mo ago

Very interesting. The security page does show up on kagi at #6.

I wonder if GitHub flags it to not be indexed or something.

polytely5mo ago

Also true in the Netherlands, I hate these copyright freaks constantly trying to restrict access.

junon5mo ago

Was also shocked to see that (Berlin, Telekom here).

sva_5mo ago

They also block some foreign "news" like Russia Today last time I checked.

mvkel5mo ago

This work is so critical.

Read an article that was published just 10 years ago, and witness the bit rot as most external links will 404, gone forever.

I think it's worth questioning the value of preserving -everything-, but it seems like if we can, we should.

larodi5mo ago

HN crowd is, of course, biased in the technocratic sense, but you see - everyone seems to actually rejoice the move.

The closest to remorse is `linhns` and `locusofself` expressing concern about artists getting hurt (not Spotify itself), but locusofself prefaces with "I hate spotify as a company but..."

(disclaimer: this text is NOT LLM generated, I wrote myself a summary of the summary. here's the Claude thread should anyone care https://claude.ai/share/cfc4ca63-2b9e-47ac-a360-202025d1a134)

mycall5mo ago

Are those 404 links available on web.archive.org?

bob10295mo ago

I recall many interesting tracks that were very aggressively deleted from all platforms in sync. I wonder if I could find them in this archive.

Making media effectively lost is not much different in my mind. Is it available if it's sitting on a tape in an iron mountain bunker that no one will ever look at again?

WD-425mo ago

Incredible.

> A while ago, we discovered a way to scrape Spotify at scale.

They wont and shouldn’t divulge the details, but I imagine that would be a fun read!

DUDOS5mo ago

How they manage to transfer 300TB of data while remaining anonymous is also astonishing.

tacker20005mo ago

I would guess this can be hidden under normal music streaming activity? But one would need lots of proxies!

eterm5mo ago

It's hard to imagine anything but physical egress for that kind of volume.

1 more reply

monerozcash5mo ago

Rent a dedicated server, setup mullvad wireguard on it or whatever. Download stuff to said server using wireguard.

Sure, you can also use Tor. The people engaged in copyright-related illegality generally don't.

1 more reply

NelsonMinar5mo ago

Perhaps they leased a botnet. https://krebsonsecurity.com/2025/10/aisuru-botnet-shifts-fro...

Thaxll5mo ago

I mean 300TB is nothing for a streaming service, like it woudn't even show on a dashboard. They probably did that over weeks which is invisible.

derkades5mo ago

https://codeberg.org/raphson/music-server/src/branch/main/sp...

KomoD5mo ago

Where the magic actually happens: https://github.com/librespot-org/librespot

2 more replies

bambax5mo ago

"at scale" could mean they had direct access to a server or to storage, maybe because they had an insider giving them access, or they found secrets that had leaked somewhere?

bmikaili5mo ago

they're probably just using something like https://github.com/nor-dee/spotizerr-spotify

WD-425mo ago

No way, that would take far too long.

bigyabai5mo ago

Probably not, those tools don't actually download Spotify tracks at source quality.

1 more reply

xandrius5mo ago

Truly amazing work. I couldn't help but being sad of the less popular songs not being currently stored, as those are definitely the ones more in risk of being lost forever.

squigz5mo ago

1 more reply

472828475mo ago

xandrius5mo ago

The main difference is that people can re-host and seed part of the data by offering space in their own servers.

If AA goes down, it's not the end of it all, a new one comes back up and the seeders are still there.

1 more reply

lukan5mo ago

"and all this will do is paint them a bigger target for takedowns/prosecution"

They are based in russia. And they currently do not work together so well with the west.

472828475mo ago

Your ISP is filtering DNS records. Easily fixed by changing DNS. It may even speed up your lookups, as most ISP DNS are slower than the large ones like quad1/8/9.

> They are based in russia.

“In 2016, for example, the Moscow City Court (Mosgorsud) granted more than 700 requests to protect intellectual property.” https://www.group-ib.com/blog/torrents/

etc

There are plenty of Russian music labels. Big book publishers? Not so much. Some sites explicitly ban content from the hosting country to try and avoid that. Not the case here.

flexagoon5mo ago

> They are based in russia.

Are you sure? I don't think they are, from what I've seen

computergert5mo ago

Trump threatened the EU to tax Spotify (and others) just this week. So it doesn’t look like Trump would be happy to help Spotify out, though in exchange for money he’ll probably change his mind.

shevy-java5mo ago

p0w3n3d5mo ago

That's why I divide music to the one that I want to have forever - I buy it on CDs - and dance music that I can live without one day

eightys3v3n5mo ago

I really appreciate platforms that still show the titles and metadada after something is removed. Then at least I can go find it again to maintain my collection. Tidal does this.

yegle5mo ago

uhfraid5mo ago

spotify used to do just that (stream p2p) until 2014 or so

https://www.scribd.com/document/56651812/kreitz-spotify-kth1...

zanderz5mo ago

johanyc5mo ago

https://www.csc.kth.se/~gkreitz/spotify/kreitz-spotify_kth11...

KTH link is better than scribd for downloading. though academic links are sometimes prone to link rot.

willio585mo ago

I do hope one day self-hosting music with an extremely easy setup with torrenting for sourcing is set up again. What I’m talking about exists to some extent, but it’s not trivial for most people.

justatdotin5mo ago

for me its the arms trade.

Daniel Ek pours spotify wealth into next gen miltech.

sometimes I worry that I don't know what music means to other people but I am certain that to me it is antithetical to war culture.

2 more replies

rasmus-kirk5mo ago

woile5mo ago

pjerem5mo ago

Yeah we shouldn’t. But we may.

nness5mo ago

a la "Popcorn Time."

yoan92245mo ago

  I've always found it interesting how streaming services have become the de facto music library of record, yet they can and do remove content at will. When Spotify pulled out of Russia, entire catalogs became inaccessible. Physical media and personal archives suddenly matter again in ways we thought were obsolete.

  The copyright discussion is complex, but from a pure preservation standpoint, I'm glad someone is doing this work.

gorbachev5mo ago

Quoting from their page:

--------------

If they truly are on a mission to protect world's information from disappearing, they should work with MusicBrainz to get this data on it.

Alternatively, it would be amazing, if they built a MusicBrainz like service around it.

aerozol5mo ago

2 more replies

472828475mo ago

> n either case, to make the data truly useful, they'd need to solve the problem on how to match the metadata to a fingerprint used to identify the music tracks

How is that a problem?

    for each track in collection do extract_fingerprint

djfergus5mo ago

Anna’s Archive has largely flown under the radar by focusing on books.

Even perceived involvement in music piracy puts a much bigger target on their back from far more aggressive actors (RIAA, major labels)

pmdr5mo ago

reassess_blind5mo ago

“Good luck, we don’t care.” is their stance, as far as I can tell.

yellow_lead5mo ago

Is the music torrent not up yet? Only see the metadata one here: https://annas-archive.li/torrents/spotify

artninja19885mo ago

Yeah, in the article they write:

The data will be released in different stages on our Torrents page:

[X] Metadata (Dec 2025)

[ ] Music files (releasing in order of popularity)

[ ] Additional file metadata (torrent paths and checksums)

[ ] Album art

[ ] .zstdpatch files (to reconstruct original files before we added embedded metadata)

yellow_lead5mo ago

Oh I see, thanks! I missed that

ZeWaka5mo ago

Since the article asks:

> We're curious about the peaks at whole minutes (particularly 2:00, 3:00, 4:00). If you know why this is, please let us know!

As a hobby video/audio editor, people will start with their track taking up a preset amount and fill up the time - even if it means having some dead space at the end.

The other alternative is algorithmically created music.

nemomarx5mo ago

So you might see a lot of anchoring just like YouTube videos kept stretching to almost exactly ten minutes?

syntaxing5mo ago

Moral and legal discussion aside, this is technically very impressive. I also wouldn’t be surprised if this somehow kickstarts open source music generative AI from China.

robotbikes5mo ago

This already exists and is interesting to play around with - https://github.com/ASLP-lab/DiffRhythm

frereubu5mo ago

Site is down for me. Archive link: https://archive.is/jf3HW

mawax5mo ago

Probably not down, but blocked by your ISP. Try a VPN. Same thing happens here.

lukan5mo ago

Yes, blocked. This is what I see in germany without a VPN

https://notice.cuii.info/

"Their buisness model is based on copyright infringement"

Well, where to complain that Anna's Archive ain't a buisness?

3 more replies

ipsum25mo ago

Ironic. But its working for me.

tristanc5mo ago

nighthawk4545mo ago

Amazing! I wonder if the Every Noise At Once[1] site could be updated with the metadata from this?

[1] https://everynoise.com/

iggldiggl5mo ago

Thanks for linking that page, interesting rabbit hole that I hadn't heard about until today…

Fizzadar5mo ago

I have Spotify premium but the constant shuffle of content availability has meant I’ve stared routinely archiving my liked songs to avoid any rug pull. Zspotify and co still work a charm.

throwaway6137455mo ago

I wonder how deep the hole they're gonna put whoever runs this site into is gonna be?

urbandw311er5mo ago

I heard they’re based in Russia so one assumes they probably will be welcomed by the current government (or even aided) rather than prosecuted.

frytaped5mo ago

peterburkimsher5mo ago

For a fully-legal alternative of metadata archiving, I suggest the iTunes EPF (Enterprise Partner Feed). https://performance-partners.apple.com/epf

The best metadata I've found, though, is the MySpace Dragon Hoard: https://archive.org/details/myspace_dragon_hoard_2010

Meanwhile, if you're interested in the genre-by-country MySpace data, or have questions about the iTunes EPF, feel free to reach out and we can discuss your research.

squigz5mo ago

I would guess that combining these sources, along with info from MusicBrainz, would help quite a bit? Still, I'm rather surprised Spotify doesn't provide more information about artists.

realdeal794mo ago

With the MySpace stuff, where are you seeing the metadata? All of the the zips I’ve downloaded from the Dragon Hoard don’t have any metadata.

1 more reply

o_____________o5mo ago

> Please note that Apple Music and iTunes Music data will be migrating away from the Enterprise Partner Feed (EPF). Starting July 16, 2024

zzzeek5mo ago

DoctorOetker5mo ago

I'd rather see them use AI to convert all the scanned scientific articles into proper PDF or other formats.

How compact can we get the collective human scientific corpus?

vlaaad5mo ago

xyzzy_plugh5mo ago

RGamma5mo ago

Since relatively recently I'm getting AI music in my automatic radio. They look/sound like soulless facsimiles of the real thing.

1 more reply

davsti45mo ago

Really? How about asking google to "play bloomberg news on spotify" next time. Then see if you can remove the resulting chaos from your history so it won't start feeding you slop.

gck15mo ago

When they launched Discover Weekly thing, I used to add at least 1 track from it to my library - it was insanely good. Now it's all junk - not even close to what I listen to.

There hasn't been a change in Spotify in last 7 years or so that wasn't negative.

layer85mo ago

YouTube Music works pretty well for me. One great feature is that it includes not just a commercial music streaming catalog, but all user uploads of music on YouTube.

komali25mo ago

nickthegreek5mo ago

eastbound5mo ago

This is more frequent than you would assume. I’ve neither subscribed to Apple Music nor Spotify for this exact reason: I’m a millenial who would like to discover music.

wintermutestwin5mo ago

Why do you want a megacorp to tell you what to listen to!?? There are a million ways to do discovery where some enshitified corp isn’t incentivized to push something at you.

1 more reply

venturecruelty5mo ago

Why haven't you unsubscribed then?

sneak5mo ago

199GB, only metadata released for now.

Magnet link found here: https://annas-archive.li/torrents/spotify

Are magnet links allowed on HN?

cranberryturkey5mo ago

that is only 199gb, the real one is 300TB

acjohnson555mo ago

If I were to do it today, I could get so much farther with hyperscaler products and this dataset.

bguberfain5mo ago

We can finally search for playlists with a giving song! A basic feature that Spotify is missing!

ipsum25mo ago

Can someone explain why C#/Db (major/minor) is the third most popular key? Very unexpected for me, since its relatively more difficult to play.

ghostie_plz5mo ago

Anecdotally, I know a few vocalists that sound great in these keys and use them as a starting point

thaumasiotes5mo ago

> Both C#m and Db can be played on piano using only the black keys (skipping the 3rd note of the scale)

For the major scale, there are 7 notes in the scale and only 5 black keys; you also need to skip ti, the 7th note.

For the minor scale ("C#m"), it's worse; only four of the five black keys are part of that scale.

And I would have thought that something intended to be played only on the black keys would be described as using a pentatonic scale anyway?

1 more reply

adzm5mo ago

For electronic music, it's around the lowest bass root note that most systems can play well without a subwoofer. C pretty much requires a sub and things rarely go lower than that.

kzrdude5mo ago

ruuda5mo ago

klysm5mo ago

Difficult to play in what instrument?

yurishimo5mo ago

C# I don’t believe was/is a common tuning for most western instruments, classical or modern.

A digital piano can transpose things to make it “easier” to play.

Cursory google search says that a sitar is traditionally tuned to something useful for c#

1 more reply

RickyLahey5mo ago

i believe the most popular reason is capo on 1st fret when writing songs, other factors coming 2nd or 3rd (electronic music, sped up old samples, etc)

nmz5mo ago

This might be the perfect time to do archiving before the entire internet gets inundated by sub-par AI generated content.

635mo ago

urbandw311er5mo ago

They can’t be touched by the music industry they’re based in Russia.

userbinator5mo ago

Music files (releasing in order of popularity)

Increasing or decreasing? IMHO increasing would make more sense, as the most popular music is already mirrored in countless other places. It's the rare stuff that is most in need of preservation.

reassess_blind5mo ago

Same model and same prompt won’t necessarily create the same result, unless I misunderstand how these audio models work.

squigz5mo ago

It's possible to generate the same images and text from LMs by tweaking the settings, right? Are audio models different?

1 more reply

Motorbytes5mo ago

Does the Spotify backup contain any so far grayed out or unavailable songs on their list?

any response is appreciated!

pekkag5mo ago

junon5mo ago

TIL Anna's Archive is blocked in Germany (by a rather obtrusive MitM, I might add). Get redirected to a "Copyright Clearing House" or something.

tjoff5mo ago

I just want to be able to backup my playlists. Maybe thats possible but last time I looked I could only find sites that wanted your login, not gonna happen.

lelandfe5mo ago

https://developer.spotify.com/documentation/web-api/referenc...

I bet you can whip up a super simple script with an LLM to do this!

Spivak5mo ago

Not that using the Spotify API directly is all that hard but the spotipy library makes it even easier.

hn1115mo ago

This works nicely: https://github.com/spotDL/spotify-downloader

Eckter25mo ago

There are a few tools that can export your spotify playlists into folders of audio files. That's what I used a few years ago for my initial spotify -> navidrome migration.

crazygringo5mo ago

This is where ChatGPT shines. Just ask it to write you a script, it'll give you all the instructions.

I've used ChatGPT to write a whole bunch of playlist logic scripts (e.g. create a playlist that takes tracks from playlists A, B and C, but exclude tracks in playlist D.)

venturecruelty5mo ago

You can also do this with your human brain, which doesn't require 1 MWh or a thousand gallons of water to write a script to pipe API results to jq.

1 more reply

emsixteen5mo ago

I worry about potential bans from scraping files through this sort of thing.

1 more reply

emsixteen5mo ago

Exactly the same here, I just wanna back up my playlists and liked songs, in an organised and tagged manner, at a non-potato quality.

aftbit5mo ago

Has anyone tried to add up the track file size from the metadata dump?

In spotify_clean_track_files.sqlite3:

    SELECT count(*), sum(filesize_bytes) FROM track_files;

    255966403|15970064861274

That's only 14.5 TiB, nowhere near 300 TiB. What makes up the other 285 TiB of content?

squigz5mo ago

That's curious and changes things pretty dramatically. It's a lot easier to host 15TB than 300. I wonder what's up here.

TheAceOfHearts5mo ago

I'm a bit sad that they chose to focus on music rather than audiobooks. Creating an archive of audiobooks seem like it would be more aligned with their mission.

TechSquidTV5mo ago

The metadata is gold, but I was immediately curious why why wouldnt go for Tidal first. Though what ever they have on Spotify I think is unique.

Jumpmanlives5mo ago

Good stuff Anna's Archive. The Anchormen, premium sea shanty crew from Western Australia, officially endorsed you sharing our salty tunes. https://www.facebook.com/theanchormenwa/

https://open.spotify.com/album/07IyzOA9jJWPZcLDysQwpo?si=KZO...

xnx5mo ago

Merry Christmas!

Mr_Minderbinder5mo ago

> Over-focus on the highest possible quality

> ...(e.g. lossless FLAC). This inflates the file size...

As an example of how much information celluloid can contain see: https://vimeo.com/89784677 (context: he is comparing a Blu-ray and a scan of a 35mm print)

HawkEyeSpaceMan4mo ago

Not worth the risk imo. This might backfire at some point and ruin a good thing with the book libraries.

Kerollmops5mo ago

romanovcode5mo ago

`spotdl download "https://open.spotify.com/user/{username}" --user-auth --output '{list-name}/{title} - {artists}.{output-ext}'`

This is literally all you need to back up Spotify.

Philpax5mo ago

spotdl downloads from YouTube, not Spotify, afaik

new_hair4mo ago

Rookie Question, but how do i access all this metadata especially in a cleaner way, or genre-wise for my project development.

lelouch90995mo ago

How legal is this with regards to copyright laws?

Aurornis5mo ago

Not legal. This group does not concern themselves with copyright law.

chrneu5mo ago

they do concern themselves with it, but in a "calling it out for being shit" kind of way.

toomuchtodo5mo ago

Adherence to the legal framework is a function of your risk appetite.

luke-stanley5mo ago

ronsor5mo ago

Very, if we delete copyright like we're supposed to.

phainopepla25mo ago

Not legal

layer85mo ago

Completely illegal.

sneak5mo ago

The metadata scrape might not be.

1 more reply

basisword5mo ago

It's not. It's awful people justifying awful behaviour. And it's why we can't have nice things. There are always assholes ready to exploit others.

jopicornell5mo ago

Monopoly is not a nice thing. Maybe it is convenient, but not nice.

nemomarx5mo ago

There's some irony here considering Spotify used pirated mp3s at the start of their operations, I suppose.

poly2it5mo ago

venturecruelty5mo ago

You're talking about Spotify, right? Famously started by ad execs pirating music and then selling it.

conception5mo ago

Are you talking about Spotify here…?

chrneu5mo ago

lol is this comedy? Cuz it's absolutely hilarious opposite humor.

rireads5mo ago

You must be the Spotify CEO, lol

Yeri5mo ago

wow. Blocked in Belgium.

Error HTTP 451 - Unavailable For Legal Reasons

https://lumendatabase.org/notices/71398835

krackers5mo ago

New multimodal training set just dropped.

pranavm275mo ago

Miss anna, next time please scale down image dimensions so that us on mobile can read properly haha

Jokes aside, I always thought the best way to deal with piracy was to understand or convince the demand not to do it over dealing with the supply.

krick5mo ago

pjerem5mo ago

BitTorrent protocol doesn’t force you to download all of the files of a torrent :)

Now imagine a dedicated music client that will download and stream (and share, because we are polite) only the needed files :)

Spivak5mo ago

dmicah5mo ago

For reference, considering you can purchase a 12-month Spotify Premium subscription via a $99 gift card at the moment, that same $6k could be used for 60 years of Spotify Premium.

1 more reply

sneak5mo ago

I have a Supermicro 24 bay 2U in my house with an array around half that size in it. It’s not prohibitive.

emsixteen5mo ago

The cost of rest of the hardware, running it constantly, and 'admin' overheads aren't to be scoffed at to be fair.

chrneu5mo ago

think popcorn time for mp3s/flac instead of mp4.

I don't really see why it wouldn't, from an end user perspective, be any different than a self hosted jellyfin or plexamp.

killingtime745mo ago

You can download torrents selectively. I think if they adopted that cautious attitude they wouldn't exist in the first place

Gander57395mo ago

Anna's archive mirrors z-lib and libgen, so those are the main alternatives. But it's unlikely anna's archive would go down so easily, they take a lot of precautions.

krick5mo ago

Oh, I was somehow under impression that libgen is no more. Glad to see it's not. I guess it was just a different domain.

machloof5mo ago

Aldipower5mo ago

Oh, just noticed my provider "Vodafone Germany" is blocking the domain annas-archive.li on DNS level.

puffpuff123455mo ago

Amazing!

Is there any way to search this spotify database without downloading the currently available metadata torrent?

Uninen5mo ago

I hope someone builds an open API around this metadata. I'd love to have alternatives to the big player APIs.

tolerance5mo ago

I am not enthused by this news. Let us entertain the possibility that similar institutions will eschew this catalog.

soundsgoodman5mo ago

You need to seriously re-think this...

Releasing indie music, like really low-level indie music, for free in the name of "preservation" is so misguided.

Don't do this. You will only end up hurting the artists who rely on paid downloads.

thenthenthen5mo ago

Full circle! Thank you! (https://torrentfreak.com/how-the-pirate-bay-helped-spotify-b...)

performative5mo ago

ewzimm5mo ago

The data analysis here is interesting. One thing that stood out to me is that black metal is the 6th most common musical genre for bands, right after rockabilly. I would never have expected that.

htx80nerd5mo ago

>Over-focus on the most popular artists. There is a long tail of music which only gets preserved when a single person cares enough to share it. And such files are often poorly seeded.

There is a ton of good bands with under 10k or even 1k monthly listeners.

walthamstow5mo ago

Very interesting that a white noise track for babies is the 4th most popular track on Spotify.

cluckindan5mo ago

zarzavat5mo ago

White noise isn't copyrightable.

1 more reply

al_borland5mo ago

Relying on an external hosted service would never cross my mind, and surely wouldn’t be something I go to on a daily basis.

komali25mo ago

e.g. https://www.youtube.com/@Ask.the.Teacher

"Independent Reading: Count Up Timer for Classrooms": https://www.youtube.com/watch?v=AfLfJtVeME8 straight up just stock imagery and a timer lol

junon5mo ago

It's not odd if you aren't the type who frequents hacker news. We are, after all, very much in a bubble here.

ThinkBeat5mo ago

Can this last?

I envision an army of lawyers and cyber security companies being prepared to unleash a scorched earth campaign that book publishers might want to be part of as well.

At the end it may take down more than just this publication but most others as well.

meysamazad5mo ago

I wonder if Spotify will pursue any legal actions to take this archive or the site down!

artninja19885mo ago

Wow. Anna is a godsend. Hopefully now we get some really good open source music models

brcmthrowaway5mo ago

First we need good stem splitting

artninja19885mo ago

What do you think about the recent SAM audio model by meta? https://ai.meta.com/blog/sam-audio/

1 more reply

baxuz5mo ago

> The quality is the original OGG Vorbis at 160kbit/s.

Yeah, the original quality is either a 320kbps OGG or lossless. Not 160.

While this is _a_ backup, it's a pretty lossy one.

none149885mo ago

Downloading of individual files to Anna’s Archive Please

haghiri755mo ago

I guess having an API to do search on metadata may be cool. Anyone thought of that?

schmuckonwheels5mo ago

I want to time-travel back to 2000 like Old Biff with the sports almanac so I can tell Shawn Fanning to use the "it's for historical preservation" defense.

nutjob25mo ago

I wonder how definitive their collection is and how much ripping Google Music/YouTube would improve on this.

A distributed ripping project to do that would be a fine thing.

damnitbuilds5mo ago

Well done !

Until we have reasonable copyright terms, Pirate On !

hmokiguess5mo ago

What an early christmas gift for humanity. Now, asking for a friend, what's the ideal setup for torrenting this? Mullvad / Tailscale?

lanalanabobana5mo ago

these guys are 100% selling that data to "AI" companies for thousands of dollars so the internet and world at large can get a little more shitty. awesome -_-

gorbachev5mo ago

I want to peek in that metadata collection to see if it could be used to identify the AI slop that's infecting Spotify.

If you could identify a track supposedly by artist X was actually AI slop not created by artist X, you could use that information to skip tracks on (web) music players, for example.

shomp5mo ago

If only Spotify paid musicians their fair share

wartywhoa235mo ago

https://annas-archive.li/llm

markstos5mo ago

> ≥70% of songs are ones almost no one ever listens to (stream count < 1000).

So much interesting but undiscovered music is out there!

halperter5mo ago

_vqpz5mo ago

dmix5mo ago

I hope they get the new lossless versions

thih95mo ago

This is conspiracy theory territory but I wonder if big tech is sponsoring efforts like this as an easy way to get training data.

littlecranky675mo ago

For some reason, the link does not work for me (spain). Works perfect at the same time in tor browser.

fungonimus5mo ago

I would like a downloader! :D this is such an awesome project

m00dy5mo ago

Congrats! I’m sure the Spotify lawyers are gonna have some sleepless nights ahead.

827a5mo ago

Holy crap. This is going to trigger a five-alarm fire at Spotify Engineering. This has got to be among the largest proprietary datasets ever unintentionally publicized by a company.

rightbyte5mo ago

Wasn't all data available to users though?

cm20125mo ago

Yes but very hard to scrape in bulk from user accounts

okokwhatever5mo ago

Who cares now, it's already downloaded and ready to be torrented... God is good

potwinkle5mo ago

gverrilla5mo ago

GREAT DAY

none149885mo ago

Downloading of individual files to Anna’s Archive Please!

eastoncrafter5mo ago

Plans to upload all this to musicbrainz soundid program?

eastoncrafter5mo ago

Plans to upload all of this to music brainz soundid?

msephton5mo ago

Is this all regions? I'm assuming so but I can't be sure

marstall5mo ago

the top 10,000 songs seem to be 99.9% top-40 corporate pop, which suprised me. thought a list that broad would pick up more that was outside the maintream ...

squigz5mo ago

10,000 sounds like a lot, but it really isn't. Even my own personal music collection - which isn't all that impressive - is nearly 20,000 tracks.

reactordev5mo ago

Oh this is going to go over real well in Nashville, TN.

gyrgtyn5mo ago

is there a torrent client already that is be good at partial downloads? I didn't realize how popcorn time worked until I read this thread.

kccqzy5mo ago

All torrent clients must necessarily support partial downloads because of the nature of torrents. The files are split into pieces which are downloaded and then assembled by the torrent client.

flexagoon5mo ago

"Partial downloads" in the context of torrenting usually refers to downloading specific files from a torrent

BaudouinVH5mo ago

error 451 https://postimg.cc/QFddnW41

siquick5mo ago

Is there a way to see the shape of the metadata?

rendaw5mo ago

Looking at the analysis, I'm totally surprised opera and psytrance are so prolific.

Psy-trance... I thought it was the same as any other electronic genres, but do people get high and just start shoveling psy-trance tracks out or something?

Opera I thought was a very strict discipline, needing rigorous somewhat esoteric training in order to produce the right sounds. How could there be so many opera artists?

captbaritone5mo ago

komali25mo ago

> Opera I thought was a very strict discipline, needing rigorous somewhat esoteric training in order to produce the right sounds. How could there be so many opera artists?

My guess is just the same opera performed by a ton of different orchestras, and perhaps the same orchestra for different recordings, times however many operas there are.

legacynl5mo ago

I'm assuming you don't know much about music then?

> do people get high and just start shoveling psy-trance tracks out or something?

Like with most art-forms, it's basically impossible to properly appreciate the art-form without having any context.

R68B245mo ago

I was suspicious of this too. I don't think "genres" table is correct.

Grimes: ['art pop', 'canadian electropop', 'grave wave', 'indietronica', 'metropopolis', 'neo-synthpop'] In the archive: ['art pop']

Taylor Swift: ['pop'] In the archive... nothing.

1 more reply

gorbachev5mo ago

My guess is a large portion of the psytrance music is slop, whether AI or some other form of auto-generation.

1 more reply

iqandjoke5mo ago

That’s why Spotify would lose against Apple. Spotify may need to pay a fortune for this scraper behaviour while Apple Music does not.

simmo90005mo ago

We need insane for culture to survive.

7ero5mo ago

free the music

RickyLahey5mo ago

This will be great to train AI on.

rldjbpin5mo ago

the metadata alone is a staggering couple hundred gb, however it contains quite handy information to play with. consider the following:

> /audio-features/{id} "Get audio feature information for a single track identified by its unique Spotify ID."

this combined with track metadata can finally allow those motivated enough to create their own personalized shuffle. potentially better than the slop we get nowadays. no generative ai required*.

Varaldar5mo ago

verisimi5mo ago

Yes, but do they have the one that goes like: to-to-to dotodoo? Hmmm? Do they?

udoyxyz5mo ago

yo, this is insane!! why would anyone do that? I think it is for AI music generation models, like training them. Maybe ai labs people did it?? yeah that is likely

dbacar5mo ago

Now, anyone with some decent info on signal processing and machine learning can build his/her own Shazam.

shmerl5mo ago

Just buy music DRM-free in the first place.

snoozebutton5mo ago

is this not highly illegal?

MuffinFlavored5mo ago

At first I was thinking "ok maybe they only backed up artists who released under some kind of like... public open source music sharing license"

then I read deeper... I had never heard of Anna's Archive before. Feels similar to ThePirateBay2.0. Surprised they are so public about their crimes?

throw-12-165mo ago

I love coming to these threads to read the pearl clutching of "technologists" who suddenly care about IP and copyright law.

sma3in5mo ago

spotify undressed

bekindtoartists5mo ago

asacrowflies5mo ago

Any serious player with ai training already had this data. This is just evening the playing field .

zoklet-enjoyer5mo ago

Wow. Now I just need some hard drives and a way to download that without my ISP doing something about it. That's amazing.

timcobb5mo ago

> and a way to download that without my ISP doing something about it.

what would your ISP do?

komali25mo ago

A full year of these emails and nothing more than that ever happened.

(if you're wondering how I hit 8000 torrents, the answer is individual album torrents)

1 more reply

haryj5mo ago

wow

1dry5mo ago

justatdotin5mo ago

I am having a lot of trouble following you. Something has upset you: what would make you feel better?

do you mean that researchers should be disallowed from accessing art?

I do not see how research interferes with all the benefits you prioritise. Can't you continue to enjoy those benefits?

junon5mo ago

> I am having a lot of trouble following you. Something has upset you: what would make you feel better?

Don't talk to people like here, please. It's passive aggressive and unproductive. GP's comment was fine, if not a bit impassioned, regardless if you agree with it.

1 more reply

linhns5mo ago

Unlike books, which are massively overpriced, this will hurt artists a lot as they need the fees paid by Spotify to make ends meet.

Stagnant5mo ago

I don't think so. Streaming services are used for convenience. Torrenting and managing music at this scale is inconvenient.

Distributing these huge torrents is the perfect way to avoid any real damage to artists while being invaluable to preservation of culture.

themusicgod15mo ago

> this will hurt artists a lot as they need the fees paid by Spotify to make ends meet.

Anyone using DRM/paracopyright to "make their ends meet" deserves what they get. This is de facto theft from the public domain.

locusofself5mo ago

I hate spotify as a company but I agree, at least in my case, a large share of my wife's income comes from spotify.

1 more reply

j / k navigate · click thread line to collapse