Origins of the youtube-dl project (opens in new tab)

(rg3.name)

559 pointsrg35y ago64 comments

64 comments

I don't know if this is related or not but back when Flickr was popular in like 2005 or 2006 my friends and I were uploading pictures of our events there (they're all still there AFAIK. We'd upload them to either our own accounts or for certain events to a shared account. One shared account is here: https://www.flickr.com/photos/sourpower/albums

I wanted copies of those pictures and the easiest way to get them was the write a tool to download them rather than have to coordinate with 3 to 15 friends and ask them to copy the images to a CD or USB stick or some other nonsense. Dropbox wasn't a thing and not all my friends were tech heads that would want to setup FTP servers.

Flickr had also come out with an API. APIs for online services seemed kind of new at that point and Flickr was one of the first AFAIK.

So I wrote the app https://blog.greggman.com/blog/flickrdown/ and a few months later it was accused by other users of flickr of being solely for the purpose of downloading copywritten images. Not once did I ever use it for such a purpose nor, AFAIK did any of my friends. None of us had any interest in other people's images on flickr, only shared images of mutually attended parties, bbq, picnics, events.

Those users reported the app to Flickr and the app was banned.

It was banned by the app's id. That meant you could register your own app and then hack in your app's id and still use it. IIRC I continued to use it to download pictures from our events but it always pissed me off they banned it. It also pissed me off because it wasn't accessing anything you couldn't just scrape for. The API made it easy to get a list of URLs, search for albums or people etc but you could easily write a script that just scraped the HTML to find all the same data. Didn't matter, flickr didn't budge.

It further pissed me off that over zealous flickr members accused me of lying about its purpose. Like many topics today, there is often absolutely nothing you can say that will convince someone else your intensions are not bad.

1vuio0pswjnm75y ago

This story clearly illustrates the purpose behind "web APIs". To limit access.

As a user (not a web developer), I personally never saw the practical point of web APIs; I have always just "scraped the HTML". Many times the solutions I write outlive the corresponding "API"; IME, often the non-API method of data retrieval is more robust and reliable than using the so-called API.

YouTube used to have a freely accessible search API. Not anymore. However "scraping" the YT search result pages continues to work fine.

colejohnson665y ago

The purpose of APIs it to provide a “uniform” interface. HTML layouts can change. JavaScript could be added to download the images after page load. An API shouldn’t change as often. And if the API is “versioned”, you can usually use the old version (old HTML layout) for a while before you upgrade (compared to your tool breaking as soon as the HTML changes).

dreamcompiler5y ago

The purpose of an API is like a company mission statement: There's one version written on the wall and then there's the actual version everybody knows is true but they don't say it out loud.

You described the written one above. GP described the actual one.

account425y ago

That is the promise of APIs, but there is no guarantee.

mrmonkeyman5y ago

In theory, yes. In practice, no.

throw0101a5y ago

> YouTube used to have a freely accessible search API.

Twitter used to have RSS/Atom feeds for each account so you could follow someone without a client, just a regular old news aggregator.

boogies5y ago

AFAIK they still do, just without any way to find it except by digging into the channel page’s HTML to find the channel_id and then constructing the feed URL from that (“https://www.youtube.com/feeds/videos.xml?channel_id=$channel...) — or (edit) using something like https://github.com/rss-bridge/rss-bridge that presumably does something like that under the hood — so I guess scraping for an undocumented API.

2 more replies

matheusmoreira5y ago

> Like many topics today, there is often absolutely nothing you can say that will convince someone else your intensions are not bad.

Sometimes it's better to not even try. Acting is always the more powerful move. Just do your thing. You wrote an awesome program that downloads stuff, they got offended and banned it. You've already accomplished your goal so you let it go... But if you cared enough you could just write a scraper for the website itself. What are they gonna do about it?

In an ideal world, downloading copywritten data would be so easy and ubiquitous that the intellectual property laws would be unenforceable. Sure, they would get mad but who cares? There's very little they can do about it.

intricatedetail5y ago

Funnily enough IP laws do little to protect authors, it's more like a framework to make money for big organisations who pretend they protect authors' interests.

Mizza5y ago

I have a very similar story!

I maintain a similar project for SoundCloud called SoundScrape: https://github.com/Miserlou/soundscrape which I started for a similar reason, to save my own 'likes' and tracks that I've made and my friends have made.

SoundCloud made this very easy, as they had an API which exposed the endpoint MP3/WAV location in a field. The tool used an API key provided by SoundCloud to fetch the response.

Overnight and without warning, they removed that field from responses, changed the terms of service, banned my application for terms of service violations, and deleted all of my personal music and likes because I had used my own account to create the API key.

I was very angry at the time since all my my music got deleted, but these days I'm just sad. Things like this have little by little destroyed all of my enthusiasm for technology.

I want to be a carpenter now.

RonanTheGrey5y ago

> I was very angry at the time since all my my music got deleted, but these days I'm just sad. Things like this have little by little destroyed all of my enthusiasm for technology. I want to be a carpenter now.

When I was a kid, I watched with wonder and delight at all the amazing things we were doing and inventing. We were poor so I didn't get much tech, but I always marveled at it. I saved up for ages to buy one of those personal organizers that were all the rage in the mid 90's. I saw myself as part of a world working together to build a brighter future for all of us, and I think I wasn't alone in that.

It is true that the seeds of many issues we face today were already firmly planted in those days and that it was unrealistic. But there is value in dreams because dreams tell us who we want to be and give us the engine to get there.

Today, I couldn't care less about tech, actively see it as a negative influence on human life, and understand the Amish and Luddites alot better.

I'm currently planning to build a farm out in the middle of nowhere and then Apple can have its fiefdom, Google can own everyone's data and control what people think, they all can tell people what they're allowed to say as our new overlords and I just won't care at all.

intricatedetail5y ago

Did they at least provide an API that allowed to download your own account? I think they should have such feature thanks to GDPR.

michaelmior5y ago

While I can understand your frustration, I also understand the frustration of photographers and other creators trying to make a living who have their work stolen. Their anger was wrongly directed at you, but it is a real problem.

greggman35y ago

While I agree with you that photographers are hurt by having their photos copied I'm not convinced my app was a problem. I personally doubt anyone was using it to bulk download copywritten photos and somehow use those photos. I could be wrong but I believe most people that steal photos still them a few at time. Basically they're making some article, they need a related photo, they search, grab a 2 to 5 and stick them in their article. They'd do this with the browser, not my app.

My app didn't let you search by keyword, only by user. Further it didn't remove watermarks or do anything else. If photographers put their photos on flickr and they care about stealing they usually both watermark and only put relatively low-res versions there and you have to pay them for high res versions.

If flickr provided logs that showed bulk download and further some proof that even with bulk download that it was actually affecting professional photographers and not just a few geeks collecting some pictures they liked then I'd be more inclined to buy into their ban but without that I'm pretty confident the ban had no basis in reality.

michaelmior5y ago

I wasn't intending to suggest you were contributing to the problem. If others used your tool improperly, that's not your fault. But I can understand why some photographers who have had their content stolen could be upset. As far as Flickr's response in banning your app, it does seem at least misguided. I also wonder if they were concerned about making it too easy for people to move to another platform.

deepspace5y ago

I would argue that the "stealing" does not happen at the time of download; after all the photos are free to view on Flickr as many times as you want, and your browser needs to 'download' them in order to show them to you. Saving a copy to your own hard drive for offline viewing does not fundamentally change that interaction.

It is only when you re-publish the photos that it becomes theft of intellectual property.

M2Ys4U5y ago

Either way it's not "theft", as the original owner of the exclusive rights to copy (etc.) the work still has those rights.

After all that's what the "property" in "intellectual property" is - the bundle of exclusive rights. Owners of those rights aren't deprived of them by somebody copying the works over which the rights exist.

0xf85y ago

Try convincing the RIAA of your (perfectly valid IMO) argument...

michaelmior5y ago

I would agree with that, although it could still be the case where a tool which technically does not violate any laws is still used to ultimately facilitate illegal activity. In which case I think it would be reasonable for a platform to ban the use of such a tool.

JoshTriplett5y ago

It's also off-topic, and it's especially off-topic because it was misdirected. The response to "my legitimate tool was attacked because people thought it was for X" should not be to talk about the problem of X and why it's important. It should be to figure out how we prevent useful tools from being taken down. Amplifying a different problem, the fear of which led to breaking a useful tool, does not help.

Also, if you don't want something downloaded, don't post it on the Internet in the first place. The problem you're talking about isn't that photos get downloaded, it's how those photos are subsequently used.

michaelmior5y ago

> It should be to figure out how we prevent useful tools from being taken down

I would argue that recognizing the reason that useful tools are taken down, even if you argue that reason is not legitimate, is an important part of figuring out how to stop those tools from being taken down.

> Also, if you don't want something downloaded, don't post it on the Internet in the first place.

I understand the argument although I do wish photographers could be free to post their work without fear of others taking it.

pjc505y ago

It's basically over now; since photographers don't have a cartel backing them like film or music, and photos are so easy to copy, everything is basically everywhere.

Newspapers will routinely rip off photos from social media, sometimes in the face of explicit non permission.

jdndbfbf5y ago

If I run that script in an infinite loop, downloading the same persons photos, can I make they go bankrupt?

i3865y ago

this is as stupid as saying "evolution isn't real otherwise I could grow eyes on stalks when I want to see around corners"

1 more reply

nzmsv5y ago

Back when Udacity just launched with their first course (Thrun's self driving car intro) I wanted to watch the videos on a big screen. I had a somewhat smart TV that could play files off a USB stick but did not have Internet access. Udacity hosted all their videos on YouTube at the time and there was no convenient way to download them. So I spent an afternoon hacking together a Chrome extension that would modify the Udacity website DOM and give me download links for the lectures (it had to be an extension to get around same origin restrictions). Udacity had some weird naming convention for their videos so I had to make some calls to their API as well as YouTube and correlate videos with titles. The YouTube part was "influenced" by youtube-dl. This was much faster than reverse engineering, even though I was writing new JavaScript and not using the original Python. Anyway, trivial stuff.

After watching the videos from my couch for a few days I decided to post a link to my extension on the Udacity message board... and it absolutely blew up! My dinky little extension had thousands of users all over the world seemingly overnight.

But the absolute highlight was getting an email from a student from Iran. Iran just blocked YouTube because of https://en.wikipedia.org/wiki/Innocence_of_Muslims and there was a whole group of students who could no longer participate in the course. Apparently they had some friends at a US university use my extension to download the videos and reupload to a VPS they ran. I was blown away - my quest to sit on a couch ended up accidentally helping fight censorship.

I maintained the extension until Udacity added a native video download feature and then took it down. But it was an interesting experience and definitely shaped my perception of fair use laws. They are important. People have way more legitimate uses for information than lawyers can imagine.

andai5y ago

> my quest to sit on a couch ended up accidentally helping fight censorship

That's brilliant! We can never predict the impact our tools will have on people's lives.

Your idea of partially porting youtube-dl to the browser gives me an idea... would it be feasible to port it fully? I think the biggest hurdle would be ffmpeg, but a few days ago I saw "A pure WebAssembly / JavaScript port of FFmpeg": https://news.ycombinator.com/item?id=24987861

simlevesque5y ago

With webassembly right now I'm doing wild things right now, I'm sure it is possible and will happen. We already have ytdl-core js [1]

[1] https://www.npmjs.com/package/ytdl-core

jchw5y ago

Thanks for this. It is weird how long it’s been. I totally forgot that Wireshark was once called Ethereal.

I hope that the DMCA takedown issue can be resolved reasonably, but it’s starting to seem more and more like a move off of Github is overdue. Especially in a world where anyone can stand up a Gitea or Gitlab CE instance.

jordigh5y ago

> I hope that the DMCA takedown issue can be resolved reasonably,

I don't think it can be solved on Github.

"GitHub’s CEO suggested that YouTube-DL won’t be reinstated in its original form. But, the software may be able to return without the rolling cipher circumvention code and the examples of how to download copyrighted material."

https://torrentfreak.com/riaas-youtube-dl-takedown-ticks-of-...

This pretty much makes youtube-dl useless, since the "rolling cipher" is just downloading the same bit of js, inspecting it, and executing it, almost the way a web browser does (AIUI, the difference is that yt-dl inspects the js and picks out the function to run from it instead of just running it all verbatim). This counts as circumvention according to the DMCA, which leaves yt-dl little legal standing in the US.

Also note that the "examples of how to download copyrighted material" in the yt-dl tests were just code for getting the first few bytes of a number of RIAA-sequestered music videos. Small excerpts are usually allowed under Fair Use. The RIAA didn't really look into that detail.

On the plus side, this fork is active and not DMCA'ed, for now. I just turned to it because I needed a fix for Bandcamp that upstream yt-dl doesn't have:

https://github.com/blackjack4494/yt-dlc

squarefoot5y ago

Would moving the offending piece of code into an external, developed elsewhere, plugin work legal-wise? If yt-dl had a standard generic way to load external plugins to adapt to various sites, that would shift the responsibility to the external plugin which then could well be available through safer places than GitHub.

crtasm5y ago

> This pretty much makes youtube-dl useless,

Does everything on Youtube use the rolling cipher? I thought it was only on things like major label music videos.

minusf5y ago

it's not working since a couple of days, but i have been using mpv+youtube_dl for bandcamp listening for years.

segfaultbuserr5y ago

What? youtube_dl supports bandcamp?! It's great news for me (I knew youtube-dl supports a huge number of video sites, but didn't know bandcamp). I'll immediately start using it today. Just like you, not really that interested to actually download the music but just to use it with mpv. Thanks for the tip.

woofie115y ago

gitlab isn't evil. One of my short life lessons is that, in general, when I do business with evil, sketchy, or nasty organizations or people, I generally come out behind.

If the good guys have an inferior product and charge double, I'll sometimes pick the bad guy. And more often than not, I get burned, costing me tenfold what it would have cost to just go with the high-integrity choice in the first place.

I'm not leaving github over this, but I'm mostly starting new project on gitlab instead.

speeder5y ago

I've been using GitLab for a while now.

GitHub did many, many decisions that I found sketchy.

res0nat0r5y ago

Gitlab enforces DMCA requests. If you're a reputable company wanting to do legit business in the USA you have to follow the process, it really isn't up to Github or Gitlab unless they want to lose safe harbor status.

https://about.gitlab.com/handbook/dmca/

2 more replies

llsf5y ago

What I had totally forgotten, was libflashplayer.so :) it was so flaky. Glad I do not have to worry about this anymore.

nosmokewhereiam5y ago

My first tech book after Mike Meyers "All in one A+ 4th Ed" was a book written on Wireshark by Laura. I never got too deep as it was taken from the desk I pulled CQ duty at.

mschuster915y ago

> I hope that the DMCA takedown issue can be resolved reasonably, but it’s starting to seem more and more like a move off of Github is overdue.

It's a risky move, dabbling with stuff that is targeted under DMCA. Anything hosted in the US is liable for takedowns - including domain names that are under the control of US-based companies. You'll need to deal with acquiring hosting and DDoS protection yourself, plus keeping track of security updates. And to be honest Europe isn't exactly a legal safe haven either, we also have nasty laws (e.g. in Germany the infamous "Störerhaftung") exposing you to liability.

humford5y ago

Been a longtime fan of YouTube-dl, so much better and faster than any alternative. I remember being in high school in a group project with some theater kids. They needed a ton of free stock footage from Vimeo and they were going through links one-by-one to download each video individually. I just compiled a list of the links, ran youtube-dl, and had it done in 5 minutes when they'd already wasted three hours on it.

wpietri5y ago

What a wonderful little retrospective. It's a nice walk down memory lane, and I really appreciate how clear and sensible his thinking was around handing off the project when the time was right.

AntiImperialist5y ago

Wow, I did not know youtube-dl had existed for so long.

I remember coding my own YouTube downloader because of similar reasons. My internet connection was way too slow to stream videos, even at the lowest quality, so I'd make a list of videos, download them in the background the entire day and then watch it offline at the end of the day. When I finally discovered youtube-dl, I was relieved that I no longer had to keep maintaining my script... and it supported almost every other video website.

Also, I just realized that it can still be downloaded from the official website and updated using --update argument.

gman835y ago

Damn, freshmeat.net... I used to visit that site every day in the early 2000s, and I'd totally forgotten about it.

gspr5y ago

What a wonderful writeup that touches on both the original author's personal history, little technical tidbits, and so many ethical and societal aspects of computing!

vagrantJin5y ago

Long live Youtube-dl.

saghul5y ago

TIL the project was started by a fellow Spaniard! Very nice write up, kudos!

TeeMassive5y ago

I've had a personal playlist of videos I maintained since 2008, a collection really, with multiple fads I once used to share with the various social circles I had during the last decade at college and university. But now more than half of the videos are gone due to the various ban episodes YouTube had (FU cancel culture, FU Carlos Maza). Fortunately with YTdl I can save at least half of what has been my good fairly big portion of my adult life.

Sektor5y ago

Hey rg3 your link https://yt-dl.org/supportedsites.html is broken. Great tool btw thanks for all your efforts.

rg3OP5y ago

Thanks, I've reported that to the current maintainers.

kzrdude5y ago

jwz also had his youtube downloader script out there - but that was apparently not the start of youtube dl

j / k navigate · click thread line to collapse

64 comments

greggman35y ago

Flickr had also come out with an API. APIs for online services seemed kind of new at that point and Flickr was one of the first AFAIK.

Those users reported the app to Flickr and the app was banned.

1vuio0pswjnm75y ago

This story clearly illustrates the purpose behind "web APIs". To limit access.

YouTube used to have a freely accessible search API. Not anymore. However "scraping" the YT search result pages continues to work fine.

colejohnson665y ago

dreamcompiler5y ago

The purpose of an API is like a company mission statement: There's one version written on the wall and then there's the actual version everybody knows is true but they don't say it out loud.

You described the written one above. GP described the actual one.

account425y ago

That is the promise of APIs, but there is no guarantee.

mrmonkeyman5y ago

In theory, yes. In practice, no.

throw0101a5y ago

> YouTube used to have a freely accessible search API.

Twitter used to have RSS/Atom feeds for each account so you could follow someone without a client, just a regular old news aggregator.

boogies5y ago

2 more replies

matheusmoreira5y ago

> Like many topics today, there is often absolutely nothing you can say that will convince someone else your intensions are not bad.

intricatedetail5y ago

Funnily enough IP laws do little to protect authors, it's more like a framework to make money for big organisations who pretend they protect authors' interests.

Mizza5y ago

I have a very similar story!

SoundCloud made this very easy, as they had an API which exposed the endpoint MP3/WAV location in a field. The tool used an API key provided by SoundCloud to fetch the response.

I was very angry at the time since all my my music got deleted, but these days I'm just sad. Things like this have little by little destroyed all of my enthusiasm for technology.

I want to be a carpenter now.

RonanTheGrey5y ago

Today, I couldn't care less about tech, actively see it as a negative influence on human life, and understand the Amish and Luddites alot better.

intricatedetail5y ago

Did they at least provide an API that allowed to download your own account? I think they should have such feature thanks to GDPR.

michaelmior5y ago

greggman35y ago

michaelmior5y ago

deepspace5y ago

It is only when you re-publish the photos that it becomes theft of intellectual property.

M2Ys4U5y ago

Either way it's not "theft", as the original owner of the exclusive rights to copy (etc.) the work still has those rights.

0xf85y ago

Try convincing the RIAA of your (perfectly valid IMO) argument...

michaelmior5y ago

JoshTriplett5y ago

michaelmior5y ago

> It should be to figure out how we prevent useful tools from being taken down

> Also, if you don't want something downloaded, don't post it on the Internet in the first place.

I understand the argument although I do wish photographers could be free to post their work without fear of others taking it.

pjc505y ago

It's basically over now; since photographers don't have a cartel backing them like film or music, and photos are so easy to copy, everything is basically everywhere.

Newspapers will routinely rip off photos from social media, sometimes in the face of explicit non permission.

jdndbfbf5y ago

If I run that script in an infinite loop, downloading the same persons photos, can I make they go bankrupt?

i3865y ago

this is as stupid as saying "evolution isn't real otherwise I could grow eyes on stalks when I want to see around corners"

1 more reply

nzmsv5y ago

andai5y ago

> my quest to sit on a couch ended up accidentally helping fight censorship

That's brilliant! We can never predict the impact our tools will have on people's lives.

simlevesque5y ago

With webassembly right now I'm doing wild things right now, I'm sure it is possible and will happen. We already have ytdl-core js [1]

[1] https://www.npmjs.com/package/ytdl-core

jchw5y ago

Thanks for this. It is weird how long it’s been. I totally forgot that Wireshark was once called Ethereal.

jordigh5y ago

> I hope that the DMCA takedown issue can be resolved reasonably,

I don't think it can be solved on Github.

https://torrentfreak.com/riaas-youtube-dl-takedown-ticks-of-...

On the plus side, this fork is active and not DMCA'ed, for now. I just turned to it because I needed a fix for Bandcamp that upstream yt-dl doesn't have:

https://github.com/blackjack4494/yt-dlc

squarefoot5y ago

crtasm5y ago

> This pretty much makes youtube-dl useless,

Does everything on Youtube use the rolling cipher? I thought it was only on things like major label music videos.

minusf5y ago

it's not working since a couple of days, but i have been using mpv+youtube_dl for bandcamp listening for years.

segfaultbuserr5y ago

woofie115y ago

gitlab isn't evil. One of my short life lessons is that, in general, when I do business with evil, sketchy, or nasty organizations or people, I generally come out behind.

I'm not leaving github over this, but I'm mostly starting new project on gitlab instead.

speeder5y ago

I've been using GitLab for a while now.

GitHub did many, many decisions that I found sketchy.

res0nat0r5y ago

https://about.gitlab.com/handbook/dmca/

2 more replies

llsf5y ago

What I had totally forgotten, was libflashplayer.so :) it was so flaky. Glad I do not have to worry about this anymore.

nosmokewhereiam5y ago

My first tech book after Mike Meyers "All in one A+ 4th Ed" was a book written on Wireshark by Laura. I never got too deep as it was taken from the desk I pulled CQ duty at.

mschuster915y ago

> I hope that the DMCA takedown issue can be resolved reasonably, but it’s starting to seem more and more like a move off of Github is overdue.

humford5y ago

wpietri5y ago

What a wonderful little retrospective. It's a nice walk down memory lane, and I really appreciate how clear and sensible his thinking was around handing off the project when the time was right.

AntiImperialist5y ago

Wow, I did not know youtube-dl had existed for so long.

Also, I just realized that it can still be downloaded from the official website and updated using --update argument.

gman835y ago

Damn, freshmeat.net... I used to visit that site every day in the early 2000s, and I'd totally forgotten about it.

gspr5y ago

What a wonderful writeup that touches on both the original author's personal history, little technical tidbits, and so many ethical and societal aspects of computing!

vagrantJin5y ago

Long live Youtube-dl.

saghul5y ago

TIL the project was started by a fellow Spaniard! Very nice write up, kudos!

TeeMassive5y ago

Sektor5y ago

Hey rg3 your link https://yt-dl.org/supportedsites.html is broken. Great tool btw thanks for all your efforts.

rg3OP5y ago

Thanks, I've reported that to the current maintainers.

kzrdude5y ago

jwz also had his youtube downloader script out there - but that was apparently not the start of youtube dl

j / k navigate · click thread line to collapse