The Pile was. It’s still available but no one will touch it, mostly due to books3.
The difference is that a few people with lots of resources take on legal risk. In the piracy example many people with few resources take on risk, which works out since no one wants to sue people with no money.
The Pile is still used to train LLMs and it's still very much available on the net. I agree it's a risk to train your models on the dataset until the legal implications are worked out, but it doesn't seem to be stopping people.