Scribd has lots of pdfs of books that are copyrighted. The Washington Post article mentions there are several other places it downloaded and scraped pdfs of copyrighted textbooks, etc
That's interesting to know, but that doesn't by itself imply that it's illegal. For example, Google Books, which has massive amounts of scanned PDFs of copyrighted works, is considered fair use under US copyright law.
It's fair use if the work is "transformative". GPT-4 isn't publishing the content of the books, it's publishing a model derived from the entire corpus. I'm not a lawyer, but I think there's an argument that it is transformative.