> There aren't many datasets exceeding that outside fundamental physics.
Just about every physical world telemetry or sensing data source of any note will generate petabytes of analytical data model in hours to days. On the high end, there are single categories of data source that aggregate to more like an exabyte per day of high-value data.
It is a completely different standard of scale than web data. In many industrial domains the average small-to-medium sized company I come across retains tens of petabytes of data and it has been this way for many years. The prohibitive cost is the only thing keeping them for scaling even more.
The major issue is that the large-scale analytics infrastructure developed for web data are hopelessly inadequate.