The dumb solution for that is to exclude persistent storage from the limit.
The nice solution for that is supporting both "runrate" and "consumption" limits.
Using a runrate limit, spinning up an instance, creating a file, etc. allocates budgets for running it continuously, which is released when shutting it down/deleting it. Hitting the limit prevents new resources from being allocated, but keeps existing ones alive. This should be used for persistent storage and instances used to handle base load.
Using a consumption limit, the resource is shut down when the limit is hit. If the shut-off is delayed, the cloud service eats the overage, since they control the delay. This should be used for bandwidth, paid api-calls, and auto-scaling instances.
The user should be able to create multiple limits of each kind, and assign different services to such limits. Alerts when going near the limit can help the user raise it, if that's their intention.
For consumption, it might also make sense to have rate limiters, which throttle after a burst budget is exceeded, similar to how compute works on T instances on AWS. But those probably only make sense for individual services, not globally (e.g. throttle an instance to 100 Mbit/s after it exhausted its 5 TB/day bandwidth allocation, or throttle an API to x calls/s).