You're always bound by the same latency - your input goes to the server and server returns world state for your PC to render - the only extra latency in this scenario is time required to encode-decode the data which could be offset if the server is faster at rendering than the client PC.
As for economics I already said you can exploit the shared state very much in games if you rework the way rendering works. Right now games focus on camera space rendering because they only care about 1 view output and view space effects are cheapest in this scenario.
If you have shared state rendering suddenly you can process the instance state once per frame for 100s of users just like you simulate the game for 100s of users per instance.
You can have specialized hardware setups (ie. multiple high end GPUs with >10GB of ram) doing dedicated tasks like recomputing lighting/shadows, animation, particle effects, etc. you would need to find a way to make these systems asynchronous to reduce lag so lighting updates might lag a frame or two for eg. behind animation but as usual with rendering there are always clever tricks to fool the eye - and once you have gigabytes of ram at your disposal very different rendering techniques compared to currently used ones become viable for shared state rendering.
Someone already linked a platform that's already doing this - I have no doubt this is the future of VR and gaming at least in some part (maybe you won't be streaming video to the client but some geometric 3D world representation with deltas so that the final rendering can be done client side and you can have low latency rotation for stuff like VR)