You can render traditionally all you want with metal. You just don’t get some of the features like camera access, or gaze. Which does have its downsides, but is a long way from what you’re describing. I’ve ported a metal based renderer to visionOS for companies already, and you already have engines like Unreal supporting it too.
I’m not even sure what DOM you’re talking about. SwiftUI? RealityKit? The former is for Ui. The latter is an ECS like rendering engine. But neither fit what you describe.
Perhaps before being outraged by things you should be familiar with development on them first.