Why not? The approach of creating an accessibility tree can take extra work from developers instead of it just working. It's convenient to be able to just use an image without writing alt text for it. For example in a group chat.
Or what about a chart, or an assembly or measurement diagram? Can current image recognition reliably reproduce that information?
At the end of the day, the extra work by developers is part of what it means to be a developer. If you’re not doing that work then is the end product really meeting your users’ needs?
Because this isn’t true unless you’re using a nonnative framework like Flutter. If you write your apps in HTML or native frameworks, the tree is built automatically. You only have to fiddle with it if you’re doing really custom stuff (which almost no one is).
You can't "just OCR stuff" without losing all the visual meaning in a page. Just like we use borders and paddings and colors to hierarchize information, screenreaders use an information hierarchy too so users can conveniently navigate around.
Of course i don't know that it is possible, it could be impossible, i'm just having the impression that there hasn't been much effort towards that approach. And TBH it kinda feels like it'd be much better to have a solution that works with "everything" without that "everything" knowing about it (or at least with very little participation from that).
Also FWIW i often use a "simple" web browser like Dillo or Elinks to read articles since it bypasses all the cruft and the usual suspect for making things unreadable isn't CSS but JavaScript.