You’re misunderstanding the purpose of this JSON/HTML combination, and I get the impression you’re probably not familiar with how screen readers work, either. The JSON is purely transient, being projected to the HTML DOM. The JSON has no standard semantics; the idea is that it should be whatever shape makes sense for your use case, and then that you should use JavaScript to project it to HTML. Think of it as a server-side templating language that takes a bag of data and decides how to write the HTML, except that the document
is the data, plus an extra piece that embeds the template to apply to the data.
The web is pretty much best-in-class for accessibility matters. (There are a few isolated cases where native desktop or mobile apps can do better, mostly to do with efficiency.) HTML elements have defined semantics, so that things like headings and links are automatically navigable, and sections, headers, footers and navigation lists become waypoints. Then ARIA attributes can be used to provide any further metadata necessary, such as to mark up a tabs widget to show how to interact with it. And that’s still key—accessibility needs to care about interactions (which tab is open? and did the content available change?), so state matters. Thus, accessibility tools will never care about any format that you are projecting from, like this JSON; they must only care about what is materialised, which is the HTML DOM. (Besides all that, the only sort of “consistent JSON format” that you could have would be basically an encoding of the HTML, which would be verbose and subjectively ugly compared to the HTML serialisation, e.g. ["a", {"href": "/"}, ["Home"]] or {"tagName": "a", "href": "/", "children": ["Home"]} instead of <a href="/">Home</a>, and miss the whole point here that the JSON is representing data rather than what the user sees.)
If you’re not familiar with accessibility stuff, I heartily recommend looking into it. If you can, find a blind person and see if you can watch them using a computer or phone. It’s really fascinating (I’ve never seen anyone be bored by it) and super useful if you ever contribute to making just about anything on a computer. Even people making documents in a word processor can learn things like “use actual headings rather than just making the text bigger and bold, because the semantics are useful”.