In a tree you have branches off branches off branches etc.
You can’t orient yourself - you can’t tell where you are - unless you count the branches. And indenting makes that visible.
In the examples for TFA, you can tell your location from the names of the elements. Eg <td> is enough for you to know you’re probably inside a tr inside a table.
And that is the more common case than the general tree example.
But a method of describing html does have to answer the question of how it represents arbitrarily deep nesting. But I like the answers it’s given for the more common case of structures that are not arbitrarily deep.