HTML does a fine job with semantics but the reality is that most of it is used for presentation. HTML says what text is but not what it means, so reading a plain unrendered HTML document is a rough experience that doesn't add much usefulness for humans.
The vast majority of internet data can be relayed through something as simple and readable as markdown and/or YAML, and still convey enough useful semantics.