Write HTML Right (opens in new tab)

(lofi.limo)

263 pointsaparks5173y ago205 comments

205 comments

Whilst the spec certainly allows you to ignore closing of a whole range of elements, it's not necessarily the wisest of choices to make. The parser does actually get slower when you fail to close your tags in my experience.

Unscientific stats from a recent project where I noticed it:

+ Document is about 50,000 words in size. About 150 words to a paragraph element, on average.

+ Converting the entire thing to self-closing p elements added an overhead of about 120ms in Firefox on Linux, before initial render.

+ Converting the entire thing to self-closing p elements added an overhead of about 480ms in Chrome on Linux, before initial render.

+ Converting the entire thing to self-closing p elements added an overhead of about 400ms in Firefox on Android, before initial render.

+ Converting the entire thing to self-closing p elements added an overhead of about 560ms in Chrome on Android, before initial render.

+ The time differences appeared to be linearly increasing, as the document grew from 20,000 to 50,000 words.

+ Curiously, Quirks Mode also increased the load times by about 250ms on Firefox and 150ms on Chrome. (Tried it just because I was surprised at the massive overhead of removing/adding the tag endings.)

The most common place this was going to be opened was Chrome on Android, and a whopping half-second slower to first render is going to be noticeable to the end user. For some prettier mark up.

Whilst you can debate whether that increased latency actually affects the user, a decreased latency will always make people smile more. So including the end tags is a no-brainer. Feel free to write it without them - but you _might_ consider whether your target is appropriate for you to generate them before you serve up the content.

niconii3y ago

I can't verify your numbers. As far as I can tell, loading a ~900,000 word document with no other differences than including or excluding </p> has about the same load time, though there's too much variance from load to load for me to really give definitive numbers.

Are you sure you converted it properly? I'd expect those kinds of numbers if your elements were very deeply nested by mistake (e.g. omitting tags where it's not valid to do so), but I don't see why leaving out </p> should be so slow.

Try these two pages:

https://niconii.github.io/lorem-unclosed.html

https://niconii.github.io/lorem-closed.html

shakna3y ago

For five runs, on the same hardware with the same load:

+ Unclosed: 4.00s, 3.91s, 3.59s, 4.45s, 3.93s

+ Closed: 3.90s, 2.74s, 3.9s, 2.05s, 3.39s

Though I'd note that the newline you have immediately following the paragraph, even when closing, would probably reduce the backtracking effect. And having no explicit body or head element would probably cause some different rendering patterns as well.

paulirish3y ago

I don't know what you're measuring (onload?), but it's not giving you enough precision to make a conclusion about the performance of the HTML parser. If you profile the page w/ devtools Performance panel, you'll see that just 5% of the CPU cost used to load & render the page is spent parsing the HTML. At that level I'm seeing costs of 22-36ms per load.

And, spoiler alert: after repeated runs I'm not seeing any substantial difference between these test pages. And based on how the HTML parser works, I wouldn't expect it.

(I work on web performance on the Chrome team)

niconii3y ago

Were the five unclosed runs before the five closed runs? I could see that making a difference vs. interleaving them, if the hardware needs to "warm up" first.

For me, on Firefox on Linux (I know it's the one with the smallest difference, but I don't have the others on hand, sorry), using the "load" time at the bottom of the Network tab, with cache disabled and refreshing with Ctrl+F5, interleaving the tests:

- Unclosed: 1.38s, 1.49s, 1.45s, 1.52s, 1.48s

- Closed: 1.47s, 1.37s, 1.48s, 1.49s, 1.35s

The one with </p> omitted takes about 0.032s longer on average going by these numbers, but that's about 2 frames of extra latency for a page almost twice the length of The Lord of the Rings.

Regarding the page itself, I tried to keep everything else as identical between the two versions as possible, including the DOM, hence why I wrote the </p> immediately before each <p>. As for backtracking, I'm not sure what you mean. The rule for the parser is simply "If the start tag is one from this list, and there's an open <p> element on the stack, close the <p> element before handling the start tag."

myfonj3y ago

Well this sounds like really interesting observation. May I ask where exactly were the original closing tags located and how the stripped source looked like? I can imagine there _might_ be some differences among differently formatted code: e.g. I'd expect

    <p>Content<p>Content[EOF fig1]

to be (slightly) slower, than

    <p>Content</p><p>Content</p>[EOF fig2]

(most likely because of some "backtracking" when hitting `<p[>]`), or

    <p>Content</p>
    <p>Content</p>[EOF fig3]

(with that that small insignificant `\n` text node between paragraph nodes), what should be possibly faster than "the worst scenarios":

    <p>Content
    <p>Content[EOF fig4a]

or even

    <p>
    Content
    <p>
    Content
    [EOF fig4b]

with paragraph text nodes `["Content\n","Content]"` / `["\nContent\n","\nContent\n]"`, where the "\n" must be also preserved in the DOM but due white-space collapsing rules not present in the render tree (if not overridden by some non-default CSS) but still with backtracking, that

    <p>Content
    </p>
    <p>Content
    </p>[EOF fig5]

should eliminate (again, similarly to fig2 vs fig1).

(Sorry for wildly biased guesswork, worthless without measurements.)

shakna3y ago

It was just paragraphs of text. p, strong, em, and q mingled at most. No figures or images or anything of the like to radically shift DOM computations. That the effect can even be seen is probably due to the scale of the document, as I noted it's a little larger than most things.

All paragraphs had a blank line between them, both with and without the p end tag. The p opening tag was always at the top-left, with no gap between it and the content.

So, for example:

    <p>Cheats open the doorway for casual play. They make it easier for disabled players to enjoy the same things as their peers, and allow people to skip parts of a game that <em>they bought</em> that they find too difficult.</p>

    <p>Unfortunately, cheats are going away, because of extensive online play, and a more corporate approach to developing games that despises anything hidden.</p>

Versus:

    <p>Cheats open the doorway for casual play. They make it easier for disabled players to enjoy the same things as their peers, and allow people to skip parts of a game that <em>they bought</em> that they find too difficult.

    <p>Unfortunately, cheats are going away, because of extensive online play, and a more corporate approach to developing games that despises anything hidden.

(You can also discount CSS from having a major effect. Less than a hundred lines of styles, where most rules are no more complicated than: `p { font-family: sans-serif; }`. No whitespace rules.)

However, if you wanted to look at this in a more scientific way - it should be entirely possible to generate test cases fairly easily, given the simplicity of the text data I saw my results with.

myfonj3y ago

Yay, thanks for info and inspiration, sure it seems like fun weekend project.

(BTW your snippet's content sounds interesting and feels relatable, definitely intrigued.)

myfonj3y ago

Finally did some synthetic measurements of (hopefully) parse times (not render nor CSSOM or anything like that). Differences seems microscopic but overall aligned with my initial expectations (omitting the closing tag actually shaves a bit of yak's hair), so I suspect that the real overhead you observed is caused by something happening after parse, where absence of trailing white-space in DOM nodes (ensued by closing tags) helps in some way. I guess something around that white-space or text layout. (Speaking of insignificant white-space, you could probably gain some more microseconds if you'd stuck paragraphs together (`..</p>\n\n<p>..` -> `..</p><p>..`), however such minification seems like a nuisance.)

Tested only on Windows, in browser consoles.

Numbers:

Firefox (Nightly) (performance.now is clamped to miliseconds)

    total; median; average; snippet
    2279.0; 4.0; 4.558; '<p>_'
    2652.0; 4.0; 5.304; '<p>_</p>'
    2471.0; 4.0; 4.942; '<p>_abcd'
    2387.0; 4.0; 4.774; '<p>_\n'
    3615.0; 5.0; 7.230; '<p>_</p>\n'
    2380.0; 4.0; 4.760; '<p>_abcd\n'
    3093.0; 5.0; 6.186; '<p>_\n</p>\n'
    3107.0; 5.0; 6.214; '<p>_</p>\n\n'
    2317.0; 4.0; 4.634; '<p>_abcd\n\n'
    2344.0; 4.0; 4.688; '<p>_\n\n'

Google Chrome (performance.now is sub-milisecond)

    total; median; average; snippet
    2870.4; 5.2; 5.741; '<p>_'
    2895.2; 5.4; 5.790; '<p>_</p>'
    2684.7; 5.2; 5.369; '<p>_abcd'
    2845.4; 5.2; 5.690; '<p>_\n'
    3836.7; 7.3; 7.673; '<p>_</p>\n'
    2837.8; 5.2; 5.676; '<p>_abcd\n'
    4022.5; 7.4; 8.045; '<p>_\n</p>\n'
    4044.3; 7.3; 8.089; '<p>_</p>\n\n'
    2928.4; 5.2; 5.857; '<p>_abcd\n\n'
    2805.3; 5.2; 5.611; '<p>_\n\n'

Test config

    Snippets per document: 5000
    Rounds: 500
    Wrap: '<!doctype html>(items-paragraphs)'
    Content each item (_): bunch of random digits chunks, something like '1943965927 52 27 5 51664138859173 5161 7226 5 15 2 55679 6553712585'

Code: https://gist.github.com/myfonj/57a6a8fcb1c5686527412543a897c...

(Before realizing I can use synthetic domparser I made something what measures document load time in iframe (http://myfonj.github.io/tst/html-parsing-times.html) but it gives quite unconvincing results, although probably closer to the real world. Understandably, synthetic domparser can crunch much more code than visible iframe.)

toqy3y ago

> For some prettier mark up.

But then if you run it through Prettier it'll add all the closing tags for you :)

throwaway8943453y ago

If you’re running it through a processor, why it just write markdown and call it a day?

galaxyLogic3y ago

Is there a standard definition for the "Markdown" -language?

There are several for HTML different versions and it is standardized that you can omit some closing tags and some tags altogether.

The benefit of writing in a standardized language is that later you or anybody can run tools against your sources that check for conformity.

So that is why I prefer HTML. But I would like to hear your opinion on what is the best mark-down dialect currently?

1 more reply

hombre_fatal3y ago

Well, one simply formats the source file as you write it. The other requires a infile -> outfile build step that's more complex.

Whether the latter is worth it tends to depend on other things than parse time.

1 more reply

jokoon3y ago

Are more strict html parsers/renderers, and aren't they faster?

hombre_fatal3y ago

Lenient parsers still benefit from strict input because it lets them avoid lookaround/backtracking.

vbezhenar3y ago

What do you mean by lookaround/backtracking? You're inside <p>. You encounter another <p>. You can't nest one <p> inside another <p>, so you close current <p> and open new <p>. That's about it. I fail to see where do you need any kind of backtracking.

1 more reply

shakna3y ago

> Are more strict html parsers/renderers, and aren't they faster?

Are what more strict? You're missing a subject there.

At a guess, you're referencing the differences between Chrome/Firefox rendering times? And are surprised that Chrome is always slower?

In the same completely unscientific stat taking, I found that Chrome was significantly faster at parsing the HTML head element of a document than Firefox, and that difference was enough for Chrome to pull ahead of Firefox in overall rendering times for smaller pages. (Chrome was about 30% of Firefox's time spent in the head.)

However, Firefox was faster at parsing the body, and as I had a larger-than-usual body (50k words is not your average webpage), Firefox was overall faster.

chrismorgan3y ago

To you and all that have responded: there is no variation in HTML parsing between browsers. All engines are using precisely the same exhaustively-defined algorithm. There is no leniency or strictness. Their performance characteristics may differ outside of parsing, which includes what they do with the result of parsing, but in the parsing itself there should be basically no difference between engines or parsers.

hsbauauvhabzb3y ago

That’s interesting, but surely relying on user agent to ‘fill in the gaps’ is error prone? Surely transpiling prior or during render would be more resilient than trusting browser behaviour

lolinder3y ago

If you're in a situation where resilience against odd browser quirks matters, you probably shouldn't be writing HTML like this anyway. This style is fine for writing HTML for a blog. For any kind of application, it would be a nightmare to try to maintain.

Every time the author introduced a shorthand, they had to clarify that it works only in specific situations. The result of those qualifiers is that you will have to have some code written in the more verbose style anyway. Context switching between those styles and having to decide whether the shorthand works in any given case just isn't worth it on a large project that you'll be making changes to over time.

chrismorgan3y ago

HTML parsing is exhaustively defined, so there’s not any filling of gaps, but only rules to be aware of. If you don’t know those rules, this may be error-prone, but if you do, it’s not, and things like the start and end tag omissions discussed in the article are quite straightforward rules to learn.

myfonj3y ago

Although, as the article correctly points out, omitting the HTML tag is technically fine, there is one rather important argument for its inclusion: it can and should have a LANG attribute:

    <html lang=en-GB>

It's not verbose after all, and IIUC may be omitted if and only if the document is served with corresponding information in `Content-Language:` HTTP header, but nasty (or rather annoying) things may happen if that fails [1], so when it comes to "right HTML", following this advice sounds reasonable.

[1] https://adrianroselli.com/2015/01/on-use-of-lang-attribute.h...

timw4mail3y ago

No thanks. With the full markup you can see where things end, not just where they start.

I think this is similar to semicolons in Javascript: with semicolons at the end of each statement there is no ambiguity, but if you do not have semicolons, you have to know about edge cases, like if a line starts with a square bracket or paren.

exyi3y ago

You can't disable this "feature", so you still don't know where things end / begin. Some tags can't be nested in <p> while you could expect that they can:

  <p>
     Paragraph with a list won't work as you could think
     <ul> <li> Test </li> </ul>
     Something else
  </p>

Parses to:

  <p>
    Paragraph with a list won't work as you could think
  </p>
  <ul> <li> Test </li> </ul>
  Something else
  <p></p>

Similarly, in JS you are paying the price for optional semicolons even if you decide to use them.

   return
   {
      x: 1
   };

Will still not work even if you use semicolons elsewhere. So I don't see any advantage to actually using semicolons. JS is not worse than Python with it's basic inference, and yet in Python people will almost yell at you if you attempt to use a semicolon :)

I'd much prefer these features to be opt-in (yea, give me XHTML back for generated content). But when I can't can't disable them, why not embrace them ;)

minitech3y ago

> JS is not worse than Python with [its] basic inference

JS semicolon insertion is worse, because it depends on the following line. In Python, an unescaped newline outside of brackets always ends the statement, but in JavaScript, parentheses, brackets, binary operators, and template literals on the following line change that. The Python rule also makes a dangling operator outside of brackets a syntax error, which is a potential source of unintentional introduction of ASI when making changes to code in JavaScript.

Ontonator3y ago

On the point about semicolons in JavaScript, the logic I’ve heard is that if you consistently use semicolons, you can have a linter warn you if there is an inferred semicolon, so you know if you have made a mistake. If you don’t use semicolons and accidentally produce code with an inferred semicolon that should not be there, then there is no way for any tool to warn you. (Well, no general way; in your example with the return, many linters would warn you about unreachable code.)

epolanski3y ago

I never use semicolons and I never have these issues.

Even in the rarest cases I maybe had them like when copy pasting in the wrong place they were so rare that I don't think it's worth the additional noise of semicolons.

1 more reply

progval3y ago

> give me XHTML

You can still use XHTML; just send "Content-Type: application/xhtml+xml". You can express the same things as an HTML document, but with a saner parser mode.

chrismorgan3y ago

> You can express the same things as an HTML document

This is not quite true. There are a number of mutual incompatibilities between the XML and HTML syntaxes at both parse and run time.

At parse time, it’s mostly in the direction of XML syntax making things possible (e.g. nesting paragraphs or links, which the HTML parser prevents), but also in the other direction (e.g. <noscript> has no effect in XML syntax since it’s essentially an HTML parser instruction); you’ve also got case sensitivity which matters for SVG; and there’s the matter of the contents of <script> and <style> elements and their handling of <>&, where the best but still imperfect solution is a crazy mix of XML comments, JavaScript/CSS comments and XML CDATA markers. (See https://www.w3.org/TR/html-polyglot/ for more details of all this kind of stuff.)

At run time, behaviour changes in such a way that it will break some JavaScript libraries, due to differences like .tagName being lowercase instead of uppercase, and .innerHTML requiring and producing XML syntax.

epolanski3y ago

What is saner parser mode?

1 more reply

iamben3y ago

Agree 100%. It's also about a thousand times easier for people with a very basic HTML understanding to parse (if you open something, with pretty much the exception of an image, you gotta close it).

Periodically I have to send code to people who then make some of their own changes inline. God forbid trying to explain "yeah, they don't need to be closed, but that does because it's nested and..." Disaster (/hours of extra support) waiting to happen.

currysausage3y ago

You have to know HTML in order to know where things end. Otherwise, you will see nested paragraphs here:

  <p>Hello <p>World</p>!</p>

when it’s actually two consecutive paragraphs, an exclamation mark outside of any paragraph, and a closing p tag without an opening counterpart.

And when you do know HTML, you might as well omit optional tags.

If you think that HTML syntax is crazy, I won’t blame you, and you might consider XHTML instead, but you should be prepared for different woes.

mst3y ago

I have a tendency to forget ASI in JS exists when I've only been looking at my own code rather than other people's for a while.

I remain unconvinced it was a wise idea.

lelanthran3y ago

What is ASI?

mikewhy3y ago

"Automatic semicolon insertion": https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...

robgibbons3y ago

This works for blog posts, where the body of the document is one long block of paragraphs, but I suspect this style would quickly become untenable for complex apps. Indentation _is_ information, which is lost here.

clairity3y ago

it doesn't work for even slightly complex documents either. there's been a little meme-fad lately around minimalistic html like this, but to claim it's the "right" way to write html is pompous at best.

not closing tags for instance is really asking for future headaches. sure, it works for a simple text list, but not when it gets even a little complicated (add links, images, buttons, etc.). even worse are p tags, where you have to memorize a whole matrix of what it can contain and what breaks out implicitly. with every insertion/deletion, you need to check the list. it's needless mental drag.

niconii3y ago

You have to know about what breaks out of <p> tags regardless of whether or not you leave off the end tag, though.

<p><div></div></p> is invalid HTML because <div> ends the paragraph, resulting in an unpaired </p>.

anjbe3y ago

And not just because of that. In XHTML‐as‐XML, where <div> does not implicitly end the paragraph, what you posted is still invalid because <p> cannot contain <div>.

aparks517OP3y ago

I've been using this style - with some tweaks - for web apps too. I don't think I have it completely figured out yet, but it's promising so far. You can view the source of http://lofi.limo/ to see how it's working out.

jaywalk3y ago

I feel like this style just makes it harder to read and understand the HTML. But hey, if it works for you, great.

sph3y ago

This is the output of an app/templating system, i.e. not a single HTML page. Have you ever read the HTML of any dynamically generated page? It's unreadable.

3 more replies

egeozcan3y ago

> Indentation _is_ information, which is lost here.

Isn't it a "view" of information? Any sufficiently advanced text editor can recreate it with a simple key combination.

robgibbons3y ago

Sure, but the author is advocating that you compose HTML this way. It would quickly become a mess of nested elements with zero visual indication of hierarchy.

The DOM is a tree, with nested elements. Losing that information doesn't get you anything but tag soup (which is, oddly, what the author suggests this style is supposed to avoid)

SkeuomorphicBee3y ago

First and foremost, the author advocates for organising documents in a much flatter DOM tree. In this style all major page elements sit at the same hierarchical level, so there is no "mess of nested elements", the is no need for visual indication of hierarchy if there is no hierarchy to begin with.

I think that is a very compelling format for a text-first web page, like a blog post or news article. Of course it is a coding style not well suited for complex web apps with deep hierarchy.

1 more reply

LeonB3y ago

In a tree you have branches off branches off branches etc.

You can’t orient yourself - you can’t tell where you are - unless you count the branches. And indenting makes that visible.

In the examples for TFA, you can tell your location from the names of the elements. Eg <td> is enough for you to know you’re probably inside a tr inside a table.

And that is the more common case than the general tree example.

But a method of describing html does have to answer the question of how it represents arbitrarily deep nesting. But I like the answers it’s given for the more common case of structures that are not arbitrarily deep.

1 more reply

nerdponx3y ago

The problem is that HTML has multiple uses. The author is describing the case of authoring content, with HTML used as a markup language. However a lot of websites and web applications use HTML more like a layout and templating engine for a GUI framework.

falcolas3y ago

Only if the formatter is unaware of HTML. If it can't handle automatically closing <p> tags, then it's unaware and is trying to treat HTML like XML.

Or, to put another way, HTML != DOM, even though HTML can be rendered into a DOM.

Something12343y ago

Sometimes I indent in a way that my text editor doesn't exactly understand to better state where complex expressions begin and end.

jahewson3y ago

I don’t think the author means the information-theory kind of information. I could gzip the file without a loss in that kind of information.

jahewson3y ago

Incorrect indentation is therefore misinformation.

LeonB3y ago

Hi guido!

1 more reply

pwdisswordfish93y ago

That’s why HTML is not a language for ‘apps’.

Spivak3y ago

Except for the fact that native apps also use SGML or XML inspired markup for their layout engines. A tree of heterogeneous objects maps extremely well to how people think about UI.

WillusFredus3y ago

I agree that a tree structure can work well for mapping UIs, but HTML does not. It was specifically design as a textual markup language. Its role has been expanded, but it has been done so poorly.

What really needs to happen it a separation of HTML from UI markup elements. HTML will be used solely for textual markup and a new markup language can be used for UIs. This would allow us to return to a proper separation of concerns.

2 more replies

btrettel3y ago

Regarding writing "one-sentence-per-line", I've noticed that style before in LaTeX. While I don't use that style, one advantage that I like is the ability to include comments on the sentence level in LaTeX.

So instead of this:

  First sentence. Second sentence. % Comment on first sentence.

I can write:

  First sentence. % Comment on first sentence.
  Second sentence.

(Of course, one could define a new TeX macro that doesn't display anything to add comments anywhere in-line. That's not as readable, though.)

I've also read that one-sentence-per-line works better with diff programs, but I haven't had any problems with the program meld, so this isn't convincing to me. The advantage the linked article mentions in terms of rearranging sentences also is worth considering, though I haven't found the normal way to be that bad so I'm not convinced by that either.

Some other links on this coding/writing style:

https://rhodesmill.org/brandon/2012/one-sentence-per-line/

https://news.ycombinator.com/item?id=4642395

http://www.uvm.edu/pdodds/writings/2015-05-13better-writing-...

pronoiac3y ago

I've been working on turning a pretty massive scanned book into a git repo of markdown files, with multiple collaborators. Using sentence-per-line has been useful (compared to line-per-paragraph) because, even with / despite --word-diff , PRs are far more concise, and merge conflicts are more rare. From memory, with paragraph-per-line, I think a series of paragraphs, each changed, even with minor changes, kinda breaks git diff and GitHub diff.

aparks517OP3y ago

Oh, wow... I hadn't even thought of the diff angle, but it makes all the sense in the world. I've heard some authors even start each clause on its own line. I'm not sure I'm ready for that yet.

buzzy_hacker3y ago

> A few years ago, I found out I'd been tying my shoes wrong for my entire life. I thought laces came undone easily and didn't usually look very good. At least that's how mine were, and I never paid much attention to anyone else's. It took a couple of weeks to re-train my hands but now I have bows in my laces that look good and rarely come undone.

I’m equally interested in this as the HTML. Any clue what the author is referring to?

js23y ago

Lucky you! [1]

https://www.fieggen.com/shoelace/

Specifically:

https://www.fieggen.com/shoelace/grannyknot.htm

An HN favorite:

https://news.ycombinator.com/from?site=fieggen.com

1. https://xkcd.com/1053/

SpaceNugget3y ago

Likely the author was tying granny knots instead of slipped/bowed reef knots

If your first cross is left over right you need to make your second cross right over left, or vice versa. I found an image showing the difference for the un-slipped version, but it's the same with a bow: http://www.tikalon.com/blog/2020/square_granny_knots.png

Granny knots untie themselves and the bow will end up perpendicular to the knot instead of parallel.

aparks517OP3y ago

Aw... you didn't read to the end ;)

> The right way to tie your shoes is with a square knot. It's easy to confuse this with the granny knot, which is the wrong way. The square knot is a simple and sound knot with many uses. The granny knot is an unsound knot whose only known uses are to make your shoelaces look crooked and to trip you.

jjice3y ago

Possibly the Ian knot https://www.fieggen.com/shoelace/ianknot.htm

You look goofy trying to relearn to tie your shoes, but it really is fast and sturdy.

wccrawford3y ago

It was apparently the square knot, but I'm a big fan of the Ian knot. I learned it about a year ago and at the very least, tying my shoes is more fun now. I'm not yet convinced it's better than the old-school method, but it looks impressive when you do it and it's more fun.

MindTwister3y ago

What immediately came to mind for me was this (short) Ted talk https://youtu.be/zAFcV7zuUDA

exyi3y ago

Not sure what he's referring to, I'm not familiar with the parallel posts. I just do the "rotate around the loop" part twice. I have had untied shoelace approximately twice in the last 5 years.

kuschku3y ago

I appreciate that this blog post itself is written in the exact same style! I really miss being able to read the view-source: version of websites easily, but this blog post does it well :)

Legion3y ago

Certainly beats the "you don't need so much JavaScript!" blog posts that load 10 external scripts.

account423y ago

Or the articles about tracking and the ad industry with consent popups asking for permissions to let their ad "partners" track you.

sivers3y ago

Thanks to Aaron for posting this. Such a great reminder.

Anyone interested in this subject, check out a series of three very tiny books called “UPGRADE YOUR HTML” by Jens Oliver Meiert.

They give great step-by-step examples for eliminating optional tags and attributes, reducing HTML to its cleanest simplest valid form. The author is a super-expert in this specific subject, working with Google and W3C on this. His bio here: https://meiert.com/en/biography/

From LeanPub: https://leanpub.com/b/upgrade-your-html-123

From Amazon: https://www.amazon.com/gp/product/B08NP4GXY2/

xaduha3y ago

> Such a great reminder.

Reminder of what? To me this reads like satire, even if it wasn't intended as such.

gildas3y ago

This is how SingleFile writes HTML by default :). However, it is also the most duplicated issue in the tracker.

isp3y ago

Example "issue" (feature) from the tracker: https://github.com/gildas-lormeau/SingleFile/issues/967

(Also: a huge thank you for creating SingleFile. One of my favourite extensions of all time.)

sph3y ago

You can link it: https://github.com/gildas-lormeau/SingleFile

Pretty neat extension!

gildas3y ago

I was hesitating, thanks!

pwdisswordfish93y ago

The remarks by the person who opened #967 are beyond frustrating—and it's frustrating to see your responses to them. People putting stuff into the bugtracker that aren't bugs deserve a harsher response. Don't enable "putting stuff into the bugtracker without clearly articulating a defect [in the form of observed behavior versus expected behavior‡]" to be a viable way to interact with a project. Indulging these kinds of persons' requests for support and freeform banter is harmful in the long run. Giving them the answers that they're looking for even though their questions/comments are out of scope is way too forgiving, and it ends up causing problems for other maintainers when these numbskulls inevitably pop up around other projects and expect the same standard of treatment because they take it as a given that their fripperies are kosher.

‡ including sound, solid reasoning for why the former is incorrect and the latter is correct

1 more reply

iostream243y ago

At this point you are better off making a DSL that compiles to html.

- it will be possible to be consistent with closing tags or not

- you can do other arbitrary things to improve your working experience with it

Ever tried Slang styled templates?

zeven73y ago

I like this idea. As someone who argued vehemently for XHTML a couple decades ago (even wrote a fair amount of XSLT in those XML-crazed days), who's been wandering between different levels of "how strict should I be?" since that time, this article marks the step of my journey where I feel like I can really embrace the goodness that SGML has to offer for the first time. So thank you. This article has changed me.

sylware3y ago

Regarding tables, there is one trick: size of borders are actually weighted semantic separators, and should be in HTML, not in CSS.

Julesman3y ago

Regarding tables, don't use tables. :)

egypturnash3y ago

…for non-tabular data such as “your pretty design elements that frame and organize the text because it is 1995 and CSS doesn’t exist yet and this is the only tool at your disposal for aligning stuff across the page”. Or because it is 2000 and putting stuff where you want it is a hell of CSS2 floats and box models and eventually you just say “fuck it” and assign table-like behavior to a bunch of divs because Tables For Layout Are Considered Harmful.

If you’ve got stuff that would look good as a table, use a table.

temporallobe3y ago

It’s funny you bring this up because while I have joined the Tables For Layout Are Considered Harmful club, I never really have heard a completely convincing argument on why tables have this bad rap. I think it’s mostly because, semantically, tables don’t make sense for layout, but back in the days before frameworks such as Foundation and Bootstrap (and more recently native CSS3 mechanisms), tables with invisible borders were nearly perfect for layout containers.

8 more replies

temporallobe3y ago

Joking aside, tables are perfectly acceptable and actually the most appropriate markup for tabular data; in addition, accessibility tools know how to read them (IF they are coded correctly, but that goes for any HTML). I use tables where needed, but of course never for layout.

Rygian3y ago

Except, you know, for actual tables. :-)

dasil0033y ago

I like the aesthetic though I'm not sure how sustainable it is beyond basic content documents. On a side note though, I clicked around and big props to Aaron on the lofi.limo project, this is very cool.

aparks517OP3y ago

Thank you for the kind words! I've been working on adapting this style for web apps, but I haven't got it figured out well enough to write an article about. Yet...

I wouldn't mind if we had a bunch more basic content documents on the web.

mekster3y ago

HTML can't be fixed with a small trick like that.

Just use templating engine like Pug and get away with most of the annoyances.

It's concise about what part of the text is covered by a certain tag due to forced indentation, not to mention you'll never need to close any tag and you never write "class=" but are all turned into CSS selector notation among many other tricks.

https://github.com/pugjs/pug#syntax

Unless the HTML I'm composing will be touched by people like designers who would get scared of new syntax, in which case I'll use Twig or Nunjucks, I'll never write plain HTML for myself.

There's also a very solid implementation in PHP as well.

https://github.com/pug-php/pug

You can either let server side (node.js or PHP) compile that on demand or let your editors compile them as you edit if you're working on a static file.

I really think the language humans write should deviate from the language the runtimes understand to get all the convenience while never breaking how runtimes/crawlers interpret your output. Same goes for Stylus against CSS.

account423y ago

> However, any content which cannot go in a p element (most other block-display elements, for example) implies the end of its content, so we can usually leave off the end tag.

Note however that this means that the whitespace between paragraphs will be part of the paragraph which can be annoying if someone tries to copy the text on your website and gets an additional space after each paragraph which wouldn't have happened if you explicitly closed the </p> directly after the text.

Also, you should keep the opening <html> and specify the language of your document even for english since e.g. automatic hyphenation does not work if you don't specify a language.

Otherwise really like this condensed HTML style and have recently converted my personal website to it.

andrew_3y ago

The lack of closing tags is giving me severe anxiety. I know it's valid non-xml syntax but all the hairs on my neck are at attention.

pineconewarrior3y ago

I agree, and unless someone has a better reason than the ones I have seen, (saving tiny amount of bytes, less keystrokes, dx) I am convinced it's a bad idea to omit the end tags.

It causes way more trouble than those benefits are worth

MrVandemar3y ago

I use an aggressively minnimal set of (valid) HTML because I prefer to write in HTML rather than Markdown-flavour-x.

Omitting the closing tags where possible is less about saving keystrokes than minnimising interruptions to my writing flow.

But I wouldn't advocate it for published documents, just my local scribblings.

nayuki3y ago

To solve your anxiety, may I suggest XHTML? I use it on my website in practice and it works really well.

jacobsenscott3y ago

If you must write html by hand this seems nice. But I would never actually write html by hand anymore. For most web apps you write more tags than text. I love slim because it was designed with that in mind. There is no overhead to writing tags, and just a little for writing text. Which is the right way to go for web apps.

spread_love3y ago

omitting <html> works fine in browsers but breaks a lot of other developer tooling in my experience. It's nice to save 6B I guess, but compared to the behemoth webapp it's wrapping it's not much of an optimization.

JasonFruit3y ago

Why does it matter? A good HTML editor ought to be able to take in HTML, display it and edit it according to the user's preferences, and save it in a size-minimizing way. Why should we have to choose only one way?

movedx3y ago

Author: write HTML right Me: this green on black background is terrible to read, I'll use reader mode Chrome: this author did not write their HTML correctly, so there is no reader mode available

How ironic.

exyi3y ago

Firefox's reader mode works just fine. You need a right browser for the right HTML.

... anyway, it bothers me sometimes that I'm not aware of any spec for "reader mode compatibility", did anyone see anything like that?

zzo38computer3y ago

I use (a old version of) Firefox and can select "View > Page Style > No Style" to disable CSS, and this works OK for me (it is better than some web pages, where this does not work very good, but this one it works good).

I do not know what criteria are needed for the reader mode in Chrome. (The HTML code looks OK to me?)

frosted-flakes3y ago

I think Reader mode looks for a <main> section. When it's not present it either guesses or doesn't work at all.

moreati3y ago

Is there a tool to convert an existing HTML document into this style? E.g. strip out optional closing tags, without doing full minimisation/whitespace stripping.

DustinBrett3y ago

I've been using https://github.com/terser/html-minifier-terser to get this kind of HTML for my personal site for a while. It passes W3C so I'm happy.

After reading the connected blog post http://perfectionkills.com/experimenting-with-html-minifier/

epolanski3y ago

Slightly off topic but I'd like to point out that paragraphs in HTML are grouping not textual elements. They are like divs or headers, not like span or b.

They are mistakenly and traditionally associated with literature-type paragraphs but that is not correct. You generally use them in forms to split different groups or inputs, that has nothing with paragraphs of a written form and even less with textual paragraphs.

I think there is really a lot of confusion about them in this whole thread.

niconii3y ago

Although there are some other uses for <p>, it is perfectly valid to use <p> tags for textual paragraphs and that has been the main use for <p> for as long as HTML has existed. I'm not sure why you believe otherwise.

Take a look at the source code for http://info.cern.ch/hypertext/WWW/MarkUp/Future.html for instance, which was written by the creator of HTML, Tim Berners-Lee.

You can also look at the source code for any page of the current HTML spec (e.g. https://html.spec.whatwg.org/multipage/introduction.html) where, again, <p> is used for each paragraph in the text.

epolanski3y ago

I didn't say it's not a valid use, I said that it's not it's primary use.

Paragraphs relate to grouping content[1], not textual one. There's no logic in paragraphs.

I quote here the official spec, which makes various examples of how paragraphs are not related to logical paragraphs:

> The solution is to realize that a paragraph, in HTML terms, is not a logical concept, but a structural one. In the fantastic example above, there are actually five paragraphs as defined by this specification: one before the list, one for each bullet, and one after the list.

And I'll quote also the definition on MDN:

> The <p> HTML element represents a paragraph. Paragraphs are usually represented in visual media as blocks of text separated from adjacent blocks by blank lines and/or first-line indentation, but HTML paragraphs can be any structural grouping of related content, such as images or form fields.

Failing to realize that paragraphs are grouping rather than logical content leads to frequent misuses of paragraphs and this comment section is literally filled by bad paragraphs examples which suggests the community is largely ignorant on html.

[1]https://html.spec.whatwg.org/multipage/grouping-content.html...

niconii3y ago

In this comment section? Are you talking about stuff like the example I used earlier?

    <p><div></div></p>

Yes, obviously this is bad and nonsensical HTML. Under no circumstances does it make sense to have a div inside a p. In fact, the above doesn't even work, being parsed as

    <p></p><div></div></p>

But the intention of this example is not to show good HTML. The point is that many people have only a very basic understanding of HTML syntax, under the impression that

    <foo><bar></bar></foo>

works for any elements, because there's a <foo> and a </foo> so clearly anything inside it must be inside the foo element, right? But this is not the case for all elements. HTML's syntax is more complicated than that. My example was only intended to correct this misconception, not to demonstrate semantically-correct HTML, and that goes for other similar examples made by other people in the comments too.

layer83y ago

> gator

What do they mean here?

aparks517OP3y ago

Less-than or greater-than signs (code points 0x3C and 0x3E in ASCII). A friend put me on to calling them that because they (sort of?) look like alligators with their mouths open.

trevcanhuman3y ago

My math teachers used an alligator analogy to remind us which is the correct symbol for using the greater - and less - than sign, the 'mouth' of the gator is always eating the greater number.

phabricator3y ago

Ex: https://cdn.themeasuredmom.com/wp-content/uploads/2014/07/fr...

nhooyr3y ago

> It used to be the case that URL parsers would remove newlines and tabs, so we could split long URLs across lines and even format their query parameters nicely with tabs. Unfortunately, this was taken advantage of for data exfiltration via HTML injection and we no longer have this nice thing as URL parsers have been made more strict to prevent this kind of attack.

Does anyone have a source/reference for this?

Kazkans3y ago

Why dont just use groff/troff and output to html?

Voeid3y ago

Figure 2. showing the "common style" is something I've never used or seen before.

What is the "right" way? Perhaps it is to use style from both of these extreme examples and write code that is easy to read and edit for the person that is working with it.

Or perhaps the right way is to never imply the way you are doing things is the only correct way and then try to pass it on as facts?

JJMcJ3y ago

I too found out I'd been doing my shoelaces wrong. YouTube set me straight.

For HTML, these are good recommendations.

Sometimes, like for technical writing where there are

    various

distinct and important formatting choices, it's just hard work to get it the way you want it even with a WYSIWYG editor.

exodust3y ago

Closing li tags is the right thing to do! I always close the kitchen drawer too after putting the scissors back. But I rarely write HTML as content anyway, it's mostly templates for the CMS, where it's best to close the tags.

recursive3y ago

I too close my kitchen drawers. But not my li tags. Unless I'm using the bastardization known as jsx. The next li closes it automatically, as it's specified to do.

hinkley3y ago

"everybody knows" doesn't scale, because

1) not everybody knows and

2) you're relying on memorization for people to read your code, which means you're smashing the ladder rungs behind you

Software on a team is a performance art. People are either watching you and copying your behavior, or watching you and getting confused.

And if you've ever felt overbooked on a project while other people are idle? It's stuff like that that put you into that situation. And since you're the one who did the 'stuff like that', it's at least partly your fault you're in this situation. Stop being a ball hog, and you'll get fewer bruises.

recursive3y ago

> "everybody knows" doesn't scale

Agreed. That's why I prefer to have things written down. In this case, WHATWG and W3C already did the work for us.

> And if you've ever felt overbooked on a project while other people are idle?

I've seen what you're talking about, but I'm not the one getting overbooked. I'm not generally the one fighting over this stuff. If I get feedback on a PR telling me to add li close tags, I'll probably just do it.

If you're using a technology on a daily basis, it will pay big dividends to spend a little time learning how it actually works.

martin_a3y ago

While I'm no big fan of SEO and all that surrounds it: Will this open-tag-thing here influence how crawlers handle your site and index/rank it?

exyi3y ago

I have no idea what Google does, but expect their parsers to be quite robust. I tried doing some web scraping, and so many pages are not even valid HTML (most often invalid nested tags, like a table inside span, missing closing tags even when required, random unopened closing tags, ...). Not closing <p> and <td> tags is quite common, I have not seen omitted <html> <head> and <body> yet.

aparks517OP3y ago

I don’t expect it to as long as the mark-up is valid. Perhaps someone with more SEO knowledge will stop by to correct me.

martin_a3y ago

You're right. HTML5 does not work with DTDs anymore, so unclosed tags are not a violation of the document schema and therefore probably not "punishable" by search engines.

anjbe3y ago

Implicit end tags as described in the article have been allowed by every HTML DTD not named XHTML.

tiffanyh3y ago

One of the easiest ways to improve SEO is to just properly use existing HTML tags (instead of using a custom DIV for everything).

jokoon3y ago

I'm curious if a more strict html parser would actually be faster.

Browsers are not really fast on my Android, and I wish they were fast.

exyi3y ago

I have yet to see a slow HTML-only website ;) (which is not 10MB single file spec or entire book). Really, I don't think html parsing is a huge bottleneck and these few parser exceptions don't seem to be that hard to implement - just close a tag if opening one of a predefined list, no backtracking or something expensive.

mst3y ago

Depending on what sites you're mostly accessing, it may be worth experimenting with Firefox Mobile plus uBlock Origin plus perhaps one or more of the extra anti-(ad|bloat)ware extensions. Chrome is definitely faster in a straight line but once I've got Firefox configured it's (to me) significantly more pleasant to use (and I like the current UI better than Chrome's though that's -definitely- not a universal opinion, mileage may vary as ever).

_glass3y ago

love the troff reference. I wrote my first CV in troff. mostly because it was available on my linux machine, and working.

iLoveOncall3y ago

In 2022 how often do you actually write text by hand in your HTML files? I find that beside the few buttons here and there (and that's if you don't have i18n), text is always going to be served by a server.

In 2022 we also all use text editors or IDEs that can collapse entire blocks of tags, to improve readability.

I'm not sure I can see a clear benefit here outside of very few edge cases, and I am sure it comes with its lot of disadvantages.

nojs3y ago

Static site generators (Jekyll, Hugo) are one example. Sometimes you can get away with markdown but often you end up marking up pages of text.

pineconewarrior3y ago

Even when you need to write actual HTML you still should use shorthand tools like emmet to write your markup faster and with less mistakes.

EugeneOZ3y ago

XML is beautiful and clean, and I prefer to write full closing tags.

anjbe3y ago

It’s funny how people’s aesthetic sensibilities can differ. Making use of HTML’s standard features to drop unnecessary elements and closing tags is very much in line with my own idea of “beautiful” and “clean.”

Do you consider any table that doesn’t explicitly declare <tbody> “unclean”? That’s an implicit element in every <table>, according to the spec.

EugeneOZ3y ago

No, tbody is just an element. The power is in tags.

enriquto3y ago

Of course, of course; but here they are talking about HTML (i.e., about HTML5), not about XML.

tannhaeuser3y ago

I've given up to try and educate XML heads that XML is just a proper subset of SGML, just as HTML is originally, and mostly still, an SGML vocabulary. Idk what people are talking about in this thread (seems to be about each one's personal preferences and wildly speculative assumptions about backtracking when in reality both SGML and WHATWG are deterministic); while there is exactly one reference to WHATWG at this time.

nayuki3y ago

HTML has a dialect in XML called XHTML. It is obscure but actually works. My website is a living example.

irrational3y ago

This post needs an OCD trigger warning.

ThatIsntOCD3y ago

"OCD" as in "I don't like clutter" or real OCD as in "if I don't clear away the clutter my family will die in a car crash, I know that's illogical, and yet I'm still encumbered with the intrusive thought?"

irrational3y ago

OCD as in “not having closing tags matching open tags is driving me insane”. Maybe OCD isn’t the proper term, but I don’t know of a better one.

ThatIsntOCD3y ago

Respectfully, please try to refrain from using OCD casually. It's not like you're the only one, but it's a debilitating disease.

1 more reply

eatsyourtacos3y ago

Rite HTML Wright

(sorry)

math_dandy3y ago

Keep calm and Prettier on.

j / k navigate · click thread line to collapse