CRLF is obsolete and should be abolished (opens in new tab)

tptacek1y ago

This would be more persuasive if HTTP servers didn't already widely accept bare 0ah line termination. What's the first major public web site you can find that doesn't?

5 more replies

Spooky231y ago

What a weird reaction. Microsoft’s use of CRLF is an archaic pain in the ass. Taking a position that it should be deprecated isn’t radical or irresponsible — Microsoft makes gratuitous changes to things all of the time, why not this one?

Hipp is probably one of the better engineering leaders out there. His point of view carries weight because of who he is, but should be evaluated on its merits. If Microsoft got rid of this crap 30 years ago, when it was equally obsolete, we wouldn’t be having this conversation; if nobody does, our grandchildren will.

rtpg1y ago

Took me a second to get what was going on here, but basically the idea is that you middleware might not see `C:D`, but then your application _does_ see `C:D`.

And given your application might assume your middleware does some form of access control (for example, `X-ActualUserForReal` being treated as an internal-only header), you could get around some access control stuff.

Not a bytes-alignment thing but a "header values disagreement" thing.

This is an issue if one part of your stack parses headers differently than another in general though, not limited to newlines.

refulgentis1y ago

I wouldn't be too worried and making personal judgements, he says the same thing you are (though I assume you disagree)

mackal1y ago

> massive SQLite fan, but this is giving me pause about using other software by the same author

Even if I wanted to contribute code to SQLite, I can't. I acknowledge the fact God doesn't exist, so he doesn't want my contributions :P

https://www.postfix.org/smtp-smuggling.html

deanishe1y ago

> but this is giving me pause about using other software by the same author

Go read the article again. I think you'll be pleasantly surprised.

amluto1y ago

> I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

It’s worse than satire. Postel’s Law is definitively wrong, at least in the context of network protocols, and delimiters, especially, MUST be precise. See, for example:

Send exactly what the spec requires, and parse exactly as the spec requires. Do not accept garbage. And LF, where CRLF is specified, is garbage.

tptacek1y ago

If two systems agree, independent of any specification someone somewhere else wrote, to accept a bare NL where a CRLF is specified, that is not "garbage". Standards documents are not laws; the horse drags the cart.

[1]: https://github.com/dotnet/aspnetcore/pull/43202

halter731y ago

> I'm hoping this is satire.

Me too. It's one thing to accept single LFs in protocols that expect CRLF, but sending single LFs is a bridge to far in my opinion. I'm really surprised most of the other replies to your comment currently seem to unironically support not complying with well-established protocol specifications under the misguided notion that it will somehow make things "simpler" or "easier" for developers.

I work on Kestrel which is an HTTP server for ASP.NET Core. Kestrel didn't support LF without a CR in HTTP/1.1 request headers until .NET 7 [1]. Thankfully, I'm unaware of any widely used HTTP client that even supports sending HTTP/1.1 requests without CRLF header endings, but we did eventually get reports of custom clients that used only LFs to terminate headers.

I admit that we should have recognized a single LF as a line terminator instead of just CRLF from the beginning like the spec suggests, but people using just LF instead of CRLF in their custom clients certainly did not make things any simpler or easier for me as an HTTP server developer. Initially, we wanted to be as strict as possible when parsing request headers to avoid possible HTTP request smuggling attacks. I don't think allowing LF termination really allows for smuggling, but it is something we had to consider.

I do not support even adding the option to terminate HTTP/1.1 request/response headers with single LFs in HttpClient/Kestrel. That's just asking for problems because it's so uncommon. There are clients and servers out there that will reject headers with single LFs while they all support CRLF. And if HTTP/1.1 is still being used in 2050 (which seems like a safe bet), I guarantee most clients and servers will still use CRLF header endings. Having multiple ways to represent the exact same thing does not make a protocol simpler or easier.

jfengel1y ago

LF only? Huh.

In its original terms for printing terminals, carriage return might be ambiguous. It could means either "just send the print head to column zero" or "print head to 0 and advance the line by one". The latter is what typewriters do for the Return key.

But LF always meant Line Feed, moving the paper but not the print head.

These are of course wildly out of date concepts. But it still strikes me as odd to see a Line Feed as a context reset.

inopinatus1y ago

Not just potential bugs, there'll be definite security failures.

Changing the line endings can invalidate signatures over plaintext content. So an email MTA, for example, could never do so. Nor most proxy implementations. Then there's the high latent potential for request smuggling, command injection, and privilege escalation, via careful crafting of ambiguous header lines or protocol commands that target less robust implementations. With some protocols, it may cause declared content sizes to be incorrect, leading to bizarre hangs, which is to say, another attack surface.

In practice, retiring CRLF can't be safely performed unilaterally or by fiat, we'll need to devise a whole new handshake to affirm that both ends are on the same page re. newline semantics.

mechanicalpulse1y ago

> Why intentionally introduce potential bugs for the sake of making a point?

It seems spiteful, but it strikes me as an interesting illustration of how the robustness principle could be hacked to force change. It’s a descriptivist versus prescriptivist view of standards, which is not how we typically view standards.

jcul1y ago

Not disagreeing with you, but implementation diverges from spec a lot anyway.

I've had to write decoders for things like HTTP, SMTP, SIP (VoIP), and there's so many edge cases and undocumented behavior from different implementations that you have to still support.

I find that it affects text based protocols, a lot more than binary protocols. Like TLS, or RTP, to stick with the examples above, have much less divergence and are much less forgiving to broken (according to spec) implementations.

michaelmior1y ago

That's fair, but I don't see that as an argument for intentionally deviating from the spec.

chasil1y ago

FYI, Sendmail accepts LF without CR, but Exchange doesn't.

isThereClarity1y ago

sendmail 8.18.1 includes patches to correct this behaviour (and options to turn it back on) due to its role in SMTP smuggling, CVE-2023-51765. See https://ftp.sendmail.org/RELEASE_NOTES

  sendmail is now stricter in following the RFCs and rejects
  some invalid input with respect to line endings
  and pipelining:
  ...snip...
  - Accept only CRLF . CRLF as end of an SMTP message
    as required by the RFCs, which can disabled by the
    new srv_features option 'O'.
  - Do not accept a CR or LF except in the combination
    CRLF (as required by the RFCs).  These checks can
    be disabled by the new srv_features options
   'U' and 'G', respectively.  In this case it is
   suggested to use 'u2' and 'g2' instead so the server
   replaces offending bare CR or bare LF with a space.
   It is recommended to only turn these protections off
   for trusted networks due to the potential for abuse.

9dev1y ago

…how very in character for each of them!

javajosh1y ago

>What would be the benefit...

It is interesting that you ignore the benefits the OP describes and instead present a vague and fearful characterization of the costs. Your reaction lies at the heart of cargo-culting, the maintenance of previous decisions out of sheer dread. One can do a cost-benefit analysis and decide what to do, or you can let your emotions decide. I suggest that the world is better off with the former approach. To wit, the OP notes for benefits " The extra CR serves no useful purpose. It is just a needless complication, a vexation to programmers, and a waste of bandwidth." and a mitigation of the costs "You need to search really, really hard to find a device or application that actually interprets U+000a as a true linefeed." You ignore both the benefits assertion and cost mitigating assertion entirely, which is strong evidence for your emotionality.

YZF1y ago

What's your estimate for the cost of changing legacy protocols that use CRLF vs. the work that will be done to support those?

My intuition (not emotion) agrees with the parent that investing in changing legacy code that works, and doesn't see a lot of churn, is likely a lot more expensive than leaving it be and focusing on new protocols that over time end up replacing the old protocols anyways.

OP does not really talk about the benefit, he just opines. How many programmers are vexed when implementing "HTTP, SMTP, CSV, FTP"? I'd argue not many programmers work on implementations of these protocols today. How much traffic is wasted by a few extra characters in these protocols? I'd argue almost nothing. Most of the bits are (binary, compressed) payload anyways. There is no analysis by OP of the cost of not complying with the standard which potentially results in breakage and the difficulty of being able to accurately estimate the breakage/blast radius of that lack of compliance. That just makes software less reliable and less predictable.

michaelmior1y ago

You're right that I didn't mention the supposed benefits in my response. But let's incorporate those benefits into new protocols rather than break existing protocols. I just don't see the benefit in intentionally breaking existing protocols.

LegionMammal9781y ago

The cost is, if people start transitioning to a world where senders only transmit LF in opposition to current standards for protocols like HTTP/1.1 or SMTP (especially aggressively, e.g., by creating popular HTTP libraries without a CRLF option), then it will create the mental and procedural overhead of tracking which receivers accept LF alone vs. which still require CRLF. Switching established protocols is never free, even when there are definite benefits: see the Python 2-to-3 fiasco, caused by newer programs being incompatible with most older libraries.

perching_aix1y ago

> you ignore the benefits the OP describes

Funnily enough, the author doesn't actually describe any tangible benefits. It's all just (in my reading, semi-sarcastic) platonics:

- peace

- simplicity

- the flourishing of humanity

... so instead of "vague and fearful", the author comes on with a "vague and cheerful". Yay? The whole shtick about saving bandwidth, lessening complications, and reducing programmer vexations are only ever implied by the author, and were explicitly considered by the person you were replying to:

> You save a handful of bits at the expense of a lot of potential bugs.

... they just happened to be not super convinced.

Is this the kind of HackerNews comment I'm supposed to feel impressed by? That demonstrates this forum being so much better than others?

cassepipe1y ago

It seems to me the author is not suggesting to update the protocols themselves but rather to stop sending them CR even if the spec requires it. And to patch the corresponding software to it accepts simple newlines.

phkahler1y ago

>> I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

It's not satire and it's not just trying to make a point. It's trying to make things simpler. As he says, a lot of software will accept input without the CR already, even if it's supposed to be there. But we should change the standard over time so people in 2050 can stop writing code that's more complicated (by needing to eat CR) or inserts extra characters. And never mind the 2050 part, just do it today.

michaelmior1y ago

Ignoring established protocols doesn't make things simpler. It makes things vastly more complicated.

Let's absolutely fix new protocols (or new versions of existing protocols). But intentionally breaking existing protocols doesn't simplify anything.

nsnshsuejeb1y ago

Yes. We all know how to do this. You know that API version thingy. I agree to drop the carriage return when not needed but do it in future protocols.

Obviously IPv6 shows you need to be patient. Your great grandkids may see a useless carriage return!

Windows doesn't help here.

nedt1y ago

> What would be the benefit

Easy - being able to use a plain text protocol as a human being without having to worry if my terminal sends the right end of line terminator. Using netcat to debug SMTP issues is actually something I do often enough.

tfehring1y ago

At least for CSV, there's a divergence between usage in practice and the spec. The spec requires CRLF, but all of the commonly used tools I've encountered for reading and writing CSVs can read files with CR, LF, or CRLF line endings, and when writing CSVs they'll default to either LF or platform-specific line endings. (Even Excel for Mac doesn't default to CRLF!) I think this divergence is bad and should be fixed.

But IMO the right resolution is to update the spec so that (1) readers MUST accept any of (CR, LF, CRLF), (2) writers MUST use one of (CR, LF, CRLF), and (3) writers SHOULD use LF. Removing compatibility from existing applications to break legacy code would be asinine.

michaelmior1y ago

For CSV, "breaking" changes seem like less of a big deal to me. Partially because there is already so much variation in how CSV is implemented.

Ekaros1y ago

Thinking about it. Using CR alone in protocols actually make infinitely more sense. As that would allow use of LF in records. Which would make many use cases much simpler.

Just think about text protocols like HTTP, how much easier something like cookies would be to parse if you had CR as terminating character. And then each record separated by LF.

mattmerr1y ago

ASCII already has designated bytes for unit, group, and record separators. That aside, a big drawback of using unprintable bytes like these is they're more difficult for humans to read in dumps or type on a keyboard than a newline (provided newline has a strict definition CRLF, LF, etc)

1: https://stackoverflow.com/questions/73086622/is-a-gitattribu...

gpvos1y ago

That is so backwards incompatible that it is never, ever going to fly.

SQLite1y ago

Author here:

My title was imprecise and unclear. I didn't mean that you should raise errors if CRLF is used as a line terminator in (for example) HTTP, only that a bare NL should be allowed as an acceptable line terminator. RFC2616 recommends as much (section 19.3 paragraph 3) but doesn't require it. The text of my proposal does say that CRLF should continue to be accepted, for backwards compatibility, just not required and not generated by default. I failed to make that point clear.

My initial experiments suggested that this idea would work fine and that few people would even notice. Initially, it appeared that when systems only generate NL instead of CRLF, everything would just keep working seamlessly and without problems. But, alas, there are more systems in circulation that are unable to deal with bare NLs than I knew. And I didn't sell my idea very well. So there was breakage and push-back.

I have revised the document accordingly and reverted the various systems that I control to generate CRLFs again. The revolution is over. Our grandchildren will have to continue dealing with CRLFs, it seems. Bummer.

Thanks to everyone who participated in my experiment. I'm sorry it didn't work out.

AndyKelley1y ago

Copying my comment from lobste.rs in case you didn't see it:

I really appreciate this attitude. As programmers, we love to complain and grumble to each other about how the state of things suck, or that things are over complicated, but then too often the response is the software engineering equivalent of “I paid my student loans, so you should have to, too”. A new person joins the project, and WTFs at something, and the traumatized veterans say, “haha oh boy welcome, yeah everything sucks! You’ll get used to it soon.”

I hate that attitude.

We are at the very, very beginning of software protocols that could potentially last for millennia. From that perspective, you would look back at this situation and think of Richard’s blog post as super obvious, the clear voice of reason, and the reaction of everyone here as myopic.

Even if our software protocols for whatever reason don’t last that long, we need to be working on reducing global system complexity. Beauty and elegance aside, there is such a thing as complexity budget which is limited by the laws of information theory, the computer science equivalent of the laws of physics. People like Richard understand this intuitively, and actively work towards reconstructing our world to regain complexity currency so that it can be spent on more productive things.

I would have backed you 100%.

perching_aix1y ago

You remind me to the common wisdom regarding user feedback. That when your users complain about something, you should listen - not to their advice per se, but to their gripes. Because while their gripes may be legitimate, their advice is near guaranteed to be rubbish (they're not developers after all).

Specifically, I'm referring to your new guy example here. The new guy usually very correctly identifies that things suck, what he lacks is perspective. This means that both his priorities will be off, as well as his approaches. Trust the gripe, not the advice.

This is also I think what people in this thread are/were generally about here. Not because Richard would be some new unknown kid on the block mind you, but because our grandchildren having to deal with CRLF is approximately as harrowing as the eventual heat death of the universe, and because instead of standards revisions, he was calling for standards violations.

inopinatus1y ago

The problem with trying to legislate a specific case of Postel's Law is that everyone who might get on board because of it, is already on board because of it, and vice versa.

Ygg21y ago

It also goes a bit more than that, for optimal UTF8 search you want the ASCII separators. It's always easier to search for a single byte than two or more bytes of special pattern.

That said, I do agree we should abolish CRLF. And replace it with LF.

MatthiasPortzel1y ago

They acted on these words, updating their HTTP server to serve just \n.

=> https://sqlite.org/althttpd/info/8d917cb10df3ad28 Send bare \n instead of \r\n for all HTTP reply headers.

While browser aren't effected, this broke compatibility with at least Zig's HTTP client.

=> https://github.com/ziglang/zig/issues/21674 zig fetch does not work with sqlite.org

abhinavk1y ago

It has been reverted.

djha-skin1y ago

> Let's make CRLF one less thing that your grandchildren need to know about or worry about.

The struggle is real, the problem is real. Parents, teach your kids to use .gitattribute files[1]. While you're at it, teach them to hate byte order marks[2].

2: https://blog.djhaskin.com/blog/byte-order-marks-must-diemd/

Kwpolska1y ago

Nope. Git should not mess with line endings, the remote repository not matching the code in your local clone can bite you when you least expect it. On Windows, one should disable the autocrlf misfeature (git config --global core.autocrlf false) and configure their text editor to default to LF.

zulu-inuoe1y ago

3000% agree. I have been bitten endlessly by autocrlf. It is absolutely insane to me that anyone ever considered your having your SOURCE CONTROL tool get/set different content than what's in the repo

layer81y ago

This is impractical in many situations, because tools that process build-source files (for example XML files that control the build, or generated source files) inherently generate CRLF on Windows. These are many, many, many tools, not just one’s text editor.

The correct solution is to use .gitattributes.

djha-skin1y ago

Sure, don't use autocrlf. But some _windows_ tools need crlf, like for powershell or batch build scripts. Defaulting to lf in the editor will not save you.

Don't use `auto`, full marks, but the gitattributes file is indispensable as a safety net when explicit entries are used in it.

I mean, the whole point of the file is not everyone who is working on the project has their editors set to lf. Furthermore, not every tool is okay with line endings that are not CRLF.

When used properly (sure, ideally without auto), the git attributes file as a lifesaver.

[0] https://news.ycombinator.com/item?id=13860682

nsnshsuejeb1y ago

The letters after the dot in my filename don't map 1 to 1 with the file format.

perching_aix1y ago

Well, at least the title is honest. Straight up asking people to break standards out of sheer conviction is a new one for me personally, but it's definitely one of the attitudes of all time, so maybe it's just me being green.

Can we ask for the typical *nix text editors to disobey the POSIX standard of a text file next, so that I don't need to use hex editing to get trailing newlines off the end of files?

rkeene21y ago

People don't seem to mind when Chrome does it [0]. The response "standards aren't a death pact" stands out in particular.

akira25011y ago

Death pact? Jeez. Standards simply prevent people from having to waste time debugging dumb issues that rightfully could have been avoided.

perching_aix1y ago

Might be just my personal impression, but I'm pretty sure Chrome is extremely notorious for abusing its market leader position, including in this way. So gonna have to disagree there, from my view people do mind Chrome and its implementation particularities quite a lot.

bityard1y ago

Why would you want that?

All Unix text processing tools assume that every line in a text file ends in a newline. Otherwise, it's not a text file.

There's no such thing as a "trailing newline," there is only a line-terminating newline.

I've yet to hear a convincing argument why the last line should be an exception to that extremely long-standing and well understood convention.

perching_aix1y ago

> There's no such thing as a "trailing newline," there is only a line-terminating newline.

Is "line-terminating newline" a controlled / established term I'm unfamiliar with or am I right to hold deep contempt against you?

Because "trailing newline", contrary to what you claim, is 100% established terminology (in programming anyways), so I'd most definitely consider it "existing", and I find it actively puzzling that someone wouldn't.

2019841y ago

What's wrong with trailing newlines?

perching_aix1y ago

Other than select software being pissy about it, not much. Just like how there's nothing wrong with CRLF, except for select software being pissy about that too.

norir1y ago

It makes writing parsers more complicated in certain cases because you can't tell without lookahead if a newline character should be treated as a newline or an eof.

bigstrat20031y ago

Yeah, I have no idea what the author is smoking. Deliberately breaking standards is simply not an acceptable solution to the problem, even if it were a serious problem (it's not).

Ekaros1y ago

If there truly is a problem with existing protocols, propose and properly design new one that can replace it. Then if it is technically superior solution it should win in long run.

https://news.ycombinator.com/newsguidelines.html

moomin1y ago

Counterpoint: Unix deciding on a non-standard line ending was always a mistake. It has produced decades of random incompatibility for no particular benefit. CRLF isn’t a convention: it’s two different pieces of the base terminal API. You have no idea how many programs rely on CR and LF working correctly.

fanf21y ago

It is a standard line ending. ANSI X3.4-1968 says:

10 LF (Line Feed). A format effector that advances the active position to the same character position on the next line. (Also applicable to display devices.) Where appropriate, this character may have the meaning “New Line” (NL), a format effector that advances the active position to the first character position on the next line. Use of the NL convention requires agreement between sender and recipient of data.

ASCII 1968 - https://www.rfc-editor.org/info/rfc20

ASCII 1977 - https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub1-2-197...

wongarsu1y ago

The first sentence is exactly what LF is in CRLF, and implies the necessity of CR. CR returns the cursor to the first character of the active line, LF moves it one line down without changing the horizontal position.

The second sentence is the UNIX interpretation of LF doing the equivalent of CRLF. But calling it a standard line ending when it's an alternative meaning defined in the standard as "requires agreement between sender and recipient of data" is a bit of a stretch. It's permissible by the standard, but it's not the default as per the standard

eqvinox1y ago

Counter-counterpoint: using 2 bytes to signal one relevant operation creates ambiguity out of thin air. If all you care about is "where does the line end?", having CRLF as a line ending creates edge cases for "there is only CR" and "there is only LF". Are those line endings or not? How do you deal with them? And what's LFCR?

Personally speaking, I've always written my parsers to be permissive and accept either CR¹, LF, or CRLF as line endings. And it always meant keeping a little extra boolean for "previous byte was CR" to ignore the LF to not turn CRLF into 2 line endings.

¹ CR-only was used on some ancient (m68k era?) Macintosh computers I believe.

P.S.: LFCR is 2 line endings in my parsers :D

sebazzz1y ago

Didn’t some version of an Apple operating system have CR as a line ending?

matheusmoreira1y ago

Yeah. It's weird how Unix picked LF given its love of terminals. CRLF is the semantically correct line ending considering terminal semantics. It's present in the terminal subsystem to this day, people just don't notice because they have OPOST output post processing enabled which automatically converts LF into CRLF.

eqvinox1y ago

I'd argue (but have no historical context) that it's a distinction between storage format and presentation interface, and IMHO that makes a lot of sense. A terminal has other operations too, backspaces and deletes being the most basic. Which coincidentally are one hell of a mess across different terminal types between ^H / 0x08 and DEL / 0x7f as well…

(And these distinctions predate UNIX — if I were confronted with an inconsistent mess I'd go for simplicity too, and a 2-byte newline is definitely not simple just by merit of being 2 bytes. I personally wouldn't have cared whether it was CR or LF, but would have cared to make it a single byte.)

bmitc1y ago

I have always felt that somehow Linux and proponents of it default to every decision it made being right and everything else, namely Windows, being wrong. I honestly feel Linux is orders of magnitude more complex. It is much easier, in my experience to make software just work on Windows. (This is not to say Windows doesn't have bad decisions. It has many. All the major OSs are terrible.)

globular-toast1y ago

But, like the article says, LF is not useful. I could always interpret LF as NL and if you send CRs too it won't break anything. If you know I'm interpreting LF that way you can just stop sending the CRs. That's what happened in Unix.

rgmerk1y ago

Of all the stupid and obsolete things in standards we use to interoperate, CRLF is one of the least consequential.

fweimer1y ago

SMTP <https://datatracker.ietf.org/doc/html/rfc2821#section-4.1.1....> is pretty clear that the message termination sequence is CR LF . CR LF, not LF . LF, and disagreements in this spot are known to cause problems (include undesirable message injection). But then enough alternative implementations that recognize LF . LF as well are out there, so maybe the original SMTP rules do not matter anymore.

mvdtnz1y ago

[flagged]

dang1y ago

"Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that.""

ripe1y ago

Ha, ha, ha! I love it. I believe the author is serious, and I think he's on to something.

OP clearly says that most things in fact don't break if you just don't comply with the CRLF requirement in the standard and send only LF. (He calls LF "newline". OK, fine, his reasoning seems legit.) He is not advocating changing the language of the standard.

To all those people complaining that this is a minor matter and the wrong hill to die on, I say this: most programmers today are blindly depending on third-party libraries that are full of these kinds of workarounds for ancient, weird vestigial crud, so they might think this is an inconsequential thing. But if you're from the school of pure, simple code like the SQLite/Fossil/TCL developers, then you're writing the whole stack from scratch, and these things become very, very important.

Let me ask you instead: why do you care if somebody doesn't comply with the standard? The author's suggestion doesn't affect you in any way, since you'll just be using some third-party library and won't even know that anything is different.

Oh bUT thE sTandArDs.

abhinavk1y ago

> (He calls LF "newline". OK, fine, his reasoning seems legit.) He is not advocating changing the language of the standard.

The Unicode standard does call it NL along with LF.

    000A  <control>
      = LINE FEED (LF)
      = new line (NL)
      = end of line (EOL)

Source: https://www.unicode.org/charts/PDF/U0000.pdf

tedunangst1y ago

No mention of what happened the last time we mixed and matched line endings? https://smtpsmuggling.com/

deltaknight1y ago

Doesn’t this show that ignoring CR and only processing LFs is a good idea? If I’m understanding right (probably wrong), this vuln relied on some servers using CRLF only as endings, and others supporting both CRLF and LF.

If every server updated to line-end of LF, thereby supporting both types, this vuln wouldn’t happen?

Of course if there’s is a mixed bag then I guess this is still possible, if your server only supports CRLF. At least in that scenario you have some control over the issue though.

hifromwork1y ago

Yes, if every server/middleware implemented parsing in the same way this kind of vulnerability wouldn't happen. Same goes for HTTP smuggling and other smuggling attacks.

Unfortunately, asking more people to ignore the currently estabilished standards makes the problem worse, not better.

dwattttt1y ago

As I mentioned else-thread: it doesn't matter as much which option is chosen, so long as everyone agrees. If everyone agrees that LF on its own is enough (and we stop sending CR's to make sure it's not part of whatever comes before LF), that's fine. But it's just as fine for everyone to agree that CRLF is right, and reject plain LF.

anonymousiam1y ago

This article seems like it was written to troll people into a flame war. There is no such character as NL, and the article does not at all address that fact that the "ENTER" key on every keyboard sends a CR and not a LF. Things work fine the way they are.

TacticalCoder1y ago

> There is no such character as NL ...

More specifically the Unicode control character U+000a is, in the Unicode standard, named both LF and NL (and that comes from ASCII but in ASCII I think 0x0a was only called LF).

It literally has both names in Unicode: but LINEFEED is written in uppercase while newline is written in lowercase (not kidding you). You can all see for yourself that U+000a has both names (and eol too):

> and the article does not at all address that fact that the "ENTER" key on every keyboard sends a CR and not a LF.

what a key on a keyboard sends doesn't matter though. What matters is what gets written to files / what is sent over the wire.

    ... $  cat > /tmp/anonymousiam<ENTER>
    <ENTER>
    <CTRL-C>

    ... $  hexdump /tmp/anonymousiam
    00000000  000a

When I hit ENTER at my Linux terminal above, it's LINEFEED that gets written to the file. Under Windows I take it the same still gets CRLF written to the file as in the Microsoft OSes of yore (?).

> Things work fine the way they are.

I agree

anonymousiam1y ago

Try the cat example again with your tty in raw mode instead of cooked mode.

(stty raw)

Note that your job control characters will no longer function, so you will need to kill the cat command from a different terminal, then type: stty sane (or stty cooked) to restore your terminal to "normal" operation.

You will then see the 0d hex carriage return characters in the /tmp/anonymousiam file, and no 0a hex linefeed characters present.

eviks1y ago

> There is no such character as NL,

There is, copying from a helpful comment above:

> The Unicode standard does call it NL along with LF.

    000A  <control>
      = LINE FEED (LF)
      = new line (NL)
      = end of line (EOL)

Source: https://www.unicode.org/charts/PDF/U0000.pdf

And things don't work fine, there are many issues with this historical baggage

anonymousiam1y ago

I agree that there are issues. Making radical changes to legacy behavior will cause more.

o11c1y ago

U+0085 is sometimes called NL (it is the standard in EBCDIC), but more often NEL in the ASCII world.

deltaknight1y ago

As an implementation detail, I assume many programs simply ignore the CR character already? Whilst of course many windows programs (and protocols as mentioned) still require CRLF, surely the most efficient way to make something cross-platform if to simply act on the LF part of CRLF, that way it works for both CRLF and LF line ends.

The fact that both CRLF and LF used the same control character in my eyes in a huge bonus for this type of action to actually work. Simply make everything cross platform and start ignoring CR completely. I’m surprised this isn’t mentioned explicitly as a course of action in the article, instead it focuses on making people change their understanding of LF in to NL which is as unnecessary complication that will cause inevitable bikeshedding around this idea.

phkahler1y ago

>> instead it focuses on making people change their understanding of LF in to NL which is as unnecessary complication that will cause inevitable bikeshedding around this idea.

Not really. In order to ignore CR you need to treat LF as NL.

deltaknight1y ago

Fair point, although I’d suggest that many programs already treat LF as NL (e.g. unix text files), so this understanding of the meaning of LF already exists in the world. If you’re writing anything generic/cross-platform, you have to be able to treat LF as NL. So there isn’t really a change to be made here.

zac23or1y ago

> Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply. Send only NL.

Insane. First i think it was a April 1st joke, but is not.

Let's break everything because YES.

Quekid51y ago

Indeed. Very strange to hear a break-the-world suggestion from a person leading a company famous for never breaking the world.

I'm kind of confused by this whole post.

I do understand the desire for simplification (let's ignore the argument of whether this is one), but...

pathartl1y ago

Let's all move to little-endian while we're at it. Don't accept anything else!

sunk1st1y ago

> Nobody ever wants to be in the middle of a line, then move down to the next line and continue writing in the next column from where you left off. No real-world program ever wants to do that.

Is this true?

anamax1y ago

No, it's not true.

It was used for "graphics" on character-only terminals.

numpad01y ago

isn't CR without LF how CLI progress bars work?

cowsandmilk1y ago

He says there are good usages of CR, he only argues for getting rid of LF.

https://en.wikipedia.org/wiki/IBM_Selectric

Ekaros1y ago

Nope.

ericyd1y ago

I'm not trying to be obtuse but I am actually confused how a modern machine correctly interprets CRLF based on the description in this post.

If a modern machine interprets LF as a newline, and the cursor is moved to the left of the current row before the newline is issued, wouldn't that add a newline _before_ the current line, i.e. a newline before the left most character of the current line? Obviously this isn't how it works but I don't understand why not.

chowells1y ago

Line feed is "move the cursor down one line". It's irrelevant what is currently on the line. These are printer/terminal control instructions, not text editing instructions.

ericyd1y ago

Ok, I conflated terminal instruction with text editing instruction. I thought the post made them sound like they behave the same but it sounds like I misunderstood, thank you.

BlueTemplar1y ago

If you are thinking of it being more like pressing "Home" then "Enter", it would seem that "Enter" actually works more like LFCR ?

ericyd1y ago

Yes this is a good description of my confusion

layer81y ago

Unless your editor is in auto-indent mode. ;)

shadowgovt1y ago

Define "abolish."

We could certainly try to write no new software that uses them.

But last I checked, there are terabytes and terabytes of stored data in various formats (to say nothing of living protocols already deployed) and they aren't gonna stop using CRLF any time soon.

eviks1y ago

Is defined in 4 points at the end

justin661y ago

Nice. I think that's the most energized I've seen Richard Hipp on a topic.

zulu-inuoe1y ago

Of all the hills to die on. What an unbelivably silly one. CRLF sucks, suck it up. As many others have noted, there are millions of devices this idea puts in jeopardy for absolutely no reason. We should be reducing the exceptions, not creating them

fortran771y ago

The article had some major gaffes. Teletypes never had a ball. The stationary platen models had type boxes and cylinders, but never balls.

refset1y ago

Not sure whether this changes anything about your critique, but note that the IBM 2741 terminal embedded a Selectric typewriter:

> Selectric-based mechanisms were also widely used as terminals for computers, replacing both Teletypes and older typebar-based output devices. One popular example was the IBM 2741 terminal

wrs1y ago

Well, it says right there, the 2741 replaced Teletypes. It wasn't a Teletype. (Not sure I'd call this a "major gaffe", though!)

srg01y ago

I would also like to point out that English spelling is obsolete and should be abolished (/s). The text of the CRLF abolition proposal itself contains more digraphs, trigraphs, diphthongs, and silent letters than line-ending sequences. The last letter of the word "obsolete" is not necessary. "Should" can be written as only three letters in Shavian "𐑖𐑫𐑛".

According to ChatGPT, the original proposal had:

Number of sentences: 60 Number of diphthongs: 128 (pairs of vowels in the same syllable like "ai", "ea", etc.) Number of digraphs: 225 (pairs of letters representing a single sound, like "th", "ch", etc.) Number of trigraphs: 1 (three-letter combinations representing a single sound, like "sch") Number of silent letters: 15 (common silent letter patterns like "kn", "mb", etc.)

For all intents and purposes, CRLF is just another digraph.

ksp-atlas1y ago

I'm a big fan of English spelling reform and know Shavian and sometimes write in it, but I feel shavian is limited due to how heavily it uses letter rotation. Dyslexics already have trouble with b, d, p and q, having most letters have a rotated form would be challenging

fracus1y ago

I read your article and am now fully indoctrinated to your noble cause. I propose an official chant. "Death to LF!"

WillAdams1y ago

FWIW, I actually find CRLF handy in a database export I work in --- it exports cells with multiple lines by using LF for the linebreaks --- I open it in a text editor, replace all LFs w/ \\ (so as to get a single line for each data record and to cause the linebreaks to happen in LaTeX), and it's ready for further processing.

bmitc1y ago

Does anyone besides poorly designed Unix tools and Git actually get confused by any of this? I configure my editor to just use LF on whatever OS to appease Linux and configure Git to never mess with them. And in dealing with serial protocols, it's never an issue.

Eduard1y ago

> Stop using "linefeed" as the name for the U+000a code point.

stop reinventing terms. it's literally standardized with the name "LF" / "line feed" in Unicode.

lifthrasiir1y ago

Just in case... Unicode doesn't define anything about C0 control characters. Everything you see from the code chart is from ISO/IEC 6429 and only shown there for information. Some parts of Unicode and related standards do assign a special meaning to U+000A, but often also to U+000D for the obvious reason.

wongogue1y ago

He is not reinventing anything. Unicode also defines 0a as LF, NL and EOL. In modern software, 0a is used as NL anyway.

steeeeeve1y ago

Once in a long while, people start looking into things and wanting them to make sense.

Like hey - why don't we start using the field separator and record separator characters when exporting/importing data.

But then you end up realizing that even when you are right, the energy it would take to push a change like that is astounding.

Those who successfully create an RFC and find a way push it through all the way to it becoming a standard are admirable people.

fijiaarone1y ago

Line feed is exactly what you do when you are editing text. But nobody uses it.

CR + LF was meant as an instruction for teletype printers, so it is outdated, and looks like he withdrew the proposal (which couldn’t have ever been serious) after some feedback.

Fossil SCM, btw, was written by the creator of SQLite, so his opinion shouldn’t be discounted as some random nobody.

lynx231y ago

Can OP please tell me how to abolsih CR while in Raw Mode? Did he forget about it, or am I just unimaginative?

samatman1y ago

Right, you don't need to search that hard for a device which interprets 0xA as a line feed, just set your terminal to raw mode, done.

But given the very first sentence:

> CR and NL are both useful control characters.

I'm willing to conclude that he doesn't intend A Blaste Against The Useless Appendage of Carriage Return Upon a New Line, or Line Feed As Some Style It, to apply to emulators of the old devices which make actual use of the distinction.

lynx231y ago

I know that we're technically emulating old devices... But that time is so long gone. I actually never worked on a hardware terminal in my entire career, which is already almost 30 years. I think it is about time to stop calling it emulation, because thats no longer what it is. Its simply the way how text mode applications do I/O. It has become so ubiquitous that ncurses is slowly going out of fashion, because you might as well just use the common ANSI escape sequences, because they're supported everywhere anyways. IOW, raw mode isn't just an emulation required to get a 50 year old peripheral device to work, its necessary for almost everything that sits between an CLI and a GUI.

https://github.com/jftuga/chars

jftuga1y ago

I wrote a command line program to determine/detect the end-of-line format, tabs, bom, and nul characters

Stand-alone binaries are provided for all major platforms.

NelsonMinar1y ago

sqlite is a work of absolute genius. But every once in awhile something comes along to remind us how weird its software background is. Fossil. The build system. The TCL test harness. And now this, a quixotic attempt to break 50+ years of text formatting and network protocols.

Yes CRLF is dumb. No, replacing it is not realistic.

pdonis1y ago

For extra fun, the original Mac OS used CR by itself to mean newline.

eska1y ago

I’ll join your revolution under the condition that I’m allowed to ignore all lines of text that don’t end in a newline :) (POSIX)

truetraveller1y ago

I agree 100%. This is the cause of endless confusion, especially in crossplatform text files. Not to mention parsing programmatically.

theginger1y ago

Ridiculous! We need to develop 1 universal standard that covers everyone's use cases. Yeah!

gfody1y ago

we should leave it for backwards compatibility and adopt U+0085 as the standard next line codepoint. and utf8 libraries could unofficially support every combination of 0A 0D as escape sequences.

WesolyKubeczek1y ago

> Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply. Send only NL.

Now just go pound sand. Seriously. And you owe me 5 minutes of my life wasted on reading the whole thing.

My god, I would have thought all those “simplification” ideas die off once you have 3 years of experience or more. Some people won’t learn.

P. S. Guess even the most brilliant people tend to have dumb ideas sometimes.

elcritch1y ago

Conversely, I'd argue most brilliant people tend to have more dumb ideas than others, usually on oddly specific topics which most people would find inconsequential.

the_gorilla1y ago

It's true. Smart people tend to have a lot of novel ideas, most of which are going to be retarded. Most people just have no ideas.

A4ET8a8uTh01y ago

I feel it necessary to have an obligatory 'Would someone think of banking?' before we 'abolish'(however we eventually arrive at defining it )anything.

I mean it is all cool to have this idea, but real world implications, where half the stuff dangles on a text file, appear to be not considered here.

For clarity's sake, I am not saying don't do it. I am saying: how will that work?

edit: spaces, tabs and one crlf

webprofusion1y ago

Next you'll be telling us to use spaces instead of tab.

midnitewarrior1y ago

If you'd like to break every system, and nearly every protocol, start abolishing arbitrary line endings that have been used for decades.

That will make things better.

nunobrito1y ago

XKCD has graphically replied to this topic: https://xkcd.com/927/

eviks1y ago

It hasn't like it never does. In this case your mistake is that number of standards doesn't change

nuancebydefault1y ago

It is a bit like saying, let's forget about AM radio since FM radio is much better. Oh wait forget FM since DAB exists. Oh wait forget about broadcast protocols since everything is point to point or multicast now.

The reality is that existing protocols CANNOT be changed. Only new versions are released and the old ones (which might rely on CRLF) will never die.

Animats1y ago

Now convince Microsoft. It's really the legacy of DOS that keeps this alive.

nycdotnet1y ago

Even Notepad.exe supports LF only text files now.

dankwizard1y ago

"Call to action" my god guy get a grip youre upset about some unicode

M95D1y ago

But adopting this new standard means we'll have to re-tool entire industries! /s

Ekaros1y ago

I think I can offer most reasonable compromise here. Decide upon on new UTF-8 code point. Have the use mandated and ignore and ban all end-points that do not use this code-point instead of CRLF or just LF alone.

bear86421y ago

> Decide upon on new UTF-8 code point.

Unicode have already done so - (NEL) https://www.compart.com/en/unicode/U+0085

phkahler1y ago

So break everything.

nycdotnet1y ago

Which would need to be encoded in at least two bytes at which point, why not just use CRLF?

kps1y ago

You mean U+2028 LINE SEPARATOR?

Ekaros1y ago

Perfect. So now we just need to start filing bug reports to any tool that does not support it instead of CRLF or LF alone.

bear86421y ago

Oh, yet another option - first thought was U+0085 NEXT LINE as above

whizzter1y ago

https://xkcd.com/927/

I could not possibly disagree with this more strongly or violently.

In short - shutup and deal with it. Is it an extremely mild and barely inconvenient nuisance to deal with different or mixed line endings? Yes. Is this actually a hard or difficult problem? No.

Stop trying to force everyone to break their backs so your life is inconsequentially easier. Deal with it and move on.

Avamander1y ago

Why do we _have to_ keep bringing this legacy baggage with us for the next decades though?

Allowing CRFL-less operation intentionally, especially in new implementations. Abusing protocol tolerance is (just a bit) to switch current ones. Should allow relatively gradual progress towards Less Legacy:tm: with basically no cost.

Not every change is "breaking your back" especially if you should be updating your systems anyways to implement other, larger and more important changes.

Because it’s literally fine and a non-issue. Only whiny Linux babies cry about it. It’s trivial for tools to support. Trivial. Like this is easiest, least harmful baggage in the history tech debt baggage.

There will always be tech debt. Always and forever. Burn cycles on one that matters.

j / k navigate · click thread line to collapse

264 comments

michaelmior1y ago

> various protocols (HTTP, SMTP, CSV) still "require" CRLF at the end of each line

Sure, it makes sense not to require CRLF with any new protocols, but it doesn't seem worth updating legacy things.

> Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply.

I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

FiloSottile1y ago

Exactly. Please DO NOT mess with protocols, especially legacy critical protocols based on in-band signaling.

(I am a massive, massive SQLite fan, but this is giving me pause about using other software by the same author, at least when networks are involved.)

pdw1y ago

The situation is different with SMTP, see https://www.postfix.org/smtp-smuggling.html

tptacek1y ago

This would be more persuasive if HTTP servers didn't already widely accept bare 0ah line termination. What's the first major public web site you can find that doesn't?

5 more replies

Spooky231y ago

rtpg1y ago

Took me a second to get what was going on here, but basically the idea is that you middleware might not see `C:D`, but then your application _does_ see `C:D`.

Not a bytes-alignment thing but a "header values disagreement" thing.

This is an issue if one part of your stack parses headers differently than another in general though, not limited to newlines.

refulgentis1y ago

I wouldn't be too worried and making personal judgements, he says the same thing you are (though I assume you disagree)

mackal1y ago

> massive SQLite fan, but this is giving me pause about using other software by the same author

Even if I wanted to contribute code to SQLite, I can't. I acknowledge the fact God doesn't exist, so he doesn't want my contributions :P

https://www.postfix.org/smtp-smuggling.html

deanishe1y ago

> but this is giving me pause about using other software by the same author

Go read the article again. I think you'll be pleasantly surprised.

amluto1y ago

> I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

It’s worse than satire. Postel’s Law is definitively wrong, at least in the context of network protocols, and delimiters, especially, MUST be precise. See, for example:

Send exactly what the spec requires, and parse exactly as the spec requires. Do not accept garbage. And LF, where CRLF is specified, is garbage.

tptacek1y ago

[1]: https://github.com/dotnet/aspnetcore/pull/43202

halter731y ago

> I'm hoping this is satire.

jfengel1y ago

LF only? Huh.

But LF always meant Line Feed, moving the paper but not the print head.

These are of course wildly out of date concepts. But it still strikes me as odd to see a Line Feed as a context reset.

inopinatus1y ago

Not just potential bugs, there'll be definite security failures.

In practice, retiring CRLF can't be safely performed unilaterally or by fiat, we'll need to devise a whole new handshake to affirm that both ends are on the same page re. newline semantics.

mechanicalpulse1y ago

> Why intentionally introduce potential bugs for the sake of making a point?

jcul1y ago

Not disagreeing with you, but implementation diverges from spec a lot anyway.

I've had to write decoders for things like HTTP, SMTP, SIP (VoIP), and there's so many edge cases and undocumented behavior from different implementations that you have to still support.

michaelmior1y ago

That's fair, but I don't see that as an argument for intentionally deviating from the spec.

chasil1y ago

FYI, Sendmail accepts LF without CR, but Exchange doesn't.

isThereClarity1y ago

sendmail 8.18.1 includes patches to correct this behaviour (and options to turn it back on) due to its role in SMTP smuggling, CVE-2023-51765. See https://ftp.sendmail.org/RELEASE_NOTES

  sendmail is now stricter in following the RFCs and rejects
  some invalid input with respect to line endings
  and pipelining:
  ...snip...
  - Accept only CRLF . CRLF as end of an SMTP message
    as required by the RFCs, which can disabled by the
    new srv_features option 'O'.
  - Do not accept a CR or LF except in the combination
    CRLF (as required by the RFCs).  These checks can
    be disabled by the new srv_features options
   'U' and 'G', respectively.  In this case it is
   suggested to use 'u2' and 'g2' instead so the server
   replaces offending bare CR or bare LF with a space.
   It is recommended to only turn these protections off
   for trusted networks due to the potential for abuse.

9dev1y ago

…how very in character for each of them!

javajosh1y ago

>What would be the benefit...

YZF1y ago

What's your estimate for the cost of changing legacy protocols that use CRLF vs. the work that will be done to support those?

michaelmior1y ago

LegionMammal9781y ago

perching_aix1y ago

> you ignore the benefits the OP describes

Funnily enough, the author doesn't actually describe any tangible benefits. It's all just (in my reading, semi-sarcastic) platonics:

- peace

- simplicity

- the flourishing of humanity

> You save a handful of bits at the expense of a lot of potential bugs.

... they just happened to be not super convinced.

Is this the kind of HackerNews comment I'm supposed to feel impressed by? That demonstrates this forum being so much better than others?

cassepipe1y ago

phkahler1y ago

>> I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

michaelmior1y ago

Ignoring established protocols doesn't make things simpler. It makes things vastly more complicated.

Let's absolutely fix new protocols (or new versions of existing protocols). But intentionally breaking existing protocols doesn't simplify anything.

nsnshsuejeb1y ago

Yes. We all know how to do this. You know that API version thingy. I agree to drop the carriage return when not needed but do it in future protocols.

Obviously IPv6 shows you need to be patient. Your great grandkids may see a useless carriage return!

Windows doesn't help here.

nedt1y ago

> What would be the benefit

tfehring1y ago

michaelmior1y ago

For CSV, "breaking" changes seem like less of a big deal to me. Partially because there is already so much variation in how CSV is implemented.

Ekaros1y ago

Thinking about it. Using CR alone in protocols actually make infinitely more sense. As that would allow use of LF in records. Which would make many use cases much simpler.

Just think about text protocols like HTTP, how much easier something like cookies would be to parse if you had CR as terminating character. And then each record separated by LF.

mattmerr1y ago

1: https://stackoverflow.com/questions/73086622/is-a-gitattribu...

gpvos1y ago

That is so backwards incompatible that it is never, ever going to fly.

SQLite1y ago

Author here:

Thanks to everyone who participated in my experiment. I'm sorry it didn't work out.

AndyKelley1y ago

Copying my comment from lobste.rs in case you didn't see it:

I hate that attitude.

I would have backed you 100%.

perching_aix1y ago

inopinatus1y ago

The problem with trying to legislate a specific case of Postel's Law is that everyone who might get on board because of it, is already on board because of it, and vice versa.

Ygg21y ago

It also goes a bit more than that, for optimal UTF8 search you want the ASCII separators. It's always easier to search for a single byte than two or more bytes of special pattern.

That said, I do agree we should abolish CRLF. And replace it with LF.

MatthiasPortzel1y ago

They acted on these words, updating their HTTP server to serve just \n.

=> https://sqlite.org/althttpd/info/8d917cb10df3ad28 Send bare \n instead of \r\n for all HTTP reply headers.

While browser aren't effected, this broke compatibility with at least Zig's HTTP client.

=> https://github.com/ziglang/zig/issues/21674 zig fetch does not work with sqlite.org

abhinavk1y ago

It has been reverted.

djha-skin1y ago

> Let's make CRLF one less thing that your grandchildren need to know about or worry about.

The struggle is real, the problem is real. Parents, teach your kids to use .gitattribute files[1]. While you're at it, teach them to hate byte order marks[2].

2: https://blog.djhaskin.com/blog/byte-order-marks-must-diemd/

Kwpolska1y ago

zulu-inuoe1y ago

3000% agree. I have been bitten endlessly by autocrlf. It is absolutely insane to me that anyone ever considered your having your SOURCE CONTROL tool get/set different content than what's in the repo

layer81y ago

The correct solution is to use .gitattributes.

djha-skin1y ago

Sure, don't use autocrlf. But some _windows_ tools need crlf, like for powershell or batch build scripts. Defaulting to lf in the editor will not save you.

Don't use `auto`, full marks, but the gitattributes file is indispensable as a safety net when explicit entries are used in it.

I mean, the whole point of the file is not everyone who is working on the project has their editors set to lf. Furthermore, not every tool is okay with line endings that are not CRLF.

When used properly (sure, ideally without auto), the git attributes file as a lifesaver.

[0] https://news.ycombinator.com/item?id=13860682

nsnshsuejeb1y ago

The letters after the dot in my filename don't map 1 to 1 with the file format.

perching_aix1y ago

Can we ask for the typical *nix text editors to disobey the POSIX standard of a text file next, so that I don't need to use hex editing to get trailing newlines off the end of files?

rkeene21y ago

People don't seem to mind when Chrome does it [0]. The response "standards aren't a death pact" stands out in particular.

akira25011y ago

Death pact? Jeez. Standards simply prevent people from having to waste time debugging dumb issues that rightfully could have been avoided.

perching_aix1y ago

bityard1y ago

Why would you want that?

All Unix text processing tools assume that every line in a text file ends in a newline. Otherwise, it's not a text file.

There's no such thing as a "trailing newline," there is only a line-terminating newline.

I've yet to hear a convincing argument why the last line should be an exception to that extremely long-standing and well understood convention.

perching_aix1y ago

> There's no such thing as a "trailing newline," there is only a line-terminating newline.

Is "line-terminating newline" a controlled / established term I'm unfamiliar with or am I right to hold deep contempt against you?

2019841y ago

What's wrong with trailing newlines?

perching_aix1y ago

Other than select software being pissy about it, not much. Just like how there's nothing wrong with CRLF, except for select software being pissy about that too.

norir1y ago

It makes writing parsers more complicated in certain cases because you can't tell without lookahead if a newline character should be treated as a newline or an eof.

bigstrat20031y ago

Yeah, I have no idea what the author is smoking. Deliberately breaking standards is simply not an acceptable solution to the problem, even if it were a serious problem (it's not).

Ekaros1y ago

If there truly is a problem with existing protocols, propose and properly design new one that can replace it. Then if it is technically superior solution it should win in long run.

https://news.ycombinator.com/newsguidelines.html

moomin1y ago

fanf21y ago

It is a standard line ending. ANSI X3.4-1968 says:

ASCII 1968 - https://www.rfc-editor.org/info/rfc20

ASCII 1977 - https://nvlpubs.nist.gov/nistpubs/Legacy/FIPS/fipspub1-2-197...

wongarsu1y ago

eqvinox1y ago

¹ CR-only was used on some ancient (m68k era?) Macintosh computers I believe.

P.S.: LFCR is 2 line endings in my parsers :D

sebazzz1y ago

Didn’t some version of an Apple operating system have CR as a line ending?

matheusmoreira1y ago

eqvinox1y ago

bmitc1y ago

globular-toast1y ago

rgmerk1y ago

Of all the stupid and obsolete things in standards we use to interoperate, CRLF is one of the least consequential.

fweimer1y ago

mvdtnz1y ago

[flagged]

dang1y ago

"Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that.""

ripe1y ago

Ha, ha, ha! I love it. I believe the author is serious, and I think he's on to something.

Oh bUT thE sTandArDs.

abhinavk1y ago

> (He calls LF "newline". OK, fine, his reasoning seems legit.) He is not advocating changing the language of the standard.

The Unicode standard does call it NL along with LF.

    000A  <control>
      = LINE FEED (LF)
      = new line (NL)
      = end of line (EOL)

Source: https://www.unicode.org/charts/PDF/U0000.pdf

tedunangst1y ago

No mention of what happened the last time we mixed and matched line endings? https://smtpsmuggling.com/

deltaknight1y ago

If every server updated to line-end of LF, thereby supporting both types, this vuln wouldn’t happen?

Of course if there’s is a mixed bag then I guess this is still possible, if your server only supports CRLF. At least in that scenario you have some control over the issue though.

hifromwork1y ago

Yes, if every server/middleware implemented parsing in the same way this kind of vulnerability wouldn't happen. Same goes for HTTP smuggling and other smuggling attacks.

Unfortunately, asking more people to ignore the currently estabilished standards makes the problem worse, not better.

dwattttt1y ago

anonymousiam1y ago

TacticalCoder1y ago

> There is no such character as NL ...

More specifically the Unicode control character U+000a is, in the Unicode standard, named both LF and NL (and that comes from ASCII but in ASCII I think 0x0a was only called LF).

> and the article does not at all address that fact that the "ENTER" key on every keyboard sends a CR and not a LF.

what a key on a keyboard sends doesn't matter though. What matters is what gets written to files / what is sent over the wire.

    ... $  cat > /tmp/anonymousiam<ENTER>
    <ENTER>
    <CTRL-C>

    ... $  hexdump /tmp/anonymousiam
    00000000  000a

When I hit ENTER at my Linux terminal above, it's LINEFEED that gets written to the file. Under Windows I take it the same still gets CRLF written to the file as in the Microsoft OSes of yore (?).

> Things work fine the way they are.

I agree

anonymousiam1y ago

Try the cat example again with your tty in raw mode instead of cooked mode.

(stty raw)

You will then see the 0d hex carriage return characters in the /tmp/anonymousiam file, and no 0a hex linefeed characters present.

eviks1y ago

> There is no such character as NL,

There is, copying from a helpful comment above:

> The Unicode standard does call it NL along with LF.

    000A  <control>
      = LINE FEED (LF)
      = new line (NL)
      = end of line (EOL)

Source: https://www.unicode.org/charts/PDF/U0000.pdf

And things don't work fine, there are many issues with this historical baggage

anonymousiam1y ago

I agree that there are issues. Making radical changes to legacy behavior will cause more.

o11c1y ago

U+0085 is sometimes called NL (it is the standard in EBCDIC), but more often NEL in the ASCII world.

deltaknight1y ago

phkahler1y ago

>> instead it focuses on making people change their understanding of LF in to NL which is as unnecessary complication that will cause inevitable bikeshedding around this idea.

Not really. In order to ignore CR you need to treat LF as NL.

deltaknight1y ago

zac23or1y ago

> Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply. Send only NL.

Insane. First i think it was a April 1st joke, but is not.

Let's break everything because YES.

Quekid51y ago

Indeed. Very strange to hear a break-the-world suggestion from a person leading a company famous for never breaking the world.

I'm kind of confused by this whole post.

I do understand the desire for simplification (let's ignore the argument of whether this is one), but...

pathartl1y ago

Let's all move to little-endian while we're at it. Don't accept anything else!

sunk1st1y ago

> Nobody ever wants to be in the middle of a line, then move down to the next line and continue writing in the next column from where you left off. No real-world program ever wants to do that.

Is this true?

anamax1y ago

No, it's not true.

It was used for "graphics" on character-only terminals.

numpad01y ago

isn't CR without LF how CLI progress bars work?

cowsandmilk1y ago

He says there are good usages of CR, he only argues for getting rid of LF.

https://en.wikipedia.org/wiki/IBM_Selectric

Ekaros1y ago

Nope.

ericyd1y ago

I'm not trying to be obtuse but I am actually confused how a modern machine correctly interprets CRLF based on the description in this post.

chowells1y ago

Line feed is "move the cursor down one line". It's irrelevant what is currently on the line. These are printer/terminal control instructions, not text editing instructions.

ericyd1y ago

Ok, I conflated terminal instruction with text editing instruction. I thought the post made them sound like they behave the same but it sounds like I misunderstood, thank you.

BlueTemplar1y ago

If you are thinking of it being more like pressing "Home" then "Enter", it would seem that "Enter" actually works more like LFCR ?

ericyd1y ago

Yes this is a good description of my confusion

layer81y ago

Unless your editor is in auto-indent mode. ;)

shadowgovt1y ago

Define "abolish."

We could certainly try to write no new software that uses them.

But last I checked, there are terabytes and terabytes of stored data in various formats (to say nothing of living protocols already deployed) and they aren't gonna stop using CRLF any time soon.

eviks1y ago

Is defined in 4 points at the end

justin661y ago

Nice. I think that's the most energized I've seen Richard Hipp on a topic.

zulu-inuoe1y ago

fortran771y ago

The article had some major gaffes. Teletypes never had a ball. The stationary platen models had type boxes and cylinders, but never balls.

refset1y ago

Not sure whether this changes anything about your critique, but note that the IBM 2741 terminal embedded a Selectric typewriter:

> Selectric-based mechanisms were also widely used as terminals for computers, replacing both Teletypes and older typebar-based output devices. One popular example was the IBM 2741 terminal

wrs1y ago

Well, it says right there, the 2741 replaced Teletypes. It wasn't a Teletype. (Not sure I'd call this a "major gaffe", though!)

srg01y ago

According to ChatGPT, the original proposal had:

For all intents and purposes, CRLF is just another digraph.

ksp-atlas1y ago

fracus1y ago

I read your article and am now fully indoctrinated to your noble cause. I propose an official chant. "Death to LF!"

WillAdams1y ago

bmitc1y ago

Eduard1y ago

> Stop using "linefeed" as the name for the U+000a code point.

stop reinventing terms. it's literally standardized with the name "LF" / "line feed" in Unicode.

lifthrasiir1y ago

wongogue1y ago

He is not reinventing anything. Unicode also defines 0a as LF, NL and EOL. In modern software, 0a is used as NL anyway.

steeeeeve1y ago

Once in a long while, people start looking into things and wanting them to make sense.

Like hey - why don't we start using the field separator and record separator characters when exporting/importing data.

But then you end up realizing that even when you are right, the energy it would take to push a change like that is astounding.

Those who successfully create an RFC and find a way push it through all the way to it becoming a standard are admirable people.

fijiaarone1y ago

Line feed is exactly what you do when you are editing text. But nobody uses it.

CR + LF was meant as an instruction for teletype printers, so it is outdated, and looks like he withdrew the proposal (which couldn’t have ever been serious) after some feedback.

Fossil SCM, btw, was written by the creator of SQLite, so his opinion shouldn’t be discounted as some random nobody.

lynx231y ago

Can OP please tell me how to abolsih CR while in Raw Mode? Did he forget about it, or am I just unimaginative?

samatman1y ago

Right, you don't need to search that hard for a device which interprets 0xA as a line feed, just set your terminal to raw mode, done.

But given the very first sentence:

> CR and NL are both useful control characters.

lynx231y ago

https://github.com/jftuga/chars

jftuga1y ago

I wrote a command line program to determine/detect the end-of-line format, tabs, bom, and nul characters

Stand-alone binaries are provided for all major platforms.

NelsonMinar1y ago

Yes CRLF is dumb. No, replacing it is not realistic.

pdonis1y ago

For extra fun, the original Mac OS used CR by itself to mean newline.

eska1y ago

I’ll join your revolution under the condition that I’m allowed to ignore all lines of text that don’t end in a newline :) (POSIX)

truetraveller1y ago

I agree 100%. This is the cause of endless confusion, especially in crossplatform text files. Not to mention parsing programmatically.

theginger1y ago

Ridiculous! We need to develop 1 universal standard that covers everyone's use cases. Yeah!

gfody1y ago

we should leave it for backwards compatibility and adopt U+0085 as the standard next line codepoint. and utf8 libraries could unofficially support every combination of 0A 0D as escape sequences.

WesolyKubeczek1y ago

> Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply. Send only NL.

Now just go pound sand. Seriously. And you owe me 5 minutes of my life wasted on reading the whole thing.

My god, I would have thought all those “simplification” ideas die off once you have 3 years of experience or more. Some people won’t learn.

P. S. Guess even the most brilliant people tend to have dumb ideas sometimes.

elcritch1y ago

Conversely, I'd argue most brilliant people tend to have more dumb ideas than others, usually on oddly specific topics which most people would find inconsequential.

the_gorilla1y ago

It's true. Smart people tend to have a lot of novel ideas, most of which are going to be retarded. Most people just have no ideas.

A4ET8a8uTh01y ago

I feel it necessary to have an obligatory 'Would someone think of banking?' before we 'abolish'(however we eventually arrive at defining it )anything.

I mean it is all cool to have this idea, but real world implications, where half the stuff dangles on a text file, appear to be not considered here.

For clarity's sake, I am not saying don't do it. I am saying: how will that work?

edit: spaces, tabs and one crlf

webprofusion1y ago

Next you'll be telling us to use spaces instead of tab.

midnitewarrior1y ago

If you'd like to break every system, and nearly every protocol, start abolishing arbitrary line endings that have been used for decades.

That will make things better.

nunobrito1y ago

XKCD has graphically replied to this topic: https://xkcd.com/927/

eviks1y ago

It hasn't like it never does. In this case your mistake is that number of standards doesn't change

nuancebydefault1y ago

The reality is that existing protocols CANNOT be changed. Only new versions are released and the old ones (which might rely on CRLF) will never die.

Animats1y ago

Now convince Microsoft. It's really the legacy of DOS that keeps this alive.

nycdotnet1y ago

Even Notepad.exe supports LF only text files now.

dankwizard1y ago

"Call to action" my god guy get a grip youre upset about some unicode

M95D1y ago

But adopting this new standard means we'll have to re-tool entire industries! /s

Ekaros1y ago

bear86421y ago

> Decide upon on new UTF-8 code point.

Unicode have already done so - (NEL) https://www.compart.com/en/unicode/U+0085

phkahler1y ago

So break everything.

nycdotnet1y ago

Which would need to be encoded in at least two bytes at which point, why not just use CRLF?

kps1y ago

You mean U+2028 LINE SEPARATOR?

Ekaros1y ago

Perfect. So now we just need to start filing bug reports to any tool that does not support it instead of CRLF or LF alone.

bear86421y ago

Oh, yet another option - first thought was U+0085 NEXT LINE as above

whizzter1y ago

https://xkcd.com/927/

I could not possibly disagree with this more strongly or violently.

In short - shutup and deal with it. Is it an extremely mild and barely inconvenient nuisance to deal with different or mixed line endings? Yes. Is this actually a hard or difficult problem? No.

Stop trying to force everyone to break their backs so your life is inconsequentially easier. Deal with it and move on.

Avamander1y ago

Why do we _have to_ keep bringing this legacy baggage with us for the next decades though?

Not every change is "breaking your back" especially if you should be updating your systems anyways to implement other, larger and more important changes.

There will always be tech debt. Always and forever. Burn cycles on one that matters.