The HTTP crash course nobody asked for (opens in new tab)

(fasterthanli.me)

902 pointsg0xA52A2A3y ago141 comments

141 comments

> HTTP/1.1 is a delightfully simple protocol, if you ignore most of it.

As someone who had to write a couple of proxy servers, I can't express how so sadly accurate it is.

And this is why I expect HTTP/2 and HTTP/3 to be much more robust in the long term: the implementations are harder to write, and you won’t get anywhere without reading at least a some spec, whereas HTTP/1 is deceptively simple with therefore a lot of badly incorrect implementations, often with corresponding security problems.

superkuh3y ago

HTTP/3 is written for the use case of large corporations and does not even allow for human persons to use it alone. It requires CA based TLS to set up a connection. So if you want to host a website visitable by a random person you've never communicated with before you have to get continued permission from an incorporated entity running a CA to do so.

This is far more of a security problem than all of the bad HTTP 1.1 implementations put together. It is built in corporate control that cannot be bypassed except by not using HTTP/3. It is extremely important that we not let the mega-corp browsers drop HTTP 1.1 and continue to write our own projects for it.

chrismorgan3y ago

Your complaint is strictly social, and quite irrelevant here.

Look, cleartext internet protocols are on the way out, because their model is fundamentally broken. For security reasons, I will note, and privacy. There, we joust security against security. Cleartext HTTP/1 is strictly a legacy matter, retained only because there’s still too much content stuck on it. But browsers will be more aggressively phasing it out sooner or later, first with the likes of scary address bar “insecure” badges, and probably within a decade by disabling http: by default in a way similar to Firefox’s HTTPS-Only Mode (puts up a network error page with the ability to temporarily enable HTTP for the site), though I doubt it’ll be removed for decades. And HTTP/1 at least over TLS will remain the baseline for decades to come—HTTP/2 could conceivably be dropped at some point, but HTTP/3 is very unlikely to ever become the baseline because it requires more setup effort.

You can still use cleartext HTTP/1 at least for now if you want, but this functionality was rightly more or less removed in HTTP/2, and fully removed in HTTP/3. Pervasive monitoring is an attack (https://www.rfc-editor.org/rfc/rfc7258.html), and HTTP/2 and HTTP/3 are appropriately designed to mitigate it.

Look, be real: the entire web is now built heavily on the CA model. If free issuance of certificates falters, the internet as we know it is in serious trouble. Deal with it. Social factors. This might conceivably happen, and if it does, HTTP/1 will not save you. In fact, cleartext HTTP/1 will be just about the first thing to die (be blocked) in the most likely relevant sequence of events.

7 more replies

insanitybit3y ago

Can you not just have it use a self signed certificate? I don't see why a CA would need to be involved at all, nor can I even imagine how that could be enforced at the protocol level.

This sounds like a red herring to me.

edit: Yeah I've more or less confirmed that self signed certs are perfectly fine in HTTP3. This is a big ball of nothing.

1 more reply

Natsu3y ago

I mean, if it's on the open web you can use Let's Encrypt. If it's on your private network, you can make whatever keys you want with XCA and trust your self-made CA in browsers.

tambre3y ago

Is there anything the spec that actually requires that? AFAIK it's just that major implementators (browsers) have chosen to enforce TLS.

2 more replies

bmitc3y ago

> whereas HTTP/1 is deceptively simple with therefore a lot of badly incorrect implementations

Doesn't that imply that HTTP/1 is deceptively complex?

ameliaquining3y ago

I think the idea is that HTTP/1 is simple in the hello-world 5th-percentile-complexity case, which deceives people into thinking that it's also simple in the real-world 99.9th-percentile-complexity case, which it's not at all.

1 more reply

chestervonwinch3y ago

I get what you're saying, but robustness through complexity feels like an odd argument nonetheless.

chrismorgan3y ago

Its counterintuitivity is why I like bringing it up. :-)

arjvik3y ago

As someone who has not read the HTTP/1.1 spec, what are some pitfalls that could actually become security issues?

chrismorgan3y ago

The most common proximate cause of security issues in format handling (parsing and emitting) comes from implementations differing in their parsing, or implementations emitting invalid values in a way that will be parsed differently. Probably the most common type of security issue then comes from smuggling values through, bypassing checks or triggering injection. (This is the essence of injection attacks as a broad class.) One of the easiest demonstrations of this in HTTP specifically is called HTTP request smuggling: https://portswigger.net/web-security/request-smuggling. And the solution for that is pretty much: “stop using a text protocol, they’re too hard to use correctly”.

jefftk3y ago

One of the simplest issues is that headers end with a newline. Most code will not generate a header with an embedded new line, so it's common that software doesn't handle this case, and passes the new line through unmodified. This means that if someone is able to set a custom value for part of a header they can often use that to inject their own custom response header. Or even their own customer response body, since that is also set off with newlines.

pwdisswordfish93y ago

Being text-based. Which leads to people constructing protocol messages by printf and therefore tons of injection bugs.

dbttdft3y ago

I don't think I could implement a correct HTTP 1 agent even if I read the specs.

gumby3y ago

But for back compatibility implementors will still have to support HTTP/1, which will likely take more than 50% of the total effort.

mgaunard3y ago

HTTP/2 makes no sense at all. HTTP/3 is just a fix to HTTP/2 so that it makes some sort of sense.

Both of these are only concerned with reducing the latency of doing lots of requests to the same server in parallel.

Which is only needed by web browsers and nothing else.

SamuelAdams3y ago

I feel like this applies to many technologies. Made me think of the bootstrapping, “I-can-build-that-in-a-weekend” crowd.

The initial problem is usually easy to solve for, it’s all the edge cases and other details that makes something complex.

cookiengineer3y ago

> As someone who had to write a couple of proxy servers, I can't express how so sadly accurate it is.

Chunked transfer/content encoding problems still give me nightmares...

Donckele3y ago

“By contrast, I think about Bluetooth a lot. I wish I didn't.”

LOL, yes same here. Can’t wait for Bluetooths b̶a̶l̶l̶s̶ baggage to be chopped.

danuker3y ago

How is WiFi so much more reliable than Bluetooth?

I installed a web server on my phone and send files this way much faster (and Android -> Apple works):

https://f-droid.org/en/packages/net.basov.lws.fdroid/

I wish there were a standard for streaming (headphones could connect to your network via WPS, and stream some canonical URL with no configuration needed).

masklinn3y ago

> How is WiFi so much more reliable than Bluetooth?

WiFi uses near 10x the power Bluetooth does when active (and that’s before factoring in BLE which cuts that down in half). WiFi also has access to the much less crowded 5GHz band.

IIRC WiFi is also a much simpler protocol, it’s just a data channel (its aim being to replace LAN cables).

Plus in order to support cheap and specialised devices Bluetooth supports all sorts of profiles and applications. This makes the devices simpler, and means all the configuration can be automated to pairing, but it makes the generic hosts a lot more complicated.

Reventlov3y ago

>IIRC WiFi is also a much simpler protocol, it’s just a data channel (its aim being to replace LAN cables).

I'm not sure what do you mean, but Wi-Fi covers the PHY layer and the MAC layers. It's not « only » a data channel. Modern Wi-Fi uses OFDMA, which is arguably more complex than what bluetooth uses (without even talking about the MAC).

1 more reply

jandrese3y ago

Bluetooth is massively more complicated than WiFi. It has a whole service enumeration/discovery layer baked in that IMHO tried to cram way too much into the spec. Whats even more amazing is that some of the hardware vendors at the table during the development went "F that" and added some side channel audio stuff that bypassing most of the stack.

But mostly the problem is that too much of this complexity fell on hardware vendors and they suck at writing software. There are umpteen bajillion different bluetooth stacks out there and they're all buggy in new and exciting ways. Interoperability testing is hugely neglected by most vendors. The times where Bluetooth works well are typically where the same vendor controls both ends of the link, like Airpods on an iPhone.

In 2020 I tried buying some reputable brand Bluetooth headphones for my kids so they could do home-schooling without disturbing each other. It was a total failure. Every time their computer went to sleep the bluetooth stack would become out of sync and attempts to reconnect would result in just "error connecting" messages, requiring you to fully delete the bluetooth device on the Windows side and redo the entire discovery/association/connection from scratch. The bluetooth stack on Windows would crash halfway through the association process about half of the time forcing you to reboot the computer to start over. Absolutely unusable. I tried the same headphones on a Linux host and they worked slightly better, but were still prone to getting out of sync and requiring a full "forget this device" and add it again cycle every few days for no apparent reason.

Reventlov3y ago

> Bluetooth is massively more complicated than WiFi. It has a whole service enumeration/discovery layer baked in that IMHO tried to cram way too much into the spec. Whats even more amazing is that some of the hardware vendors at the table during the development went "F that" and added some side channel audio stuff that bypassing most of the stack.

I seriously think you underestimate the complexity in Wi-Fi networks. The 802.11 2020 standard is 4379 pages long. And i'm not even counting the amendments ( https://www.ieee802.org/11/Reports/802.11_Timelines.htm ) that are in development.

rmckayfleming3y ago

Yep, had a similar annoyance using my AirPods with my gaming laptop. The laptop wouldn't reconnect after going to sleep for an extended period of time. I ended up replacing the stock wireless card for an Intel AX210 based one and then it was fine.

4111111111111113y ago

WiFi supposedly needs more power and has higher latency. Not sure how true that remains post WiFi6 though

pletnes3y ago

Range and bandwidth is orders of magnitude larger, and both have direct limitations in terms of energy budget.

leinadho3y ago

The humorous style is very refreshing, if only my networking lecturers had been more witty I might remember more of this

X-Istence3y ago

> This is not the same as HTTP pipelining, which I will not discuss, out of spite.

That is cause HTTP pipelining was and is a mistake and is responsible for a ton of http request smuggling vulnerabilities because the http 1.1 protocol has no framing.

No browser supports it anymore, thankfully.

mgaunard3y ago

Isn't "HTTP pipelining" just normal usage of HTTP/1.1?

Anyone that doesn't support this is broken. My own code definitely does not wait for responses before sending more requests, that's just basic usage of TCP.

X-Istence3y ago

HTTP Pipelining has the client sending multiple requests before receiving a response. It turns it into Request, Request, Request, Response, Response, Response.

The problem is that if Request number 1 leads to an error whereby the connection is closed, those latter two requests are discarded entirely. The client would have to retry request number two and three. If the server has already done work in parallel though, it can't send those last two responses because there is no way to specify that the response is for the second or third request.

The only way a server has to signal that it is in a bad state is to return 400 Bad Request and to close the connection because it can't keep parsing the original requests.

There is no support for HTTP pipelining in current browsers.

What you are thinking about is probably HTTP keep alive, where the same TCP/IP channel is used to send a follow-up request once a response to the original request has been received and processed. That is NOT HTTP pipelining.

deathanatos3y ago

> Isn't "HTTP pipelining" just normal usage of HTTP/1.1?

> Anyone that doesn't support this is broken. My own code definitely does not wait for responses before sending more requests, that's just basic usage of TCP.

Yep.

There is some "support" a server could do, in the form of processing multiple requests in parallel¹, e.g., if it gets two GET requests back to back, it could queue up the second GET's data in memory, or so. The responses still have to be streamed out in the order they came in, of course. Given how complex I imagine such an implementation would be, I'd expect that to be implemented almost never, though; if you're just doing a simple "read request from socket, process request, write response" loop, then like you say, pipelined requests aren't a problem: they're just buffered on the socket or in the read portion's buffers.

¹this seems fraught with peril. I doubt you'd want to parallelize anything that wasn't GET/HEAD for risk of side-effects happening in unexpected orders.

X-Istence3y ago

HTTP pipelining is not normal usage of HTTP/1.1. And it means that if request number 1 fails, usually request number 2 and 3 are lost because servers will slam the door shut because of the lack of framing around HTTP it is too dangerous to try and continue parsing the HTTP requests that are incoming without potentially leading to a territory where they are parsing the incoming text stream wrong.

This is what led to the many request smuggling, its because the front-end proxy treats the request different from the backend proxy and parses the same HTTP text stream differently.

Since there is no framing there is no one valid way to say "this is where a request starts, and this is where a request ends and it is safe to continue parsing past the end of this request for the next request".

Servers are also allowed to close the connection at will. So let's say I pipeline Request 1, 2, and 3.

The server can respond to Request 1 with Connection: close, and now request 2 and 3 are lost.

That's the reason HTTP pipelining is not supported by browsers/most clients.

Curl removed it and there's a blog post about it: https://daniel.haxx.se/blog/2019/04/06/curl-says-bye-bye-to-...

1 more reply

dbttdft3y ago

Not even just TCP, basic usage of message passing and any data flow.

yfiapo3y ago

> We're not done with our request payload yet! We sent:

> Host: neverssl.com

> This is actually a requirement for HTTP/1.1, and was one of its big selling points compared to, uh...

> AhAH! Drew yourself into a corner didn't you.

> ...Gopher? I guess?

I feel like the author must know this.. HTTP/1.0 supported but didn't require the Host header and thus HTTP/1.1 allowed consistent name-based virtual hosting on web servers.

I did appreciate the simple natures of the early protocols, although it is hard to argue against the many improvements in newer protocols. It was so easy to use nc to test SMTP and HTTP in particular.

I did enjoy the article's notes on the protocols however the huge sections of code snippets lost my attention midway.

proto_lambda3y ago

> I feel like the author must know this

The author does know this, it's a reference to a couple paragraphs above:

> [...] and the HTTP protocol version, which is a fixed string which is always set to HTTP/1.1 and nothing else.

> (cool bear) But what ab-

> IT'S SET TO HTTP/1.1 AND NOTHING ELSE.

yfiapo3y ago

Thanks, missed that.

fasterthanlime3y ago

You know how some movie fans will sometimes pretend the sequels to some franchise don't exist? HTTP is the opposite.

1 more reply

I_complete_me3y ago

That was an excellent, well-written, well-thought out, well presented, interesting, humorous, enjoyable read. Coincidentally I recently did a Rust crash course so it all made perfect sense - I am not an IT pro. Anyhows, thanks.

pohuing3y ago

I highly recommend taking a look at the other writeups on fasterthanli.me they're almost all excellent

mihneawalker3y ago

I'd like to ask you what crash course on Rust did you take, as there are quite a few out there, and it would help if someone recommends a certain course.

atfzl3y ago

Try https://fasterthanli.me/articles/a-half-hour-to-learn-rust which is also written by the same author.

I_complete_me3y ago

You Tube Let's Get Rusty - ULTIMATE Rust Lang Tutorial! - Getting Started

phenylene3y ago

Playlist link:

https://youtube.com/playlist?list=PLai5B987bZ9CoVR-QEIN9foz4...

becquerel3y ago

After the string of positive adjectives, I was expecting the second half of your comment to take a sharp turn into cynicism. Thank you for subverting my expectations by not subverting my expectations!

q-base3y ago

I will piggyback on your comment as I totally agree. I am amazed at the amount of work that must go into not just writing the article itself but all the implementations along the way. Really amazing job!

Andys3y ago

I learned HTTP1 pretty well but not much of 2.

Since playing with QUIC, I've lost all interest in learning HTTP/2, it feels like something already outdated that we're collectively going to skip over soon.

fasterthanlime3y ago

I tend to agree with you there, however the thing I'm replacing does HTTP/2, and HTTP/3 is yet another can of worms as far as "production multitenant deployment" goes, so, that's what my life is right now.

As far as learning goes, I do think HTTP/2 is interesting as a step towards understanding HTTP/3 better, because a lot of the concepts are refined: HPACK evolves into QPACK, flow control still exists but is neatly separated into QUIC, I've only taken a cursory look at H3 so far but it seems like a logical progression that I'm excited to dig into deeper, after I've gotten a lot more sleep.

masklinn3y ago

FWIW HTTP/3 very much builds upon / reframes HTTP/2’s semantics, so it might be useful to get a handle on /2, as I’m not sure all the /3 documentation will frame it in /1.1 terms.

1 more reply

pcthrowaway3y ago

HTTP1 is definitely outdated (it was expeditiously replaced by HTTP 1.1), but I'd argue ignoring HTTP/2 might be more like ignoring IPv4 because we have IPv6 now

Joker_vD3y ago

It's pretty much a transport-level protocol, just like QUIC.

Icathian3y ago

Amos' writing style is just so incredibly good. I don't know anyone else doing these very long-form, conversational style articles.

Plus, you know, just an awesome dev who knows his stuff. Huge fan.

mcspiff3y ago

https://xeiaso.net/ is equally great content in a similar style in my opinion. Different area of topics a bit, but I enjoy both very much

Icathian3y ago

Oh, this looks very promising. Thanks for the recommendation!

juped3y ago

If you're using OpenBSD nc already, just use nc -c for TLS.

stevewatson3013y ago

Depending on your version of nc, -c is for sending CRLFs or executing sent data as commands. You might be looking for ncat instead.

Denvercoder93y ago

In OpenBSD nc (as GP mentioned), -c is for a TLS connection: https://man.openbsd.org/nc.1

Aissen3y ago

Reminder, there are many different netcats, here are some of the most commons:

- netcat-traditional http://www.stearns.org/nc/

- netcat-openbsd : https://github.com/openbsd/src/blob/master/usr.bin/nc/netcat... (also packaged in Debian)

- ncat https://nmap.org/ncat/

- netcat GNU: https://netcat.sourceforge.net/ (quite rare)

To prevent any confusion, I like to recommend socat: http://www.dest-unreach.org/socat/

silon423y ago

My nc has that as -C, no -c option.

photochemsyn3y ago

What a great overall site. Hopping down the links I found the section on files with code examples in JS, Rust and C, plus strace, really the best short explanation I've ever found online.

https://fasterthanli.me/series/reading-files-the-hard-way/pa...

rpigab3y ago

This is awesome, didn't read all of it yet, but I will for sure, I use HTTP way too much and too often to ignore some of these underlying concepts, and when I try to look it up, there's always way too much abstraction and the claims aren't proven to me with a simple example, and this article is full of simple examples. Thanks Amos!

est3y ago

I hope there's a h2 or TLS crash course.

fasterthanlime3y ago

Against my better judgement, the article /does/ go over H2 (although H3 is all the rage right now).

For TLS, I recommend The Illustrated TLS 1.3 Connection (Every byte explained and reproduced): https://tls13.xargs.org/

tehmillhouse3y ago

I'd like to thank you for the time and effort it must take to research, write and edit these articles. The tone you strike with these articles is a delight to read, and I find myself gobbling these things up even for topics about which I (falsely, it usually turns out) consider myself fairly knowledgeable.

keewee73y ago

Thanks for the link! Are there other good crash courses on various protocols and standards? Directly jumping into the dry official specs is just too overwhelmingly sometimes.

Icathian3y ago

I recently crammed a bunch on DNS for an interview, and I can recommend the cloudflare blogs on that topic as being quite good.

antonvs3y ago

> Where every line ends with \r\n, also known as CRLF, for Carriage Return + Line Feed, that's right, HTTP is based on teletypes, which are just remote typewriters

Does it need to be pointed out that this is complete bullshit?

a13692099933y ago

Well, I've definitely seen a lot of people claim (generally not word-for-word) that using a pointlessly-overlong encoding of newline that exists to cater to the design deficiencies of hardware from the nineteen-sixties is not bullshit, so... maybe? But only for rather mushy values of "need".

kortex3y ago

It's not totally right, but it's not totally wrong, either, kind of like the way the dimensions of the space shuttle booster are directly affected by the size of a pair of Roman war horses' asses.

CRLF was used verily heavily and thus got baked into a lot of different places. Namely, it conveniently sidesteps the ambiguity of "some systems use CR, others use LF" by just putting both in, and since they are whitespace, there's not much downside other than the extra byte.

Beyond that, there are many other clear and obvious connections between Hypertext Transfer Protocol and teletype machines. Many early web browsers were expected to be teletype machines [0]. So while it might be a bit of a stretch, I'd say this is far from "complete bullshit".

[0] - http://info.cern.ch/hypertext/WWW/Proposal.html#:~:text=it%2...

antonvs3y ago

> kind of like the way the dimensions of the space shuttle booster are directly affected by the size of a pair of Roman war horses' asses.

I agree the two are similar, but the space shuttle story is also bullshit. See e.g. Snopes: https://www.snopes.com/fact-check/railroad-gauge-chariots/

People are suckers for plausible-sounding and amusing stories, that one's classic bait for people's lack of critical thinking skills.

> CRLF was used verily heavily and thus got baked into a lot of different places.

Well, exactly. Which is precisely why it's bullshit to claim that HTTP was "based on teletypes". It was based on technical standards at the time, that originally derived from teletypes, but there was no consideration of teletypes in the development of HTTP that I'm aware of:

> Many early web browsers were expected to be teletype machines [0].

Could you quote a relevant part of your reference? Because I don't see it. Perhaps you're confusing "dumb terminal" with "teletype"? Or confusing the Unix concept of tty, a teletype abstraction, with the electromechanical device known as a teletype - the "remote typewriters" mentioned in the original comment?

By the time that WWW spec was written in 1990, teletypes were decades out of date and not commonly used at all. PCs had existed for over a decade, and video display terminals for mainframes and minicomputers had been around for nearly three decades. No-one was using actual teletypes any more.

> So while it might be a bit of a stretch, I'd say this is far from "complete bullshit".

This conclusion would work if any of your claims had survived scrutiny.

tripa3y ago

Kind of.

Which part of it do you think is wrong?

antonvs3y ago

HTTP is not “based on teletypes”. That’s just nerd hyperbole for a technical choice they don’t like, for irrational reasons.

sireat3y ago

Is HTTP always the same protocol as HTTPS - given the same version - and ignoring the encryption from TLS?

Theoretically yes, but in practice?

I've done my share of nc testing even simpler protocols than HTTP/1.1

For some reason the migration to HTTPS scared me despite the security assurances. I could not see anything useful in wireshark anymore. I now had to trust one more layer of abstraction.

st_goliath3y ago

> Is HTTP always the same protocol as HTTPS - given the same version - and ignoring the encryption from TLS?

> Theoretically yes, but in practice?

Yes, that's the whole point of encapsulation. The protocol is blissfully unaware of encryption and doesn't even have to be. It has no STARTTLS mechanism either.

Your HTTPS traffic consists of a TCP handshake to establishes a TCP connection, a TLS handshake across that TCP connection to exchange keys and establish a TLS session, and the exact, same HTTP request/response traffic, inside the encrypted/authenticated TLS session.

The wonderful magic of solving a problem by layering/encapsulating.

> I could not see anything useful in wireshark anymore

Wireshark supports importing private keys for that, see: https://wiki.wireshark.org/TLS

fasterthanlime3y ago

The article covers using Wireshark to decrypt TLS traffic using Pre-Shared Master Secrets!

ok1234563y ago

The encapsulation isn't complete because of SNI.

dochtman3y ago

For 1.1 and 2, the byte stream is the same for TCP vs TLS over TCP. For 3, it uses one stream per request over a QUIC connection which is always encrypted.

Too3y ago

The protocol is the same, but semantics in the applications can differ. Secure cookies only working on https to give one example.

mannyv3y ago

As far as i can tell the host header is pointless, because if it's ssl/tls you won't be able to read it and route it. That's what sni is for. If you aren't tls then you don't need it, unless you hit the server as an ip. But then why would you do that?

LukeShu3y ago

It's for one server/IP serving multiple hostnames. For instance, the same physical server at 45.76.26.79 serves both www.lukeshu.com and git.lukeshu.com with the same instance of Nginx. Once Nginx decrypts the request, it needs to know which `server { … }` block to use to generate the reply.

With TLS+SNI, this is redundant to the name from SNI. But we had TLS long before we had SNI, and we had HTTP long before we had TLS, and both of those scenarios need the `Host` header.

Too3y ago

Proxies doing TLS termination, with multiple servers behind.

mahdi7d13y ago

I didn't ask but I needed it.

mannyv3y ago

Also, never trust the content length. It's been that way since before http was finalized. Use it as guidance, but don't treat it as canonical.

mannyv3y ago

When doing http by hand, it's better to do http/1.0 because that tells the server you (and it) can't do anything exciting.

mustak_im3y ago

Yay! this is going to be a great read for the weekend!

danesparza3y ago

More articles should be written in the style of this article. Thank you for this.

stefs3y ago

most of his articles are written in this style. they're great!

tinglymintyfrsh3y ago

    GET / HTTP/1.0\r\n\r\n

Still works with many websites.

mlindner3y ago

Is there a way to get this guide without the annoying side-commentary?

fasterthanlime3y ago

The RFCs themselves are pretty dry, if that's your thing — https://httpwg.org/ has the freshest ones.

tomcam3y ago

Funny and very helpful. Thank you.

cph1233y ago

For a crash course would the code examples have been better in something like Python rather than Rust?

fasterthanlime3y ago

My whole thing is that I'm teaching Rust /while/ solving interesting, real-world problems (instead of looking at artificial code samples), so, if someone wants to write the equivalent article with Python, they should! I won't.

rk063y ago

Nope, that’s the author’s favourite language. A regular reader would expect rust to be used like in previous articles

tmountain3y ago

This is gold.

j / k navigate · click thread line to collapse

141 comments

Joker_vD3y ago

> HTTP/1.1 is a delightfully simple protocol, if you ignore most of it.

As someone who had to write a couple of proxy servers, I can't express how so sadly accurate it is.

chrismorgan3y ago

superkuh3y ago

chrismorgan3y ago

Your complaint is strictly social, and quite irrelevant here.

7 more replies

insanitybit3y ago

Can you not just have it use a self signed certificate? I don't see why a CA would need to be involved at all, nor can I even imagine how that could be enforced at the protocol level.

This sounds like a red herring to me.

edit: Yeah I've more or less confirmed that self signed certs are perfectly fine in HTTP3. This is a big ball of nothing.

1 more reply

Natsu3y ago

I mean, if it's on the open web you can use Let's Encrypt. If it's on your private network, you can make whatever keys you want with XCA and trust your self-made CA in browsers.

tambre3y ago

Is there anything the spec that actually requires that? AFAIK it's just that major implementators (browsers) have chosen to enforce TLS.

2 more replies

bmitc3y ago

> whereas HTTP/1 is deceptively simple with therefore a lot of badly incorrect implementations

Doesn't that imply that HTTP/1 is deceptively complex?

ameliaquining3y ago

1 more reply

chestervonwinch3y ago

I get what you're saying, but robustness through complexity feels like an odd argument nonetheless.

chrismorgan3y ago

Its counterintuitivity is why I like bringing it up. :-)

arjvik3y ago

As someone who has not read the HTTP/1.1 spec, what are some pitfalls that could actually become security issues?

chrismorgan3y ago

jefftk3y ago

pwdisswordfish93y ago

Being text-based. Which leads to people constructing protocol messages by printf and therefore tons of injection bugs.

dbttdft3y ago

I don't think I could implement a correct HTTP 1 agent even if I read the specs.

gumby3y ago

But for back compatibility implementors will still have to support HTTP/1, which will likely take more than 50% of the total effort.

mgaunard3y ago

HTTP/2 makes no sense at all. HTTP/3 is just a fix to HTTP/2 so that it makes some sort of sense.

Both of these are only concerned with reducing the latency of doing lots of requests to the same server in parallel.

Which is only needed by web browsers and nothing else.

SamuelAdams3y ago

I feel like this applies to many technologies. Made me think of the bootstrapping, “I-can-build-that-in-a-weekend” crowd.

The initial problem is usually easy to solve for, it’s all the edge cases and other details that makes something complex.

cookiengineer3y ago

> As someone who had to write a couple of proxy servers, I can't express how so sadly accurate it is.

Chunked transfer/content encoding problems still give me nightmares...

Donckele3y ago

“By contrast, I think about Bluetooth a lot. I wish I didn't.”

LOL, yes same here. Can’t wait for Bluetooths b̶a̶l̶l̶s̶ baggage to be chopped.

danuker3y ago

How is WiFi so much more reliable than Bluetooth?

I installed a web server on my phone and send files this way much faster (and Android -> Apple works):

https://f-droid.org/en/packages/net.basov.lws.fdroid/

I wish there were a standard for streaming (headphones could connect to your network via WPS, and stream some canonical URL with no configuration needed).

masklinn3y ago

> How is WiFi so much more reliable than Bluetooth?

WiFi uses near 10x the power Bluetooth does when active (and that’s before factoring in BLE which cuts that down in half). WiFi also has access to the much less crowded 5GHz band.

IIRC WiFi is also a much simpler protocol, it’s just a data channel (its aim being to replace LAN cables).

Reventlov3y ago

>IIRC WiFi is also a much simpler protocol, it’s just a data channel (its aim being to replace LAN cables).

1 more reply

jandrese3y ago

Reventlov3y ago

rmckayfleming3y ago

4111111111111113y ago

WiFi supposedly needs more power and has higher latency. Not sure how true that remains post WiFi6 though

pletnes3y ago

Range and bandwidth is orders of magnitude larger, and both have direct limitations in terms of energy budget.

leinadho3y ago

The humorous style is very refreshing, if only my networking lecturers had been more witty I might remember more of this

X-Istence3y ago

> This is not the same as HTTP pipelining, which I will not discuss, out of spite.

That is cause HTTP pipelining was and is a mistake and is responsible for a ton of http request smuggling vulnerabilities because the http 1.1 protocol has no framing.

No browser supports it anymore, thankfully.

mgaunard3y ago

Isn't "HTTP pipelining" just normal usage of HTTP/1.1?

Anyone that doesn't support this is broken. My own code definitely does not wait for responses before sending more requests, that's just basic usage of TCP.

X-Istence3y ago

HTTP Pipelining has the client sending multiple requests before receiving a response. It turns it into Request, Request, Request, Response, Response, Response.

The only way a server has to signal that it is in a bad state is to return 400 Bad Request and to close the connection because it can't keep parsing the original requests.

There is no support for HTTP pipelining in current browsers.

deathanatos3y ago

> Isn't "HTTP pipelining" just normal usage of HTTP/1.1?

> Anyone that doesn't support this is broken. My own code definitely does not wait for responses before sending more requests, that's just basic usage of TCP.

Yep.

¹this seems fraught with peril. I doubt you'd want to parallelize anything that wasn't GET/HEAD for risk of side-effects happening in unexpected orders.

X-Istence3y ago

This is what led to the many request smuggling, its because the front-end proxy treats the request different from the backend proxy and parses the same HTTP text stream differently.

Servers are also allowed to close the connection at will. So let's say I pipeline Request 1, 2, and 3.

The server can respond to Request 1 with Connection: close, and now request 2 and 3 are lost.

That's the reason HTTP pipelining is not supported by browsers/most clients.

Curl removed it and there's a blog post about it: https://daniel.haxx.se/blog/2019/04/06/curl-says-bye-bye-to-...

1 more reply

dbttdft3y ago

Not even just TCP, basic usage of message passing and any data flow.

yfiapo3y ago

> We're not done with our request payload yet! We sent:

> Host: neverssl.com

> This is actually a requirement for HTTP/1.1, and was one of its big selling points compared to, uh...

> AhAH! Drew yourself into a corner didn't you.

> ...Gopher? I guess?

I feel like the author must know this.. HTTP/1.0 supported but didn't require the Host header and thus HTTP/1.1 allowed consistent name-based virtual hosting on web servers.

I did enjoy the article's notes on the protocols however the huge sections of code snippets lost my attention midway.

proto_lambda3y ago

> I feel like the author must know this

The author does know this, it's a reference to a couple paragraphs above:

> [...] and the HTTP protocol version, which is a fixed string which is always set to HTTP/1.1 and nothing else.

> (cool bear) But what ab-

> IT'S SET TO HTTP/1.1 AND NOTHING ELSE.

yfiapo3y ago

Thanks, missed that.

fasterthanlime3y ago

You know how some movie fans will sometimes pretend the sequels to some franchise don't exist? HTTP is the opposite.

1 more reply

I_complete_me3y ago

pohuing3y ago

I highly recommend taking a look at the other writeups on fasterthanli.me they're almost all excellent

mihneawalker3y ago

I'd like to ask you what crash course on Rust did you take, as there are quite a few out there, and it would help if someone recommends a certain course.

atfzl3y ago

Try https://fasterthanli.me/articles/a-half-hour-to-learn-rust which is also written by the same author.

I_complete_me3y ago

You Tube Let's Get Rusty - ULTIMATE Rust Lang Tutorial! - Getting Started

phenylene3y ago

Playlist link:

https://youtube.com/playlist?list=PLai5B987bZ9CoVR-QEIN9foz4...

becquerel3y ago

q-base3y ago

Andys3y ago

I learned HTTP1 pretty well but not much of 2.

Since playing with QUIC, I've lost all interest in learning HTTP/2, it feels like something already outdated that we're collectively going to skip over soon.

fasterthanlime3y ago

masklinn3y ago

FWIW HTTP/3 very much builds upon / reframes HTTP/2’s semantics, so it might be useful to get a handle on /2, as I’m not sure all the /3 documentation will frame it in /1.1 terms.

1 more reply

pcthrowaway3y ago

HTTP1 is definitely outdated (it was expeditiously replaced by HTTP 1.1), but I'd argue ignoring HTTP/2 might be more like ignoring IPv4 because we have IPv6 now

Joker_vD3y ago

It's pretty much a transport-level protocol, just like QUIC.

Icathian3y ago

Amos' writing style is just so incredibly good. I don't know anyone else doing these very long-form, conversational style articles.

Plus, you know, just an awesome dev who knows his stuff. Huge fan.

mcspiff3y ago

https://xeiaso.net/ is equally great content in a similar style in my opinion. Different area of topics a bit, but I enjoy both very much

Icathian3y ago

Oh, this looks very promising. Thanks for the recommendation!

juped3y ago

If you're using OpenBSD nc already, just use nc -c for TLS.

stevewatson3013y ago

Depending on your version of nc, -c is for sending CRLFs or executing sent data as commands. You might be looking for ncat instead.

Denvercoder93y ago

In OpenBSD nc (as GP mentioned), -c is for a TLS connection: https://man.openbsd.org/nc.1

Aissen3y ago

Reminder, there are many different netcats, here are some of the most commons:

- netcat-traditional http://www.stearns.org/nc/

- netcat-openbsd : https://github.com/openbsd/src/blob/master/usr.bin/nc/netcat... (also packaged in Debian)

- ncat https://nmap.org/ncat/

- netcat GNU: https://netcat.sourceforge.net/ (quite rare)

To prevent any confusion, I like to recommend socat: http://www.dest-unreach.org/socat/

silon423y ago

My nc has that as -C, no -c option.

photochemsyn3y ago

What a great overall site. Hopping down the links I found the section on files with code examples in JS, Rust and C, plus strace, really the best short explanation I've ever found online.

https://fasterthanli.me/series/reading-files-the-hard-way/pa...

rpigab3y ago

est3y ago

I hope there's a h2 or TLS crash course.

fasterthanlime3y ago

Against my better judgement, the article /does/ go over H2 (although H3 is all the rage right now).

For TLS, I recommend The Illustrated TLS 1.3 Connection (Every byte explained and reproduced): https://tls13.xargs.org/

tehmillhouse3y ago

keewee73y ago

Thanks for the link! Are there other good crash courses on various protocols and standards? Directly jumping into the dry official specs is just too overwhelmingly sometimes.

Icathian3y ago

I recently crammed a bunch on DNS for an interview, and I can recommend the cloudflare blogs on that topic as being quite good.

antonvs3y ago

> Where every line ends with \r\n, also known as CRLF, for Carriage Return + Line Feed, that's right, HTTP is based on teletypes, which are just remote typewriters

Does it need to be pointed out that this is complete bullshit?

a13692099933y ago

kortex3y ago

[0] - http://info.cern.ch/hypertext/WWW/Proposal.html#:~:text=it%2...

antonvs3y ago

> kind of like the way the dimensions of the space shuttle booster are directly affected by the size of a pair of Roman war horses' asses.

I agree the two are similar, but the space shuttle story is also bullshit. See e.g. Snopes: https://www.snopes.com/fact-check/railroad-gauge-chariots/

People are suckers for plausible-sounding and amusing stories, that one's classic bait for people's lack of critical thinking skills.

> CRLF was used verily heavily and thus got baked into a lot of different places.

> Many early web browsers were expected to be teletype machines [0].

> So while it might be a bit of a stretch, I'd say this is far from "complete bullshit".

This conclusion would work if any of your claims had survived scrutiny.

tripa3y ago

Kind of.

Which part of it do you think is wrong?

antonvs3y ago

HTTP is not “based on teletypes”. That’s just nerd hyperbole for a technical choice they don’t like, for irrational reasons.

sireat3y ago

Is HTTP always the same protocol as HTTPS - given the same version - and ignoring the encryption from TLS?

Theoretically yes, but in practice?

I've done my share of nc testing even simpler protocols than HTTP/1.1

For some reason the migration to HTTPS scared me despite the security assurances. I could not see anything useful in wireshark anymore. I now had to trust one more layer of abstraction.

st_goliath3y ago

> Is HTTP always the same protocol as HTTPS - given the same version - and ignoring the encryption from TLS?

> Theoretically yes, but in practice?

Yes, that's the whole point of encapsulation. The protocol is blissfully unaware of encryption and doesn't even have to be. It has no STARTTLS mechanism either.

The wonderful magic of solving a problem by layering/encapsulating.

> I could not see anything useful in wireshark anymore

Wireshark supports importing private keys for that, see: https://wiki.wireshark.org/TLS

fasterthanlime3y ago

The article covers using Wireshark to decrypt TLS traffic using Pre-Shared Master Secrets!

ok1234563y ago

The encapsulation isn't complete because of SNI.

dochtman3y ago

For 1.1 and 2, the byte stream is the same for TCP vs TLS over TCP. For 3, it uses one stream per request over a QUIC connection which is always encrypted.

Too3y ago

The protocol is the same, but semantics in the applications can differ. Secure cookies only working on https to give one example.

mannyv3y ago

LukeShu3y ago

With TLS+SNI, this is redundant to the name from SNI. But we had TLS long before we had SNI, and we had HTTP long before we had TLS, and both of those scenarios need the `Host` header.

Too3y ago

Proxies doing TLS termination, with multiple servers behind.

mahdi7d13y ago

I didn't ask but I needed it.

mannyv3y ago

Also, never trust the content length. It's been that way since before http was finalized. Use it as guidance, but don't treat it as canonical.

mannyv3y ago

When doing http by hand, it's better to do http/1.0 because that tells the server you (and it) can't do anything exciting.

mustak_im3y ago

Yay! this is going to be a great read for the weekend!

danesparza3y ago

More articles should be written in the style of this article. Thank you for this.

stefs3y ago

most of his articles are written in this style. they're great!

tinglymintyfrsh3y ago

    GET / HTTP/1.0\r\n\r\n

Still works with many websites.

mlindner3y ago

Is there a way to get this guide without the annoying side-commentary?

fasterthanlime3y ago

The RFCs themselves are pretty dry, if that's your thing — https://httpwg.org/ has the freshest ones.

tomcam3y ago

Funny and very helpful. Thank you.

cph1233y ago

For a crash course would the code examples have been better in something like Python rather than Rust?

fasterthanlime3y ago

rk063y ago

Nope, that’s the author’s favourite language. A regular reader would expect rust to be used like in previous articles

tmountain3y ago

This is gold.

j / k navigate · click thread line to collapse