* I’ve got a network misconfiguration on my local machine;
* My wifi connection to the router is down;
* The cable between my router and ISP is cut;
* My ISP is having large scale issues; or
* The website I’m trying to reach is down.
I’ve been given the vague impression that it has something to do with a non-deterministic path by which requests are routed, but this seems unconvincing. If some link on the path breaks, why doesn’t the last good link send a message backward that says “Your message made it to me, but I tried to send it the next step and it failed there.”
But to write a tool to provides a useful description to the user is near impossible because no two setups are the same, it’s not possible to know if something is intentional or not, and it can be dangerous to just make an assumption based on what the common causes are and just suggest to the user a completely wrong answer.
For example, let’s say you can’t connect to a website because the DNS server isn’t responding and the host isn’t responding. You could tell the user that something is probably misconfigured at your router or your ISP is having some issues.
However, it turns out that the actual reason was that your VPN client updated your local routing tables and DNS server but failed to remove the changes when you quit the client. How is a troubleshooter supposed to know that the settings were temporarily changed versus it being the permanent ones?
Once you try to start to write a troubleshooter that can identify the actual cause, you realize that it’s very difficult due to the complexity and variation. At best you can write something that usually spits out a correct answer but also sometimes suggests something totally wrong and leads people down a completely wrong path.
Your application won't see the ICMP message unless you configure the socket to report them(these are considered "transient" errors). On Linux this is done via the socket option IP_RECVERR.
ETA: there's not a ton of value collecting errors at this layer when you're working at L7. The errors that _do_ get surfaced for DU at your layer will be appropriate for the failure handling logic you'll inevitably have already. In this case I think it'd be a timeout, as other layers implement retries in the face of unreachable destinations.
I found these RFCs helpful re: how the TCP layer handles ICMP errors: https://www.rfc-editor.org/rfc/rfc1122#page-103
Section 4.2.3.9:
> Since these Unreachable messages indicate soft error conditions, TCP MUST NOT abort the connection, and it SHOULD make the information available to the application.
> DISCUSSION: TCP could report the soft error condition to the application layer with an upcall to the ERROR_REPORT routine, or it could merely note the message and report it to the application only when and if the TCP connection times out.
This one gets into the nitty gritty of how the stacks interact in order to study ICMP as vector for TCP attacks.
It's like ping + traceroute in a live running session with each hop broken down.
Quite consistent when I am the first to notice a node down on Xfinity network and in the same mtr see my network at least to my modem is good. Or when there's a hop beyond my ISP with 100s of ms added latency, which I haven't seen other tools do well like MTR can.
Won't solve everything, but might be worth your checking in your case as it breaks down per-hop providing latency for each.
If a web browser can't access a URL, it won't tell you why exactly because there's a chance it diagnosis the reason wrong and most users will be confused by that. I assume most diagnosis tools work the same way. You need to make assumptions about how the OS, hardware, and network are configured to be able to say "the problem is here."
For example, when you access a website, the first thing that needs to be done is check a domain name server (DNS) to get the IP address of the web server. But where does the web browser get the DNS IPs from? You can configure it in the browser. Or in the OS. Or in your router. Or in your modem. And if you don't, it gets them from the DHCP server the router connects to, which could be your ISP's DHCP server (then you get your ISP's default DNS) or it could also be some other router in an organization's network.
If the DNS seems wrong it's easy to tell the IP is wrong but it gets hard to say where that IP came from.
Even SSL could be a problem with the server having the wrong certificates or it could be your computer having the wrong certificates.
Most of the time a lot of software kinda doesn't care about what's happening just if it can do what it's told.
For Websites you often get more informative errors like 404, 500 or something else.
As someone else mentioned ICMP addresses certain classes of failures if enabled but I think the historical reason is more along the lines of the Internet was meant to run over lossy connections. For example, when a certain link is saturated routers will just start dropping packets. Reporting each dropped packet back to the sender is just not a good idea, it adds load to a system already potentially operating at capacity. TCP assumes packets can get lost and retransmits them. When a link goes down routing protocols will potentially send those retransmitted packets over a different link/path. I.e. there's no real concept of "connection down" other than the application layer or TCP eventually giving up (which can take a very long time). The kind of ICMP message that will immediately terminate a connection is when the server machine doesn't have anything listening on the destination port.
https://en.wikipedia.org/wiki/Cyclomatic_complexity
There are so many different paths for an error case to follow.
You can of course debug this by reducing the complexity - for example, by watching one of the links in the chain (say, DNS) and seeing if it is failing - but this is the realm of network engineers who get paid mightily to get through this cyclomatic complexity and work at the relevant layers, all the way down to the atoms in the pipe ..
>If some link on the path breaks, why doesn’t the last good link send a message backward that says “Your message made it to me, but I tried to send it the next step and it failed there.”
In fact, the links all do this, but there is simply no provision in your OS - no fancy GUI, perhaps - that allows you to fully understand this without getting overwhelmed by the cyclomatic complexity. Tools exist, and once you learn to use them to tame the complexity - congrats, you're now worth $300k/yr and can go work in San Francisco .. /s ;)
Examples: https://www.cloudflare.com/learning/dns/what-is-dns/ https://www.cloudflare.com/learning/ssl/transport-layer-secu... https://www.cloudflare.com/learning/performance/what-is-http...
It's already not true for, like, ages.
> Everything you’ve learned here is a lie.
> The process we just describe is for the original version of TLS, which is outdated compared to the more modern version of TLS 1.3.
Best part of the article!
That is true for the key exchange part because RSA does not offer forward security. For signatures RSA is still used and probably still the most widely spread type of x509 certs.
I know Safari just upped the requirements to 2048bit keys for RSA not too long ago (for signatures).
TBF it is titled as mediocre!
My writing isn’t a strength of mine, so I appreciate the criticism. My writing going from “bad” -> “is it AI?” is progress.
I struggled with where to “cutoff” the explanation and public key cryptography seemed like a good boundary and better explained elsewhere, as did various OSI layers.
I probably should have gone over the cert and potentially the full chain of trust, I’ll give you that.
I sure hope not. But I suppose it is titled "Mediocre Engineer".
> $300K/year
… I'll undercut you by $50k/y; where do I apply?
(There are just more and more errors. TLS <1.3 doesn't even work the way it describes, even though it tries to throw newer stuff into 1.3. The DNS section describes a recursive resolver, but the client isn't going to do that. It is probably talking to a stub resolver, too. "Internet Layer". The implication of "brotli" being a widely used algorithm in a ciphersuite/in TLS's compression, "Current version of TLS (>1.3) do not support RSA" …
… these sorts of blogspam are why I wish sometimes that there was a downvote. The advert isn't so obnoxious as to make me want to flag is low enough. I guess I should write the less mediocre article and make the HN frontpage. If only I made $300K/y, I'd have more time.)