undefined | Better HN

0 pointsElixir64191y ago0 comments

It's one of the best tools to troubleshoot packetloss on the internet and generally routed networks. It gives you way more information than ping or traceroute could potentially give.

If you run it in TCP or UDP mode you can even nail down the physical interface that's erroring in a LAG/LACP bundle due to being able to manipulate the 5 tuples very well.

I'm also curious about the flags you used for ping and mtr that showed you this discrapancy.

0 comments

commandersaki1y ago

mtr -i 0.1 1.1.1.1 gives 80% loss for my router (ok not the same as 100% loss as I stated earlier, but I just rerun to experiment), which is deprioritising ttl exceeded packets, but a ping -c 1000 -f 192.168.0.1 (my router) yields 0% loss. The per hop loss indicator is not only incorrect but also isn't useful even if it were accurate since end to end loss is what matters, not a phantom per hop loss that doesn't have any effect on end to end loss.

Elixir6419OP1y ago

Right, so control-plane packet rates are rate limited (to some definition of sane), but they are applied to all applications, traceroutes, pings alike.

An argument could be made for a device configured as such to show loss on ping but not on mtr if you configure the rate limits so that the icmp reply rate is lower than ttl expired rates. Which tool would be wrong than? Would you blame ping for producing misleading results?

The running counters and the ability to pick out the obvious rate limiting when the loss doesn't cascade into the hops to me is akin to traceroutes * * * output. It doesn't always mean that the packets are blackholed, connectivity is broken, it just means the tool is producing an artifact due to network configuration or network characteristics. Further investigation is needed to figure out what's going on.

MTR imho is giving you much more insight into the network than traceroute or ping separately. It doesn't resolve the usual firewall/rate limiting artifacts, but gives you way more information about paths if you know how to interpret them.

commandersaki1y ago

> Right, so control-plane packet rates are rate limited (to some definition of sane), but they are applied to all applications, traceroutes, pings alike.

I'm not sure I understand what you're saying, but in this case control-plane packet rates are different for generating TTL exceeded vs Echo Response, where one is giving 80% loss and the other is giving 0% loss at similar rates. Gripe #1 why are we even testing control plane in the first place, it's a useless metric that doesn't have utility at measuring end to end latency/loss.

> An argument could be made for a device configured as such to show loss on ping but not on mtr if you configure the rate limits so that the icmp reply rate is lower than ttl expired rates. Which tool would be wrong than? Would you blame ping for producing misleading results?

Sure that would be a problem, but any combination could be misleading if the data path is yielding 0% loss for high rates of ICMP end to end. This is why it's not a very particularly helpful metric and can be downright misleading (usually not to me, but I've seen plenty people make incorrect inferences from bunk MTR results because the tool isn't intuitive).

> The running counters and the ability to pick out the obvious rate limiting when the loss doesn't cascade into the hops to me is akin to traceroutes * * * output. It doesn't always mean that the packets are blackholed, connectivity is broken, it just means the tool is producing an artifact due to network configuration or network characteristics. Further investigation is needed to figure out what's going on.

Sure that's great, not particularly helpful to the masses who misunderstand the tool. I worked as a network engineer for a decade receiving bunk MTR reports where people freak out because they're seeing "packet loss" which was inexistent on the data forwarding plane (you know the one that actually matters).

> MTR imho is giving you much more insight into the network than traceroute or ping separately. It doesn't resolve the usual firewall/rate limiting artifacts, but gives you way more information about paths if you know how to interpret them.

Time shouldn't be wasted measuring the control path and then investigating to confirm it is the control path and not data path. You cannot make these mistakes using traceroute and ping separately because traceroute doesn't have a notion of a "per-hop" loss indicator and ping doesn't involve intermediate hops (unless an intermediate hop generates an ICMP diagnostic for an echo request).

1 more reply

j / k navigate · click thread line to collapse

0 comments

commandersaki1y ago

Elixir6419OP1y ago

Right, so control-plane packet rates are rate limited (to some definition of sane), but they are applied to all applications, traceroutes, pings alike.

commandersaki1y ago

> Right, so control-plane packet rates are rate limited (to some definition of sane), but they are applied to all applications, traceroutes, pings alike.

1 more reply

j / k navigate · click thread line to collapse