I often wear a shirt, handed out by internet monitoring company ThousandEyes, that says this:
And everyone, who wants to think they’re at all clever about actually knowing what traceroute is, asks me what it means. And why there’s serious depth to this joke.
So let’s start with what the average person knows about traceroute. Required watching, “Nextgen Hacker 101”:
As they say in the south, Bless Your Heart.
So today, for the second time in as many days, someone said “Wait, what’s wrong with traceroute?”. Last night, it was at a BDSM Party. Today, it was while being helped at a dispensary. These are weird places to be asked this question, but this is the Pacific Northwest.
But as I mentioned to the person selling me edibles “Well, there’s a whole rant there, and I don’t know if you have time for it, but if you want to put a coin in the box I’ll go on for ten minutes.” Now, it turns out my budtender had an associates in Information Security, and actually wanted to know these things. And I had time…
“What’s wrong with traceroute” is actually a great Network Engineer job interview question. Because there’s a lot.
So, explaining what’s wrong with traceroute starts by explaining how routers work. When your router sends out a packet, there’s a field called the “TTL” packet — the maximum number of hops it can go through, and traceroute exploits this. It sends a number of hops out with ttl “1” and the next router in the chain will [probably] reply back with “sorry, this can’t be delivered”. Then it repeats the process with packets of TTL 2, and so on. It’s a clever hack, to be sure, and it can definitely get you information, but…it can also be misleading.
So, let’s start with an actual traceroute, from my home (on comcast) to google. I’ve turned on the -a option which causes AS numbers to be displayed:
gushi@Dans-MacBook-Air ~ % traceroute -a www.google.com
traceroute to www.google.com (188.8.131.52), 64 hops max, 52 byte packets
1 * * *
2 [AS7922] 184.108.40.206 (220.127.116.11) 24.128 ms 25.785 ms 16.709 ms
3 [AS7922] 18.104.22.168 (22.214.171.124) 19.048 ms 18.722 ms
[AS7922] 126.96.36.199 (188.8.131.52) 24.929 ms
4 * * *
5 [AS15169] sea09s29-in-f4.1e100.net (184.108.40.206) 24.546 ms
[AS15169] 220.127.116.11 (18.104.22.168) 22.121 ms
[AS15169] sea09s29-in-f4.1e100.net (22.214.171.124) 17.947 ms
gushi@Dans-MacBook-Air ~ %
From this I can tell a few useful things: Google is 5 hops away from me, and probably not much more than 25ms. I’ll refer to this example above in the explanations.
First: It’s inconsistent.
On Windows, traceroute uses UDP packets by default. On Linux and BSD (and most routers), it uses ICMP packets. Which means a traceroute from one platform is not standard on the other.
Second: It’s asymmetric, just like all routing on the internet.
By the nature of traceroute, you see a list of the forward path your packets take, but there’s no indication of the path those packets take back to you. You cannot see the path that the ICMP Unreachable message from hop three takes to reach you, nor do you have any guarantee that that’s the path that packets from your final destination would take to reach you. To make traceroute really
Third: It’s frequently blocked.
Part of the joke of the shirt saying “F * * *” traceroute is that on many hops, you will simply get no response (represented by * * *), and worse still, this makes the traceroute slower, because most traceroute commands by default only send one packet at once.
At any rate, even on my simple traceroute above, two hops are configured not to respond for some reason. Now, I’ll note that they’re between ASes, so there could be a whole other Autonomous System in there that we’re not seeing.
Fourth: It can be sidestepped.
Traceroute exploits the idea that each host will decrease the TTL by one, or return an unreachable if that would result in a TTL of zero.
Except that many security appliances, and even generic FreeBSD machines with the
IPSTEALTH kernel option turned on, can simply not decrement the TTL, effectively making that system invisible to Traceroute. Doing this on a cisco router is harder, but
Fifth: The numbers are inaccurate
Many high-end routers do their packet forwarding in special-purpose ASICs, but to generate a reply packet to something like a host-directed ping, or an ICMP unreachable, that needs to be done on the software routing stack, so the speed shown in the response is not the same as the packet throughput. Couple that with my second point above, that routing is asymmetric, and it means that not only are your numbers inaccurate because of the switching plane/routing plane differential, but also because the route that your reply packet is taking may itself be lossy.
The same is true when pinging a generic router — if it even responds to ping, those packets are treated at a lower priority (and often, more rate-limited) than general forwarding. A router that can route 10 million packets per second may not respond to 10 million pings per second.
If this seems implausible, how else would it be possible for a hop in the middle of the trace to have round-trip-times way higher than the final hop?
Sixth: ISPs don’t want to give away information
You may have noticed above, that the only hop in the trace, the final one, returned any RDNS info. [1e100 is scientific notation for the number ‘one googol’]. Many ISP’s have been phasing this information out of public view, and while the -a option is marginally helpful to show the AS numbers of the systems, it’s still non-obvious unless you really know the systems involved. Couple this with the fact that some systems don’t respond at all, and the tool is way less useful than it could be.
Seventh: The standard traceroute program is slow
The baseline traceeroute built in to both windows and unix (and macOS) runs its traces one at a time, sequentially, and blocks on one packet coming back to send the next. Tools like
mtr for macos, linux, and BSD, or Ping Plotter for Windows/MacOS do a much better job of displaying information in a reasonable form, and presenting that data over time.
Additionally, there are third party services, such as Thousand Eyes (the folks who gave me the shirt at an operators’ group meeting), that purport to do a much better job of network visibility than the single-point view you’d see in a traceroute.
That said, traceroute is available everywhere (even on most routers and switches), and it’s something you can walk anyone through running and sending you a screenshot of, even if they don’t understand what they’re looking at.
But that still doesn’t explain why I wear the shirt
It’s comfy, and it starts conversations. Do I need another reason?
Yes. When you have the job I do, “F Traceroute” has an even deeper meaning. I’ll leave that as an exercise for the reader.