David Madore's WebLog: IPv6, MTU woes, and a memo on packet sizes

My parents' Internet Provider (Orange) offers, as yet, no IPv6 connectivity, only IPv4, at least to non-business clients. (Did I mention how much I loathe the whole concept of having business or pro services?) Can you imagine that? IPv4 is soooo 20th-century. I kid, but I tend to think they legally shouldn't be allowed to call it Internet access if it doesn't include the v6 Internet: I'm sure if they had a choice between getting their act together and having to relabel their advertisements, they'd somehow discover the means to make it happen very fast. But I digress. I use IPv6 a lot because I have it at home and at work, and it really makes my life simpler when it comes to connecting to N different computers which are hidden behind a single v4 IP (at my parents' place, N≥6 not counting mobile phones and such, thanks to my father's technology buying sprees).

Up to recently, for my parents' access, I relied on 6to4 (the magic 2002::/16 space which provides every IPv4 with a septillion v6 addresses): 6to4 is really nice because it works without having to register anything: in one direction, you just encapsulate your v6 packets and send them to the magic v4 multicast address 192.88.99.1 and some good Samaritan sends them to the v6 Internet, and in the other direction, some other good Samaritan encapsulates packets sent to your 2002:xxyy:zztt::/48 space and delivers them to your v4 address. Exactly who these good Samaritans are, and how come they haven't closed everything down because of security concerns, beats me; but 6to4 is remarkably simple to set up, and allows anyone with a fixed IPv4 address not sitting behind a fascist firewall to gain IPv6 connectivity instantly. Neat. (In fact, it can even be useful to work around a stupidly configured firewall: the network administrators at my alma mater — I mean the ENS — refuse to learn anything about IPv6, they have stupid firewalls set up in various places, such as on the library Wifi or Ethernet, and these firewalls will let 6to4 happily through, which makes the whole thing a bit inane.)

But for some reason, Orange recently stopped routing to 192.88.99.1. Whether this is intentional is debatable; I don't know. But at any rate, I lost 6to4 because of this. I might have used some other (unicast) encapsulating Samaritan instead, like 194.95.109.156 ≡ 6to4.ipv6.fh-regensburg.de, but that didn't seem very robust or practical, so I decided to get a dedicated (static) tunnel instead.

Fortunately, a tunnel is available free of charge from Hurricane Electric. Actually, I'm pretty amazed at how simple it is: registering an account and setting up a tunnel only takes a few minutes, and they give you a /48 on demand (plus a /64 and a bonus /128). In contrast, the other tunnel provider I know of, SixXS, is really inquisitive, they practically ask for your grandmother's maiden name before you can get a tunnel (and you have to promise a lot of strange things, like keeping the tunnel up 24/7 — sort of bizarre, I don't see why they would care). Anyway, I got one for my parents, and I can really say I recommend Hurricane Electric.

⁂

But there is one thing which is the scourge of tunnels in general, and IPv6 tunnels in particular: MTU woes.

The MTU (Maximal Transmission Unit) is the maximal size of a network packet that can go through an interface. The typical value would be 1500, which is the MTU of a standard Ethernet interface (without the so-called jumbo frames extension). When a packet is too large to go through a network interface in a single piece, there are basically two possibilities: either a packet is returned to the sender with the datagram too large error (ICMP), or the packet is fragmented in smaller pieces. The latter is problematic for various reasons and tends to be avoided. The former is nice, except that various sorts of reasons, all a variant of the stupid network administrator kind, can cause ICMP packets (hence, the error) to be lost.

When a packet must go through a tunnel, things get worse: the tunnel appears as a single network interface, but it might not know what its own MTU is, because that depends on every intervening link on the tunnel's route (which might not even be constant). If the tunnel allows fragmentation, this is not really a problem (the MTU is then pretty arbitrary), but it typically doesn't, because fragmentation requires more work of the tunnel endpoints and causes inefficiency. If the tunnel does not allow fragmentation, it should, ideally, maintain its MTU dynamically by starting with some standard value, ajusting it whenever it receives a datagram too large error, and sometimes tentatively increasing it again in case the route has changed to something better. I'm afraid real tunnels rarely do this.

Hurricane Electric's tunnel won't fragment, and apparently autocratically sets its MTU to 1480: incoming packets larger than this receive a datagram too large ICMP message from HE's end of the tunnel, and smaller packets try to go through. But they may not succeed: and if they don't, even though they may receive a packet too large at the v4 layer supporting the tunnel, this won't be translated to the corresponding error at the v6 layer. So the tunnel is a black hole for packets with a size just below 1480 but larger than the true MTU (which should rather be called MRU, Maximal Reception Unit, in this context, since I am talking about packets which, seen from my end of the tunnel, are inbound).

This is unfortunately a very common problem. Its general consequence is that connections will simply freeze, because they are established without problem, but once an inbound packet is sent that is too large, it will never go through, and typical TCP hosts use no mechanism to dynamically adjust the path MTU (more about this in a minute).

In my case, my parents' Internet connection by ADSL seems to have an MTU of 1456 (measured at the PPP level). I don't know why it's so low: my own home Internet connection, also by ADSL, apparently has 1492, which is normal for PPPoE (the PPPoE protocol is used to communicate with the ADSL modem by encapsulating PPP data over Ethernet frames; for some reason I can't imagine, the modem does not decapsulate the Ethernet frames but rather sends them directly on the ATM link used by ADSL: it is thus more accurately PPPoEoA; since for some other reason I also can't imagine fragmentation of PPP data is also forbidden between Ethernet frames, the link MTU is limited by the Ethernet frame size, 1500, minus the PPPoE overhead, 8). The PPPoA protocol would give a better — higher — MTU, but there doesn't seem to be a way to use it while doing the PPP negociation on the PC rather than on the modem: I already complained about this a while ago.

The problem isn't specific to IPv6: the same problem can occur at the IPv4 level: my parents' Internet connection seems to lose packets of size greater than 1456 even at the v4 level (which is all Orange provides, anyway), with no kind of error packet being received. However, of course, Internet providers will do all that is necessary for ordinary IPv4 TCP connections to work: typically they will clamp the TCP maximal segment size to the right value. (This is maddeningly and mind-bogglingly stupid, when you think of it: instead of making sure they send out the right error messages when too large a packet is received, they prefer to mangle TCP connections.) They won't do that at the v6 level, so the subscriber has to do it.

In principle, there's a mechanism, suggested by RFC 4821, which explains how to work around MTU problems by letting the transport layer (typically TCP) discover the right path MTU by itself. Unfortunately, this mechanism (which is enabled under Linux by setting the sysctl net.ipv4.tcp_mtu_probing — despite its name, I'm told this also concerns IPv6) needs to be enabled at both ends of the connection for it to be of real use: and at any rate, since my problem is that my Internet connections won't receive datagrams that are too large, I would need it enabled on hosts that talk to me, and that I can't control. Bad luck.

Instead, one must resort to a hack called TCP MSS clamping (as I mentioned above, it seems that many Internet providers do it anyway, at the v4 level), which consists of letting the router alter TCP packets to make sure their maximal segment size (MSS) is low enough. Under Linux, this is done with something like: /sbin/ip6tables -t mangle -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu (this is necessary only on the router, for TCP connections that are being routed through it: for inbound or outbound TCP connections, the host will know the correct MSS anyway). Unsatisfactory, because it will not save SCTP, but as my father likes to say, you can't make a silk purse out of a sow's ear.

⁂

One of the annoying aspects of this tinkering with packet sizes is that every tool seems to measure packet size differently. The TCP maximal segment size, for example, is not equal to the MTU because of the overhead brought by IP and TCP headers. Every time I have to fight this battle, I must figure out the right constants. So, in an effort to save myself some efforts the next time, and perhaps be of use to someone else, here is something I wrote down:

IP packet size = IP header size (20 for IPv4, 40 for IPv6) + IP payload; IP packet size is bounded by interface MTU. For TCP: IP payload = TCP header size (20) + TCP segment size; TCP segment size = TCP segment payload + TCP options size (typ. 12), but TCP segment size (not payload) is bounded by MSS; so MSS is typically MTU−40 for IPv4, MTU−60 for IPv6. For ping: ping -s defines ICMP payload size; IP payload size = ICMP payload size + ICMP header size (8); so IP packet size is ICMP payload + 28 for IPv4, + 48 for IPv6. Ethernet frame: Ethernet header size (14) + IP packet size + Ethernet CRC (4) (but this size is hardly ever useful; 1500 is Ethernet standard MTU). Tunnel overheads: PPPoE adds 8 bytes (PPPoE headers (6) + PPP headers (2)); 6in4 adds 20 bytes (IPv4 payload of 6in4 = IPv6 total packet size).