2009-06-08 - dnscache fails spectacularly on high latency connections

Like most people I use dnscache extensively for name resolution within my network. All this worked fine until recently when my DSL link broke (with the telephone line) and I had to use HSDPA for external connectivity.

My HSDPA provider is supposed to provide a maximum speed of at least 1.8 Mbit/sec - of course this can be lower due to network congestion, poor signal strength, low capacity network links and so on. I don't think I have ever received the full 1800 kbit/sec from my provider, maybe nothing over 300 kbit/sec in fact.

Anyway this particular outage was particularly troublesome because I was getting only 50 kbit/sec through HSDPA ... slower than a dialup modem. It was slow in another respect too, the packet round-trip time was between 5 and 7 seconds. That's over 200 times more than a typical RTT on DSL of 25 ms.

I don't know what caused the extremely high latency, but I do know what its effect was. Dnscache failed almost completely. It would send out requests to nameservers on the internet, and not receiving any response in a reasonable time, would go on to try the next nameserver, and the next, and so on. The responses came back at some later time ... after dnscache had given up on the request (and dnscache would ignore the response). So the net effect was a storm of DNS packets sent and received, as well as ICMP port-unreachable packets when responses were received after dnscache had stopped listening.

Now that the DSL is working I am testing dnscache (from djbdns 1.05) to see the exact nature of this problem and if it can be fixed. I am using Linux's Traffic Control subsystem (see lartc.org for documentation) and specifically the 'netem' queue discipline module to simulate various amounts of latency to a virtual host.

I setup the variable latency using this script:

#!/bin/bash
#
#  Setup network delay for testing dnscache on high latency links
#  Outbound traffic on eth0 is delayed by several seconds if
#    - ip dst is 192.168.1.75

set -x

tc qdisc del dev eth0 root 2>/dev/null

# Setup root class and base rates
tc qdisc add dev eth0 root handle 1: htb default 99

tc class add dev eth0 parent 1: classid 1:1 htb rate 100mbit
# all traffic
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 100mbit
# delayed traffic
tc class add dev eth0 parent 1:1 classid 1:11 htb rate 2mbit
# default
tc class add dev eth0 parent 1:1 classid 1:99 htb rate 3500kbit

tc qdisc add dev eth0 parent 1:10 handle 10: sfq
tc qdisc add dev eth0 parent 1:11 handle 11: netem delay 5000ms
tc qdisc add dev eth0 parent 1:99 handle 99: sfq

# Move selected traffic into 1:11
tc filter add dev eth0 parent 1: protocol ip prio 1 u32 match ip dst 192.168.1.75 flowid 1:11

This script is run on my gateway and it uses the HTB queue discipline on device eth0 and some classes beneath that to filter outbound packets on that interface. Packets sent to 192.168.1.75 are delayed (by the 'netem' qdisc) by exactly 5000 msec, which is more than enough time to give dnscache a headache.

Next up I hacked dnscache to (a) run as an ordinary user, (b) listen on port 5300, and (c) show me what it is doing. Dnscache implements exponential timeouts on sent requests for 1, 3, 11 and 45 seconds. I don't know why djb chose those particular numbers. I started the server and sent a single request to lookup an 'A' record for telstra.net:

dig -p 5300 @127.0.0.1 A telstra.net +tries=1

This is what dnscache output at first:

$ ./dnscache
starting
query 1 7f000001:e4d2:6114 1 telstra.net.
tx 0 1 telstra.net. . 803f0235 c03a801e c0249411 c6290004 c707532a 80080a5a c0702404 c0cbe60a ca0c1b21 c0e44fc9 c00505f1 c021040c c1000e81
dns_transmit to 128.63.2.53, timeout set to 1
dns_transmit to 192.58.128.30, timeout set to 1
dns_transmit to 192.36.148.17, timeout set to 1
dns_transmit to 198.41.0.4, timeout set to 1
dns_transmit to 199.7.83.42, timeout set to 1
dns_transmit to 128.8.10.90, timeout set to 1
dns_transmit to 192.112.36.4, timeout set to 1
dns_transmit to 192.203.230.10, timeout set to 1
dns_transmit to 202.12.27.33, timeout set to 1
dns_transmit to 192.228.79.201, timeout set to 1
dns_transmit to 192.5.5.241, timeout set to 1
dns_transmit to 192.33.4.12, timeout set to 1
dns_transmit to 193.0.14.129, timeout set to 1

What seems to be happening here is that dnscache is looking up the root nameservers ('.') for 'telstra.net'. The hex numbers are the IPv4 addresses of each root nameserver. Dnscache tries them in order (at least, in the same order as they appear on the 'tx' line). There are 13 root nameservers and these requests appear to be issued once per second - so this process has taken 13 seconds so far. With a 5000 msec delay on the interface, 8 of those 13 requests have been replied-to, but dnscache apparently stops listening for a response as soon as its timeout expires (1 second here) and it sends the next request.

Continuing on:

dns_transmit to 128.63.2.53, timeout set to 3
dns_transmit to 192.58.128.30, timeout set to 3
dns_transmit to 192.36.148.17, timeout set to 3
dns_transmit to 198.41.0.4, timeout set to 3
dns_transmit to 199.7.83.42, timeout set to 3
dns_transmit to 128.8.10.90, timeout set to 3
dns_transmit to 192.112.36.4, timeout set to 3
dns_transmit to 192.203.230.10, timeout set to 3
dns_transmit to 202.12.27.33, timeout set to 3
dns_transmit to 192.228.79.201, timeout set to 3
dns_transmit to 192.5.5.241, timeout set to 3
dns_transmit to 192.33.4.12, timeout set to 3
dns_transmit to 193.0.14.129, timeout set to 3

Dnscache sends to the same set of 13 nameservers, but with a 3 second timeout on each. That takes 39 seconds (for a total time spent so far of 52 seconds, and we still don't know what are the nameservers for telstra.net). Continuing:

dns_transmit to 128.63.2.53, timeout set to 11
rr 803f0235 172800 1 a.gtld-servers.net. c005061e
rr 803f0235 172800 1 b.gtld-servers.net. c0210e1e
rr 803f0235 172800 1 c.gtld-servers.net. c01a5c1e
rr 803f0235 172800 1 d.gtld-servers.net. c01f501e
rr 803f0235 172800 1 e.gtld-servers.net. c00c5e1e
rr 803f0235 172800 1 f.gtld-servers.net. c023331e
rr 803f0235 172800 1 g.gtld-servers.net. c02a5d1e
rr 803f0235 172800 1 h.gtld-servers.net. c036701e
rr 803f0235 172800 1 i.gtld-servers.net. c02bac1e
rr 803f0235 172800 1 j.gtld-servers.net. c0304f1e
rr 803f0235 172800 1 k.gtld-servers.net. c034b21e
rr 803f0235 172800 1 l.gtld-servers.net. c029a21e
rr 803f0235 172800 1 m.gtld-servers.net. c037531e
rr 803f0235 172800 ns net. a.gtld-servers.net.
rr 803f0235 172800 ns net. b.gtld-servers.net.
rr 803f0235 172800 ns net. c.gtld-servers.net.
rr 803f0235 172800 ns net. d.gtld-servers.net.
rr 803f0235 172800 ns net. e.gtld-servers.net.
rr 803f0235 172800 ns net. f.gtld-servers.net.
rr 803f0235 172800 ns net. g.gtld-servers.net.
rr 803f0235 172800 ns net. h.gtld-servers.net.
rr 803f0235 172800 ns net. i.gtld-servers.net.
rr 803f0235 172800 ns net. j.gtld-servers.net.
rr 803f0235 172800 ns net. k.gtld-servers.net.
rr 803f0235 172800 ns net. l.gtld-servers.net.
rr 803f0235 172800 ns net. m.gtld-servers.net.
rr 803f0235 172800 28 a.gtld-servers.net. 20010503a83e00000000000000020030
stats 1 945 1 0
cached 1 a.gtld-servers.net.
cached 1 b.gtld-servers.net.
cached 1 c.gtld-servers.net.
cached 1 d.gtld-servers.net.
cached 1 e.gtld-servers.net.
cached 1 f.gtld-servers.net.
cached 1 g.gtld-servers.net.
cached 1 h.gtld-servers.net.
cached 1 i.gtld-servers.net.
cached 1 j.gtld-servers.net.
cached 1 k.gtld-servers.net.
cached 1 l.gtld-servers.net.
cached 1 m.gtld-servers.net.

Dnscache has finally increased its timeout to 11 and after another 5 seconds (total time elapsed now 57 seconds) it receives a response. Now it knows the nameservers for the 'net' top-level domain. There are 13 of them and so it's going to take another 57 seconds before it learns the 4 nameservers for 'telstra.net', and then another 21 seconds to learn that there is actually no 'A' record for 'telstra.net'. That's 135 seconds total time. I don't know how long clients typically wait for a response but it's a lot less than that.

Clearly dnscache should implement two timeouts per request: one for sending a second request (to another nameserver) for the same information, and one to give up waiting for a response from the first request. The 2nd timeout should be much longer than the first.

If dnscache was modified to wait up to 10 seconds for a response but try each successive nameserver after 1 second, then it should be possible for dnscache to answer the query within 15 seconds, which is reasonable in this context. In these calculations I'm assuming that dnscache has no existing cache (other than the list of root nameservers) because that makes dnscache's behaviour predictable, and it shows us the worst case performance.

The next step for me is to work out if dnscache can have multiple outstanding requests to nameservers for the same client request. Obviously dnscache can handle multiple concurrent client requests and must query many different nameservers concurrently, but the question is whether it can query multiple nameservers for the same information, at more or less the same time, and use the first response that is received.