DNS Failover / High Availability

From ben.goodacre.name/tech

Jump to: navigation, search

The native Linux resolver with default settings cannot be relied upon for HA / failover. By default there are two attempts each with a 5 second timeout. Therefore every query has to wait 10 seconds before the secondary DNS server is queried. Linux does not natively mark a DNS node is 'dead' and send all queries to a secondary node, it try the primary upon every single query.



Tweaking resolv.conf

Firstly get some stats on how long querys take on your network. BIND does this natively through stats. Once an appropriate maximim query time is defined. The timeout that the Linux resolver waits for before trying to secondary node can be tweaked, this is done through the options timeout:n directive where n isthe number of seconds.

Another directive options rotate can be used to provide pseudo-load-balancing. It is pseudo-load-balancing as upon testing the primary node still receives double the queries than the secondary. This is better than 99% of queries as it is without this directive being applied. Testing with ping will not proove that it is working as a new process will restart the round-robin.

Example config

search dc2.uk.eu.company.loca. uk.eu.company.local eu.company.local
options timeout:2 rotate

This will give two attempts (default) each with a timeout of 2 seconds = 4 seconds until the secondary is called. A more aggressive config would be options timeout:1 attempts:1 rotate which can be used so long as there is no recursion in your environment, otherwise queries which may otherwise have returned may fail.


Personal tools