Best Practice: Reverse DNS Zones

aaron's picture

The general rule with Xsan, since version 1.1 or so, is this: DNS isn't necessary, but if you have some you had better make sure it is perfect. A few of us are beginning to suspect that DNS is, in fact, required, although in a very obscure way. It will take me several paragraphs to explain why, but let me get to the bottom line first:

Your DNS servers should include a zone for reverse lookups of your metadata (private) IP range. Ask your DNS administrator to create a reverse zone for this range, with SOA and NS records. PTR records aren't needed.

Read more to find out why.

Contents

I'm going to start with an overview of DNS. Skip down for the better stuff.

A review of DNS

The Domain Name Service is a system that translates names into IP numbers. Most people (a few of you excepted) find it easier to remember names (like "www.adic.com") than numbers (like "63.81.117.101"). An essential part of every network, a DNS server is expected to quickly and fully resolve names given to it, or to at least to quickly return a negative response. This is accomplished via a tremendous distributed database that spreads all over the Internet, mostly hierarchically, from the 13 root name servers, through the servers for each top-level domain, down to individual servers for each organization or smaller unit.

A query to your DNS server goes like this (type "dig +trace www.adic.com" to follow along):

First, the DNS servers need to know the addresses of the root DNS servers. These are stored in a local file on the server.

.                       399215  IN      NS      G.ROOT-SERVERS.NET.
.                       399215  IN      NS      H.ROOT-SERVERS.NET.
.                       399215  IN      NS      I.ROOT-SERVERS.NET.
.                       399215  IN      NS      J.ROOT-SERVERS.NET.
.                       399215  IN      NS      K.ROOT-SERVERS.NET.
.                       399215  IN      NS      L.ROOT-SERVERS.NET.
.                       399215  IN      NS      M.ROOT-SERVERS.NET.
.                       399215  IN      NS      A.ROOT-SERVERS.NET.
.                       399215  IN      NS      B.ROOT-SERVERS.NET.
.                       399215  IN      NS      C.ROOT-SERVERS.NET.
.                       399215  IN      NS      D.ROOT-SERVERS.NET.
.                       399215  IN      NS      E.ROOT-SERVERS.NET.
.                       399215  IN      NS      F.ROOT-SERVERS.NET.
;; Received 388 bytes from 192.168.1.2#53(192.168.1.2) in 15 ms

The first dot (".") represents the DNS root. "NS" means the following name is the authoritative name server responsible for that (root) domain.

Next, the address www.adic.com is broken apart in reverse order. The DNS server looks to see who is responsible for names ending in ".com", by asking a random server from the above list:

com.                    172800  IN      NS      L.GTLD-SERVERS.NET.
com.                    172800  IN      NS      M.GTLD-SERVERS.NET.
com.                    172800  IN      NS      A.GTLD-SERVERS.NET.
com.                    172800  IN      NS      B.GTLD-SERVERS.NET.
com.                    172800  IN      NS      C.GTLD-SERVERS.NET.
com.                    172800  IN      NS      D.GTLD-SERVERS.NET.
com.                    172800  IN      NS      E.GTLD-SERVERS.NET.
com.                    172800  IN      NS      F.GTLD-SERVERS.NET.
com.                    172800  IN      NS      G.GTLD-SERVERS.NET.
com.                    172800  IN      NS      H.GTLD-SERVERS.NET.
com.                    172800  IN      NS      I.GTLD-SERVERS.NET.
com.                    172800  IN      NS      J.GTLD-SERVERS.NET.
com.                    172800  IN      NS      K.GTLD-SERVERS.NET.
;; Received 502 bytes from 192.112.36.4#53(G.ROOT-SERVERS.NET) in 81 ms

Next, we ask to find out the servers responsible for "adic.com":

adic.com.               172800  IN      NS      auth10.ns.wcom.com.
adic.com.               172800  IN      NS      auth20.ns.wcom.com.
adic.com.               172800  IN      NS      ns01hq.adic.com.
;; Received 149 bytes from 192.41.162.30#53(L.GTLD-SERVERS.NET) in 4312 ms

Finally, we ask one of these for the address ("A") record for "www.adic.com":

www.adic.com.           21600   IN      A       63.81.117.101
;; Received 204 bytes from 198.6.100.21#53(auth10.ns.wcom.com) in 204 ms

All this needs to happen with every new name that your Mac encounters. Intelligently, both DNS servers and hosts cache the results, so repeated queries don't go through so much trouble.

Reverse DNS?

DNS also handles queries in reverse, from an address to a name. The method for this is either brilliant or an ugly hack, depending on your state of mind.

So say you want to reverse lookup 63.81.117.101. The address is inverted as 101.117.81.63, then suffixed with the reverse domain, in-addr.arpa. (.arpa is a special top-level domain.) So the lookup is done on the unwieldy address 101.117.81.63.in-addr.arpa. (use "dig +trace 101.117.81.63.in-addr.arpa." or, more simply, "dig +trace -x 63.81.117.101".)

Again, the query begins with root servers:

.                       397910  IN      NS      A.ROOT-SERVERS.NET.
.                       397910  IN      NS      B.ROOT-SERVERS.NET.
.                       397910  IN      NS      C.ROOT-SERVERS.NET.
.                       397910  IN      NS      D.ROOT-SERVERS.NET.
.                       397910  IN      NS      E.ROOT-SERVERS.NET.
.                       397910  IN      NS      F.ROOT-SERVERS.NET.
.                       397910  IN      NS      G.ROOT-SERVERS.NET.
.                       397910  IN      NS      H.ROOT-SERVERS.NET.
.                       397910  IN      NS      I.ROOT-SERVERS.NET.
.                       397910  IN      NS      J.ROOT-SERVERS.NET.
.                       397910  IN      NS      K.ROOT-SERVERS.NET.
.                       397910  IN      NS      L.ROOT-SERVERS.NET.
.                       397910  IN      NS      M.ROOT-SERVERS.NET.
;; Received 420 bytes from 192.168.1.2#53(192.168.1.2) in 2 ms

It continues with the herby servers responsible for reverse lookups on 63.x.x.x:

63.in-addr.arpa.        86400   IN      NS      chia.ARIN.NET.
63.in-addr.arpa.        86400   IN      NS      dill.ARIN.NET.
63.in-addr.arpa.        86400   IN      NS      BASIL.ARIN.NET.
63.in-addr.arpa.        86400   IN      NS      henna.ARIN.NET.
63.in-addr.arpa.        86400   IN      NS      indigo.ARIN.NET.
63.in-addr.arpa.        86400   IN      NS      epazote.ARIN.NET.
63.in-addr.arpa.        86400   IN      NS      figwort.ARIN.NET.
;; Received 195 bytes from 198.41.0.4#53(A.ROOT-SERVERS.NET) in 76 ms

Then with the second octet 63.81.x.x:

81.63.in-addr.arpa.     86400   IN      NS      AUTH03.NS.UU.NET.
81.63.in-addr.arpa.     86400   IN      NS      AUTH00.NS.UU.NET.
;; Received 95 bytes from 192.5.6.32#53(chia.ARIN.NET) in 4206 ms

Then 63.81.117.x:

117.81.63.in-addr.arpa. 21600   IN      NS      ns01hq.adic.com.
117.81.63.in-addr.arpa. 21600   IN      NS      auth02.ns.uu.net.
117.81.63.in-addr.arpa. 21600   IN      NS      auth60.ns.uu.net.
;; Received 124 bytes from 198.6.1.83#53(AUTH03.NS.UU.NET) in 74 ms

And finally with the record we want:

101.117.81.63.in-addr.arpa. 21600 IN    PTR     www.adic.com.
;; Received 558 bytes from 63.81.117.5#53(ns01hq.adic.com) in 146 ms

So the servers ns01hq.adic.com, auth02.ns.uu.net, and auth60.ns.uu.net handle reverse lookups for ADIC. Not surprisingly, these are the same servers that handle the forward lookups.

Reverse lookups on private IP ranges

Three IP ranges are reserved for non-routable intranets, and are therefore commonly used for the private metadata network on Xsans:

  • 10.0.0.0 - 10.255.255.255
  • 172.16.0.0 - 172.31.255.255
  • 192.168.0.0 - 192.168.255.255

So who is responsible for reverse lookups on these ranges?

baa:~ aaron$ dig +trace -x 10.1.1.1

; > DiG 9.2.2 > +trace -x 10.1.1.1
10.in-addr.arpa.        86400   IN      NS      BLACKHOLE-1.IANA.ORG.
10.in-addr.arpa.        86400   IN      NS      BLACKHOLE-2.IANA.ORG.
;; Received 99 bytes from 192.112.36.4#53(G.ROOT-SERVERS.NET) in 63 ms

10.in-addr.arpa.        604800  IN      SOA     prisoner.iana.org. 
hostmaster.root-servers.org. 2002040800 1800 900 604800 604800
;; Received 116 bytes from 192.175.48.6#53(BLACKHOLE-1.IANA.ORG) in 103 ms

What about for all 10.x.x.x addresses?

baa:~ aaron$ dig +short ns -x 10   
blackhole-2.iana.org.
blackhole-1.iana.org.

And the 192.168.x.x range?

baa:~ aaron$ dig +short ns -x 192.168
blackhole-1.iana.org.
blackhole-2.iana.org.

And even 172.16.x.x?

baa:~ aaron$ dig +short ns -x 172.16
blackhole-2.iana.org.
blackhole-1.iana.org.

What's that? Two servers handle the reverse DNS lookups for all possible private network ranges? $30 home routers are installed in just about every home in the U.S., at least, and almost all of them use an address in the 192.168.x.x range. And every time one of them decides to ask, "what's the name of my peer that just sent that request," the answer is routed to one of those two servers.

I found an FAQ on the blackhole servers, and this interesting tidbit:

Q5: How busy are the blackhole servers?

A5: While rates vary, the blackhole servers generally answer thousands of queries per second. In the past couple of years the number of queries to the blackhole servers has increased dramatically. It is believed that the large majority of those queries occur because of "leakage" from intranets that are using the RFC 1918 private addresses. This can happen if the private intranet is internally using services that automatically do reverse queries, and the local DNS resolver needs to go outside the intranet to resolve these names. For well-configured intranets, this shouldn't happen. Users of private address space should have their local DNS configured to provide responses to inverse lookups in the private address space.

I added the emphasis at the end. Sure enough, trying out some queries yesterday, I got this response (slightly truncated):

aaron-g5:~ aaron$ dig +trace -x 192.168.1.1

; > DiG 9.2.2 > +trace -x 192.168.1.1

168.192.in-addr.arpa.   86400   IN      NS      blackhole-1.iana.org.
168.192.in-addr.arpa.   86400   IN      NS      blackhole-2.iana.org.
;; Received 102 bytes from 192.31.80.32#53(indigo.ARIN.NET) in 26 ms

;; connection timed out; no servers could be reached

Maybe the servers were too busy to handle my request.

What's this all to do with Xsan?

I'd say close to 100% of the Xsans that use private metadata networks use one of the three ranges listed above. And in many, probably close to most, of these SANs never bothered with DNS on the private network. I don't mean you need DNS servers on your private network; these probably wouldn't be used anyway. I mean adding an appropriate reverse zone (ending with ".in-addr.arpa") on your public network's existing DNS servers.

Now I can tell you for sure that the Xsan client and the MDCs are going to attempt reverse DNS lookups on the private network IPs. I don't know why -- maybe for logging, maybe for security, or maybe it is a bug. If the Xsan client gets a valid PTR response, great! If it gets a negative response, great! But if it gets no response, if there is a timeout, or if the PTR is incorrect, your SAN won't start.

Put these two facts together, and you come to the uncomfortable but logical conclusion that nearly every Xsan in the world relies on the responsiveness of two obscure servers on the Internet, blackhole-1.iana.org and blackhole-2.iana.org.

The bottom line, once again

Get your SAN out of the blackhole! As the Blackhole FAQ states:

Users of private address space should have their local DNS configured to provide responses to inverse lookups in the private address space.

The person who set up your DNS should be able to do this with no trouble. You won't need actual records for the hosts on the private range, just a NS (nameserver) and SOA (Start of Authority) record. The DNS server will then send quick negative responses to any queries, without forwarding requests to the Blackhole IANA servers.

How to add private records to your DNS server

By no means should you set up a new DNS server in an existing network environment without a lot of careful planning. There are standard options used in corporate environments that are not available in OS X Server Admin's GUI. Leave off those options, and you can easily screw the people you are trying to help.

If you are already using Mac OS X Server as a DNS server, then you are the one who needs to add the zone. There's no way in Server Admin to add an empty zone, but if you add forward records for a host or two on your private LAN, Server Admin will create the reverse zone.

I'd recommend adding DNS records for MDCs. You probably already have records that point to the public IP addresses of your MDCs. When adding records for the private IP addresses, make sure the names you use are different than the names that resolve to the public addresses. I recommend something like "mdc1-private.company.com".


Private records in Server Admin's DNS module

Two common client problems: All your SAN MDCs and clients will need the IP address of this server in their Network preference pane, under the primary (public) interface. And in the Network preference pane, never mix DNS servers you control with those you do not control. Every server listed there must return identical information for each query, or else you'll get intermittent incorrect responses.

And what's this got to do with iTunes?

Well, nothing. But it's beginning to be clear that the iTunes "issue" isn't so clear. It may have nothing to do with iTunes, too.

So what's hapenning? Perhaps the blackhole servers were experiencing problems last week. I looked for two days but found no reports like this. Or maybe something in the Mac OS changed to do reverse lookups more frequently.

My personal suspicion is that this issue has been responsible for many of the reports we've heard of in the last couple of weeks. The symptoms certainly sound indicative: unplug the public network (or remove DNS) and the SAN starts. The DNS fix was just the trick for me on a Labor Day 11pm "SAN Down" call. The corporate Internet was down, so the Blackhole servers weren't accessible.

I look forward to the comments, especially the "I tried everything you said but still have the same problem" ones. Best of luck!

Comments

9

He aaron, thank you for the great post, gave me a good reason to finally
subscribe to this forum and hopefully start contributing. So as a short
introduction: I work for an apple reseller doing mostly Audio/Video/XSan
related things here in the Netherlands.
One of the things that I still had on my 'have to test' list was that seriuos
hardware failover still gives some setups I'm involved with huge problems.
(Sort of like
this post
kind of stuff). Last week at IBC some of the Apple people told
me to run DNS on the private network as well, so thanks for the long
explanation.
One question just to make sure, do I create the DNS records on the public
network, even though they apply to the private? It seems to me from the
article that this is the case? Creating it on the private network would not
make sense in the failover scenario, since DNS would not be completely
available if it was for instance run on the primary MDC...

---
--inside the machine--

aaron's picture

Correct -- your DNS servers should be on the public network. You can run them
on your MDCs if you are brave. And since DNS is so critical to a working SAN,
you should always have more than one DNS server, in a www.zytrax.com/books/dns/ch4/">master-slave setup.

---
Aaron Freimark
http://www.tekserve.com/vcard/af.vcf

Aaron Freimark
CTO, Tekserve

Nathan's picture

We had to go with the /./etc/hosts file solution in our environment as we
have local DNS but getting anything altered on it is a political nightmare...as
it was not the most elegant of solutions with ARD 3 it is very easy and it has
provided us with the best performance we have had since we installed the
SAN nearly two years ago.

We were prior to the update having slow Xsan admin performance and
starting on September 4th our SAN became nearly unusable. Apple
recommended an upgrade to Xsan 1.4 which of course did not solve the issue
and after days of troubleshooting the only fix that worked was the hosts
file...and immediately following reboot all of our systems were working
flawlessly (that was a long week).

Another DNS issue to check is make sure the machine's hostconfig file does
not contain an fqdn rather than -Automatic-. In early Xsan installations under
10.2 we were instructed to change the hostconfig file...now that causes a
servermgrd communication error as well as constant reverse lookup errors in
the system log even though the hostname is correct.

Not sure what changed on September 4th, as we didn't do any major updates,
but our DNS team may have made a minor change and not notified us, but
the hosts file solution solved everything and now the system is more
responsive than ever.

ibgarrett's picture

Great article, and I've never even thought to do that for the Xsan. I am
wondering however, couldn't I just enter in that information into the hosts file
(assuming it's a small network of xserves/xsan) rather than mess around with
the dns portion of things?

Thanks,

Brian
brian@garrett.net

aaron's picture

With quick testing, it seems you can use /etc/hosts. I'll let someone else post
instructions. (I prefer doing this in DNS, since /etc/hosts is easily forgotten
during upgrades, moves, etc.)

---
Aaron Freimark
http://www.tekserve.com/vcard/af.vcf

Aaron Freimark
CTO, Tekserve

MattG's picture

You _could_ add entries to /etc/hosts - but it's not too elegant a solution, since all of the individual files would need to be modified if the SAN were expanded, etc.

---
Matt Geller
Meta Media™ Creative Technologies
Consulting & Integration | Proactive & Reactive Maintenance
Xsan Integration Specialists

Rupert Watson's picture

Aaron

I have always created XSAN environments where the metadata travels on the 10.0.0.x network and the OD authentication is done over the 192.168.1.x network

MDC2 is the DNS server for this self-contained network and for the OD etc

In this situation there is no entry in the XSAN MDC's Network Sys Pref at all. Are you saying there should be?

What I do see is a propensity on _some_ of the clients to sometimes not "see" the SAN unless the public ethernet is unplugged and the client rebooted. This can be fine for weeks and then occur. Would that be helped by doing as you suggest here?

As I said, the issue does not seem to be the servers, but the clients. They have DNS entries that point to the MDC2 and are authenticating against on OD server

aaron's picture

That sounds like it could be the issue above. Read the article very
carefully, and add entries for your private IP addresses into your DNS server.
But note the
following first:

  • Changes must be made to the DNS servers listed in the client
    Network preference pane. So MDC2's IP should be in the list.
  • Only the DNS servers listed in the primary ("public" or OD) interface of
    the Network preference pane are ever consulted. DNS servers listed in
    secondary interfaces are ignored.
  • Make sure all DNS servers you list in the Network preference pane return
    exactly the same answers for all queries. Adding corporate or ISP DNS server
    addresses to your locally managed ones doesn't provide more stability, only
    more intermittent problems.
  • You have a second MDC because you expect one two fail eventually. Do
    you have a second DNS server as well?

---
Aaron Freimark
http://www.tekserve.com/vcard/af.vcf

Aaron Freimark
CTO, Tekserve

esm's picture

Hi:

I run local DNS servers and have for many years although I'm certainly no DNS
expert. I use a tool called DNS Enabler from Cutedge Systems (cutedge.com).
It's a $25 tool and makes setting up and configuring DNS easy enough for even
me to do. I run primary DNS on an Intel iMac and secondary on a Mac Mini. I'm
not affiliated with the company, just a very satisfied customer.

Eric