Monday, January 23, 2012

DNS and SOPA

In light of the brew-ha-ha over SOPA and some recent DNS changes we made at Flex, I found myself in the position of explaining DNS to my family and others over the weekend -- and it occurred to me that DNS is one of those aspects of the Internet that very few lay people know about, and even those of us who work in technology everyday only have a cursory understanding of it.

The various political perspectives on SOPA have been well covered.  I can't really add much to the debate and this is a technical blog, not a political blog.  With that in mind, I'll stop at pointing out that SOPA has a number of enforcement mechanisms subject to preliminary injunctions (meaning the claims against a web site don't have to be proven in court before the site's taken down), but one of the most alarming is the one related to DNS filtering.  Here's the relevant portion of the bill (H.R. 3261) in its current form:

A service provider shall take technically feasible and reasonable measures designed to prevent access by its subscribers located within the United States to the foreign infringing site (or portion thereof) that is subject to the order, including measures designed to prevent the domain name of the foreign infringing site (or portion thereof) from resolving to that domain name's Internet Protocol address.
This means creating a DNS blacklist.  Rather than expatiate about why this is or isn't a good idea, it might be a better idea to explain how DNS works first and save conclusions for later.

IP Addresses

First off, the servers on the Internet -- and all the client machines that browse it, are assigned unique numbers called IP addresses.  You've probably seen these before.  They look like this: 66.29.188.245.  They have four numbers separated by dots.  Each number is called an octet, because each number actually represents a byte, which is eight bits -- hence 'octet'.  Since each IP address has four octets and each octet is 8 bits, full IP addresses are 32 bits.  And since each bit can only have two discrete values (0 and 1) and 2 raised to the 32nd power is around 4.2 billion, that means there can only be 4.2 billion discreet IP addresses.

Last time I checked there were roughly 7 billion people in the world, many of whom have devices that need IP addresses: laptops, cell phones, iPads, desktop computers, video game consoles - and increasingly, consumer electronics.  My Blu-Ray player and Roku box, for example, connect to the Internet, which means they also need IP addresses.  Then there's the server side of the equation: millions of servers in concrete bunkers, all of whom also need their own unique address, as does every router, firewall, wi-fi hotspot and networking appliance in between.  A technique called Network Address Translation can be used to stretch the limits of scarce IP addresses, but the problem remains.

The telephone industry was hit with the same problem not long ago, when cell phones, pagers and fax machines started to tax the quantity of available phone numbers.  The phone companies responded by introducing more area codes.  The Internet world responded by introducing a new kind of IP address called IPv6.  (The old kind is called IPv4.)  IPv6 raises the number of bits from 32 to 128.  The total number of addresses in a 128 bit space is on the order of 10^38, a number so big that most of us have never even heard the word used for it (one hundred sextillion).  That should hold us for a while.

How all this relates to DNS is that IPv4 addresses are hard enough to remember.  Take a look at an IPv6 address: [2001:db8:85a3:8d3:1319:8a2e:370:7348]. Good luck memorizing that or writing a catchy jingle to drive customers to your web site.  Any marketing strategy that requires the consumer to understand hexadecimal might be doomed to failure.  Just sayin'.


We Need Names


IP addresses govern how network devices are identified and how information moves between them.  But, we give web sites names like google.com because it's much easier to remember those names than IP addresses. 

We used to map these obscure addresses to server names using a text file called hosts or hosts.txt.  This file was (and is) a simple list of IP addresses mapped to the names by which they can be referred.  If you search your computer for this file, you'll find it hidden away somewhere.  Here's the one from my computer:
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting.  Do not change this entry.
##
127.0.0.1       localhost
192.168.1.101   local-dev-db
255.255.255.255 broadcasthost


Each computer had a copy of this file.  Every computer still does in fact, though it's only used to override DNS these days.

Domain Name System

Then along came DNS, or the Domain Name System, which is, in essence, a distributed database of IP addresses and their names.  When you type in a web address, like www.flexrentalsolutions.com, your computer must first determine the IP address to connect to.  It does this using a piece of software called a resolver.  The resolver first checks the hosts file on your computer.  If the domain name isn't found there, as it likely won't be, it refers the request to your DNS server.

You have a DNS server?  Yes, you do.  You have several.  Usually they're set by your ISP or corporate network using a networking protocol called DHCP.  DHCP is part of the handshake that puts your computer on the Internet when you first turn it on or connect to a wi-fi hotspot.  If you want to see what your DNS servers are, you can open a command shell on Windows and type ipconfig, ifconfig on Linux.  On Mac OS, you can take a look at the Network applet in System Preferences.  Here's what mine looks like:


As you can see, my ISP, Comcast, has assigned me the DNS servers 75.75.75.75 and 75.75.76.76.  I could change these to any value that I like, but I'm not Kevin Mitnick and SOPA isn't law yet.

So, in my case, if I type www.flexrentalsolutions.com into my web browser, my computer will contact 75.75.75.75 or 75.75.76.76 and ask these servers for the correct IP address.  Does this mean that 75.75.75.75 and 75.75.76.76 know the correct IP address?  Maybe, but probably not.  They do know how to get it however.

Because DNS is a distributed database, none of the information exists in one place.  It's scattered all over the planet in caches and zones across thousands of servers.  Let's walk through a simple request and assume (wrongly) that Comcast's DNS servers cache nothing - not even top level domains.  To begin with, polling all those thousands of DNS servers until you find one with information is not a very practical idea.  There has to be a way to efficiently sift through them for the correct server and that information is the basis of the domain name system itself, it's the reason we have .com and .org and .net.  These are called top level domains.

When my www.flexrentalsolutions.com request comes in, if the DNS server has no cached information, it will first have to determine which server has information about .com domains.  It must ask a root server - and there are only 13 of them.  They're maintained by the Department of Commerce in conjunction with ICANN.

The Root server will return something called a SOA (Start Of Authority) record, or a list of DNS servers that can be consulted for more information about .com domains.  Using a tool called nslookup on my Mac, here's what I got when I queried the root DNS server operated by the US Army Research Lab for www.flexrentalsolutions.com:

QUESTIONS:
    www.flexrentalsolutions.com, type = A, class = IN
    ANSWERS:
    AUTHORITY RECORDS:
    ->  com
    nameserver = f.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = b.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = i.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = d.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = e.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = m.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = c.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = k.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = g.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = h.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = l.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = a.gtld-servers.net.
    ttl = 172800
    ->  com
    nameserver = j.gtld-servers.net.
    ttl = 172800
    ADDITIONAL RECORDS:
    ->  a.gtld-servers.net
    internet address = 192.5.6.30
    ttl = 172800
    ->  b.gtld-servers.net
    internet address = 192.33.14.30
    ttl = 172800
    ->  c.gtld-servers.net
    internet address = 192.26.92.30
    ttl = 172800
    ->  d.gtld-servers.net
    internet address = 192.31.80.30
    ttl = 172800
    ->  e.gtld-servers.net
    internet address = 192.12.94.30
    ttl = 172800
    ->  f.gtld-servers.net
    internet address = 192.35.51.30
    ttl = 172800
    ->  g.gtld-servers.net
    internet address = 192.42.93.30
    ttl = 172800
    ->  h.gtld-servers.net
    internet address = 192.54.112.30
    ttl = 172800
    ->  i.gtld-servers.net
    internet address = 192.43.172.30
    ttl = 172800
    ->  j.gtld-servers.net
    internet address = 192.48.79.30
    ttl = 172800
    ->  k.gtld-servers.net
    internet address = 192.52.178.30
    ttl = 172800
    ->  l.gtld-servers.net
    internet address = 192.41.162.30
    ttl = 172800
    ->  m.gtld-servers.net
    internet address = 192.55.83.30
    ttl = 172800
    ->  a.gtld-servers.net
    has AAAA address 2001:503:a83e::2:30
    ttl = 172800
If I query the same root server just for com as opposed to the full web address, I get the same response, meaning that the root server just maintains a list of servers that are considered authoritative for the .com top level domain.  So we've now gone from the root server to the top level domain (TLD) servers for .com addresses.  I happen to know that these servers are operated by Verisign.  (For a complete list of operators for all Top Level Domains, check here.)  When you register a .com web site, you're essentially paying Verisign to add your domain to their servers or paying a third party like register.com or godaddy to do it for you.

Let's query one of these TLD servers for www.flexrentalsolutions.com and see what we get:
  QUESTIONS:
    www.flexrentalsolutions.com, type = A, class = IN
    ANSWERS:
    AUTHORITY RECORDS:
    ->  flexrentalsolutions.com
    nameserver = ns-756.awsdns-30.net.
    ttl = 172800
    ->  flexrentalsolutions.com
    nameserver = ns-421.awsdns-52.com.
    ttl = 172800
    ->  flexrentalsolutions.com
    nameserver = ns-1070.awsdns-05.org.
    ttl = 172800
    ->  flexrentalsolutions.com
    nameserver = ns-1830.awsdns-36.co.uk.
    ttl = 172800
    ADDITIONAL RECORDS:
    ->  ns-756.awsdns-30.net
    internet address = 205.251.194.244
    ttl = 172800
    ->  ns-421.awsdns-52.com
    internet address = 205.251.193.165
    ttl = 172800

We're getting warmer.  Versign's servers can't give us the final answer, but the get us one step closer: they return the DNS servers that are authoritative for our domain.  Let's query one of the servers listed above for SOA records and see what we get:


    QUESTIONS:
    www.flexrentalsolutions.com, type = A, class = IN
    ANSWERS:
    AUTHORITY RECORDS:
    ->  flexrentalsolutions.com
    nameserver = ns-756.awsdns-30.net.
    ttl = 172800
    ->  flexrentalsolutions.com
    nameserver = ns-421.awsdns-52.com.
    ttl = 172800
    ->  flexrentalsolutions.com
    nameserver = ns-1070.awsdns-05.org.
    ttl = 172800
    ->  flexrentalsolutions.com
    nameserver = ns-1830.awsdns-36.co.uk.
    ttl = 172800
    ADDITIONAL RECORDS:
    ->  ns-756.awsdns-30.net
    internet address = 205.251.194.244
    ttl = 172800
    ->  ns-421.awsdns-52.com
    internet address = 205.251.193.165
    ttl = 172800
This tells us a few important details.  First, it tells us that we've reached the authoritative DNS server with official information on the flexrentalsolutions.com domain, including some information related to caching.  More on that later.  Now that we know we've reached the authoritative DNS server for our domain, let's find the IP address for www.flexrentalsolutions.com. 

------------
    QUESTIONS:
    www.flexrentalsolutions.com, type = A, class = IN
    ANSWERS:
    ->  www.flexrentalsolutions.com
    internet address = 66.29.188.245
    ttl = 3600
------------
Name:    www.flexrentalsolutions.com
Address: 66.29.188.245

So, there's our answer: 66.29.188.245.  We made it in four moves, which is much simpler than randomly scanning DNS servers until we find one with information.  In reality, when I query the Comcast DNS server, it does all these steps automatically.  This is called a recursive DNS request.

Caching

In practice it's unlikely that a request would go all the way to the root DNS servers, or even the top level domain servers, because DNS servers cache as much information as they can.   They also frequently bulk transfer information for big domains (like top level domains), which renders on demand caching unnecessary.

As you scan the output in the examples above, you'll notice that each record contains something called TTL.  This stands for Time To Live and is the amount of time, in seconds, that this information should be considered valid without needing to recheck it.  If you look at the records returned by the root server, they each have a TTL of 172800, which is 48 hours.  When a DNS server checks this information, it will save a copy and refer to the copy for at least 48 hours before checking the root server for updated information.

By the same token the top level domain server in our example returned an SOA record for flexrentalsolutions.com with a TTL of 172800, also 48 hours.  What this tells me as the administrator of the flexrentalsolutions.com domain is that if I ever change DNS providers, I'll need to keep the old one up for at least two days before shutting the old one down.

Then we come to the final result, the actual IP address record we were after in the first place.  It comes with a TTL of 3600 seconds, or one hour.  This only gives non-authoritative DNS servers permission to cache the result for 1 hour.  We use a low value like this to give us the ability to quickly change servers if we need to.

DNS Tricks

For simple web sites like ours where there's only one server, returning the same IP address with a relatively long TTL like 3600 is fine.  For big companies like Google and Facebook however, who operate many thousands of servers in dozens of locations around the world, they might need more exotic techniques.

In these cases, big companies might return a different answer for each DNS request, perhaps routing requests from the United States to US data centers and European requests to European data centers.  They could use traffic management algorithms to route traffic to data centers that have more unused capacity.  They could use a simple round robin DNS technique to randomly distribute load across multiple servers and data centers.  For these situations, it's not uncommon to see a TTL of 300, (5 minutes).

As we roll out our High Availability architecture over the next year, you'll see our TTL's drop as well, though in our case it's for rapid failover in the event of a catastrophic data center event.


Hooks For SOPA Enforcement

Since DNS is required for web surfers to find web sites, we can't really live without it.  In order to wipe a .com web site off the internet within 48 hours, all you have to do is convince Verisign to remove it from each of their 13 TLD servers, say, with something like a preliminary injunction from a judge.  That's it.  In two days, unless users are using alternative DNS systems or DNS servers have have ignored the TTL setting, your web site is invisible, even if your authoritative domain server is still up.  (Full disclosure: there are other ways a blacklist could and probably would be implemented.)

Even more interesting is the fact that 13 root servers control all officially recognized domains.  The same technique could be used to blacklist an entire TLD, effectively shutting down all .com domains withing 48 hours.  Since these servers are all controlled by the United States Government, this gives them a convenient Internet kill switch.  If you see the TTL's on the root servers drop from 48 hours to something shorter, you can be reasonably sure they're thinking about doing it.


Alternative DNS Systems

Paranoia about the concentration of power in the hands of the US Government and the companies that have cut deals with ICANN has led to the development of alternative DNS systems, systems that have been developed expressly for the purpose of keeping blacklisted domains accessible.

After all, what keeps the authority of root DNS servers in tact is that fact that most Internet service providers stick to the 13 official root servers.  There's nothing to prevent a private party from mirroring the root servers or any of the TLD servers and running their own DNS.  There's nothing to prevent you from finding out about one of those servers and manually changing your system to use it instead of the one suggested by your ISP.  There are also browser plugins that can do the same thing - - and these are much easier for non-technical types to figure out.

And in the wake of the SOPA/PIPA scare, these services and tools are multiplying rapidly, like bacteria on an agar plate.

Chaos Theory

One of the benefits of the current system of DNS and domain name registration is that it's secure (enough).  People who register domains have to identify themselves (sort of) and control of which DNS servers are authoritative for all .com domains rests in the hands of 13 servers run by a big and legally liable company.  Right now it's a pretty open, accessible (meaning it's cheap to register domain names) and secure process.

What happens when government meddling compels hundreds of competing DNS systems to spring up and web users to switch over to them?  And once that happens, what's to prevent a private DNS operator from maliciously changing DNS entries for popular web sites?  Or to prevent a hacker from messing with a poorly protected DNS server?  Maybe you get a different wellsfargo.com from the private DNS server than from the official system.  Sound insecure?  It is.  It also makes registering a domain name much harder - because there are more servers that need to be updated.

For big companies that use DNS tricks for traffic management, it would also make their life much harder, because private DNS providers would not have access to the data used to drive the traffic management algorithms.  In fact, private DNS could be an effective way to intentionally overload and crash big web sites.

This is why companies like Google insist that SOPA will make the Internet less secure, and they're right.  It will create a DNS free for all.  The only way to prevent that is to remove the incentives that will push users to a competing system - and find non-DNS ways to execute enforcement actions against black hat web sites.  Otherwise, pirates and those who download their wares will just go around the official DNS system, and break the internet for the rest of us while they're at it.

No comments:

Post a Comment