cassandra-powered distributed dns

Highly Available DNS and Request Routing Using
Apache Cassandra

A Real-World Introduction to Cassandra's Data Structures+ Python's pyCassa module

David Strauss / Founder + CTO / Pantheon Systems

Why another DNS server?

DNS servers either have no replication or require writing to a defined master server.Exceptions (Active Directory and ApacheDS) require backing DNS with a heavyweight and annoying directory service like LDAP.

Critical DNS services need replication...to withstand DDoS attacks

...to maintain uptime when major regional links fail

The zone file formats in use are awful.

Maintaining data persistence and replication should not be the DNS server's problem.

Why Cassandra?

Easy cluster setup and management

Built-in replication and high availability

Multi-master: writers don't need to understand the replication topology.

Data model similarity to DNS

Eventual consistency isn't a problem

Not a perfect match, though:Write scalability is overkill

High memory requirements

Demo break

Let's set up a basic, three-node
Cassandra cluster...

Creating the data model

Think in terms of a nested dictionary.

Design for eventual consistency.Columns are the units (atoms) of replication.

Some columns may be replicated before others.

Column names are unique in each row or SuperC.

When possible, dissect objects into columns, keeping in mind that Cassandra may replicate those columns in any order.

Design for common read/write patterns.The ability to arbitrarily query is limited.

My initial data model

I started with normal Column Family:

names (Column Family)Key: fully qualified domain name (FQDN)

ColumnsName: Record type (A, AAAA, MX, )

Value: All data (addresses, TTL, etc.) as JSON

Efficient for lookups for a type or ANY

But: All records of one type must be replaced at once. Cassandra keeps latest column written.Can't rely on reading, modifying, then writing

Data Model

Then, I dissected records into sub-columns:

names (Super Column Family)Key: fully qualified domain name (FQDN)

Super ColumnsName: Record Type (A, AAAA, MX, )

Sub-ColumnsName: Data (e.g. IP address)

Value: Metadata as JSON (TTL, preference)

Still efficient for lookups for a type or ANY

Using data as sub-column name results in keeping the latest metadata for any record.

Visualizing as a dictionary

{test.example.com: {A: {192.168.0.1: {ttl: 86400}192.168.0.2: {ttl: 86400}}MX: {mail.example.com:{preference: 10, ttl: 86400}}}}

Key

Super Column Name

Super Column Name

Sub-Column Names

Stored in Cassandra as
a JSON-encoded sub-column value.

Sub-Column Name

Sub-Column Values

Structuring the application

cassandranames.py + CassandraNamesDNS-centric Python API wrapping Cassandra

cassandranames-import.pyShell-based import tool for BIND files

cassandranames-test.pyPython unit test to exercise the persistence

cassandradns.py + CassandraNamesResolverTwisted-based DNS server using CassandraNames

Want to follow along with code?

Setup directions:
https://wiki.getpantheon.com/display/CONF/ Cassandra+DNS+server+setup

Code on GitHub:
https://github.com/pantheon-systems/cassandra-dns/

Demo break

Let's clone the code down to two boxes on our demo cluster and run the test suite...

Schema setup

def install_schema(drop_first=False, rf=3): keyspace_name = "dns" sm = pycassa.system_manager .SystemManager("127.0.0.1:9160")

[snip the drop_first implementation]

sm.create_keyspace(keyspace_name, replication_factor=rf) sm.create_column_family(keyspace_name, "names", super=True, key_validation_class= pycassa.system_manager.UTF8_TYPE, comparator_type= pycassa.system_manager.UTF8_TYPE, default_validation_class= pycassa.system_manager.UTF8_TYPE)

The CassandraNames class

class CassandraNames: def __init__(self): self.pool = pycassa.connect("dns")

[rest on upcoming slides]

Adding new records

def insert(self, fqdn, type, data, ttl=900, preference=None): # Connect to the ColumnFamily cf = pycassa.ColumnFamily(self.pool, "names") # Start the metadata with just a TTL metadata = {"ttl": int(ttl)} # Add in a preference if requested. if preference is not None: metadata["preference"] = int(preference) # Actually perform the insertion. cf.insert(fqdn, {str(type): {data: json.dumps(metadata)}})

Reading records

def lookup(self, fqdn, type=ANY): cf = pycassa.ColumnFamily(self.pool, "names") try: columns = {} if type == ANY: # Pull all types of records. columns = dict(cf.get(fqdn)) else: # Pull only one type of record. columns = {str(type): dict(cf.get(fqdn, super_column=str(type)))}

# Convert the JSON metadata into valid Python data. [snip] return decoded_columns except pycassa.cassandra.ttypes.NotFoundException: # If no records exist for the FQDN or type, # fail gracefully. pass return {}

Deleting records

def remove(self, fqdn, type=ANY, data=None): cf = pycassa.ColumnFamily(self.pool, "names") if type == ANY: # Delete all records for the FQDN. cf.remove(fqdn) elif data is None: # Delete all records of a certain type from the FQDN. cf.remove(fqdn, super_column=str(type)) else: # Delete all records for a certain type and data. cf.remove(fqdn, super_column=str(type), columns=[data])

Making it actually serve DNS

class CassandraNamesResolver(common.ResolverBase): implements(interfaces.IResolver) def __init__(self): self.names = cassandranames.CassandraNames() common.ResolverBase.__init__(self)

def _lookup(self, name, cls, type, timeout): log.msg(Type %s records for name: %s" % (type, name)) all_types = self.names.lookup(name, type)

results = [] authority = [] additional = []

[continued on next slide]

Python's Twisted includes a complete DNS server implementation
with a pluggable resolver base (IResolver and common.ResolverBase).

Making it actually serve DNS

def _lookup(self, name, cls, type, timeout): [function started on previous slide] for type, records in all_types.items(): for data, metadata in records.items(): if type == A: payload = dns.Record_A(data) elif type == MX: payload = dns.Record_MX( metadata["preference"], data) elif type == NS: payload = dns.Record_NS(data) header = dns.RRHeader(name, type=type, payload=payload, ttl=metadata["ttl"], auth=True) results.append(header)

return defer.succeed((results, authority, additional))

Demo break

Let's actually play with the cluster:Query the records left around by the test suite

Use the Python shell to manage records

Import a BIND zone file on one server

Query the imported records on a different server

Next steps

Properly firewall the clusterCassandra needs port 7000 for replication with other cluster servers.

Port 53 needs to be open for DNS requests.

Accelerate DNS by fronting each server
with a djbdns cache

Finish the CNAME implementation
(and other record types)

Consider a non-blocking library, like txCQL

GeoDNS using a Python GeoIP library

Conclusion

Questions?

Questions for later?I'm David Strauss (@davidstrauss)

Setup directions:
https://wiki.getpantheon.com/display/CONF/ Cassandra+DNS+server+setup

Code on GitHub:
https://github.com/pantheon-systems/cassandra-dns/

Pantheon Systems is hiring engineers and developers in the San Francisco Bay Area

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso

cassandra-powered distributed dns

Technology