bringing the unix philosophy to big data

Bringing the Unix Philosophy to Big Data

SVP, Engineering

[email protected]

Bryan Cantrill

@bcantrill

mailto:[email protected]


Unix

• When Unix appeared in the early 1970s, it was not just a new system, but a new way of thinking about systems

• Instead of a sealed monolith, the operating system was a collection of small, easily understood programs

• First Edition Unix (1971) contained many programs that we still use today (ls, rm, cat, mv)

• Its very name conveyed this minimalist aesthetic: Unix is a homophone of “eunuchs” — a castrated MulticsWe were a bit oppressed by the big system mentality. Ken wanted to do something simple. — Dennis Ritchie

Unix: Let there be light

• In 1969, Doug McIlroy had the idea of connecting different components:At the same time that Thompson and Ritchie were sketching out a file system, I was sketching out how to do data processing on the blackboard by connecting together cascades of processes

• This was the primordial pipe, but it took three years to persuade Thompson to adopt it:And one day I came up with a syntax for the shell that went along with the piping, and Ken said, “I’m going to do it!”

Unix: ...and there was light

And the next morning we had this orgy of one-liners. — Doug McIlroy

The Unix philosophy

• The pipe — coupled with the small-system aesthetic — gave rise to the Unix philosophy, as articulated by Doug McIlroy:

• Write programs that do one thing and do it well

• Write programs to work together

• Write programs that handle text streams, because that is a universal interface

• Four decades later, this philosophy remains the single most important revolution in software systems thinking!

• In 1986, Jon Bentley posed the challenge that became the Epic Rap Battle of computer science history:Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.

• Don Knuth’s solution: an elaborate program in WEB, a Pascal-like literate programming system of his own invention, using a purpose-built algorithm

• Doug McIlroy’s solution shows the power of the Unix philosophy:

tr -cs A-Za-z '\n' | tr A-Z a-z | \ sort | uniq -c | sort -rn | sed ${1}q

Doug McIlroy v. Don Knuth: FIGHT!

Big Data: History repeats itself?

• The original Google MapReduce paper (Dean et al., OSDI ’04) poses a problem disturbingly similar to Bentley’s challenge nearly two decades prior:Count of URL Access Frequency: The function processes logs of web page requests and outputs ⟨URL, 1⟩. The reduce function adds together all values for the same URL and emits a ⟨URL, total count⟩ pair

• But the solutions do not adhere to the Unix philosophy...

• ...and nor do they make use of the substantial Unix foundation for data processing

• e.g., Appendix A of the OSDI ’04 paper has a 71 line word count in C++ — with nary a wc in sight

Big Data: Challenges

• Must be able to scale storage to allow for “big data” — quantities of data that dwarf a single machine

• Must allow for massively parallel execution

• Must allow for multi-tenancy

• To make use of both the Unix philosophy and its toolset, must be able to virtualize the operating system

Scaling storage

• There are essentially three protocols for scalable storage: block, file and object

• Block (i.e., a SAN) is far too low an abstraction — and notoriously expensive to scale

• File (i.e., NAS) is too permissive an abstraction — it implies a coherent store for arbitrary (partial) writes, trying (and failing) to be both C and A in CAP

• Object (e.g., S3) is similar “enough” to a file-based abstraction, but by not allowing partial writes, allows for proper CAP tradeoffs

Object storage

• Object storage systems do not allow for partial updates

• For both durability and availability, objects are generally erasure encoded across spindles on different nodes

• A different approach is to have a highly reliable local file system that erasure encodes across local spindles — with entire objects duplicated across nodes for availability

• ZFS pioneered both reliability and efficiency of this model with RAID-Z — and has refined it over the past decade of production use

• ZFS is one of the four foundational technologies in Joyent’s open source SmartOS

Virtualizing the operating system?

• Historically — since the 1960s — systems have been virtualized at the level of hardware

• Hardware virtualization has its advantages, but it’s heavyweight: operating systems are not designed to share resources like DRAM, CPU, I/O devices, etc.

• One can instead virtualize at the level of the operating system: a single OS kernel that creates lightweight containers — on the metal, but securely partitioned

• Pioneered by BSD’s jails; taken to a logical extreme by zones found in Joyent’s SmartOS

• Can we combine the efficiency and reliability of ZFS with the abstraction provided by zones to develop an object store that has compute as a first-class citizen?

• ZFS rollback allows for zones to be trashed — simply rollback the zone after compute completes on an object

• Add a job scheduling system that allows for both map and reduce phases of distributed work

• Would allow for the Unix toolset to be used on arbitrary large amounts of data — unlocking big data one-liners

• If it perhaps seems obvious now, it wasn’t at the time...

Idea: ZFS + Zones?

Idea: ZFS + Zones?

• Building a sophisticated distributed system on top of ZFS and zones, we have built Manta, an internet-facing object storage system offering in situ compute

• That is, the description of compute can be brought to where objects reside instead of having to backhaul objects to transient compute

• The abstractions made available for computation are anything that can run on the OS...

• ...and as a reminder, the OS — Unix — was built around the notion of ad hoc unstructured data processing, and allows for remarkably terse expressions of computation

Manta: ZFS + Zones!

• Manta allows for an arbitrarily scalable variant of McIlroy’s solution to Bentley’s challenge: mfind -t o /bcantrill/public/v7/usr/man | \ mjob create -o -m "tr -cs A-Za-z '\n' | \ tr A-Z a-z | sort | uniq -c" -r \ "awk '{ x[\$2] += \$1 } END { for (w in x) { print x[w] \" \" w } }' | \ sort -rn | sed ${1}q"

• This description not only terse, it is high performing: data is left at rest — with the “map” phase doing heavy reduction of the data stream

• As such, Manta — like Unix — is not merely syntactic sugar; it converges compute and data in a new way

Manta: Unix for Big Data

• Eventual consistency represents the wrong CAP tradeoffs for most; we prefer consistency over availability for writes (but still availability for reads)

• Many more details:http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/

• Celebrity endorsement:

Manta: CAP tradeoffs

http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/

http://dtrace.org/blogs/dap/2013/07/03/fault-tolerance-in-manta/

• Hierarchical storage is an excellent idea (ht: Multics); Manta implements proper directories, delimited with a forward slash

• Manta implements a snapshot/link hybrid dubbed a snaplink; can be used to effect versioning

• Manta has full support for CORS headers

• Manta uses SSH-based HTTP auth for client-side tooling (IETF draft-cavage-http-signatures-00)

• Manta SDKs exist for node.js, Java, Ruby, Python

• “npm install manta” for command line interface

Manta: Other design principles

• We believe compute/data convergence to be the future of big data: stores of record must support computation as a first-class, in situ operation

• We believe that Unix is a natural way of expressing this computation — and that the OS is the right level at which to virtualize to support this securely

• We believe that ZFS is the only sane storage substrate underpinning for such a system

• Manta will surely not be the only system to represent the confluence of these — but it is the first

• We are actively retooling our software stack in terms of Manta — Manta is changing the way we develop software!

Manta and the future of big data

• Product page:

http://joyent.com/products/manta

• node.js module:

https://github.com/joyent/node-manta

• Manta documentation:

http://apidocs.joyent.com/manta/

• IRC, e-mail, Twitter, etc.:

#manta on freenode, [email protected], @mcavage, @dapsays, @yunongx, @joyent

• Here’s to the orgy of big data one-liners!

Manta: More information









bringing the unix philosophy to big data

Technology

unix toolset

unixwhen unix

big data quantities

data processing

unix philosophythe pipe

substantial unix foundation

programsfirst edition

big system mentality