hydroelectric dam - carnegie mellon university
TRANSCRIPT
FAWNA Fast Array of Wimpy Nodes
!"#$%&'(%)*+)(,&-"+.(&/*"(01$(,&2$34")1&5"6$(+078,
'6"*&94"($+4"7)),&:";*)(3)&<"(,&!"#$%&!$'()*+$,
="*()>$)&2)11.(&?($#)*+$@7&&&&&&&&8A(@)1&:"B+&9$C+BD*>4
Energy in computing
! 9.;)*&$+&"&+$>($E3"(@&BD*%)(&.(&3.6FDG(>
! HI7)"*&<=J&+..(&@.&B)&%.6$("@)%&B7&F.;)*
2
Hydroelectric Dam
Can we reduce energy use by a factor of ten?
-.//&'*0+*&12*&'$3*&4506/5$)'&
7+5")&",80*$'",9&8$:"1$/&85'13
FAWN/"+@&'**"7&.K&L$6F7&M.%)+
A6F*.#)&3.6FD@"G.("1&)N3$)(37&.K&
%"@"I$(@)(+$#)&3.6FDG(>&D+$(>&"(&"**"7&
.K&;)11IB"1"(3)%&1.;IF.;)*&+7+@)6+O
Understanding Data-Intensive Workloads on FAWN
Iulian Moraru, Lawrence Tan, Vijay Vasudevan15-849 Low Power Project
Abstract
In this paper, we explore the use of the Fast Arrayof Wimpy Nodes (FAWN) architecture for a wideclass of data-intensive workloads. While the bene-fits of FAWN are highest for I/O-bound workloadswhere the CPU cycles of traditional machines areoften wasted, we find that CPU-bound workloadsrun on FAWN can be up to six times more efficientin work done per Joule of energy than traditionalmachines.
1 Introduction
Power has become a dominating factor in the costof provisioning and operation of large datacenters.This work focuses on one promising approach toreduce both average and peak power using a FastArray of Wimpy Node (FAWN) architecture [2],which proposes using a large cluster of low-powernodes instead of a cluster of traditional, high powernodes. FAWN (Figure 1) was originally designed totarget mostly I/O-bound workloads, where the ad-ditional processing capabilities of high-speed pro-cessors were often wasted. While the FAWN ar-chitecture has been shown to be significantly moreenergy efficient than traditional architectures forseek-bound workloads [16], an open question iswhether this architecture is well-suited for otherdata-intensive workloads common in cluster-basedcomputing.
Recent work has shown that the FAWN archi-tecture benefits from fundamental trends in com-puting and power—running at a lower speed savesenergy, while the low-power processors used inFAWN are significantly more efficient in work doneper joule [16]. Combined with the inherent paral-lelism afforded by popular computing frameworks
!"#$
%&'
()*
()* %&' +,-#.
()* %&' +,-#.
()* %&' +,-#.
()* %&' +,-#.
()*
()* ()*()* %&' +,-#.
()* %&' +,-#.
()* %&' +,-#.
()* %&' +,-#.
/001 201
34-5"6"78-,
9&4:&4+;1<
Figure 1: FAWN architecture
such as Hadoop [1] and Dryad [9], a FAWN sys-tem’s improvement in both CPU-I/O balance andinstruction efficiency allows for an increase in over-all energy efficiency for a much larger class of dat-acenter workloads.
In this work, we present a taxonomy and analysisof some primitive data-intensive workloads and op-erations to understand when FAWN can perform aswell as a traditional cluster architecture and reduceenergy consumption for data centers.
To understand where FAWN improves energyefficiency for data-intensive computing, we in-vestigate a wide-range of benchmarks commonin frameworks such as Hadoop [1], finding thatFAWN is between three to ten times more efficientthan a traditional machine in performing operationssuch as distributed grep and sort. For more CPU-bound operations, such as encryption and compres-sion, FAWN architectures are still between three tosix times more energy efficient. We believe thesetwo categories of workloads—CPU-bound and I/O-bound—encompass a large enough range of com-
1
'2!&P).%)
QRS2T&!U'2
VPT&=.6F"3@/1"+4
4
Goal: reduce peak power
5
<*"%$G.("1&!"@"3)(@)*
9.;)*
=..1$(>
!$+@*$BDG.(QWX
QWX&)()*>7&1.++
Y>..%Z
{[WWWL
\RWL
100%
FAWN
[WWL
][WWL
^)*#)*+
Overview
! T"30>*.D(%
! ;7<=&>0",8":/*'
! /'LMI5_&!)+$>(
! `#"1D"G.(
!2.*)&L.*01."%+
! =.(31D+$.(
6
Towards balanced systems
7
[`IW[
[`aWW
[`aW[
[`aWQ
[`aWH
[`aWV
[`aWR
[`aWS
[`aW\
[`aWb
[cbW [cbR [ccW [ccR QWWW QWWR
M"(.+)3.(%+
d)"*
?"'6&-**6
@>A&@%8/*
L"+@)%&
*)+.D*3)+
U)B"1"(3$(>&JFG.(+
<.%"7e+&=9?+
'**"7&.K
/"+@)+@&!$+0+
^1.;)*&=9?+
/"+@&^@.*">)
^1.;&=9?+
<.%"7e+&!$+0+
?B7C&788*''
^F))%&#+O&`N3$)(37/"+@)+@&F*.3)++.*+
)f4$B$@&+DF)*1$()"*
F.;)*&D+">)
/$f)%&F.;)*&3.+@+&3"(
%.6$("@)&)N3$)(37
K.*&+1.;&F*.3)++.*+
/'LM&@"*>)@+&+;))@&+F.@
$(&+7+@)6&)N3$)(37&;4)(&
$(31D%$(>&Ef)%&3.+@+
8
YA(31D%)+&WO[L&F.;)*&.#)*4)"%Z
Targeting the sweet-spot in efficiency
9
<.%"7e+&=9?
'**"7&.K
/"+@)+@&!$+0+
^1.;)*&=9?
/"+@&^@.*">)^1.;&=9?
<.%"7e+&!$+0
FAWN
2.*)&)N3$)(@
Targeting the sweet-spot in efficiency Overview! T"30>*.D(%
! /'LM&9*$(3$F1)+
! ;7<=DE!&?*'"9,
! '*34$@)3@D*)
! =.(+@*"$(@+
! `#"1D"G.(
!2.*)&L.*01."%+
! =.(31D+$.(
10
Data-intensive key value
! =*$G3"1&$(K*"+@*D3@D*)&+)*#$3)
! ^)*#$3)&1)#)1&">*))6)(@+&K.*&F)*K.*6"(3)g1"@)(37
! U"(%.6I"33)++,&*)"%I6.+@17,&4"*%&@.&3"34)
11
FAWN-KV: Key Value System
! `()*>7I)N3$)(@&31D+@)*&0)7I#"1D)&+@.*)
! P."1h&$6F*.#)&F(*0"*'GH5(/*
! 9*.@.@7F)h&'1$fH3Q&(.%)+&;$@4&i"+4&+@.*">)
! RWW2jk&=9?,&QRS2T&!U'2,&VPT&=.6F"3@/1"+4
12
FAWN-KV: Key Value System
• Energy-efficient cluster key-value store
• Goal: improve Queries/Joule
• Prototype: Alix3c2 nodes with flash storage
• 500MHz CPU, 256MB DRAM, 4GB CompactFlash
13
?($lD)&=4"11)(>)+h
!&IJ8"*,1&$,)&K$'1&K$"/5+*0
!&<"3:%&@>A'L&/"3"1*)&?B7C
!&;/$'2&:550&$1&'3$//&0$,)53&40"1*'
FAWN-KV architecture
14
/*.(@I)(% X
T"30I)(%
T"30I)(%
T"30I)(%
T"30I)(%
T"30I)(%5_&U$(>
Consistent hashing
FAWN Back-endFAWN-DS
Front-end
Front-end
Switch
Requests
Responses
E2A1
B1
D1
E1
F1D2
A2
F2
B2
2"(">)+&T"30)(%+
'3@+&"+&P"@);"7
U.D@)+&U)lD)+@+
FAWN-KV architecture
15
Front-end X
Back-end
Back-end
Back-end
Back-end
Back-end
FAWN Back-endFAWN-DS
Front-end
Front-end
Switch
Requests
Responses
E2A1
B1
D1
E1
F1D2
A2
F2
B2
'#.$%&*"(%.6&;*$@)+
:$6$@)%&U)+.D*3)+
/'LMI!^ /'LMI5_
`N3$)(@&/"$1.#)*
'#.$%&*"(%.6&;*$@)+
From key to value
16
Data Log In-memoryHash Index
Log Entry
KeyFrag Valid Offset
160-bit Key
KeyFrag
Key Len Data
Inserted valuesare appended
Scan and Split
ConcurrentInserts
Datastore List Datastore ListData in new rangeData in original range Atomic Update
of Datastore List
(a) (b) (c)
Figure 2: (a) FAWN-DS appends writes to the end of the Data Log. (b) Split requires a sequential scan of the data region, transfer-
ring out-of-range entries to the new store. (c) After scan is complete, the datastore list is atomically updated to add the new store.
Compaction of the original store will clean up out-of-range entries.
3.2 Understanding Flash Storage
Flash provides a non-volatile memory store with several signifi-cant benefits over typical magnetic hard disks for random-access,read-intensive workloads—but it also introduces several challenges.
Three characteristics of flash underlie the design of the FAWN-KVsystem described throughout this section:
1. Fast random reads: (� 1 ms), up to 175 times faster thanrandom reads on magnetic disk [35, 40].
2. Efficient I/O: Flash devices consume less than one Watt evenunder heavy load, whereas mechanical disks can consume over10 W at load. Flash is over two orders of magnitude moreefficient than mechanical disks in terms of queries/Joule.
3. Slow random writes: Small writes on flash are very expen-sive. Updating a single page requires first erasing an entireerase block (128 KB–256 KB) of pages, and then writing themodified block in its entirety. As a result, updating a single byteof data is as expensive as writing an entire block of pages [37].
Modern devices improve random write performance using write
buffering and preemptive block erasure. These techniques improveperformance for short bursts of writes, but recent studies show thatsustained random writes still perform poorly on these devices [40].
These performance problems motivate log-structured techniquesfor flash filesystems and data structures [36, 37, 23]. These sameconsiderations inform the design of FAWN’s node storage manage-ment system, described next.
3.3 The FAWN Data Store
FAWN-DS is a log-structured key-value store. Each store containsvalues for the key range associated with one virtual ID. It acts toclients like a disk-based hash table that supports Store, Lookup,and Delete.1
FAWN-DS is designed specifically to perform well on flash stor-age and to operate within the constrained DRAM available on wimpynodes: all writes to the datastore are sequential, and reads require asingle random access. To provide this property, FAWN-DS maintainsan in-DRAM hash table (Hash Index) that maps keys to an offset in
the append-only Data Log on flash (Figure 2a). This log-structureddesign is similar to several append-only filesystems [42, 15], which
avoid random seeks on magnetic disks for writes.
1We differentiate datastore from database to emphasize that we do not provide atransactional or relational interface.
/* KEY = 0x93df7317294b99e3e049, 16 index bits */INDEX = KEY & 0xffff; /* = 0xe049; */KEYFRAG = (KEY >> 16) & 0x7fff; /* = 0x19e3; */for i = 0 to NUM HASHES do
bucket = hash[i](INDEX);if bucket.valid && bucket.keyfrag==KEYFRAG &&
readKey(bucket.offset)==KEY then
return bucket;end if
{Check next chain element...}end for
return NOT FOUND;
Figure 3: Pseudocode for hash bucket lookup in FAWN-DS.
Mapping a Key to a Value. FAWN-DS uses an in-memory(DRAM) Hash Index to map 160-bit keys to a value stored in theData Log. It stores only a fragment of the actual key in memory to
find a location in the log; it then reads the full key (and the value)from the log and verifies that the key it read was, in fact, the correctkey. This design trades a small and configurable chance of requiringtwo reads from flash (we set it to roughly 1 in 32,768 accesses) for
drastically reduced memory requirements (only six bytes of DRAMper key-value pair).
Figure 3 shows the pseudocode that implements this design forLookup. FAWN-DS extracts two fields from the 160-bit key: the ilow order bits of the key (the index bits) and the next 15 low orderbits (the key fragment). FAWN-DS uses the index bits to select abucket from the Hash Index, which contains 2i
hash buckets. Each
bucket is only six bytes: a 15-bit key fragment, a valid bit, and a4-byte pointer to the location in the Data Log where the full entry isstored.
Lookup proceeds, then, by locating a bucket using the index bitsand comparing the key against the key fragment. If the fragmentsdo not match, FAWN-DS uses hash chaining to continue searchingthe hash table. Once it finds a matching key fragment, FAWN-DSreads the record off of the flash. If the stored full key in the on-flashrecord matches the desired lookup key, the operation is complete.Otherwise, FAWN-DS resumes its hash chaining search of the in-memory hash table and searches additional records. With the 15-bit
key fragment, only 1 in 32,768 retrievals from the flash will beincorrect and require fetching an additional record.
The constants involved (15 bits of key fragment, 4 bytes of logpointer) target the prototype FAWN nodes described in Section 4.
[SWIB$@&0)7
Data Log In-memoryHash Index
Log Entry
KeyFrag Valid Offset
160-bit Key
KeyFrag
Key Len Data
Inserted valuesare appended
Scan and Split
ConcurrentInserts
Datastore List Datastore ListData in new rangeData in original range Atomic Update
of Datastore List
(a) (b) (c)
Figure 2: (a) FAWN-DS appends writes to the end of the Data Log. (b) Split requires a sequential scan of the data region, transfer-
ring out-of-range entries to the new store. (c) After scan is complete, the datastore list is atomically updated to add the new store.
Compaction of the original store will clean up out-of-range entries.
3.2 Understanding Flash Storage
Flash provides a non-volatile memory store with several signifi-cant benefits over typical magnetic hard disks for random-access,read-intensive workloads—but it also introduces several challenges.
Three characteristics of flash underlie the design of the FAWN-KVsystem described throughout this section:
1. Fast random reads: (� 1 ms), up to 175 times faster thanrandom reads on magnetic disk [35, 40].
2. Efficient I/O: Flash devices consume less than one Watt evenunder heavy load, whereas mechanical disks can consume over10 W at load. Flash is over two orders of magnitude moreefficient than mechanical disks in terms of queries/Joule.
3. Slow random writes: Small writes on flash are very expen-sive. Updating a single page requires first erasing an entireerase block (128 KB–256 KB) of pages, and then writing themodified block in its entirety. As a result, updating a single byteof data is as expensive as writing an entire block of pages [37].
Modern devices improve random write performance using write
buffering and preemptive block erasure. These techniques improveperformance for short bursts of writes, but recent studies show thatsustained random writes still perform poorly on these devices [40].
These performance problems motivate log-structured techniquesfor flash filesystems and data structures [36, 37, 23]. These sameconsiderations inform the design of FAWN’s node storage manage-ment system, described next.
3.3 The FAWN Data Store
FAWN-DS is a log-structured key-value store. Each store containsvalues for the key range associated with one virtual ID. It acts toclients like a disk-based hash table that supports Store, Lookup,and Delete.1
FAWN-DS is designed specifically to perform well on flash stor-age and to operate within the constrained DRAM available on wimpynodes: all writes to the datastore are sequential, and reads require asingle random access. To provide this property, FAWN-DS maintainsan in-DRAM hash table (Hash Index) that maps keys to an offset in
the append-only Data Log on flash (Figure 2a). This log-structureddesign is similar to several append-only filesystems [42, 15], which
avoid random seeks on magnetic disks for writes.
1We differentiate datastore from database to emphasize that we do not provide atransactional or relational interface.
/* KEY = 0x93df7317294b99e3e049, 16 index bits */INDEX = KEY & 0xffff; /* = 0xe049; */KEYFRAG = (KEY >> 16) & 0x7fff; /* = 0x19e3; */for i = 0 to NUM HASHES do
bucket = hash[i](INDEX);if bucket.valid && bucket.keyfrag==KEYFRAG &&
readKey(bucket.offset)==KEY then
return bucket;end if
{Check next chain element...}end for
return NOT FOUND;
Figure 3: Pseudocode for hash bucket lookup in FAWN-DS.
Mapping a Key to a Value. FAWN-DS uses an in-memory(DRAM) Hash Index to map 160-bit keys to a value stored in theData Log. It stores only a fragment of the actual key in memory to
find a location in the log; it then reads the full key (and the value)from the log and verifies that the key it read was, in fact, the correctkey. This design trades a small and configurable chance of requiringtwo reads from flash (we set it to roughly 1 in 32,768 accesses) for
drastically reduced memory requirements (only six bytes of DRAMper key-value pair).
Figure 3 shows the pseudocode that implements this design forLookup. FAWN-DS extracts two fields from the 160-bit key: the ilow order bits of the key (the index bits) and the next 15 low orderbits (the key fragment). FAWN-DS uses the index bits to select abucket from the Hash Index, which contains 2i
hash buckets. Each
bucket is only six bytes: a 15-bit key fragment, a valid bit, and a4-byte pointer to the location in the Data Log where the full entry isstored.
Lookup proceeds, then, by locating a bucket using the index bitsand comparing the key against the key fragment. If the fragmentsdo not match, FAWN-DS uses hash chaining to continue searchingthe hash table. Once it finds a matching key fragment, FAWN-DSreads the record off of the flash. If the stored full key in the on-flashrecord matches the desired lookup key, the operation is complete.Otherwise, FAWN-DS resumes its hash chaining search of the in-memory hash table and searches additional records. With the 15-bit
key fragment, only 1 in 32,768 retrievals from the flash will beincorrect and require fetching an additional record.
The constants involved (15 bits of key fragment, 4 bytes of logpointer) target the prototype FAWN nodes described in Section 4.
5)7/*">&&&&_"1$%&&&&&&&&&O
Data Log In-memoryHash Index
Log Entry
KeyFrag Valid Offset
160-bit Key
KeyFrag
Key Len Data
Inserted valuesare appended
Scan and Split
ConcurrentInserts
Datastore List Datastore ListData in new rangeData in original range Atomic Update
of Datastore List
(a) (b) (c)
Figure 2: (a) FAWN-DS appends writes to the end of the Data Log. (b) Split requires a sequential scan of the data region, transfer-
ring out-of-range entries to the new store. (c) After scan is complete, the datastore list is atomically updated to add the new store.
Compaction of the original store will clean up out-of-range entries.
3.2 Understanding Flash Storage
Flash provides a non-volatile memory store with several signifi-cant benefits over typical magnetic hard disks for random-access,read-intensive workloads—but it also introduces several challenges.
Three characteristics of flash underlie the design of the FAWN-KVsystem described throughout this section:
1. Fast random reads: (� 1 ms), up to 175 times faster thanrandom reads on magnetic disk [35, 40].
2. Efficient I/O: Flash devices consume less than one Watt evenunder heavy load, whereas mechanical disks can consume over10 W at load. Flash is over two orders of magnitude moreefficient than mechanical disks in terms of queries/Joule.
3. Slow random writes: Small writes on flash are very expen-sive. Updating a single page requires first erasing an entireerase block (128 KB–256 KB) of pages, and then writing themodified block in its entirety. As a result, updating a single byteof data is as expensive as writing an entire block of pages [37].
Modern devices improve random write performance using write
buffering and preemptive block erasure. These techniques improveperformance for short bursts of writes, but recent studies show thatsustained random writes still perform poorly on these devices [40].
These performance problems motivate log-structured techniquesfor flash filesystems and data structures [36, 37, 23]. These sameconsiderations inform the design of FAWN’s node storage manage-ment system, described next.
3.3 The FAWN Data Store
FAWN-DS is a log-structured key-value store. Each store containsvalues for the key range associated with one virtual ID. It acts toclients like a disk-based hash table that supports Store, Lookup,and Delete.1
FAWN-DS is designed specifically to perform well on flash stor-age and to operate within the constrained DRAM available on wimpynodes: all writes to the datastore are sequential, and reads require asingle random access. To provide this property, FAWN-DS maintainsan in-DRAM hash table (Hash Index) that maps keys to an offset in
the append-only Data Log on flash (Figure 2a). This log-structureddesign is similar to several append-only filesystems [42, 15], which
avoid random seeks on magnetic disks for writes.
1We differentiate datastore from database to emphasize that we do not provide atransactional or relational interface.
/* KEY = 0x93df7317294b99e3e049, 16 index bits */INDEX = KEY & 0xffff; /* = 0xe049; */KEYFRAG = (KEY >> 16) & 0x7fff; /* = 0x19e3; */for i = 0 to NUM HASHES do
bucket = hash[i](INDEX);if bucket.valid && bucket.keyfrag==KEYFRAG &&
readKey(bucket.offset)==KEY then
return bucket;end if
{Check next chain element...}end for
return NOT FOUND;
Figure 3: Pseudocode for hash bucket lookup in FAWN-DS.
Mapping a Key to a Value. FAWN-DS uses an in-memory(DRAM) Hash Index to map 160-bit keys to a value stored in theData Log. It stores only a fragment of the actual key in memory to
find a location in the log; it then reads the full key (and the value)from the log and verifies that the key it read was, in fact, the correctkey. This design trades a small and configurable chance of requiringtwo reads from flash (we set it to roughly 1 in 32,768 accesses) for
drastically reduced memory requirements (only six bytes of DRAMper key-value pair).
Figure 3 shows the pseudocode that implements this design forLookup. FAWN-DS extracts two fields from the 160-bit key: the ilow order bits of the key (the index bits) and the next 15 low orderbits (the key fragment). FAWN-DS uses the index bits to select abucket from the Hash Index, which contains 2i
hash buckets. Each
bucket is only six bytes: a 15-bit key fragment, a valid bit, and a4-byte pointer to the location in the Data Log where the full entry isstored.
Lookup proceeds, then, by locating a bucket using the index bitsand comparing the key against the key fragment. If the fragmentsdo not match, FAWN-DS uses hash chaining to continue searchingthe hash table. Once it finds a matching key fragment, FAWN-DSreads the record off of the flash. If the stored full key in the on-flashrecord matches the desired lookup key, the operation is complete.Otherwise, FAWN-DS resumes its hash chaining search of the in-memory hash table and searches additional records. With the 15-bit
key fragment, only 1 in 32,768 retrievals from the flash will beincorrect and require fetching an additional record.
The constants involved (15 bits of key fragment, 4 bytes of logpointer) target the prototype FAWN nodes described in Section 4.
Data Log In-memoryHash Index
Log Entry
KeyFrag Valid Offset
160-bit Key
KeyFrag
Key Len Data
Inserted valuesare appended
Scan and Split
ConcurrentInserts
Datastore List Datastore ListData in new rangeData in original range Atomic Update
of Datastore List
(a) (b) (c)
Figure 2: (a) FAWN-DS appends writes to the end of the Data Log. (b) Split requires a sequential scan of the data region, transfer-
ring out-of-range entries to the new store. (c) After scan is complete, the datastore list is atomically updated to add the new store.
Compaction of the original store will clean up out-of-range entries.
3.2 Understanding Flash Storage
Flash provides a non-volatile memory store with several signifi-cant benefits over typical magnetic hard disks for random-access,read-intensive workloads—but it also introduces several challenges.
Three characteristics of flash underlie the design of the FAWN-KVsystem described throughout this section:
1. Fast random reads: (� 1 ms), up to 175 times faster thanrandom reads on magnetic disk [35, 40].
2. Efficient I/O: Flash devices consume less than one Watt evenunder heavy load, whereas mechanical disks can consume over10 W at load. Flash is over two orders of magnitude moreefficient than mechanical disks in terms of queries/Joule.
3. Slow random writes: Small writes on flash are very expen-sive. Updating a single page requires first erasing an entireerase block (128 KB–256 KB) of pages, and then writing themodified block in its entirety. As a result, updating a single byteof data is as expensive as writing an entire block of pages [37].
Modern devices improve random write performance using write
buffering and preemptive block erasure. These techniques improveperformance for short bursts of writes, but recent studies show thatsustained random writes still perform poorly on these devices [40].
These performance problems motivate log-structured techniquesfor flash filesystems and data structures [36, 37, 23]. These sameconsiderations inform the design of FAWN’s node storage manage-ment system, described next.
3.3 The FAWN Data Store
FAWN-DS is a log-structured key-value store. Each store containsvalues for the key range associated with one virtual ID. It acts toclients like a disk-based hash table that supports Store, Lookup,and Delete.1
FAWN-DS is designed specifically to perform well on flash stor-age and to operate within the constrained DRAM available on wimpynodes: all writes to the datastore are sequential, and reads require asingle random access. To provide this property, FAWN-DS maintainsan in-DRAM hash table (Hash Index) that maps keys to an offset in
the append-only Data Log on flash (Figure 2a). This log-structureddesign is similar to several append-only filesystems [42, 15], which
avoid random seeks on magnetic disks for writes.
1We differentiate datastore from database to emphasize that we do not provide atransactional or relational interface.
/* KEY = 0x93df7317294b99e3e049, 16 index bits */INDEX = KEY & 0xffff; /* = 0xe049; */KEYFRAG = (KEY >> 16) & 0x7fff; /* = 0x19e3; */for i = 0 to NUM HASHES do
bucket = hash[i](INDEX);if bucket.valid && bucket.keyfrag==KEYFRAG &&
readKey(bucket.offset)==KEY then
return bucket;end if
{Check next chain element...}end for
return NOT FOUND;
Figure 3: Pseudocode for hash bucket lookup in FAWN-DS.
Mapping a Key to a Value. FAWN-DS uses an in-memory(DRAM) Hash Index to map 160-bit keys to a value stored in theData Log. It stores only a fragment of the actual key in memory to
find a location in the log; it then reads the full key (and the value)from the log and verifies that the key it read was, in fact, the correctkey. This design trades a small and configurable chance of requiringtwo reads from flash (we set it to roughly 1 in 32,768 accesses) for
drastically reduced memory requirements (only six bytes of DRAMper key-value pair).
Figure 3 shows the pseudocode that implements this design forLookup. FAWN-DS extracts two fields from the 160-bit key: the ilow order bits of the key (the index bits) and the next 15 low orderbits (the key fragment). FAWN-DS uses the index bits to select abucket from the Hash Index, which contains 2i
hash buckets. Each
bucket is only six bytes: a 15-bit key fragment, a valid bit, and a4-byte pointer to the location in the Data Log where the full entry isstored.
Lookup proceeds, then, by locating a bucket using the index bitsand comparing the key against the key fragment. If the fragmentsdo not match, FAWN-DS uses hash chaining to continue searchingthe hash table. Once it finds a matching key fragment, FAWN-DSreads the record off of the flash. If the stored full key in the on-flashrecord matches the desired lookup key, the operation is complete.Otherwise, FAWN-DS resumes its hash chaining search of the in-memory hash table and searches additional records. With the 15-bit
key fragment, only 1 in 32,768 retrievals from the flash will beincorrect and require fetching an additional record.
The constants involved (15 bits of key fragment, 4 bytes of logpointer) target the prototype FAWN nodes described in Section 4.
!{
[Q&B7@)+&F)*&)(@*7
'#.$%&*"(%.6&;*$@)+
:$6$@)%&U)+.D*3)+
/'LMI!^ /'LMI5_
`N3$)(@&/"$1.#)*
'#.$%&*"(%.6&;*$@)+
5)7/*">&mn&5)7
9.@)(G"1&3.11$+$.(+m
&:.;&F*.B"B$1$@7&.K
6D1GF1)&/1"+4&*)"%+
j"+4@"B1) !"@"&*)>$.(
Log-structured datastore
! :.>I+@*D3@D*$(>&"#.$%+&+6"11&*"(%.6&;*$@)+
17
P)@
9D@
!)1)@)
U"(%.6&U)"%
'FF)(%
!
/'LMI5_
`N3$)(@&/"$1.#)*
'#.$%&*"(%.6&;*$@)+
!'#.$%&*"(%.6&;*$@)+
:$6$@)%&U)+.D*3)+
/'LMI!^
Yj,To
j"+4&A(%)f&&&&&&_"1D)+&
H
G
F
DC
B
A
On a node addition
18
M.%)&"%%$G.(+,&K"$1D*)+&*)lD$*)&@*"(+K)*&.K&0)7I*"(>)+
Nodes stream data range
19
Data Log In-memoryHash Index
Log Entry
KeyFrag Valid Offset
160-bit Key
KeyFrag
Key Len Data
Inserted valuesare appended
Scan and Split
ConcurrentInserts
Datastore List Datastore ListData in new rangeData in original range Atomic Update
of Datastore List
B
A !&&M$86905(,)&5:*0$.5,'&'*N(*,.$/!&&@5,.,(*&15&3**1&-O7
^@*)"6&K*.6&T&@.&'
=.(3D**)(@&A(+)*@+,
2$($6$k)+&1.30$(>
=.6F"3@&!"@"+@.*)
!
/'LMI5_
`N3$)(@&/"$1.#)*
'#.$%&*"(%.6&;*$@)+
!!!
'#.$%&*"(%.6&;*$@)+
:$6$@)%&U)+.D*3)+
/'LMI!^
FAWN-KV take-aways
! :.>I+@*D3@D*)%&%"@"+@.*)&
! '#.$%+&*"(%.6&;*$@)+&"@&"11&1)#)1+
! 2$($6$k)+&1.30$(>&%D*$(>&K"$1.#)*
! ="*)KD1&*)+.D*3)&D+)&BD@&4$>4&F)*K.*6$(>
! U)F1$3"G.(&"(%&+@*.(>&3.(+$+@)(37&
! _"*$"(@&.K&34"$(&*)F1$3"G.(
20
Overview
! T"30>*.D(%
! /'LM&F*$(3$F1)+
! /'LMI5_&!)+$>(
! I+$/($.5,
!2.*)&L.*01."%+
! =.(31D+$.(
21
Evaluation roadmap
! 5)7I#"1D)&1..0DF&)N3$)(37&3.6F"*$+.(
! A6F"3@&.K&B"30>*.D(%&.F)*"G.(+
! J(&F)*K.*6"(3)&"(%&1"@)(37
! <=J&"("17+$+&K.*&*"(%.6&*)"%&;.*01."%+
22
FAWN-DS lookups
! JD*&/'LMIB"+)%&+7+@)6&.#)*&Sf&6.*)&)N3$)(@&
@4"(&QWWbI)*"&@*"%$G.("1&+7+@)6+
23
System QPS Watts QPSWatt
Alix3c2/Sandisk(CF) 1298 3.75 346
Desktop/Mobi (SSD) 4289 83 51.7
MacbookPro / HD 66 29 2.3
Desktop / HD 171 87 1.96
Modern system speculation
24
VQW,WWW&AJ9^
QRWL
VQW,WWW&AJ9^
[WWL
#+O
<*"%$G.("1 ;7<=
Impact of background ops
25
0
400
800
1200
1600
9)"0 =.6F"3@ ^F1$@ 2)*>)
pD)*$)+&F)*&+)3.(%
&9)"0&lD)*7&1."%
0
400
800
1200
1600
Peak Compact Split Merge
Que
ries
per
sec
ond
&HWX&.K&F)"0&lD)*7&1."%
T"30>*.D(%&.F)*"G.(+&4"#)h&
!C5)*0$1*&$6F"3@&"@&F)"0&1."%
! =*9/"9"P/*&$6F"3@&"@&HWX&1."%
Impact on query latency
0
0.2
0.4
0.6
0.8
1
100 1000 10000 100000 1e+06
CD
F o
f Q
uery
Late
ncy
Query Latency (us)
Median 99.9%
No Split (Max)
Split (Low)
Split (High)
891us 26.3ms
863us 491ms
873us 611ms
Get Query Latency (Max Load)During Split (Low Load)During Split (High Load)
:.;&6)%$"(&1"@)(37
Y$(31D%)+&()@;.*0Z
j$>4&ccOcX&1"@)(37
%D*$(>&6"$(@)("(3)
YBD@&+G11&.0"7Z
26
When to use FAWN?
27
<=J&n&="F$@"1&=.+@&a&9.;)*&=.+@&YrWO[Wg0L4Z
/'LM&Y[WL&)"34Z
Q&<T&%$+0
SVPT&^'<'&/1"+4&^^!
QPT&!U'2&F)*&(.%)
<*"%$G.("1&YQWWLZ
/$#)&Q&<T&%$+0+
[SWPT&9=AI)&/1"+4&^^!
SVPT&/T!A22&F)*&(.%)
qrQRWIRWW&F)*&(.%)qrQWWWIbWWW&F)*&(.%)
Architecture with lowest TCOK.*&*"(%.6&"33)++&;.*01."%+
/'LMIB"+)%&+7+@)6+&3"(&F*.#$%)&
1.;)*&3.+@&F)*&sPT,&pD)*7U"@)t
U"G.&.K&lD)*7&*"@)&@.&
%"@"+)@&+$k)&%)@)*6$()+&
+@.*">)&@)34(.1.>7
P*"F4&$>(.*)+&
6"(">)6)(@,&3..1$(>,&
()@;.*0$(>OOO
System Cost W QPS QueriesJoule
GBWatt
TCOGB
TCOQPS
Traditionals:5-2TB HD $2K 250 1500 6 40 0.26 1.77160GB PCIe SSD $8K 220 200K 909 0.72 53 0.0464GB DRAM $3K 280 1M 3.5K 0.23 59 0.004
FAWNs:2TB Disk $350 20 250 12.5 100 0.20 1.6132GB SSD $500 15 35K 2.3K 2.1 16.9 0.0152GB DRAM $250 15 100K 6.6K 0.13 134 0.003
Table 4: Traditional and FAWN node statistics
!"#$
!$
!$"
!$""
!$"""
!$""""
!"#$ !$ !$" !$"" !$"""
%&'&()'!*+,)!+-!./
01)23!4&')!56+77+8-(9():;
!"#$%&%'(#
)*+*,-./
0.12*+*,%34
0.12*+*0)#35
0.12*+*,-./
Figure 16: Solution space for lowest 3-year TCO as a functionof dataset size and query rate.
second. The dividing lines represent a boundary across which onesystem becomes more favorable than another.
Large Datasets, Low Query Rates: FAWN+Disk has the lowesttotal cost per GB. While not shown on our graph, a traditionalsystem wins for exabyte-sized workloads if it can be configuredwith sufficient disks per node (over 50), though packing 50 disks permachine poses reliability challenges.
Small Datasets, High Query Rates: FAWN+DRAM costs thefewest dollars per queries/second, keeping in mind that we do notexamine workloads that fit entirely in L2 cache on a traditional node.This somewhat counterintuitive result is similar to that made bythe intelligent RAM project, which coupled processors and DRAMto achieve similar benefits [5] by avoiding the memory wall. Weassume the FAWN nodes can only accept 2 GB of DRAM per node,so for larger datasets, a traditional DRAM system provides a highquery rate and requires fewer nodes to store the same amount of data(64 GB vs 2 GB per node).
Middle Range: FAWN+SSDs provide the best balance of storagecapacity, query rate, and total cost. As SSD capacity improves, thiscombination is likely to continue expanding into the range servedby FAWN+Disk; as SSD performance improves, so will it reach intoDRAM territory. It is therefore conceivable that FAWN+SSD couldbecome the dominant architecture for a wide range of random-accessworkloads.
Are traditional systems obsolete? We emphasize that this analysisapplies only to small, random access workloads. Sequential-readworkloads are similar, but the constants depend strongly on the per-byte processing required. Traditional cluster architectures retaina place for CPU-bound workloads, but we do note that architec-tures such as IBM’s BlueGene successfully apply large numbers of
low-power, efficient processors to many supercomputing applica-tions [14]—but they augment their wimpy processors with customfloating point units to do so.
Our definition of “total cost of ownership” also ignores severalnotable costs: In comparison to traditional architectures, FAWNshould reduce power and cooling infrastructure, but may increasenetwork-related hardware and power costs due to the need for moreswitches. Our current hardware prototype improves work done pervolume, thus reducing costs associated with datacenter rack or floorspace. Finally, of course, our analysis assumes that cluster softwaredevelopers can engineer away the human costs of management—anoptimistic assumption for all architectures. We similarly discardissues such as ease of programming, though we ourselves selectedan x86-based wimpy platform precisely for ease of development.
6. RELATED WORKFAWN follows in a long tradition of ensuring that systems are bal-anced in the presence of scaling challenges and of designing systemsto cope with the performance challenges imposed by hardware ar-chitectures.
System Architectures: JouleSort [44] is a recent energy-efficiency benchmark; its authors developed a SATA disk-based“balanced” system coupled with a low-power (34 W) CPU that sig-nificantly out-performed prior systems in terms of records sorted perjoule. A major difference with our work is that the sort workloadcan be handled with large, bulk I/O reads using radix or merge sort.FAWN targets even more seek-intensive workloads for which eventhe efficient CPUs used for JouleSort are excessive, and for whichdisk is inadvisable.
More recently, several projects have begun using low-powerprocessors for datacenter workloads to reduce energy consump-tion [6, 34, 11, 50, 20, 30]. The Gordon [6] hardware architectureargues for pairing an array of flash chips and DRAM with low-powerCPUs for low-power data intensive computing. A primary focus oftheir work is on developing a Flash Translation Layer suitable forpairing a single CPU with several raw flash chips. Simulations ongeneral system traces indicate that this pairing can provide improvedenergy-efficiency. Our work leverages commodity embedded low-power CPUs and flash storage for cluster key-value applications,enabling good performance on flash regardless of FTL implemen-tation. CEMS [20], AmdahlBlades [50], and Microblades [30] alsoleverage low-cost, low-power commodity components as a buildingblock for datacenter systems, similarly arguing that this architecturecan provide the highest work done per dollar and work done perjoule. Microsoft has recently begun exploring the use of a large clus-ter of low-power systems called Marlowe [34]. This work focuseson taking advantage of the very low-power sleep states providedby this chipset (between 2–4 W) to turn off machines and migrateworkloads during idle periods and low utilization, initially target-ing the Hotmail service. We believe these advantages would alsotranslate well to FAWN, where a lull in the use of a FAWN clusterwould provide the opportunity to significantly reduce average en-ergy consumption in addition to the already-reduced peak energyconsumption that FAWN provides. Dell recently designed and hasbegun shipping VIA Nano-based servers consuming 20–30 W eachfor large webhosting services [11].
Considerable prior work has examined ways to tackle the “mem-ory wall.” The Intelligent RAM (IRAM) project combined CPUsand memory into a single unit, with a particular focus on energy effi-ciency [5]. An IRAM-based CPU could use a quarter of the power
28
Overview
! T"30>*.D(%
! /'LM&F*$(3$F1)+
! /'LMI5_&!)+$>(
! `#"1D"G.(
!C50*&<506/5$)'
! =.(31D+$.(
29
Sorting small records
30
W
[\OR
HR
RQOR
\W
j"%..F&^.*@ M^.*@
70
2734
18
55
^.*@&+F))%&Y2TF+Z
/'LM&YRLZ
<*"%$G.("1&a&j!&Yb\LZ
'@.6^^!I/'LM&YHWLZ
W
WO\R
[OR
QOQR
H
j^&`KK$3$)(37 M^.*@&`KK$3$)(37
2.333
0.60.3790.31
11
^.*@&`KK$3$)(37&Y2TF-Z
0
15
30
45
60
`(3*7F@$.(&^F))%&Y2Tg+Z
51.15
3.65
/'LM&YRLZ
<*"%$G.("1&Yb\LZ
0
0.2
0.4
0.6
0.8
`(3*7F@$.(&`KK$3$)(37&Y2Tg-Z
0.365
0.73
AES encryption/decryption of a 512MB file with a 256-bit key
31
/'LM&$+&Qf&6.*)&)N3$)(@&K.*&=9?IB.D(%&.F)*"G.(+m
9)*K.*6"(3) `N3$)(37
CPU-bound encryption Future directions
!L)&())%&B)C)*&6)@*$3+m
! `()*>7I)N3$)(37&#+O&s:"@)(37,&=.+@t
! 2"(">)"B$1$@7,&*)1$"B$1$@7,&F*.>*"66"B$1$@7&OOO
! U)"1&;.*01."%+&.(&/'LM
! 2"FU)%D3),&L)B&^)*#)*+,&J:<9u
! /'LMI.FG6$k)%&%"@"&F*.3)++$(>&E1@)*+
! ="(&;)&6"0)&^";k"11g:AMpg9$>&)N3$)(@&.(&/'LMu
32
Conclusion! /'LM&"*34$@)3@D*)&*)%D3)+&)()*>7&
3.(+D6FG.(&.K&31D+@)*&3.6FDG(>
! /'LMI5_&"%%*)++)+&34"11)(>)+&.K&;$6F7&(.%)+&K.*&
0)7&#"1D)&+@.*">)
! A($G"1&*)+D1@+&>)()*"1$k)&@.&.@4)*&;.*01."%+
! !"#$%&'($)*#+&,-'(-&,.&*#/0)12'(&)0$-(#3(&)0&4#-#++(+)3*&-(52)-(3&#&*#6,-&-('(3)/0�'&-(7-)1(&,.&4#-#++(+&$,'(8&9&:#1%;&<(+)$=
33 34
What about the network?
'>>*)>"G.(
^)*#)*
=.*)
^)*#)*
'>>*)>"G.(
L$6F$)+
=.*)
/'LM&^;$@34
L$6F$)+ L$6F$)+
/'LM&^;$@34
L$6F$)+
"%%+&[&1.;&F.;)*&/'LM&+;$@34&F)*&vF.%w