![Page 1: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/1.jpg)
Data Processing on the fast lane
Gustavo Alonso
Systems Group
Department of Computer Science
ETH Zurich, Switzerland
![Page 2: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/2.jpg)
The team behind the work:
• Rene Müller (now at IBM Almaden)
• Louis Woods (now at Apcera)
• Jens Teubner (now Professor at TU Dortmund)
David Sidler Zsolt Istvan Kaan KaraMuhsen Owaida
![Page 3: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/3.jpg)
Data processing today:AppliancesData Centers (Cloud)
![Page 4: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/4.jpg)
What is a database engine?
• As complex or more complex than an operating system
• Full software stack including• Parsers, Compilers, Optimizers
• Own resource management (memory, storage, network)
• Plugins for application logic
• Infrastructure for distribution, replication, notifications, recovery
• Extract, Transform, and Load infrastructure
• Large legacy, backward compatibility, standards
• Hugely optimized
![Page 5: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/5.jpg)
Databases are blindly fast at what they do well
![Page 6: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/6.jpg)
From Oracle documentation
Databases = think big
ORACLE EXADATA
![Page 7: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/7.jpg)
Database engine trends: Appliances
Oracle:T7, SQL in Hardware, RAPID
SAP:
OLTP+OLAP on main memory
Hana on SGI supercomputerSAP Hana on SGI UV 300HSGI documentation
Nobody ever got fired for using Hadoop on a ClusterA. Rowstron, D. Narayanan, A. Donnely, G. O’Shea, A. Douglas
HotCDP 2012, Bern, Switzerland
![Page 8: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/8.jpg)
SQL on FPGAs
Presentation at HotChips’16 from Baiduhttp://www.nextplatform.com/2016/08/24/baidu-takes-fpga-approach-accelerating-big-sql/
![Page 9: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/9.jpg)
The challenge of hardware acceleration
![Page 10: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/10.jpg)
If it sounds too good to be true ..
![Page 11: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/11.jpg)
Usual unspoken caveats in HW acceleration
• Where is the data to start with?
• Where does the data has to be at the end?
• What happens with irregular workloads?
• What happens with large intermediate states?
• What is the architecture?
• Is the design preventing the system from doing something else?
• Can the accelerator be multithreaded?
• Is the gain big enough to justify the additional complexity?
• Can the gains be characterized?
![Page 12: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/12.jpg)
Do not replace, enhance
Help the CPU to do what it does not do well
![Page 13: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/13.jpg)
Text search in databases
INTEL HARP:This is an experimental system provided by Intel any resultspresented are generated using pre-production hardware and software,and may not reflect the performance of production or future systems.
FCCM’16
![Page 14: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/14.jpg)
100% processing on FPGA
![Page 15: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/15.jpg)
Hybrid Processing CPU/FPGA
![Page 16: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/16.jpg)
Accelerators to come
From Oracle M7 documentation
![Page 17: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/17.jpg)
If the data moves, do it efficiently
Bumps in the wire(s)
![Page 18: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/18.jpg)
(Woods, VLDB’14)
IBEX
![Page 19: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/19.jpg)
A processor on the data path
![Page 20: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/20.jpg)
Storage to come
• Recent example BISCUIT from Samsung (ISCA’16)• User programmable Near-Data Processing for SSDs
From Samsung presentation at ISCA’16http://isca2016.eecs.umich.edu/wp-content/uploads/2016/07/3A-1.pdf
![Page 21: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/21.jpg)
Sounds good?
The goal is to be able to do this at all levels:Smart storageOn the network switch (SDN like)On the network card (smart NIC)On the PCI express bus On the memory bus (active memory)
Every element in the system
(a node, a computer rack, a cluster)
will be a processing component
![Page 22: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/22.jpg)
Disaggregated data center
Near Data Computation
![Page 23: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/23.jpg)
01-Sep-16 23
Consensus in a Box (Istvan et al, NSD’16)
Xilinx VC709 Evaluation Board
SFP+
SFP+
SFP+
SFP+
DRAM (8GB)
FPGA
Networking Atomic
Broadcast
Replicated
key-value store
Reads
Writes
SW Clients /
Other nodes
Other nodes
Other nodes
TCP
Direct
Direct
![Page 24: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/24.jpg)
• Drop-in replacement for memcached with Zookeeper’s replication
• Standard tools for benchmarking (libmemcached)• Simulating 100s of clients
24
The system
X 12
10Gbps Switch
3 FPGA cluster
Clients
Comm. over TCP/IP
Comm. over direct connections + Leader election
+ Recovery
![Page 25: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/25.jpg)
25
Latency of puts in a KVS
Consensus
15-35μs ~10μs
Memaslap(ixgbe)
TCP / 10Gbps Ethernet
~3μsDirect connections
![Page 26: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/26.jpg)
1000
10000
100000
1000000
10000000
1 10 100 1000
Thro
ugp
ut
(co
nse
nsu
s ro
un
ds/
s)
Consensus latency (us)
FPGA (Direct)
FPGA (TCP)
DARE* (Infiniband)
Libpaxos (TCP)
Etcd (TCP)
Zookeeper (TCP)
Specialized solutions
26
The benefit of specialization…
General purpose solutions
[1] Dragojevic et al. FaRM: Fast Remote Memory. In NSDI’14.[2] Poke et al. DARE: High-Performance State Machine Replication on RDMA Networks. In HPDC’15.*=We extrapolated from the 5 node setup for a 3 node setup.
10-100x
![Page 27: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/27.jpg)
This is the end …
Most exciting time to be in research
Many opportunities at all levels and in all areas
FPGAs great tools to:
Explore parallelism
Explore new architectures
Explore Software Defined X/Y/Z
Prototype accelerators
![Page 28: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/28.jpg)
FPGAs: the view from an outsider
![Page 29: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/29.jpg)
Difficulty to program
• FPGAs are no more difficult to program than system software (OS, databases, infrastructure, etc.)
• Only a handful of programmers can do system software, my guess is system programmers are not many more than the people who can program FPGAs
• But FPGAs have no tools to enhance productivity, specially no freely available tools (GCC, instrumentation, libraries, open source tools …)
![Page 30: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/30.jpg)
CS vs EE
• EE = understand parallelism
• CS= understand abstraction
You need both (and these days a lot more: systems, algorithms, machine learning, data center architecture, …)
![Page 31: Data Processing on the fast lane - FPL 2016 -- Data Processing...Data Processing on the fast lane Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland](https://reader033.vdocuments.mx/reader033/viewer/2022050409/5f868b908ed46b5bd0652736/html5/thumbnails/31.jpg)
Complete systems
• The proof of something that makes a difference is an end to end argument
• Showing that something is faster when running on an FPGA does not mean it will be faster when hooked into a real system (example: GPUs)