missing the forest for the trees hpca2020 - sigarch · 1 missing the forest for the trees:...
TRANSCRIPT
1
Missing the Forest for the Trees: End-to-End AI Application Performance in Edge Data Centers
Daniel Richins1,2, Dharmisha Doshi2, Matthew Blackmore2, Aswathy Thulaseedharan Nair2, Neha Pathapati2, Ankit Patel2, Brainard Daguman2, Daniel Dobrijalowski2, Ramesh Illikkal2, Kevin Long2, David Zimmerman2, Vijay Janapa Reddi1,3
1The University of Texas at Austin 2Intel 3Harvard University
International Symposium on High Performance Computer Architecture 25 February 2020
Missing the Forest for the Trees The AI Tax
2
AI
Missing the Forest for the Trees The AI Tax
2
AI
Missing the Forest for the Trees The AI Tax
2
AI
The forest is the AI tax
Missing the Forest for the Trees The AI Tax
3
The AI tax includes all the compute and infrastructure in an AI application that is necessary to enable the AI to execute but that isn’t AI itself.
Missing the Forest for the Trees The AI Tax
3
The AI tax includes all the compute and infrastructure in an AI application that is necessary to enable the AI to execute but that isn’t AI itself.
Artificial IntelligenceTime
Excit
emen
t
Missing the Forest for the Trees The AI Tax
3
The AI tax includes all the compute and infrastructure in an AI application that is necessary to enable the AI to execute but that isn’t AI itself.
Artificial IntelligenceTime
Excit
emen
t
Pre Post
1. AI Tax - A Case Study a. Definition b. Video Analytics c. Analysis
2. AI Acceleration - Anticipating Future Bottlenecks a. Emulation Technique b. Results c. What's Breaking?
3. Optimization - Better Performance at Lower TCO a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
Outline
4
1. AI Tax - A Case Study a. Definition b. Video Analytics c. Analysis
2. AI Acceleration - Anticipating Future Bottlenecks a. Emulation Technique b. Results c. What's Breaking?
3. Optimization - Better Performance at Lower TCO a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
Outline
4
1. AI Tax - A Case Study a. Definition b. Video Analytics c. Analysis
2. AI Acceleration - Anticipating Future Bottlenecks a. Emulation Technique b. Results c. What's Breaking?
3. Optimization - Better Performance at Lower TCO a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
Outline
4
1. AI Tax - A Case Study a. Definition b. Video Analytics c. Analysis
2. AI Acceleration - Anticipating Future Bottlenecks a. Emulation Technique b. Results c. What's Breaking?
3. Optimization - Better Performance at Lower TCO a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
Outline
4
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
AI Tax
5
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
AI Tax - Definition
6
Missing the Forest for the Trees
7
Missing the Forest for the Trees
7
Missing the Forest for the Trees
7
AI Tax
Missing the Forest for the Trees
7
AI TaxSupporting compute, storage, network, software infrastructure, etc. together constitute the AI Tax.
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
AI Tax - Video Analytics
8
Face Recognition Algorithm
9
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
Video Stream
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
Video Stream
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
Video Stream Ingestion
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
FrameVideo
Stream Ingestion
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
FrameFace
DetectionVideo
Stream Ingestion
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
Face Thumbnail
FrameFace
DetectionVideo
Stream Ingestion
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
Face Thumbnail
FrameFace
Detection
Feature Extraction
Video Stream Ingestion
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
Vector
Face Thumbnail
FrameFace
Detection
Feature Extraction
Video Stream Ingestion
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
Vector
Face Thumbnail
FrameFace
Detection
Feature ExtractionClassification
Video Stream Ingestion
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
Vector
Face Thumbnail
FrameFace
Detection
Feature ExtractionClassification
Video Stream Ingestion
Face Recognition Algorithm
9
Face Recognition is Google’s FaceNet as a data center application.
Vector
Face Thumbnail
Frame
Identity
Face Detection
Feature ExtractionClassification
Video Stream Ingestion
Face Recognition Algorithm
9
User Application
Face Recognition is Google’s FaceNet as a data center application.
Vector
Face Thumbnail
Frame
Identity
Face Detection
Feature ExtractionClassification
Video Stream Ingestion
Face Recognition Algorithm
9
User Application
Face Recognition is Google’s FaceNet as a data center application.
Vector
Face Thumbnail
Frame
Identity
Face Detection
Face Detection
Feature ExtractionFeature
ExtractionClassificationClassification
Video Stream Ingestion
AI Compute
Face Recognition Data Center Deployment
10
Classification
Ingestion Face Detection
Video Stream
Identity Feature Extraction
User Application
AI Compute
Face Recognition Data Center Deployment
10
Classification
Ingestion Face Detection
Feature Extraction
User Application
AI Compute
Ingest/Detect
Face Recognition Data Center Deployment
10
User
Identification
Application
AI Compute
Ingest/Detect
Face Recognition Data Center Deployment
10
User
Identification
Application
Producers
ConsumersAI Compute
IdentificationIdentification
Ingest/DetectIngest/DetectIngest/Detect
Face Recognition Data Center Deployment
10
User
Identification
Application
Producers
ConsumersAI Compute
IdentificationIdentification
Ingest/DetectIngest/DetectIngest/Detect
Face Recognition Data Center Deployment
10
User
Identification
Application
Brokers
Producers
ConsumersAI Compute
Experimental Setup Hardware
11
Experimental Setup Hardware
11
2x Intel Xeon Platinum 8176 2x 28 cores, 2.10 GHz, 2x 38.5 MB LLC
384 GB DDR4 SDRAM
1x Intel SSD P4510 2.85 GB/s read 1.10 GB/s write
100 Gbps Ethernet
Experimental Setup Hardware
11
Experimental Setup Face Recognition
12
We allocate one core per container. Hence, a server runs 56 containers.
Identification
Experimental Setup Face Recognition
12
We allocate one core per container. Hence, a server runs 56 containers.
Ingest/Detect x56
x56
Experimental Setup Face Recognition
12
We allocate one core per container. Hence, a server runs 56 containers.
Ingest/Detect Identification
840 total producers 1680 total consumers
Experimental Setup Face Recognition
12
We allocate one core per container. Hence, a server runs 56 containers.
Brokers get their own server. This grants them full network and storage bandwidth.
Ingest/Detect Identification
840 total producers 1680 total consumers
Experimental Setup Face Recognition
12
We allocate one core per container. Hence, a server runs 56 containers.
Broker
Brokers get their own server. This grants them full network and storage bandwidth.
Ingest/Detect Identification
840 total producers 1680 total consumers
Experimental Setup Face Recognition
12
We allocate one core per container. Hence, a server runs 56 containers.
Brokers get their own server. This grants them full network and storage bandwidth.
Ingest/Detect Identification
840 total producers 1680 total consumers
Broker
3 brokers
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
AI Tax - Analysis
13
Application Progress Event Logging
14
while True: frame = queue.get()
producer.send(faces) faces = detect_faces(frame)
Application Progress Event Logging
14
while True: frame = queue.get() start = time.time()
end = time.time() producer.send(faces)
faces = detect_faces(frame)
Application Progress Event Logging
14
while True: frame = queue.get() start = time.time()
end = time.time()
size = sys.getsizeof(faces) log = { 'start': start, 'end': end, 'size': size } logger.info(log)
producer.send(faces)
faces = detect_faces(frame)
Application Progress Event Logging
14
while True: frame = queue.get() start = time.time()
end = time.time()
size = sys.getsizeof(faces) log = { 'start': start, 'end': end, 'size': size } logger.info(log)
producer.send(faces)
faces = detect_faces(frame)
Logging is designed to raise the level of abstraction. We view application progress from the data center perspective.
Face Detection Latency
15
Face Detection Latency
15
Latency Breakdown
Ingestion DetectionBrokers Identification
Face Detection Latency
15
Latency Breakdown
Ingestion DetectionBrokers Identification
5.4%
AI Tax
Face Detection Latency
15
Latency Breakdown
Ingestion DetectionBrokers Identification
21.3%
5.4%
AI Tax
AI Compute
Face Detection Latency
15
Latency Breakdown
Ingestion DetectionBrokers Identification
35.9%
21.3%
5.4%
AI Tax
AI Tax
AI Compute
Face Detection Latency
15
Latency Breakdown
Ingestion DetectionBrokers Identification
37.4%
35.9%
21.3%
5.4%
AI Tax
AI Tax
AI Compute
AI Compute
Process Breakdowns
16
Ingestion
100%
AI AI Tax
Process Breakdowns
16
Face Detection
58%42%
AI AI Tax
Ingestion
100%
AI AI Tax
Process Breakdowns
16
Face Detection
58%42%
AI AI Tax
Identification
12%
88%
AI AI Tax
Ingestion
100%
AI AI Tax
Process Breakdowns
16
Face Detection
58%42%
AI AI Tax
Identification
12%
88%
AI AI Tax
Ingestion
100%
AI AI Tax
Process Breakdowns
16
Pre- and post-processing are heavily utilized within stages.
AI Tax
17
Time
Excit
emen
t
Pre PostAI
AI Tax
18
Time
Excit
emen
t
Pre PostAIAIAI
AI Tax
19
Time
Excit
emen
t
Pre PostAIAIAI
AI Tax
19
Time
Excit
emen
t
Pre PostAIAIAI
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
AI Acceleration
20
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
AI Acceleration - Emulation Technique
21
Acceleration Emulation
22
Classification
Brokers
Ingest/Detect
Identification
Acceleration Emulation
22
Classification
Brokers
Dial an Accelerator Speed
Acceleration Emulation
23
while True:
faces = detect_faces(frame) start = time.time()
end = time.time()
sys.getsizeof(faces)
frame = queue.get()
producer.send(faces)
log = { 'start': start, 'end': end,
} logger.info(log)
'size': 'size': size
size =
Acceleration Emulation
23
while True:
faces = detect_faces(frame) start = time.time()
end = time.time()
sys.getsizeof(faces)
frame = queue.get()
producer.send(faces)
log = { 'start': start, 'end': end,
} logger.info(log)
'size': 'size': size
size =
Acceleration Emulation
23
while True:
start = time.time()
end = time.time()
sys.getsizeof(faces)
frame = queue.get()
producer.send(
time.sleep(avg_time)
faces)
log = { 'start': start, 'end': end,
} logger.info(log)
'size': 'size': size
size =
Acceleration Emulation
23
while True:
start = time.time()
end = time.time()
sys.getsizeof(faces)
frame = queue.get()
producer.send(
time.sleep(avg_time)
faces)
log = { 'start': start, 'end': end,
} logger.info(log)
'size': 'size': size
size =
Acceleration Emulation
23
while True:
start = time.time()
end = time.time()
frame = queue.get()
producer.send(
time.sleep(avg_time)
os.urandom(avg_size))
log = { 'start': start, 'end': end,
} logger.info(log)
'size': 'size': size
avg_size size =
Acceleration Emulation
23
while True:
start = time.time()
end = time.time()
frame = queue.get()
producer.send(
time.sleep(avg_time)
os.urandom(avg_size))
log = { 'start': start, 'end': end,
} logger.info(log)
'size': 'size': size
avg_size size =
Acceleration Emulation
24
while True:
start = time.time()
end = time.time()
frame = queue.get()
producer.send(
log = { 'start': start, 'end': end,
} logger.info(log)
'size': 'size':
os.urandom(avg_size))
time.sleep(avg_time)
size = avg_size
size
Acceleration Emulation
24
while True:
start = time.time()
end = time.time()
frame = queue.get()
producer.send(
log = { 'start': start, 'end': end,
} logger.info(log)
'size': 'size':
os.urandom(avg_size))
time.sleep(avg_time)
size = avg_size
size
Acceleration Emulation
24
while True:
start = time.time()
end = time.time()
frame = queue.get()
producer.send(
log = { 'start': start, 'end': end,
} logger.info(log)
'size': 'size':
os.urandom(avg_size))
time.sleep(avg_time/speedup)
size = avg_size
size
Acceleration Emulation
24
while True:
start = time.time()
end = time.time()
frame = queue.get()
producer.send(
log = { 'start': start, 'end': end,
} logger.info(log)
'size': 'size':
os.urandom(avg_size))
time.sleep(avg_time/speedup)
size = avg_size
size
With faster processing, we feed frames into the system faster to maximize throughput
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
AI Acceleration - Results
25
Accelerated AI: Reduced Latency and Increased Throughput
26
Accelerated AI: Reduced Latency and Increased Throughput
26
Late
ncy
(ms)
0
100
200
300
400
500
600
700
1x 2x 4x 6x 8x
Ingest/Detect Broker Identify
Fram
es p
er S
econ
d (x
1000
)
0
10
20
30
40
50
60
70Throughput
Accelerated AI: Reduced Latency and Increased Throughput
26
Late
ncy
(ms)
0
100
200
300
400
500
600
700
1x 2x 4x 6x 8x
Ingest/Detect Broker Identify
Fram
es p
er S
econ
d (x
1000
)
0
10
20
30
40
50
60
70Throughput
Accelerated AI: Reduced Latency and Increased Throughput
26
Late
ncy
(ms)
0
100
200
300
400
500
600
700
1x 2x 4x 6x 8x
Ingest/Detect Broker Identify
Fram
es p
er S
econ
d (x
1000
)
0
10
20
30
40
50
60
70Throughput
Accelerated AI: Reduced Latency and Increased Throughput
26
Late
ncy
(ms)
0
100
200
300
400
500
600
700
1x 2x 4x 6x 8x
Ingest/Detect Broker Identify
Fram
es p
er S
econ
d (x
1000
)
0
10
20
30
40
50
60
70Throughput
Accelerated AI: Reduced Latency and Increased Throughput
26
Late
ncy
(ms)
0
100
200
300
400
500
600
700
1x 2x 4x 6x 8x
Ingest/Detect Broker Identify
Fram
es p
er S
econ
d (x
1000
)
0
10
20
30
40
50
60
70Throughput
At 8x speedup, the average latency goes to infinity. The longer the experiment runs, the greater the latency.
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
AI Acceleration - What’s Breaking?
27
Three Big Systems
28
Three Big Systems
28
Compute
Three Big Systems
28
Compute Network
Three Big Systems
28
Compute Network Storage
Three Big Systems
28
Compute Network Storage
Three Big Systems
28
Compute Network Storage
?
Three Big Systems
28
Compute Network Storage
? ?
Explaining the Bottleneck
29
Network Utilization
0%
1%
2%
3%
4%
5%
6%
7%
1x 2x 4x 6x 8x
Broker Read Broker Write
Explaining the Bottleneck
29
Network Utilization
0%
1%
2%
3%
4%
5%
6%
7%
1x 2x 4x 6x 8x
Broker Read Broker Write
Explaining the Bottleneck
29
Network Utilization
0%
1%
2%
3%
4%
5%
6%
7%
1x 2x 4x 6x 8x
Broker Read Broker Write
Explaining the Bottleneck
29
Storage Utilization
0%
10%
20%
30%
40%
50%
60%
70%
1x 2x 4x 6x 8x
Network Utilization
0%
1%
2%
3%
4%
5%
6%
7%
1x 2x 4x 6x 8x
Broker Read Broker Write
Explaining the Bottleneck
29
Storage Utilization
0%
10%
20%
30%
40%
50%
60%
70%
1x 2x 4x 6x 8x
Network Utilization
0%
1%
2%
3%
4%
5%
6%
7%
1x 2x 4x 6x 8x
Broker Read Broker Write
Explaining the Bottleneck
29
Storage Utilization
0%
10%
20%
30%
40%
50%
60%
70%
1x 2x 4x 6x 8x
Network Utilization
0%
1%
2%
3%
4%
5%
6%
7%
1x 2x 4x 6x 8x
Broker Read Broker Write
Explaining the Bottleneck
29
Storage Utilization
0%
10%
20%
30%
40%
50%
60%
70%
1x 2x 4x 6x 8x
Network Utilization
0%
1%
2%
3%
4%
5%
6%
7%
1x 2x 4x 6x 8x
Broker Read Broker Write
Explaining the Bottleneck
29
As storage utilization approaches the limits of the devices, it becomes the limiting factor to performance.
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
Optimization
30
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
Optimization - Fixing the Bottleneck
31
Fixing the Bottleneck
32
Fixing the Bottleneck
32
Additional Drives
Late
ncy
(ms)
0
50
100
150
200
1 Driv
e
2 Driv
es
3 Driv
es
4 Driv
es
8x 12x 16x 24x 32x
Fixing the Bottleneck
32
Additional Drives
Late
ncy
(ms)
0
50
100
150
200
1 Driv
e
2 Driv
es
3 Driv
es
4 Driv
es
8x 12x 16x 24x 32x
Additional Brokers
Late
ncy
(ms)
0
50
100
150
200
3 Brok
ers
4 Brok
ers
6 Brok
ers
8 Brok
ers
Fixing the Bottleneck
32
Additional Drives
Late
ncy
(ms)
0
50
100
150
200
1 Driv
e
2 Driv
es
3 Driv
es
4 Driv
es
8x 12x 16x 24x 32x
Additional Brokers
Late
ncy
(ms)
0
50
100
150
200
3 Brok
ers
4 Brok
ers
6 Brok
ers
8 Brok
ers
Fixing the Bottleneck
32
Additional Drives
Late
ncy
(ms)
0
50
100
150
200
1 Driv
e
2 Driv
es
3 Driv
es
4 Driv
es
8x 12x 16x 24x 32x
Additional Brokers
Late
ncy
(ms)
0
50
100
150
200
3 Brok
ers
4 Brok
ers
6 Brok
ers
8 Brok
ers
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
Optimization - Edge Data Centers
33
Advantages of an Edge Data Center
34
Advantages of an Edge Data Center
34
Smaller corporations are finding edge data centers more economical than the cloud.
Advantages of an Edge Data Center
34
Smaller corporations are finding edge data centers more economical than the cloud.
Edge data centers offer lower latency by serving local users.
Advantages of an Edge Data Center
34
Smaller corporations are finding edge data centers more economical than the cloud.
Edge data centers offer lower latency by serving local users.
Edge data centers can be built to target a specific application domain.
https://www.networkworld.com/article/2926448/7-key-criteria-for-defining-edge-data-centers.html
http://blog.cushwake.com/americas/life-on-the-edge-the-new-normal-for-data-centers.html
https://www.vxchnge.com/blog/what-is-an-edge-data-center
Sources
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
Optimization - Two Designs
35
Edge Data Center Node Allocation
36
Edge Data Center Node Allocation
36
We need to allocate enough brokers to handle 32x speedup.
Edge Data Center Node Allocation
36
We need to allocate enough brokers to handle 32x speedup.
Consumers30
Producers15
Brokers8
Edge Data Center Node Allocation
36
Consumers578
Producers289
Brokers157
We need to allocate enough brokers to handle 32x speedup.
Consumers30
Producers15
Brokers8
Targeted Data Center Design Optimizing for Total Cost of Ownership
37
Targeted Data Center Design Optimizing for Total Cost of Ownership
37
Homogeneous Heterogeneous
Targeted Data Center Design Optimizing for Total Cost of Ownership
37
Homogeneous Heterogeneous
56 core 1 Drive 100 GbE
Targeted Data Center Design Optimizing for Total Cost of Ownership
37
Homogeneous Heterogeneous
56 core 1 Drive 100 GbE
Compute Broker
Targeted Data Center Design Optimizing for Total Cost of Ownership
37
Homogeneous Heterogeneous
56 core 1 Drive 100 GbE
Compute Broker
160 switches
Targeted Data Center Design Optimizing for Total Cost of Ownership
37
Homogeneous Heterogeneous
56 core 1 Drive 100 GbE
Compute Broker
160 switches 28+14 switches
Targeted Data Center Design Optimizing for Total Cost of Ownership
37
Homogeneous Heterogeneous
56 core 1 Drive 100 GbE
Compute Broker
TCO
160 switches 28+14 switches
Heterogeneous Edge Data Center
38
Heterogeneous Edge Data Center
38
Compute Node
Heterogeneous Edge Data Center
38
Compute Node Broker Node
Heterogeneous Edge Data Center
38
Compute Node Broker Node
Heterogeneous Edge Data Center
38
Compute Node Broker Node
Heterogeneous Edge Data Center
38
Compute Node Broker Node
Heterogeneous Edge Data Center
38
Compute Node Broker Node
10 GbE
Heterogeneous Edge Data Center
38
Compute Node Broker Node
10 GbE
Heterogeneous Edge Data Center
38
Compute Node Broker Node
10 GbE
Heterogeneous Edge Data Center
38
Compute Node Broker Node
10 GbE
Heterogeneous Edge Data Center
38
Compute Node Broker Node
10 GbE50 GbE
Heterogeneous Edge Data Center Networking
39
Heterogeneous Edge Data Center Networking
39
100 Gbps Switch 100 Gbps Switch
100 Gbps 100 Gbps 100 Gbps 100 Gbps
Heterogeneous Edge Data Center Networking
39
100 Gbps Switch 100 Gbps Switch
100 Gbps 100 Gbps 100 Gbps 100 Gbps
Broker Node
50 Gbps
Heterogeneous Edge Data Center Networking
39
100 Gbps Switch 100 Gbps Switch
100 Gbps 100 Gbps 100 Gbps 100 Gbps
Broker Node
50 Gbps
Heterogeneous Edge Data Center Networking
39
100 Gbps Switch 100 Gbps Switch
100 Gbps 100 Gbps 100 Gbps 100 Gbps
40 Gbps 40 Gbps
Broker Node
50 Gbps
Heterogeneous Edge Data Center Networking
39
100 Gbps Switch 100 Gbps Switch
100 Gbps 100 Gbps 100 Gbps 100 Gbps
40 Gbps 40 Gbps
Broker NodeCompute Node
50 Gbps
10 Gbps
Heterogeneous Edge Data Center Networking
39
100 Gbps Switch 100 Gbps Switch
100 Gbps 100 Gbps 100 Gbps 100 Gbps
40 Gbps 40 Gbps 40 Gbps40 Gbps
Broker NodeCompute Node
50 Gbps
10 Gbps
Comparing Total Cost of Ownership
40
We assume a three-year amortization of costs.
Comparing Total Cost of Ownership
40
We assume a three-year amortization of costs.
0%
20%
40%
60%
80%
100%
Compute Networking Power Overall
Homogeneous Heterogeneous
Comparing Total Cost of Ownership
40
We assume a three-year amortization of costs.
0%
20%
40%
60%
80%
100%
Compute Networking Power Overall
Homogeneous Heterogeneous
89%100%
Comparing Total Cost of Ownership
40
We assume a three-year amortization of costs.
0%
20%
40%
60%
80%
100%
Compute Networking Power Overall
Homogeneous Heterogeneous
23%
89%100%100%
Comparing Total Cost of Ownership
40
We assume a three-year amortization of costs.
0%
20%
40%
60%
80%
100%
Compute Networking Power Overall
Homogeneous Heterogeneous
100%
23%
89%100%100%100%
Comparing Total Cost of Ownership
40
We assume a three-year amortization of costs.
0%
20%
40%
60%
80%
100%
Compute Networking Power Overall
Homogeneous Heterogeneous
84%100%
23%
89%100%100%100%100%
Comparing Total Cost of Ownership
40
We assume a three-year amortization of costs.
0%
20%
40%
60%
80%
100%
Compute Networking Power Overall
Homogeneous Heterogeneous
84%100%
23%
89%100%100%100%100%
$12.9 million $10.8 million
Comparing Total Cost of Ownership
40
We assume a three-year amortization of costs.
0%
20%
40%
60%
80%
100%
Compute Networking Power Overall
Homogeneous Heterogeneous
84%100%
23%
89%100%100%100%100%
$12.9 million $10.8 million
The targeted, heterogeneous data center incurs 16% lower total cost of ownership
Comparing Total Cost of Ownership
40
We assume a three-year amortization of costs.
0%
20%
40%
60%
80%
100%
Compute Networking Power Overall
Homogeneous Heterogeneous
84%100%
23%
89%100%100%100%100%
$12.9 million $10.8 million
The targeted, heterogeneous data center incurs 16% lower total cost of ownership
Designing the data center to match the needs of the application, we created a better data center at lower cost
1. AI Tax a. Definition b. Video Analytics c. Analysis
2. AI Acceleration a. Emulation Technique b. Results c. What's Breaking?
3. Optimization a. Fixing the Bottleneck b. Edge Data Centers c. Two Designs
4. Conclusion
Conclusion
41
Calls to Action
42
Calls to Action
• To fully understand AI applications, we must consider the overhead of the AI tax in end-to-end performance.
42
Calls to Action
• To fully understand AI applications, we must consider the overhead of the AI tax in end-to-end performance.
• As we accelerate AI, we must consider new bottlenecks that manifest as AI tax.
42
Calls to Action
• To fully understand AI applications, we must consider the overhead of the AI tax in end-to-end performance.
• As we accelerate AI, we must consider new bottlenecks that manifest as AI tax.
• We cannot limit our view of AI to microarchitectural considerations. We need data center-level optimizations to address data center-level bottlenecks.
42
43
Thank You