compiling for deeply embedded and heterogeneous signal ... · compiling for deeply embedded and...

16
Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler Construction (CCC) 5G Summit, Dresden, Germany September 29, 2016

Upload: others

Post on 16-Oct-2019

27 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

Compiling for deeply embedded and heterogeneous signal processing systemsJeronimo CastrillonCfaed Chair for Compiler Construction (CCC)

5G Summit, Dresden, GermanySeptember 29, 2016

Page 2: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

Multi-Processor/core Systems on Chip

q HW complexityq Increasing number of coresq Increasing heterogeneity

q Heterogeneity and size

© Prof. J. Castrillon. 5G Summit2

0 2 4 6 8

10 12 14 16

2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014

Num

ber o

f PEs

Year

PE Count in SoCs

OMAP Family

OMAP1 OMAP2OMAP3530

OMAP3640OMAP4430

OMAP4470

OMAP5430

Snapdragon Family

S1 QSD8650 S2 MSM8255S3 MSM8060

S4 MSM8960

S4 APQ8064

[Castrill14]

DMAs, sema-

phoresPMU

Peripherals

Communicationsupport

HW queues

Network Processor

Packet DMA

MEMsubsystem

NoC

A15L1

A15L1

L2A15L1

A15L1

VLIW DSP

L1,L2

TI Keystone IIEphiphany IV

Source: http://www.adapteva.com/docs/e64g401_datasheet.pdf

Page 3: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

Multi-Processor/core Systems on Chip

q SW productivity gapq How to program these systems? q Meet performance/energy requirements

© Prof. J. Castrillon. 5G Summit3

DMAs, sema-

phoresPMU

Peripherals

Communicationsupport

HW queues

Network Processor

Packet DMA

MEMsubsystem

NoC

A15L1

A15L1

L2A15L1

A15L1

VLIW DSP

L1,L2

TI Keystone IIEphiphany IV

Source: http://www.adapteva.com/docs/e64g401_datasheet.pdf

Fragmented tools, different runtimes/OS on ARM and DSP, 500+ pages on APIs

Homogeneous! OpenMP support uses 1/3 of program memory!

5G

Massive MIMO

High-perf.Codes

Complex Beamform.

Page 4: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

5G

Massive MIMO

High-perf.Codes

Complex Beamform.

Multi-Processor/core Systems on Chip

q SW productivity gapq How to program these systems? q Meet performance/energy requirements

© Prof. J. Castrillon. 5G Summit4

DMAs, sema-

phoresPMU

Peripherals

Communicationsupport

HW queues

Network Processor

Packet DMA

MEMsubsystem

NoC

A15L1

A15L1

L2A15L1

A15L1

VLIW DSP

L1,L2

TI Keystone IIEphiphany IV

Source: http://www.adapteva.com/docs/e64g401_datasheet.pdf

Fragmented tools, different runtimes/OS on ARM and DSP, 500+ pages on APIs

Homogeneous! OpenMP support uses 1/3 of program memory!

Need for software automation tools

Page 5: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

© Prof. J. Castrillon. 5G Summit5

Programming flow

DMAs, sema-

phoresPMU

Peripherals

Communicationsupport

HW queues

Network Processor

Packet DMA

MEMsubsystem

NoC

A15L1

A15L1

L2A15L1

A15L1

VLIW DSP

L1,L2

Dataflow application

Architecture model

Non-functional specification

Analysis

Synthesis

Code generation

Property models (timing, energy, error, …)

PNargs_ifft_r.ID = 6U;PNargs_ifft_r.PNchannel_freq_coef = filtered_coef_rightPNargs_ifft_r.PNnum_freq_coef = 0U;PNargs_ifft_r.PNchannel_time_coef = sink_rightPNargs_ifft_r.channel = 1;sink_left = IPCllmrf_open(3, 1, 1);sink_right = IPCllmrf_open(7, 1, 1);PNargs_sink.ID = 7U;PNargs_sink.PNchannel_in_left = sink_leftPNargs_sink.PNnum_in_left = 0U;PNargs_sink.PNchannel_in_right = sink_rightPNargs_sink.PNnum_in_right = 0U;taskParams.arg0 = (xdc_UArg)&PNargs_src;taskParams.priority = 1;

ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtr&taskParams, &eb);

glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_fft_ltaskParams.priority = 1;

ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtrft_Templ, &taskParams, &eb);

glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_ifft_rtaskParams.priority = 1;

ti_sysbios_knl_Task_create((ti_sysbios_knl_Task_FuncPtrfft_Templ, &taskParams, &eb);

glob_proc_cnt++;hasProcess = 1;taskParams.arg0 = (xdc_UArg)&PNargs_sinktaskParams.priority = 1;

[Castrill14]

Page 6: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

Dataflow programming models

q Graph representation of applicationsq Implicit repetitive execution of tasksq Good model for streaming applicationsq Good match for signal processing & multi-media applications

q Large body of research on multiple flavors of these models

6 © Prof. J. Castrillon. 5G Summit

Static Dynamic

Properties: No race conditions, determinism, strong/weak guarantees

Page 7: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

Language: C for process networks

q FIFO Channels

q Processes & networks

© Prof. J. Castrillon. 5G Summit7

typedef struct { int i; double d; } my_struct_t;__PNchannel my_struct_t S;__PNchannel int A = {1, 2, 3}; /* Initialization */__PNchannel short C[2], D[2], F[2], G[2];

__PNkpn AudioAmp __PNin(short A[2]) __PNout(short B[2]) __PNparam(short boost){

while (1)__PNin(A) __PNout(B) {for (int i = 0; i < 2; i++)B[i] = A[i]*boost;

}}__PNprocess Amp1 = AudioAmp __PNin(C) __PNout(F) __PNparam(3);__PNprocess Amp2 = AudioAmp __PNin(D) __PNout(G) __PNparam(10);

[Sheng14]

Page 8: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

Architecture model and constraints

q System model including:q Topology, interconnect, memoriesq Computation: uArch modelq Communication: cost functionsq MCA standardizing it to SHIM 2.0

q Constraintsq Time constraints

q Mapping constraintsq Platform constraints

© Prof. J. Castrillon. 5G Summit8

…… [Oden13]

1 ms

3 ms

1 ms

Page 9: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

© Prof. J. Castrillon. 5G Summit9

Analysis and synthesis: Overview

Non-functional specification

CPN application

Architecture model

Analysis: Instrumentation, profiling, tracing

Sequential performance estimation

Mapping and scheduling

Mapping configuration

Time-annotated traces

Parallel perf. estimation

Increase resources

[Castrill13]

Page 10: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

10

Example 1) From LabVIEW to Tomahawk Platform

© Prof. J. Castrillon. 5G Summit

Baseband processing application (data dependent execution paths)

hs-serial

hs-serial

hs-serial

hs-serial

hs-serial

parallelRouter(1,0)

FPGA-Interface

Router(1,1)

Router(0,1)

Router(0,0)

Duo-PE0

Duo-PE1

FEC

Duo-PE2

SDDuo-PE6

Duo-PE7

Duo-PE5

Duo-PE3 CM

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

AVS Contr.

UART-GPIO

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PM

GT

ADPLL, PMGT

ADPLL, PMGT

ADPLL, PMGT

ADPLL

ADPLL

DDR-SDRAM-Interface

Tomahawk2_core

APP

Duo-PE4

ADPLL, PMGT

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

VDSPRISC

ADPLL, PMGT

VDSPRISC

Tomahawk Chip (accelerators for baseband processing)

Automatic code generation

Visit demo booth

[Arn

old1

3]

Page 11: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

11

Example 2) LTE application mapping

© Prof. J. Castrillon. 5G Summit

Cor

tess

y: S

ilexi

caSo

ftwar

e So

lutio

ns G

mbH

Page 12: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

12

Example 2) LTE application execution

© Prof. J. Castrillon. 5G Summit

Cor

tess

y: S

ilexi

caSo

ftwar

e So

lutio

ns G

mbHSchedule

Page 13: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

13

Example 2) LTE application: Results

© Prof. J. Castrillon. 5G Summit

8,62

1,71 2,082,79

0

2

4

6

8

10

PeakPower(W)

2,62

1,55 1,641,99

0

0,5

1

1,5

2

2,5

3

AveragePower(W)

327,8

1930

966,5

348,2

0

500

1000

1500

2000

2500

ExecutionTime(ms)

1,164

0,3340,631

1,443

0

0,5

1

1,5

2

PowerEfficiency(1/(W*second))

Reference (LoadBalancer)

Constraint=2s

Constraint=1s

Constraint=0.35s

Cor

tess

y: S

ilexi

caSo

ftwar

e So

lutio

ns G

mbH

Page 14: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

Acknowledgements

q Vodafone Chair for Mobile Communications Systems

q Silexica Software Solutions GmbH

q National Instruments

q German Cluster of Excellence: Center for Advancing Electronics Dresden (www.cfaed.tu-dresden.de)

© Prof. J. Castrillon. 5G Summit14

Page 15: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

Thanks!Questions?

Page 16: Compiling for deeply embedded and heterogeneous signal ... · Compiling for deeply embedded and heterogeneous signal processing systems Jeronimo Castrillon Cfaed Chair for Compiler

References

[Castrill14] J. Castrillon and R. Leupers, Programming Heterogeneous MPSoCs: Tool Flows to Close the Software Productivity Gap. Springer, 2014

[Sheng14] W. Sheng, S. Schürmans, M. Odendahl, M. Bertsch, V. Volevach, R. Leupers, and G. Ascheid, “A compiler infrastructure for embedded heterogeneous MPSoCs”, Parallel Comput. 40, 2 (February 2014), 51-68

[Oden13] M. Odendahl, et al., “Split-cost communication model for improved MPSoC application mapping”, In International Symposium on System on Chip pp. 1-8, 2013

[Castrill13] J. Castrillon, R. Leupers, and G. Ascheid, “MAPS: Mapping concurrent dataflow applications to heterogeneous MPSoCs,” IEEE Transactions on Industrial Informatics, vol. 9, no. 1, pp. 527–545, 2013

[Arnold13] O. Arnold, et al. “Tomahawk - Parallelism and Heterogeneity in Communications Signal Processing MPSoCs”. TECS, 2013

© Prof. J. Castrillon. 5G Summit16