on workload in an sca-based system, with varying component and data packet sizes tore ulversøy 1...

21
On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate Center 1 University of Oslo (UiO) 2 Norwegian University of Science and Technology (NTNU)

Upload: joel-landon

Post on 28-Mar-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

On Workload in an SCA-based System, with Varying Component and Data Packet Sizes

Tore Ulversøy1

Jon Olavsson Neset2

1FFI1UNIK University Graduate Center1University of Oslo (UiO)2Norwegian University of Science and Technology (NTNU)

Page 2: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Outline

• Background and Problem Definition• Empirical Analysis• Analysis using Low-Complexity Analytical Models• Conclusions

Page 3: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Background

• The base code of one of the waveform applications used in the following originate from a member in, and the waveform application is also used for other activities in

• The Regular Task Group on SDR founded below the RTO-IST-080

RTO-IST-080 RTG-038 Software Defined Radio currently, the team consists of experts from government,

university and industry fromCA, DK, GE, HU, IT, NL, NO, SP, TU, US and: SDR Forum

headed by NL (Chairman: Hans Segers, TNO)

Page 4: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Background: Main Objectives of RTG-038

© IBM/Levono

© Rockwell Collins

… 011010 …

© Spectrum Signal Processing

• Share knowledge & experience of (multi)national SDR/SCA developments

• Report on possibilities of sharing waveforms and waveform components

• Investigations of portability and interoperability:

SCA-based implementation of STANAG 4285 waveform

demonstrate portability onto national SDR platforms

demonstrate interoperability between the different implementations

1.

2.

3.

Page 5: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Problem Definition and Problem Background

•SCA defines an environment that allows applications to be built as compositions of SW components (and devices)•SCA defines a distributed system, communication through CORBA for CORBA-capable processors•There is wide freedom as to how small components to split the application into: Many small components reuse of components becomes easier, but CPU overhead increases

Processor 1 Processor 2 Processor 3

C6 C7 C8 C9 C10

C1 C2 C3 C4 C5

Application/Component View:

Physical View:

•What are the CPU overhead effects of a fine structure (many components) relative to a course one (few components), and how can we predict this overhead?

C_tot

Page 6: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Analysis Approach:

CPU workload implied by a task or a group of tasks = the fraction of available processor cycles occupied over a time period

Increasing accuracy

Analytical model

Increasing clarity / simplicity

Concurrency model, e.g. Petri-net

Simulation model

Measurements on testbed

Measurements on actual system

Page 7: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Empirical Analysis

• Using OSSIE (Open Source SCA Implementation Embedded) [2] from VirginiaTech which uses omniORB [3]– Advantages: Low user-threshold, full source-code available,

Linux-based• Profiling and monitoring tools:

– OProfile [4]– SYSSTAT sar [5]

Experiment: Stanag 4285

Experiment: Synthetic waveform

OS Linux 2.6.9-34.EL Linux 2.6.9-34.EL OSSIE revision 0.6.0 0.6.2 Processor Pentium M 1.86GHz Pentium M 1.86GHz RAM 1,5 GByte 1,5 GByte Cache specifics L1i=first level instruction cache L1d=first level data cache L2=second level cache

L1i=32kB L1d=32kB L2=2MB

L1i=32kB L1d=32kB L2=2MB

Page 8: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Empirical Analysis, Simple Waveform Application

• Stanag 4285, TX part. Base code provided by Telefunken Racoms for RTO-IST-080 RTG-038

• Implemented as three different configurations, all performing the same processing functional work

• Non-SCA ‘c’ version as a reference

Data Sink

Stanag4285 TX

Float to fixed converter

Forwarder Forwarder Forwarder Forwarder

Data Source

FEC Encoder

Inter-leaver

Symbol Mapper & Scrambler

Symbol to I/Q & TX Filter

Data Sink

Data Source

FEC Encoder

Inter-leaver

Symbol Mapper & Scrambler

Symbol to I/Q & TX Filter

Float to fixed converter

Data Sink

2 comp.

7 comp.

11 comp.

Packet rate regulator

Page 9: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Empirical Analysis, Stanag 4285 TX: Results, User

0

5

10

15

20

25

30

2395 25600 64000

Symbol rate

Use

r C

PU

% Non-SCA

Single component+sink

6-components+sink

10-components + sink

WL measured by SYSSTAT sar (sar –u 40 5)

Page 10: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Empirical Analysis, Stanag 4285 TX: Results, User+System

0

5

10

15

20

25

30

35

40

45

2395 25600 64000

Symbol rate

Use

r +

Sys

tem

CP

U %

Non-SCA

Single component+sink

6-components+sink

10-components + sink

WL measured by SYSSTAT sar (sar –u 40 5)

Page 11: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Empirical Analysis, Synthetic Application

• A total of 9 FIR-filters, N taps and packet size B (NxB mult/adds per FIR)

• Both N and B can be varied

• 4 different configurations

• ‘c’ version as a reference

W2:

W3:

W5:

W11:

SRC F1TO9 SNK

FTOT SNK

SRC F123 F456 F789 SNK

F5 F6 F7 F8 F9

SRC F1 F2 F3 F4

SNK

Packet rate regulator

Page 12: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Synthetic Application: WL versus Configuration and N

WL results measured by SYSSTAT sar (sar –u 40 3)

CPU Workload versus ConfigurationB=2000, PR=40

0

5

10

15

20

25

30

35

40

45

50

FUNC N=10

W2 N=10

W3 N=10

W5 N=10

W11 N=10

FUNC N=50

W2 N=50

W3 N=50

W5 N=50

W11 N=50

Configuration

CP

U W

L [

%]

System

UserRatio, user: 1,67Ratio, user+syst: 1,94

Ratio, user: 1,10Ratio, user+syst: 1,17

Page 13: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Synthetic Application: WL versus Packet Size

CPU Workload versus Packet Size

0

5

10

15

20

25

30

35

0 10000 20000 30000 40000 50000 60000 70000

Packet Size

CP

U W

L [

%]

FUNC U

FUNC U+S

W3 U

W3 U+S

W5 U

W5 U+S

W11 U

W11 U+S

W2 U

W2 U+S

•Packet rate: 10/sec

•N and B selected such that ‘C’ implementation (FUNC) is at 10±0,3% user CPU WL

•WL overhead is seen to increase significantly with B

Page 14: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

The Simple Lower Bound Model (SLBM)

• Ideal, unrealistically optimistic model

• Serves as a lower bound

• ti = number of cycles per packet

100

)2()1()1(9%

PCR

tMtMtMtttPRWLi TFTSpacketSNKSRCCL

CN CN+1

TStTFt

packettCLt CLt

tSRC

tSNK tpacket tTS tCL tTF tpacket

tTS tpacket tCLtTF tTS

Idle

On timer strobe

Page 15: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Parameters in the Simple Lower Bound Model

CORBATSTSRC CORBATSTSINK

•For simplicity, we measure the parameters in the model with OProfile and/or SYSSTAT sar, using test applications:

for (i=0; i < BLSZ; i++)

......

‘c’-prog

SRCt SNKt

TSt(B) TFt

(B) packett(B)

sar estimate Assumed 0

Assumed 0

14,5*B 10,6*B 79200+5,5*B User

OProfile Estimate

650 200 14,6*B 10,6*B 80700+5,5*B

sar estimate Assumed 0

Assumed 0

≈ 0 ≈ 0 160000+23,6*B System

OProfile Estimate

≈ 0 ≈ 0 ≈ 0 ≈ 0 150000+23,0*B

CORBA test application:

Page 16: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Results, SLBM

• The simple model describes the dominating part of the user CPU overhead. Agreement best for small packet sizes

0 1 2 3 4 5 6

x 104

10

12

14

16

18

20

22

24

B

WL

Use

r [%

]

Comparison of W11 Measured Data and Simple Lower Bound Model

Packet SizeN and B selected such that ‘C’ implementation (FUNC) is at 10±0,3% user CPU WL

Measured

SLBM

M=11, Packet rate =10/sec

Page 17: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

The Context Switch Model (CSM)

100%%

PCR

ttCSRWLiWLcs CSICSD

Context Switch Rate [switches/second]

CS Direct Cost [cycles]

Cycle rate of the processor

CS Indirect Cost [cycles]Here: Using course estimate based on addressed space and memory speed

Here: ≈ 5µsec = 9300 cycles

Measured ≈ 1300 for example next page

Page 18: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Results, CSM

• With the CS model, we better explain the measured WL

0 1 2 3 4 5 6

x 104

10

12

14

16

18

20

22

24

B

WL

Use

r [%

]

Comparison of W11 Measured Data and CS Model with Approximate Parameters

Measured

CSM, only tCSD, course estimate

CSM example (course parameter estimates)

M=11, Packet rate =10/sec

N and B selected such that ‘C’ implementation (FUNC) is at 10±0,3% user CPU WL

Packet Size

Page 19: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Conclusions:

• We have used empirical analysis and simple analytical models to understand the effects of granularity in an SCA-based system w/CORBA capable processors

• When executing the same total functional processing work, we observe that the processor workload increases as the number of components increases

• This overhead increases with data packet size, and becomes more dominant the lesser the functional work per packet

• Contributors: Data conversions, packet communication through CORBA, direct cost of context switches, indirect cost of context switches

• Hence the scalability and reusability benefits that result from implementing the SDR-application with a high number of components, must be balanced against the processing efficiency loss that occurs when having to run several components on the same processor

• Two simple models are described that help explain the major effects, and may be used to calculate the overhead

Page 20: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

References:

[1] P. J. Fortier and H. E. Michel, Computer Systems Performance Evaluation and Prediction. Amsterdam: Digital Press, 2003.

[2] VirginiaTech, OSSIE development site for software-defined radio, http://ossie.wireless.vt.edu/trac as of Dec. 20 2007

[3] omniORB, http://omniorb.sourceforge.net/ as of Feb. 29 2008

[4] OProfile - A System Profiler for Linux, http://oprofile.sourceforge.net as of Feb. 29 2008

[5] SYSSTAT, http://pagesperso-orange.fr/sebastien.godard/ as of Feb. 29 2008

[6] Chuanpeng Li, Chen Ding, and Kai Shen, "Quantifying The Cost of Context Switch," in ExpCS '07: Proceedings of the 2007 workshop on Experimental computer science, San Diego, CA, 13-14 June 2007.

Page 21: On Workload in an SCA-based System, with Varying Component and Data Packet Sizes Tore Ulversøy 1 Jon Olavsson Neset 2 1 FFI 1 UNIK University Graduate

Questions?