workload characterization and performance assessment of yellowstone using xdmod and exploratory data...

29
Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University at Buffalo Mentor: Tom Engel, NCAR Co-Mentors: Shawn Strande, Dave Hart, NCAR

Upload: lesley-chase

Post on 18-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

Workload Characterization and Performance Assessment of Yellowstone using XDMoD

and Exploratory data analysis (EDA)

1 August 2014

Ying Yang, SUNY, University at BuffaloMentor: Tom Engel, NCAR

Co-Mentors: Shawn Strande, Dave Hart, NCAR

Page 2: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

2

Big Picture

• Background• XDMoD and Yellowstone Job Data• Enhancement of XDMoD for

Yellowstone• Additional Analyses of Yellowstone Job

Data• Summary & Future Work

Page 3: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

3

Big Picture

• Background• XDMoD and Yellowstone Job Data• Enhancement of XDMoD for

Yellowstone• Additional Analyses of Yellowstone Job

Data• Summary & Future Work

Page 4: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

4

Background

• What is XDMoD?

Open XDMoD is an open source tool designed to audit and facilitate the utilization of supercomputers by providing a wide range of metrics on resources, including resource utilization, resource performance, and impact on scholarship and research.

XDMoD is an acronym for "XSEDE Metrics on Demand” developed by the University of Buffalo for NSF's XSEDE under NSF grant OCI 1025159 

Page 5: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

5

Background• XDMoD Architecture Details

Page 6: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

6

Big Picture

• Background• XDMoD and Yellowstone Job Data• Enhancement of XDMoD for

Yellowstone• Additional Analyses of Yellowstone Job

Data• Summary & Future Work

Page 7: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

7

XDMoD and Yellowstone Job Data• XDMoD runs on a dedicated server at NWSC, and

that software was installed and configured by the SSG group

• Collaborated with CISL and SUNY at Buffalo developers to test a new shredder for ingesting LSF job termination accounting records.

• Shredded and ingested all of the LSF accounting data from Yellowstone, Geyser, and Caldera (November 2012 to the present) into open XDMoD. Total 7111011 job records are shredded. 6810231 jobs are ingested.

Page 8: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

8

XDMoD and Yellowstone Job Data

Page 9: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

9

XDMoD and Yellowstone Job DataLSF

Page 10: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

10

XDMoD and Yellowstone Job DataLSF

YellowstoneShredded Data

YellowstoneIngested

DataSuperMoD REST Service API

Page 11: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

11

XDMoD and Yellowstone Job Data• XDMoD’s Summary tab

Page 12: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

12

XDMoD and Yellowstone Job Data• XDMoD’s Metric Explorer (CPU time group by user)

Page 13: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

13

Big Picture

• Background• XDMoD and Yellowstone Job Data• Enhancement of XDMoD for

Yellowstone• Additional Analyses of Yellowstone Job

Data• Summary & Future Work

Page 14: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

14

Enhancement of XDMoD for Yellowstone

• Two new metrics (1)

Job Size: Weighted By Core Hours (Core Count): The average NCAR job size weighted by Core hours.

Defined as: sum(i = 0 to n){job i core count*job i core hours consumed }/sum(i = 0 to n){job i core hours consumed}.

Page 15: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

15

Enhancement of XDMoD for Yellowstone

• XDMoD’s Average Job Size

Page 16: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

16

Enhancement of XDMoD for Yellowstone

• Sophie’s Job Size Weighted By Core Hours (Core Count)

Page 17: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

17

Enhancement of XDMoD for Yellowstone

• Two new metrics (2)

Yellowstone %Scheduled: The percentage of resources scheduled to be utilized by jobs running on Yellowstone.Yellowstone Scheduled Utilization: The ratio of the total scheduled CPU hours to Yellowstone jobs over a given time period divided by the total CPU hours that the system could have potentially provided during that period.

Page 18: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

18

Enhancement of XDMoD for Yellowstone

• Yellowstone %Scheduled: (by job size)

Many 144-node (only 1 core per node) jobs are running.

Page 19: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

19

Big Picture

• Background• XDMoD and Yellowstone Job Data• Enhancement of XDMoD for

Yellowstone• Additional Analyses of Yellowstone Job

Data• Summary & Future Work

Page 20: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

20

Additional Analyses of Yellowstone Job Data

Exploratory data analysis with ingested data using R

Question: What is the average job size and how has it varied over time?

Methods:• Forecasting Using Exponential Smoothing• Forecasting Using ARIMA Model• Multiple Linear Regression• K-Nearest Neighbor

Experiments and Results

Page 21: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

21

Additional Analyses of Yellowstone Job Data

Methods:Exponential Smoothinga) Simple Exponential SmoothingAn additive model with constant level and no seasonalityb) Holt’s Exponential SmoothingAn additive model with increasing or decreasing trend and no seasonalityc) Holt-Winters Exponential SmoothingAn additive model with increasing or decreasing trend and seasonality

Page 22: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

22

Additional Analyses of Yellowstone Job Data

Methods:ARIMA ModelAutoregressive Integrated Moving Average (ARIMA) models include an explicit statistical model for the irregular component of a time series, that allows for non-zero autocorrelations in the irregular component.

Building the Model:Step1: Differencing a Time Series (diff() function)Step2:Selecting a Candidate ARIMA Model(acf(),pacf() function)Step3:Forecasting Using an ARIMA Model

Page 23: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

23

Additional Analyses of Yellowstone Job Data Experiments • Naive method• Mean method• Drift method (week and month)• Simple Exponential Smoothing (SES)• Holt’s Exponential Smoothing (HES)• Holt-Winters Exponential Smoothing (HWES)• ARIMA Model• Multiple Linear Regression• K-Nearest Neighbor

Descriptions:• Data: data in 2013, total days:364. Day 1-308 as training data, day

309-364 as testing data.• Prediction error: the percentage that the difference of predicted value

and true value taking of the true value.• Naive, Mean and Drift methods serve as performance comparisons.• ES methods are predicted using all days before the predicting day.

(e.g. day 1-100 predict 101, day 1-101 predict 102,..)

Page 24: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

24

Additional Analyses of Yellowstone Job Data Experiment Results

Page 25: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

25

Big Picture

• Background• XDMoD and Yellowstone Job Data• Enhancement of XDMoD for

Yellowstone • Additional Analyses of Yellowstone Job

Data• Summary & Future Work

Page 26: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

26

Summary & Future WorkSummary: • Ingested all Yellostone accounting data into

XDMoD(November 2012-Present)• Developed two new metrics for Yellowstone and

contribute back to open source• Exploratory data analysis using R

Future Work:• Enhancement of XDMoD • Further data analysis on Yellowstone data• Integrate EDA into XDMoD

Page 27: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

27

AcknowledgementsHSS and USS:Tom Engel HSS (Mentor)Shawn Strande HSS (Co-Mentor)Dave Hart USS (Co-Mentor)Davide Del Vento CSGPamela Gillman DASGErich Thanhardt MSSGIrfan Elahi SCSG

IMAGe:Doug Nychka IMAGe

Page 28: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

28

Page 29: Workload Characterization and Performance Assessment of Yellowstone using XDMoD and Exploratory data analysis (EDA) 1 August 2014 Ying Yang, SUNY, University

29

Ying [email protected]