razvan carbunescu, aditya devarakonda, jay alameda, … › documents › 527334 › 747011 ›...

27
July 15, 2014 Architecting an autograder for parallel code Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon, Susan Mehringer

Upload: others

Post on 04-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

July 15, 2014

Architecting an autograder for parallel code

Razvan Carbunescu, Aditya Devarakonda, Jay Alameda,

James Demmel, Steven I. Gordon, Susan Mehringer

Page 2: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Talk Outline

• Course that motivated autograder

• Autograder concepts and challenges

• Autograder implementation

• Course results

Page 3: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Talk Outline

• Course that motivated autograder

• Autograder concepts and challenges

• Autograder implementation

• Course results

Page 4: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

XSEDE Parallel Computing Course

• Created from UC Berkeley course CS267

• Lectures converted for online use (quizzes added)

• Programming assignments require autograder

• Course offered in 2013 for ‘Certificate of Completion’

• Course offered in 2014 for credit at 18 universities in the US and abroad with local instructors

Page 5: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Universities offering course for credit

Page 6: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Programming assignments

• HW1 - Optimizing Matrix Multiply

• HW2 - Parallel Particle Simulator

• HW3 – Parallel Knapsack

* Bottom picture taken from Wikipedia article on Knapsack

= + *C(i,j) A(i,:) B(:,j)C(i,j)

Page 7: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

HW 1 – Optimizing Matrix Multiply

• Naïve code 3 loops but

also only 3% arithmetic peak

• Students given naïve

and blocked code, must provide ‘efficient’ code

• Students learn about: memory access, caching, SIMD

and using libraries

Page 8: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

HW 2 – Parallel Particle Simulation

• Simplified particle simulator

• Introduces OpenMP, MPI and CUDA

• Students given working O(n2) code

and must provide O(n) code

• Students learn about: synchronization

,locks and domain decomposition

Page 9: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

HW 3 – Parallel Knapsack

• 0-1 Knapsack problem

• Introduces UPC

• Students given inefficient

parallel UPC code

• Students learn about: analyzing/minimizing communication, pipeline parallelism

Page 10: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Talk Outline

• Course that motivated autograder

• Autograder concepts and challenges

• Autograder implementation

• Course results

Page 11: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Autograder Concepts

• Testing Correctness

• Testing Performance

• Feedback / automation

• Resource management

Page 12: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Correctness

• What is the right answer? Does it exist?

–ε

–ε

???

Page 13: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Correctness

Problems introduced by parallelism

• Race conditions (non-benign)

• Deadlock / livelock / starvation

• Floating Point and non-determinism

Problems exacerbated by parallelism

• Output size compared to input (gathering, testing)

• Input type and size (precomputed vs random)

Page 14: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Performance

• What is a ‘fast’ or ‘good’ parallel code?

STRONG SCALING WEAK SCALING

Page 15: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Performance

• Sequential metrics: time, percentage of peak

• Strong scaling and speedup

• Weak scaling

• Input dependent performance

• Overhead of correctness check

• Overhead of I/O operations

Page 16: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Feedback / automation

• Providing fast correctness answer

• Providing performance data

• Submission/grade feedback

• Multiple submission capability

• Need for adaptability

Page 17: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Resource Management

• Allocation time vs scaling tests

• Latency due to utilization

• Student limits on allocation

Page 18: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Talk Outline

• Course that motivated autograder

• Autograder concepts and challenges

• Autograder implementation

• Course results

Page 19: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Autograder implementation

• Split into 2 parts:

autograder.cpp grade.py

Page 20: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Autograder.cpp

• Focuses on correctness and performance

• Given to students at start of assignment

• Parts integrated in assignment starting code

• Used other auxiliary files (job scripts, etc.)

• Instant feedback to student

• Limited scaling information

• Varies heavily from assignment to assignment

Page 21: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

HW1 Autograder Implementation

• Floating point round-off meant using error norm instead of equalities for correctness checks

• Performance was determined from percentage of peak floating point rate

• Students required to provide defined interface function square_dgemm with compilation options included as comments

Page 22: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

HW2 Autograder Implementation

• No previous correctness check except visual

• Implemented empirical statistic checks based on the average and minimum interaction distances for particles

• I/O and correctness turned off for performance runs

• Performance determined coefficient of O(nx) serial algorithm, average strong and weak scaling for 1-16 threads for OpenMP, MPI and from speedup for different problem sizes for CUDA

Page 23: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

HW3 Autograder Implementation

• Correctness was implemented via value check

• used average strong and weak scaling efficiency for 1-16 threads and 16-256 threads to check the 2 different stages of UPC (shared and distributed)

Page 24: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Grade.py

• Focuses on final runs and calculating grades

• Very easily modifiable

• Relatively little changes between assignments

• Uses a private copy of autograder.cpp for correctness/performance checks

• Not available to students

Page 25: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Talk Outline

• Course that motivated autograder

• Autograder concepts and challenges

• Autograder implementation

• Course results

Page 26: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Course results

• Universities used different grading schemes based on data from autograder

• High drop-off for undergraduate students (CS267 is a graduate course)

• Students worked individually or in groups of 2

• Most universities had HW3 marked as optional to allow for extra time for final projects

Page 27: Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, … › documents › 527334 › 747011 › ...Razvan Carbunescu, Aditya Devarakonda, Jay Alameda, James Demmel, Steven I. Gordon,

Homework results

• ~150 students started (includes audits)

• 75 HW1 submissions Max:94 Median:41

• 57 HW2 submissions Max:97 Median:30

• 17 HW3 submissions Max:10 Median:5

• 2013 had 345 students and 36/23/18 submissions with 18 ‘Certificate of Completions’

• From universities that finished and communicated data (4 out of 18) we have 38 starting students 25 that finished the course with 17A’s 4B’s 2C’s and rest auditing