cnc for tuning hints on ocr - purdue engineering · pdf filecnc for tuning hints on ocr ......

Download CnC for Tuning Hints on OCR - Purdue Engineering · PDF fileCnC for Tuning Hints on OCR ... level programming model (CnC). 3 . OCR ... CnC / OCR Concept Mapping Concept OCR construct

If you can't read please download the document

Upload: trinhthu

Post on 08-Feb-2018

245 views

Category:

Documents


10 download

TRANSCRIPT

  • CnC for Tuning Hints on OCR

    Nick Vrvilo, Rice University

    The 7th Annual CnC Workshop

    September 8, 2015

  • Acknowledgements

    This work was done as part of my internship with the OCR team, part of Intel Federal, LLC at Jones Farm (Hillsboro, OR).

    Mentors (Intel): Josh Fryman and Romain Cledat

    Habanero Team (Rice): Vivek Sarkar, Kath Knobe, Zoran Budimli, and Sanjay Chatterjee

    2

  • Objective

    Demonstrate the effectiveness of OCR tuning hints by way of code generation from a higher-

    level programming model (CnC).

    3

  • OCR

    Tunings

    Objective

    CnC-OCR Scaffolding

    CnC App Code CnC

    Graph

    hints

    handler

    4

  • Open Community Runtime (OCR)*

    OCR project goals: Provide effective abstraction for diverse

    hardware Typify future task-based execution models Handle large-scale parallelism efficiently Maintain a separation of concerns

    (application/scheduling/resources) Open source (encourage collaboration)

    * OCR ==> X-Stack Traleika Glacier projects implementation

    5

  • Outline

    Introduction

    OCR Hints API

    CnC on OCR

    Tuning Hints Implementation and Analysis

    6

  • CnC / OCR Concept Mapping

    Concept OCR construct CnC construct

    Task classes (code) EDT template Step collection

    Task instance EDT Step instance

    Data classes

    All DBs have type void* (keeping track of individual DBs types is the app programmer's

    responsibility)

    Item collection

    Data instance Datablock Item instance

    Unique instance identifier GUID Tag (step tag / item key)

    Dependence registration Event add dependence Item get

    Dependence satisfaction Event satisfy Item put

    7

  • OCR Hints API: Example

    // Assume we have a template and a datablock

    ocrGuid_t edt;

    ocrEdtCreate(&edt, template, 0, NULL, 1, NULL,

    EDT_PROP_NONE, NULL_GUID, NULL);

    { // Set an OCR hint

    ocrHint_t stepHints;

    ocrHintInit(&stepHints, OCR_HINT_EDT_T);

    ocrGetHint(edt, &stepHints);

    ocrSetHintValue(&stepHints, OCR_HINT_EDT_PRIORITY, 100);

    ocrSetHint(edt, &stepHints);

    }

    ocrAddDependence(datablock, edt, 0, DB_DEFAULT_MODE);

    8

  • OCR Hints API:

    Pros Generic

    Conceptually decoupled

    Light-weight

    Cons Verbose

    Placed in app source code

    Limited expressiveness

    9 9

  • Outline

    Introduction

    OCR Hints API

    CnC on OCR

    Tuning Hints Implementation and Analysis

    10

  • CnC-OCR Developer Workflow

    Write graph spec

    Run translator tool (produces skeleton project)

    Flesh-out skeleton code

    Run program (functionality check)

    debug

    Write tuning spec(s)

    Re-run translator tool (updates scaffolding code)

    Re-run program (performance check)

    fine-tuning

    11

  • OCR

    Tunings

    CnC-OCR + Tuning

    CnC-OCR Scaffolding

    CnC App Code CnC

    Graph

    hints

    handler

    12

  • Separation of Concerns in CnC

    Graph specification can be written without implementation details

    Step function implementations written without knowledge of the external graph (only its own inputs and outputs)

    Tuning specification given in a separate file Easy to mix-in different tunings for performance

    testing Try combinations of tunings until you find the

    ideal configuration

    13

  • Outline

    Introduction

    OCR Hints API

    CnC on OCR

    Tuning Hints Implementation and Analysis

    14

  • Tuning Hints Overview

    1. Step / item distribution

    2. Step affinity with input

    3. Step priority

    4. Scheduler throttling

    5. Partial item requests

    15

  • Hint #1: Step / Item Distribution Functions

    What? Declare a function for mapping individual step / item instances from a collection onto the set of OCR policy domains.

    Why?

    Distributed OCR currently lacks advanced schedule/placement heuristics.

    Need control of distribution for a reasonable baseline.

    16

  • Smith-Waterman Sequence Alignment

    Each input sequence length ~200k

    Dynamic programming optimization on ~40-billion cell matrix

    Tiles of 177x153 cells

    Total of 1138x1322 tiles

    17

  • Smith-Waterman Specification

    Graph Specification

    [ int above[] : i, j ];

    [ int left[] : i, j ];

    [ SeqData *data : () ];

    ( swStep: i, j )

    0),

    [ left: i, j ] $when(j > 0)

    -> [ below @ above: i+1, j ],

    [ right @ left: i, j+1 ],

    ( swStep: i+i, j ) $when(i+1 < #nth);

    Tuning Specification

    [ above ]: {

    distfn: (i / 16) % $RANKS

    };

    [ left ]: {

    distfn: (i / 16) % $RANKS

    };

    ( swStep ): {

    distfn: (i / 16) % $RANKS

    };

    18 18

  • Smith-Waterman Sequence Alignment

    Each input sequence length ~200k

    Dynamic programming optimization on ~40-billion cell matrix

    Tiles of 177x153 cells

    Total of 1138x1322 tiles

    Default: CnC default distribution

    Row-block: Rows in blocks of 16

    10 runs per configuration

    19

    0

    10

    20

    30

    40

    50

    1 2 4 8

    Ave

    rage

    Exe

    cuti

    on

    Tim

    e (

    seco

    nd

    s)

    Node Count

    CnC-OCR Default CnC-OCR Row-Block

    iCnC Row-Block

    115.40 141.49

  • Hint #2: Step Affinity with Input Item

    What? Declare that a step instance be affinitized with one of its input items.

    Why? OCR can use this affinity to improve scheduling

    heuristics.

    More expressive way to specify tunings like hint #1.

    20

  • Smith-Waterman Specification

    Graph Specification

    [ int above[] : i, j ];

    [ int left[] : i, j ];

    [ SeqData *data : () ];

    ( swStep: i, j )

    0),

    [ left: i, j ] $when(j > 0)

    -> [ below @ above: i+1, j ],

    [ right @ left: i, j+1 ],

    ( swStep: i+i, j ) $when(i+1 < #nth);

    Tuning Specification

    [ above ]: {

    distfn: (i / 16) % $RANKS

    };

    [ left ]: {

    distfn: (i / 16) % $RANKS

    };

    ( swStep ): {

    placeWith: above

    };

    21 21

  • Hint #3: Step Priority Weights

    What? Express a priority weight for a given CnC step, such that steps with heavier weights should execute earlier.

    Why? Search problems: prioritize paths likely to find the

    answer sooner

    Enable concurrency: prefer task with high-demand output (many consumers)

    22

  • N-Queens Puzzle

    Board size: 13x13

    Solutions possible: 73,312

    23

  • N-Queens Specification

    Graph: [ u64 solutions[4]: i ];

    ( placeQueen: row, board )

    -> ( placeQueen: row+1, board_prime ),

    [ solutions: ? ];

    Tuning: ( placeQueen /* row, board */ ): {

    priority: row

    };

    24

  • Implementation of Step Priority Weights

    Description Default Scheduler

    Priority Scheduler

    Location

    Base data structure deque bin-heap utils/

    Scheduler interface wrapper

    deque bin-heap scheduler-object/

    Scheduler (aggregate) root object

    wst pr-wsh scheduler-object/

    Scheduler heuristic behavior

    hc priority scheduler-heuristic/

    25

  • N-Queens Puzzle

    Board size: 13x13

    Solutions possible: 73,312

    Solutions sought: 5,000

    DEQ: Default work-stealing deque

    DFS: Prioritize deep rows

    BFS: Prioritize shallow rows

    50 runs per configuration

    0

    1

    2

    3

    4

    DEQ DFS BFS

    Ave

    rage

    exe

    cuti

    on

    tim

    e (

    seco

    nd

    s)

    26

  • Hint #4: Stoker Step (Scheduler Throttling)

    What? Annotate the work-creating steps (which we call stokers) so that the runtime can differentiate them from non-work-creating steps (which we call quenchers).

    Why? If the scheduler has plenty of work to do, we can throttle

    by not running any more stoker steps for the time being. For work stealing, we can prioritized stoker-steps for

    stealing, mitigates the need for more stealing in the near-term.

    27

  • Task-Bomb (Synthetic Example)

    Root step creates Z=32 stoker steps

    Each stoker creates

    Y=100 quencher tasks

    One stoker task

    Recursion creates X=200 levels

    Since the stoker is always created last, we would expect all of the stokers to run in a depth-first manner when using the standard work-stealing deque scheduler

    $initialize

    stoker(0,0)

    quencher(0,0,0)

    quencher(0,0,Y)

    stoker(0,1)

    quencher(0,1,0)

    quencher(0,1,Y)

    stoker(0,2)

    stoker(Z,0)

    quencher(Z,0,0)

    quencher(Z,0,Y)

    stoker(Z,1)

    28

  • Task-Bomb CnC Graph Spec

    [ void *done: () ];

    ( stoker: i, j )

    -> ( quencher: i, j, $rangeTo(Y) ),

    ( stoker: i, j+1 ) $when(j [ done: () ] $when(i==0 && j==X && k==Y);

    ( $initialize: () ) -> ( stoker: $range(Z),