new a gpu-accelerated 3d kinematic modeling platform for … · 2015. 3. 26. · • as a...

27
A GPU-Accelerated 3D Kinematic Modeling Platform for Behavioral Neuroscience John Long, PhD Buzsáki Laboratory Neuroscience Institute New York University Langone Medical Center 03.17.2015

Upload: others

Post on 22-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

  • A GPU-Accelerated 3D Kinematic

    Modeling Platform for Behavioral

    Neuroscience

    John Long, PhD

    Buzsáki Laboratory

    Neuroscience Institute

    New York University Langone Medical Center

    03.17.2015

  • A little about me…

    György Buzsáki

    Jose Carmena

  • …and my previous work.

    Venkatraman et al. 2009

    Long and Carmena 2013 Long and Carmena 2011

    Koralek et al. 2012

  • • As a neuroscientist, I find small form factor, massively parallel computing machines intriguing.

    • For ease of interface and visualization, I often program in Matlab or Python, and I suffer agonizing computational bottlenecks, which has led me to GPUs.

    • I’ve had a fair amount of success applying GPU computing to my scientific work.

    Why am I at a GPU conference?

  • • An introduction to the work I do in behavioral

    neuroscience that led me to GPU computing.

    • A detailed description of one of the CUDA

    programs I have implemented in the context of

    my research.

    • Throughout, I’ll mention a workflow I’ve

    found useful for porting CUDA code into

    Matlab and Python.

    What I have in store for you…

  • Who reads the maps in the brain?

    Lurilli et al. 2012 Geisler, Sirota, Zugaro, Robbe, Buzsaki, PNAS 2007

    Hippocampal “place” cells (O’Keefe and Nadel, 1978; O’Keefe and Recce, 1993)

    Sensory receptive fields (Hubel and Wiesel 1959)

  • The State of the Art in Behavioral Neuroscience

    More and more neural data!

  • The behaving rat…

    The State of the Art in Behavioral Neuroscience

  • Advances in Motion Capture

    Corazza et al. 2006

  • Environment Construction

    4

    2

    1

    6

    3

    5

  • Lines to cameras

    Line to Amplipex system

    Multiple Camera Synchronization

  • Image Segmentation

  • Svoboda et al. 2005

    3D to 2D perspective transformation

    Camera Calibration

  • Visual Hull Construction

    Visual Hull Algorithm modified from Forbes et al. 2006

  • Kinematic Model Design

  • Kinematic Model Manipulation

    Murray et al. 1994

  • Generate Candidate Poses

    Score Each Pose

    ni

    nj nj

    dij Mi

    Dj

    dij = ||Mi – Dj||2

    αij = dot(ni,nj)

    Compute Cost Components

    Update Posterior Estimate

    Kinematic Model Fitting

  • Generate Candidate Poses

    Score Each Pose

    ni

    nj nj

    dij Mi

    Dj

    dij = ||Mi – Dj||2

    αij = dot(ni,nj)

    Compute Cost Components

    Update Posterior Estimate

    Kinematic Model Fitting

  • Open Chain Kinematics

    P1 = t1a * t1b * t1c * t1d * t1e * P1;

    N1 = t1a * t1b * t1c * t1d * t1e * N1;

    N1 = N1-P1;

    P4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * P4;

    N4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * N4;

    N4 = N4-P4;

    P5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * P5;

    N5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * N5;

    N5 = N5-P5;

    P7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * P7;

    N7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * N7;

    N7 = N7-P7;

    “A mathematical introduction to robotic manipulation” by Murray, Li, and Sastry 1994

  • Open Chain Kinematics: On the GPU

    P1 = t1a * t1b * t1c * t1d * t1e * P1;

    N1 = t1a * t1b * t1c * t1d * t1e * N1;

    N1 = N1-P1;

    P4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * P4;

    N4 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * N4;

    N4 = N4-P4;

    P5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * P5;

    N5 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * N5;

    N5 = N5-P5;

    P7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * P7;

    N7 = t1a * t1b * t1c * t1d * t1e * t4a * t4b * t4c * t5a * t5b * t5c * t6a * t6b * t6c * t7a * t7b * t7c * N7;

    N7 = N7-P7;

    Exposing Parallelism

    //MATRIX REDUCTION: across temporary variables over twists float sum[2]; //1st reduction from 16, 4x4 matrices to 8, 4x4 matrices if(hWID < 8) { sum[0] = 0.0f; #pragma unroll for(int k = 0; k < 4; k++) { sum[0] += Stwists[4*(2*hWID) + y][k]* Stwists[4*(2*hWID+1) + k][x]; } Transtmp0[4*hWID + y][x] = sum[0]; }; __syncthreads();

    //Thread parameters unsigned int hWID = threadIdx.x / halfWarpSz; unsigned int hWoff = threadIdx.x % halfWarpSz; unsigned int x = hWoff % DimXY; unsigned int y = hWoff / DimXY;

    • All 4x4 transformation matrices ti can be

    computed in parallel.

    • There are many shared computational

    blocks.

  • Open Chain Kinematics: On the GPU: Results

    • x22.5 speedup relative to single Matlab

    process

    • x14.6 speedup relative to parallel Matlab

    process (6 CPUs)

    • Qualitative speedup allowed for parameter

    tuning resulting in an average 50% reduction

    in per frame model fit error i.e. better model

    fits!

    • Promising approach to open chain kinematic

    CUDA ported into Matlab via Mex

    Per

    fra

    me

    com

    pute

    tim

    e (s

    econds)

    Frame number

    Per

    fra

    me

    model

    fit

    err

    or

    (a.u

    .)

    Compute Time Comparison

    Model Fit Comparison

    single Matlab: mean = 12.6 sec

    parfor Matlab: mean = 8.2 sec

    CUDA in Matlab: mean = 0.55 sec

    CUDA where you need it errors prior to tuning

    errors after tuning

    • Qualitative speedups mean more efficient

    science.

    • Work where you need to and let user

    friendly languages like Matlab and Python do

    the rest.

  • Putting it all together

  • Promising Directions

    Berman et al. 2014

  • Wavelet Analysis

    1st principal component

    2nd principal component

    3rd principal component

    Time (seconds)

    Kinematic Modeling

    Behavioral Classification

    Parameterize

    Dynamics

    Cluster Embedded

    Dynamics

    T-SNE

    Map Dynamics

    Label Clusters

    rearing

    forward gaze

    tight scan

  • Conclusion

    • Fitting kinematic models to 3D visual hull data is greatly

    accelerated by GPUs.

    • The framework I’ve presented can be applied to open chain

    kinematics models in general.

    • In science, Big Data too often means sitting around waiting to

    find out you need to run your analysis again. GPUs are a game

    changer.

    • Interfaces like mex (Matlab) and ctypes (Python) allow you to

    tackle the hard parts and be lazy about the easy parts.

    – You can incrementally deal with bottlenecks of decreasing priority.

  • Acknowledgements

    György Buzsáki

    Antal Berenyi Andres Grosmark

    The entire Buzsáki lab

    Thank you!

  • Kinematic Model Design