transformative reality poster @ ismar 2011

Upload: waili8

Post on 06-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Transformative Reality poster @ ISMAR 2011

    1/1

    Transformative RealityAugmented Reality for Visual Prostheses

    Research funded by Australian Research Council: Research in Bionic Vision Science and Technology Initiative (SRI 1000006)

    Visual prostheses such as retinal and cortical implants apply electrical stimulation to

    the visual pathway using an electrode array to generate a 2D pattern ofphosphenes

    (visual percepts) similar to a low resolution pattern of dots. The spatial and intensity

    resolution of the phosphene patterns produced by a visual prosthesis is constrained

    by biology, technology and safety. Next generation prostheses, such as the cortical

    implant being developed by Monash Vision Group, are expected to allow patients to

    perceive bionic vision similar to a 25 x 25 dot pattern of on-offbinary phosphenes.

    What does the world look like in 625 bits? How can we make it more useful?

    Visual prostheses have limited resolution Traditional bionic vision

    Cortical implantselectrically stimulates

    the Primary VisualCortex (V1)

    Retinal implantselectrically stimulatescells in the Retina, suchas ganglion bodies

    Visual scene

    TransformativeReality

    The Transformative Reality (TR) concept improves the saliency of visual information provided through low

    resolution visual displays such as the bionic vision induced by visual prostheses. Transformative Reality worksby performing real time transformations of visual and/or non-visual sensor data into multiple user-selectable

    modes of symbolic representations of the world that are then visually rendered in low resolution.

    Depth imageof visual scenesensed using range camera

    Transformative Realityrendering ofStructural Edges

    Wen Lik Dennis Lui, Damien Browne, Lindsay Kleeman, Tom Drummond, Wai Ho Li*ECSE and Monash Vision Group, Monash University, Australia * Email: [email protected]

    Traditionally, images from a head-

    worn camera are converted into

    bionic vision by down sampling

    and binary thresholding. This

    simple approach truncates salient

    information, as can be seen in the

    example on the right.

    Where did the objects go?

    Simple downsample

    For example, TR transforms

    the tabletop scene above into a

    rendering of structural edges by

    depth sensing. Patch-by-patch

    PCA detects depth discontinuities

    and crease edges that are then

    rendered as lit phopshenes.

    User trials show better object

    detection and localisation.

    Patch-by-patchPCA

    The empty ground TR mode is

    designed for indoor navigation.

    Depth sensing allows operation incluttered environments with low

    visual contrast; factors that trouble

    traditional bionic vision. Notice

    how the office chair disappears

    in traditional bionic vision but

    remain distinct in this TR mode.

    The empty ground TR algorithm

    operates by generating ground

    plane estimates from depth images

    using RANSAC. Gravity is sensed

    using a 3-axis accelerometer,

    which allows real time operation

    by dynamically restricting the

    ground plane search space.

    User trials show significantly

    improved indoor navigation.

    Simple downsample

    RANSAC ground planedetection using depthimage and 3-axisaccelerometer

    Transformative Reality

    rendering ofEmpty Ground

    Traditional bionic visionlacks visual representationof navigational clearance

    Traditional bionic visiontruncates object locations

    Depth image showing

    RANSAC inliers in red

    Simple downsample

    Face detection usingvisual camera andbody segmentationusing depth camera

    Depth image showingbodysegments in red

    Transformative Realityrendering ofPeople

    Traditional bionic visionmakes it difficult to detect

    people within a scene

    The people detection TR mode

    is designed for interactions with

    people. A colour camera detects

    frontal human faces using the

    Viola-Jones algorithm. Detected

    faces are represented using a

    symbolic avatar designed for low

    resolution bionic vision.

    A persons body is found by

    searching below a detected face

    for a contiguous segment in

    the depth image, which is then

    symbolically rendered as a filled

    region in the TR output.

    This TR mode received the most positive feedback during user

    trials but was also the most

    difficult to use in practice.

    TR Prototype and User Trials

    User trials were conducted

    using a head mounted display

    (HMD) augmented with a

    Microsoft Kinect, which

    provides sensor data (colour

    images, depth images and

    gravity readings) in a portable

    form factor. TR algorithms

    were implemented in C++ and

    runs in real time on a standard

    PC with negligible latency. The

    user is immersed in a mobilevisualisation of bionic vision

    while being able to physically

    interact with the environment

    and walk around freely.

    User trial results suggest that TR provides practical and significant improvements

    over traditional bionic vision for indoor navigation, object localisation and people

    detection. There appears to be a learning effect where user performance improves

    steadily over the first 10 to 15 minutes. Future work inc ludes new TR modes such as

    gesture recognition as well as improved visualisations of bionic vision based directly

    on models of cortical stimulation. Psychophysics trials conducted in collaboration

    with medical researchers and Vision Australia are planned for the near future.

    Video ofexampleuser trial

    Visual prostheses work byinjecting electrical signals

    pass damaged areas alongthe visual pathway

    System diagram of TR prototype

    TR Prototype

    Video of Transformative Reality (TR)showing real time demonstrations ofall three TR modes described below

    Structural Edges

    Empty Ground

    People Detection

    Render RANSACinliers as litphosphenes

    Render faceavatar and body

    segment