computer science readings: reinforcement learning

44
Computer Science Readings: Reinforcement Learning Presentation by: Arif OZGELEN

Upload: damara

Post on 08-Jan-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Computer Science Readings: Reinforcement Learning. Presentation by: Arif OZGELEN. How do we perform visual search?. Look at usual places the item is likely to be. If item is small we tend to get closer to the area that we are searching in order to heighten our ability to detect. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computer Science Readings: Reinforcement Learning

Computer Science Readings: Reinforcement Learning

Presentation by:Arif OZGELEN

Page 2: Computer Science Readings: Reinforcement Learning

How do we perform visual search?

Look at usual places the item is likely to be.

If item is small we tend to get closer to the area that we are searching in order to heighten our ability to detect.

We look for certain properties of the target object which makes it distinguishable from the search space. e.g. color, shape, size, etc…

Page 3: Computer Science Readings: Reinforcement Learning

A Reinforcement Learning Model of Selective Visual AttentionACM 2001 Silviu Minut, Autonomous Agents Lab, Department of Computer Science, Michigan State University.

Sridhar Mahadevan, Autonomous Agents Lab, Department of Computer Science, Michigan State University.

Page 4: Computer Science Readings: Reinforcement Learning

The Problem of Visual Search

Goal:To find small objects in a large usually cluttered environment. e.g. a pen on a desk.

Preferrable to use wide-field of view images.Identifying small objects require high resolution imagesResults in very high dimensional input array.

Page 5: Computer Science Readings: Reinforcement Learning

Nature’s Method: Foveated Vision - I

Fovea: Anatomically defined as the central region of the retina with high density of receptive cells. Density of receptive cells decreases

exponentially from the fovea towards periphery.

Page 6: Computer Science Readings: Reinforcement Learning

Nature’s Method: Foveated Vision - II

Saccades: To make up for the loss of information incurred by the decrease in resolution in the periphery, eyes are re-oriented by rapid ballistic motions (up to 900°/s) called saccades.

Fixations: Periods between saccades during which the eyes remain relatively fixed, to process visual information and to select the next fixation point.

Page 7: Computer Science Readings: Reinforcement Learning

Foveated Vision: Eye Scan Patterns

Page 8: Computer Science Readings: Reinforcement Learning

Using Foveated Vision

Using foveal image processing reduces the dimension of the input data but in turn generates a sequential decision problem:

Choosing the next fixation point requires an efficient gaze control mechanism in order to direct the gaze to the most salient object.

Page 9: Computer Science Readings: Reinforcement Learning

Gaze Control- Salient FeaturesIn order to solve the problem of gaze control, next fixation point must be decided based on low resolution images which don’t appear in fovea. Saliency Map Theory (Koch and Ulmann)

Task independent bottom up model for visual attention.

Itti and Koch- Based on Saliency Map Theory 3 types of feature maps (color map, edge map, intensity map) are fused together to form saliency map.

Low resolution images alone are usually not sufficient for this decision problem.

Page 10: Computer Science Readings: Reinforcement Learning

Gaze Control- Control Mechanism Implementation

Implementation of a high level mechanism is required to control low level reactive attention. Tsotsos model – proposes selective tuning

of visual processing via a hierarchical winner takes all process.

Information should be integrated from one fixation to the next for a global understanding of the scene.

Model: top-down gaze control with bottom-up reactive saliency map processing based on RL.

Page 11: Computer Science Readings: Reinforcement Learning

Problem Definition and General Approach - I

Given an object and an environment: How to build a vision agent that learns

where the object is likely to be found. How to direct its gaze to the object.

Set of Landmarks {L0,L1,..,Ln} representing regions in the environment. A policy on this set directs the camera to the most probable region containing the target object.

Page 12: Computer Science Readings: Reinforcement Learning

Problem Definition and General Approach – II

The approach does not require high level feature detectors. Policy learned through RL is based on actual images seen by the camera. Once the direction has been selected the precise location of the next fixation point is determined by means of visual saliency.Camera takes low resolution/wide-field of view images at discrete time intervals. Using these low resolution images the system tries to recognize the target object using a low resolution template.

Page 13: Computer Science Readings: Reinforcement Learning

Problem Definition and General Approach – III

Since reasonable detection of a small sized object is difficult at low resolution, system tries to get candidate locations for the target object.

The foveated vision is simuated by zooming in and grabbing high resolution/ narrow field-of-view images centered at the candidate locations which are compared with a high resolution template of the target image.

Page 14: Computer Science Readings: Reinforcement Learning

Target Object and the Environment

Color template of the target object (left).Environment (bottom).

Page 15: Computer Science Readings: Reinforcement Learning

Reinforcement LearningThe agent may or may not know the priori the transition probabilities and the reward. In this case dynamic programming techniques could be used to compute an optimal policy.

Page 16: Computer Science Readings: Reinforcement Learning

Q-LearningIn the visual search problem, the transition probabilities and the reward are not known to the agent.A model free Q-learning algorithm used to find the optimal policies.

Page 17: Computer Science Readings: Reinforcement Learning

States – Objects in the Environment

Recorded scan patterns show that people fixate from object to object therefore it is natural to define the states as the objects in the environment.

Paradox: Objects must be recognized as worth attending to, before they are fixated on. However, an object cannot be recognized prior to the fixation, since it is perceived at low resolution.

Page 18: Computer Science Readings: Reinforcement Learning

States – Clusters of Images

States are defined as clusters of images representing the same region.Each image is represented with color histograms on a reduced number of bins (48 colors for the lab environment).Using histogram introduces perceptual aliasing as two different images have identical histograms. To reduce aliasing, histograms are computed distributedly across quadrants. Expected to reduce aliasing since natural environments are sufficiently rich.

Page 19: Computer Science Readings: Reinforcement Learning

Kullback Distance - I

Page 20: Computer Science Readings: Reinforcement Learning

Kullback Distance - II

Page 21: Computer Science Readings: Reinforcement Learning

Actions

Actions are defined as the saccades to the most salient point.

{A1,..,A8} to represent 8 directions. In addition A0 represents the most salient point in the whole image.

Page 22: Computer Science Readings: Reinforcement Learning

Reward

Agent receives positive reward for a saccade bringing the object in to the field of view.Agent receives negative reward if the object is not in the field of view after a saccade.

Page 23: Computer Science Readings: Reinforcement Learning

Within Fixation Processing

It is the stage when the eyes fixate on a point and the agent processes visual information and decides where the fixate next.Comprises computation of two components: A set of two feature maps implementing low

level visual attention, used to select the next fixation point.

A recognizer, used at low resolution for detection of candidate target objects and at high resolution for recognition of target.

Page 24: Computer Science Readings: Reinforcement Learning

Histogram Intersection

It is a method used to match two images, I (search image) and M (model). It is difficult to find a threshold between similar and dissimilar images in this method unless the model is pre-specified.

Page 25: Computer Science Readings: Reinforcement Learning

Histogram Back-projection

Given two images I and M, histogram back projection locates M in I. Color histograms hI and hM are computed on the same number of color bins.Operation requires one pass through I. For every pixel (x,y), B(x,y) = R(j) iff I(x,y) falls in bin j.Always finds candidates.

Page 26: Computer Science Readings: Reinforcement Learning

Histogram Back-Projection Example

Page 27: Computer Science Readings: Reinforcement Learning

Symmetry Operator

In order to fixate on objects a symmetry operator is used since most man-made objects have vertical, horizontal or radial symmetry.It computes an edge map first and then has each pair pi, pj of an edge pixels vote for its midpoint by (9).

Page 28: Computer Science Readings: Reinforcement Learning

Symmetry Map

Page 29: Computer Science Readings: Reinforcement Learning

Model Description - IEach low resolution image is processed by two main modules Top module (RL) learns a set of clusters

consisting of images with similar color histograms. Clusters represents physical regions and are used as states in the Q-learning method.

Second module consists of low-level visual routines. Its purpose is to compute color and symmetry maps for saliency and to recognize the target object at both low and high resolution.

Page 30: Computer Science Readings: Reinforcement Learning

Model Description - II

Each low resolution image is processed by two main modules Top module (RL) learns a set of clusters

Page 31: Computer Science Readings: Reinforcement Learning

Visual Search Agent Model

Page 32: Computer Science Readings: Reinforcement Learning

Algorithm - Initialization

Page 33: Computer Science Readings: Reinforcement Learning

Algorithm – If object found

Page 34: Computer Science Readings: Reinforcement Learning

Algorithm – If object not found

Page 35: Computer Science Readings: Reinforcement Learning

Results

The agent is trained to learn in which direction to direct its gaze in order to reach the region where the target object is most likely to be found, 400 epochs each. Epoch: a sequence of at most 100 fixations.

Every 5th epoch was used for testing where agent simply executed the learned policy.Performance metric was number of fixations.Within a single trial, starting point was the same in all test epochs.

Page 36: Computer Science Readings: Reinforcement Learning

Experimental Results - I

Page 37: Computer Science Readings: Reinforcement Learning

Experimental Results - II

Page 38: Computer Science Readings: Reinforcement Learning

Experimental Results - III

Page 39: Computer Science Readings: Reinforcement Learning

Sequence of Fixations

Page 40: Computer Science Readings: Reinforcement Learning

Experimental Results - IV

Page 41: Computer Science Readings: Reinforcement Learning

Experimental Results - V

Page 42: Computer Science Readings: Reinforcement Learning

Experimental Results - VI

Page 43: Computer Science Readings: Reinforcement Learning

Conclusion

Developed a model of selective attention for a visual search task which, is a combination of visual processing and control for attention.Control is achieved by means of RL over a low level, visual mechanism of selecting the next fixation.Color and symmetry are used for selection of next fixation and it is not necessary to combine them in a unique saliency map.The information is integrated from saccade to saccade

Page 44: Computer Science Readings: Reinforcement Learning

Future Work

Goal is to extend this approach to a mobile robot. Problem becomes more challenging as the position consequently the appearance of the object changes according to the robots position. Single template is not sufficient.In this paper it is assumed that the environment is rich in color so that perceptual aliasing would not be an issue. Extension to a mobile robot, will inevitably lead to learning in inherently perceptually aliased environments.