computer vision systems for the blind and visually disabled. stats 19 sem 2. 263057202. talk 3. alan...

of 24 /24
Computer Vision Systems for Computer Vision Systems for the Blind and the Blind and Visually Visually Disabled Disabled . . STATS 19 SEM 2. 263057202. STATS 19 SEM 2. 263057202. Talk 3. Talk 3. Alan Yuille. Alan Yuille. UCLA. Dept. Statistics and UCLA. Dept. Statistics and Psychology. Psychology. www.stat.ucla/~yuille www.stat.ucla/~yuille

Author: brenda-mccormick

Post on 23-Dec-2015

215 views

Category:

Documents


0 download

Embed Size (px)

TRANSCRIPT

  • Slide 1
  • Computer Vision Systems for the Blind and Visually Disabled. STATS 19 SEM 2. 263057202. Talk 3. Alan Yuille. UCLA. Dept. Statistics and Psychology. www.stat.ucla/~yuille
  • Slide 2
  • Computer Vision Systems Digital Camera + Portable Computer + Digital Camera + Portable Computer + Speech Synthesizer. (I) Input image from camera. (I) Input image from camera. (II) Algorithm on PC searches the image to detect and read text. (II) Algorithm on PC searches the image to detect and read text. (III) Speech Synthesizer speaks the text. (III) Speech Synthesizer speaks the text.
  • Slide 3
  • LED Reader LED/LCD displays are very common. But LED/LCD displays are very common. But impossible for the Blind to use. Controlled domain. Design system to detect and read the displays. Controlled domain. Design system to detect and read the displays.
  • Slide 4
  • LED Reader. Prototype System. (1999). Prototype System. (1999). Subjects using the LED Reader. Subjects using the LED Reader. Implementation using special purpose hardware being built. Implementation using special purpose hardware being built.
  • Slide 5
  • Blind Volunteer with Camera Blind volunteers take photographs. Still digital camera, or video camera. Blind volunteers take photographs. Still digital camera, or video camera. Automatic camera settings. Gain control. Automatic camera settings. Gain control. Dynamic range of the eye is far larger than the range of a camera. Dynamic range of the eye is far larger than the range of a camera.
  • Slide 6
  • Gain Control: Digital Cameras Limitation due to the quality of the input images. Limitation due to the quality of the input images. Blind users cannot point camera, focus, adjust camera gain, or keep the camera steady. Blind users cannot point camera, focus, adjust camera gain, or keep the camera steady. Enormous variation in the intensity in natural images: range 10,000,000, Enormous variation in the intensity in natural images: range 10,000,000, camera range is 100. camera range is 100.
  • Slide 7
  • Biologically Inspired Cameras. Ideal: cameras with the ability of the Ideal: cameras with the ability of the human retina: (I) Large gain control (from 100 to 100,000,000). (II) More than 30 frames/second (to decrease motion blur). Companies are designing cameras with these abilities. (Carver Mead). Companies are designing cameras with these abilities. (Carver Mead).
  • Slide 8
  • Images taken by the Blind Top two rows are Images taken by blind volunteers. Bottom two rows are images by Scientists. Scientists better at orienting the camera and Centering text.
  • Slide 9
  • Experiments with Blind Volunteers Experiments with Blind Volunteers. In San Francisco. Experiments with Blind Volunteers. In San Francisco. Experiments showed: Experiments showed: 1. Blind volunteers could keep the camera approximately horizontal. 2. They could hold it steady so there is little motion blur. 3. Automatic gain control was usually sufficient to give good quality images.
  • Slide 10
  • Visual Search to Detect Text. The human visual system has mechanisms for directing interesting parts of images. The human visual system has mechanisms for directing interesting parts of images. Known as Visual Attention. Known as Visual Attention. Visual attention causes eye movements and directs gaze. Visual attention causes eye movements and directs gaze. We need a form of visual attention to detect text. We need a form of visual attention to detect text. This must be fast. We want to quickly reject non-text areas of the image. This must be fast. We want to quickly reject non-text areas of the image.
  • Slide 11
  • Strategy I: Twenty Questions. Divide the image up into many small windows. Divide the image up into many small windows. Apply filter tests to each window. Apply filter tests to each window. If the window fails the test, then eliminate it. If the window fails the test, then eliminate it. If it passes, then proceed to the next test. If it passes, then proceed to the next test. Apply tests until there are only a few (1-5) windows in the image which pass all tests. Apply tests until there are only a few (1-5) windows in the image which pass all tests.
  • Slide 12
  • Strategy II: Test Selection. Choose a vocabulary of tests. E.g. average image brightness, local image variability. Choose a vocabulary of tests. E.g. average image brightness, local image variability. Use a Machine Learning algorithm AdaBoost to select and combine tests. Use a Machine Learning algorithm AdaBoost to select and combine tests. Requires a training dataset of text and non-text. (Learning with a teacher). Requires a training dataset of text and non-text. (Learning with a teacher). AdaBoost combines weak tests into a strong test. AdaBoost combines weak tests into a strong test.
  • Slide 13
  • AdaBoost Example: Face Detection. AdaBoost was AdaBoost was used in Computer Vision to detect faces. Best test: Best test: Forehead brighter than eyes.
  • Slide 14
  • Example Sequence I: Series of tests, selected by AdaBoost. Series of tests, selected by AdaBoost.
  • Slide 15
  • Example II.
  • Slide 16
  • Results of AdaBoost. Strong Performance: Very High Detection Rate.
  • Slide 17
  • Failures of AdaBoost. AdaBoost fails to detect some text. AdaBoost fails to detect some text.
  • Slide 18
  • Next Stage: Binarization. AdaBoost detects regions of text in windows of the image. AdaBoost detects regions of text in windows of the image. Apply a binarization algorithm. Label the points within the window as letters/digits or as background. Apply a binarization algorithm. Label the points within the window as letters/digits or as background. Extend the binarization to areas outside the window to include letters/digits that are just outside the window. Extend the binarization to areas outside the window to include letters/digits that are just outside the window.
  • Slide 19
  • Results of Binarization.
  • Slide 20
  • Optical Character Recognition (OCR) OCR has been developed for reading text on documents. OCR has been developed for reading text on documents. Black and white images. High resolution. Black and white images. High resolution. We apply it to the binarized output of AdaBoost. We apply it to the binarized output of AdaBoost. OCR will read the text and reject regions which are not-text. OCR will read the text and reject regions which are not-text.
  • Slide 21
  • Text detected by AdaBoost, Binarized, and read by OCR.
  • Slide 22
  • Text detected, but not read. Non-text detected, rejected by OCR. Non-text detected, read by OCR.
  • Slide 23
  • Performance Can detect text within our dataset (San Francisco) with false negative rate of 2.8%. Can detect text within our dataset (San Francisco) with false negative rate of 2.8%. We can read the detected text correctly at 93.0%. We can read the detected text correctly at 93.0%. Read detected non-text as text at 1.0%. Read detected non-text as text at 1.0%. Prototype System: room for improvement. Prototype System: room for improvement.
  • Slide 24
  • Summary It will soon be practical to build Computer It will soon be practical to build Computer Vision systems for text detection and reading that work in unconstrained Vision systems for text detection and reading that work in unconstrained domains. domains.