uc berkeley cs294-9 fall 20004- 1 document image analysis lecture 4: image transformations richard...
TRANSCRIPT
![Page 1: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/1.jpg)
UC Berkeley CS294-9 Fall 2000 4- 1
Document Image AnalysisLecture 4: Image Transformations
Richard J. FatemanHenry S. Baird
University of California – BerkeleyXerox Palo Alto Research Center
![Page 2: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/2.jpg)
UC Berkeley CS294-9 Fall 2000 4- 2
The course so far….• Reminder: All course materials are online:
http://www-inst.eecs.berkeley.edu/~cs294-9/
• Overview of the DIA Research Field
• Some applications (Postal Addresses, Checks):
• Research Objectives: more systematic
modeling, design
• Some basic engineering
![Page 3: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/3.jpg)
UC Berkeley CS294-9 Fall 2000 4- 3
Some disclaimers: we are not experts
• contrast w/ computer vision, psychophysical image processing
• contrast w/ Gestalt theory, human reading, psychophysics of reading
![Page 4: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/4.jpg)
UC Berkeley CS294-9 Fall 2000 4- 4
Do we attempt to emulate humans by programming? (Ha & Bunke paper)
• Image acquisition
• Image transformation
• Image segmentation
• feature extraction
• No, but we reach for similar goals
![Page 5: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/5.jpg)
UC Berkeley CS294-9 Fall 2000 4- 5
Psychophysical questions
• Biological, especially human, vision represents the existence proof of algorithms that solve our problems
• How do brains learn to see/ connect to visual system? (the wiring is not encoded in genes): Self organization seems key.
![Page 6: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/6.jpg)
UC Berkeley CS294-9 Fall 2000 4- 6
Psychophysical Reading
• How fast can one read?• What about comprehension (typically,
above 200wpm comprehension declines)• What do we read? words not letters.• How is reading disability related with
processing (e.g. dyslexia)• What if anything has this to do with DIA?
![Page 7: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/7.jpg)
UC Berkeley CS294-9 Fall 2000 4- 7
Computer Vision: different emphasis from DIA
• See for example, David Forsyth’s Computer Vision text
• recognition of objects, scenes, faces, patterns, visual memory; attention; and visual (and cognitive) pleasure
• change, motion, relationship to motor activities.
![Page 8: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/8.jpg)
UC Berkeley CS294-9 Fall 2000 4- 8
Computer Vision: relations
• Solving CV would solve DIA• Solving DIA (more likely in some senses)
might serve as a paradigm for CV. At least if we did it in some respectable fashion.
• Actually recent activity in Speech Understanding seems to be relevant to DIA...
![Page 9: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/9.jpg)
UC Berkeley CS294-9 Fall 2000 4- 9
Gestalt Theory I
• Fundamentally, the issue is one of understanding invariance:
• How can an object, say a square or a triangle, can be recognized regardless of its
• rotation,
• translation
• scale
• contrast
• outline or solid rendering
• texture, motion…
![Page 10: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/10.jpg)
UC Berkeley CS294-9 Fall 2000 4- 10
![Page 11: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/11.jpg)
UC Berkeley CS294-9 Fall 2000 4- 11
Gestalt Theory II
• Biological vision handles these easily.• This suggests that invariance is
fundamental to our visual representation. • E.g. In the case of rotation invariance,
perhaps we separately perceive/encode:– structure – orientation
![Page 12: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/12.jpg)
UC Berkeley CS294-9 Fall 2000 4- 12
Gestalt Theory III• We keep track of objects when we turn our
head or walk• Translation and rotational constancy of the
perceived world vs. what received• Whatever the computational mechanism it has
to account for these issues (+ and -)• A A A a a a• Context: Univ. of Illinois, Chapter III, 3.l4l59
• durnptruck
![Page 13: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/13.jpg)
UC Berkeley CS294-9 Fall 2000 4- 13
Examples of post-acquisition image analysis
• Preparation for OCR• Not symbol- or character- based• (We acknowledge that this is
feedforward, and not optimal, but so it goes.)
![Page 14: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/14.jpg)
UC Berkeley CS294-9 Fall 2000 4- 14
What can we do?
• Transform the image by local morphological computation
• Look for more global attributes (e.g. texture and FFT)
• If possible, do transformations on compressed form.
![Page 15: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/15.jpg)
UC Berkeley CS294-9 Fall 2000 4- 15
Can we find some tools
• Finding connected components• Boundaries• Morphological transforms• Thinning or “Skeletonization”• (gray-scale) contour following• Edge encoding/ vectorization• Recursive X-Y cuts
![Page 16: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/16.jpg)
UC Berkeley CS294-9 Fall 2000 4- 16
e.g. Removing rotation (skew)• Some excellent methods (e.g. HSB)• Humans notice skew of even a fraction
of a degree; it doesn’t inhibit our reading but it DOES make trouble for OCR.
• Removing skew approximately:
![Page 17: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/17.jpg)
UC Berkeley CS294-9 Fall 2000 4- 17
Deskewing / matrix transform
True rotation
Again, at 90 degreesSide slip
![Page 18: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/18.jpg)
UC Berkeley CS294-9 Fall 2000 4- 18
Remove noise (many models)
• More later (HSB)• A few for now
– Salt & Pepper– Too much ink (blurred, touching)– Too little ink (broken characters)
![Page 19: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/19.jpg)
UC Berkeley CS294-9 Fall 2000 4- 19
Removing slant from characters
• Mask out horizontal lines (optional)• Look for “best slant”• ABCDEFGHIJKLMN• ABCDEFGHIJKLMN
![Page 20: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/20.jpg)
UC Berkeley CS294-9 Fall 2000 4- 20
Erode, Dilate, Open, Close
• Erosion: remove 1 layer of boundary• Dilate: add 1 layer of boundary• Open: E then D• Close: D then E• Hit/Miss
![Page 21: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/21.jpg)
UC Berkeley CS294-9 Fall 2000 4- 21
SE33SE3
![Page 22: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/22.jpg)
UC Berkeley CS294-9 Fall 2000 4- 22
Objectives:
• SE1: looks for 3 horizontal dots• SE2: identify • SE3 & SE4: identify corners• SE6: isolate lines 6 units apart.. Or..
![Page 23: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/23.jpg)
UC Berkeley CS294-9 Fall 2000 4- 23
Segmentation by recursive X/Y Cuts “top down”
![Page 24: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/24.jpg)
UC Berkeley CS294-9 Fall 2000 4- 24
Segmentation by Smearing
Smear horizontally until letters touch, more until words touch lines
Smear vertically until lines touch paragraphs
![Page 25: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/25.jpg)
UC Berkeley CS294-9 Fall 2000 4- 25
Smearing example
character
word
line
paragraph
![Page 26: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/26.jpg)
UC Berkeley CS294-9 Fall 2000 4- 26
Canonicalize elongated objects by thinning
• A A A • These should all be “the same”• Not useful for squares, circles• Perhaps most useful for handwritten data
• Huge literature, far in excess of what it
deserves (relative to usefulness)
• Nevertheless…
![Page 27: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/27.jpg)
UC Berkeley CS294-9 Fall 2000 4- 27
Skeletonization Requirements
• Connected image regions connected lines
• Result is minimally 8-connected• Approximate “medial lines”• Extraneous spurs should be minimized• Loss of information makes it not always
advisable.
![Page 28: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/28.jpg)
UC Berkeley CS294-9 Fall 2000 4- 28
Medial Axis Computation
• For every point P in the object, locate the closest point on the boundary.
• If there are two such points (at the minimum distance) then P is on the Medial Axis
• Alternatively, think of pixels as point sources of a wave front. 2 waves meet at the MA.
![Page 29: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/29.jpg)
UC Berkeley CS294-9 Fall 2000 4- 29
Medial Axis Computation
Medial axis and skeletons with
4-distance, 8-distance, Euclidean distance
(JR Parker)
![Page 30: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/30.jpg)
UC Berkeley CS294-9 Fall 2000 4- 30
The computation is fragile
• The T-shaped object but with one pixel missing
![Page 31: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/31.jpg)
UC Berkeley CS294-9 Fall 2000 4- 31
Iterative Morphological Thinning
![Page 32: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/32.jpg)
UC Berkeley CS294-9 Fall 2000 4- 32
Hypermedia image processing reference ©
• http://www.cee.hw.ac.uk/hipr/html/thin.html
![Page 33: UC Berkeley CS294-9 Fall 20004- 1 Document Image Analysis Lecture 4: Image Transformations Richard J. Fateman Henry S. Baird University of California](https://reader035.vdocuments.mx/reader035/viewer/2022062519/5697bfa51a28abf838c97e5c/html5/thumbnails/33.jpg)
UC Berkeley CS294-9 Fall 2000 4- 33
Other approaches
• Cellular automata more generally• Geometric computation (voronoi
diagrams)• Stroke based decomposition/ syntactic
generation• Computation based on compressed
version (RLE boundary), skew on CCs