chapter 3 beta 1
Post on 11-Apr-2015
439 Views
Preview:
TRANSCRIPT
Projected Inter-Active Displayfor DLSU-Manila Campus Map
by
Arcellana, Anthony A.Ching, Warren S.
Guevara, Ram Christopher M.Santos, Marvin S.So, Jonathan N.
October 2006
Chapter 3 Theoretical Considerations
3.1 Image
3.1.1 Image Representation
A digital image is a representation of a two-dimensional image as a finite
set of digital values, called pixels derived from the word “picture element”. It has
been discretized both in spatial coordinates and in brightness. Each pixel of an
image corresponds to a part of a physical object in the 3D world, which is
illuminated by some light which is partly reflected and partly absorbed by it. Part
of the reflected light reaches the sensor used to image the scene and is responsible
for the value recorded for the specific pixel. The pixels are stored in computer
memory as a raster image or raster map, a two-dimensional array of small
integers. (Petrou, M., et. al, 1999). The number of horizontal and vertical samples
in the pixel grid is called Image dimensions, it is specified as width x height.
These values are often transmitted or stored in a compressed form. The number of
bits, b, with a size of N x N with 2m different grey level is:
b = N x N x m
That is why we often try to reduce m and N, without significant loss in the quality
because it determines the storage size. Digital images can be created in a variety
of ways with input devices like digital cameras, scanners and etc.
3.1.2 Binary and Grayscale
There are many kinds of digital image like binary, grayscale, and color.
These digital images can be classified according to the number and nature of the
value of a pixel. Binary images are images that have been quantized to two
values, usually denoted 0 and 1, but often with pixel values 0 and 255,
representing black and white. A grayscale image is an image in which the value of
each pixel is a single sample. Images of this sort are typically composed of shades
of gray, varying from black to white depending on its intensity, though in
principle the samples could be displayed as shades of any color, or even coded
with various colors for different intensities. An example of this image is in figure
3.1. The original image is the letter a (leftmost) is a grayscale image that has an
intensity of 0 to 255, the center image is a zoomed in version of the image and it
reveals the individual pixels of the letter a. The rightmost image is the normalized
numerical values of each pixel. For this example the coding used is that 1(255) is
brightest and 0(0) is darkest.
Figure 3.1
3.1.3 Color
A color image is a digital image that includes color information for each
pixel, usually stored in memory as a raster map, a two-dimensional array of small
integer triplets; or as three separate raster maps, one for each channel. One of the
most popular colour model is the RGB model. The colors red, green, and blue was
formalized by the CIE (Commission Internationale d’Eclairage) which in 1931
specified the spectral characteristics of red(R), blue(B), green(G) to be
monochromatic light of wavelengths of 700 nm, 546.1nm, 435.8 nm respectively.
(Morris, T., 2004). Almost any colour can be made to match using linear
combinations of red, green, and blue:
C = rR + gG + bB
Today there are many RGB standards in use. Some of these are ISO RGB, sRGB,
ROMM RGB, and NTSC RGB. (Buckley, R. et. al, 1999). These standards are
specifications for specific applications of the RGB color spaces.
Figure 3.2 RGB Colorspace
3.1.4 Resolution
The term resolution is often used as a pixel count in digital imaging.
Resolution is sometimes identified by the width and height of the image as well as
the total number of pixels in the image. For example, an image that is 2048 pixels
wide and 1536 pixels high (2048X1536) contains (multiply) 3,145,728 pixels (or
3.1 Megapixels). Resolution of an image expresses how much detail we can see in
it and clearly and it depends on N and m. It is a measurement of sampling density,
resolution of bitmap images give a relationship between pixel dimensions and
physical dimensions. The most often used measurement is ppi, pixels per inch.
3.1.5 Scaling / Resampling
When creating an image with different dimensions from what we have, we
scale the image. Resampling algorithms try to reconstruct the original continous
image and create a new sample grid.
3.1.6 Sample depth
Sample depth is the level at which binary representation is used to
represent the image. The spatial continuity of the image is approximated by the
spacing of the samples in the sample grid. The values represented for each pixel is
determined by the sample format chosen.
3.2 Input and Output Devices
3.2.1 PC Camera
PC Camera, popularly known as web camera or webcam, is a real time
camera widely used for video conferencing via the Internet. Acquired images
from this device were uploaded in a web server hence making it accessible using
the world wide web, instant messaging, or a PC video calling application. Web
cameras typically include a lens, an image sensor, and some support electronics.
Image sensors can be a CMOS or CCD, the former being the dominant for low-
cost cameras. Typically, consumer webcams offers a resolution in the VGA
region having a rate of around 25 frames per second. Various lens is also
available, the most being a plastic lens that can be screwed in and out to manually
control the camera focus. Support electronics is present to read the image from the
sensor and transmit it to the host computer.
3.2.2 Projector
Projectors are classified into two technologies, DLP (Digital Light
Processing) and LCD (Liquid Crystal Display). This refers to the internal
mechanisms that the projector uses to compose the image (Projectorpoint).
3.2.2.1 DLP
Digital Light Processing technology used in projectors uses an optical
semiconductor known as the Digital Micromirror Device, or DMD chip to
recreate the source material. Originally developed by Texas Instruments, there are
two manners by which DLP projection creates a color image. First it employs the
usage of single-chip DLP projectors and the other is on the use of three-chip
projectors. On a single DMD chip, placing a color wheel between the lamp and
the DMD chip generates colors. Basically a color wheel is divided into four
sectors: red, green, blue and an additional clear section to boost brightness. The
later is usually omitted since it is only use to reduce color saturation. The DMD
chip is synchronized with the rotating color wheel thus when a certain color
section of the color wheel is in front of the lamp that color is displayed at the
DMD. While on a three chip DLP projector, a prism is used to split the light from
the lamp. Each primary color of light is routed to its own DMD chip, recombined
and directed out through the lens. Three chip DLP is referred to the market as
DLP2.
3.2.2.2 LCD
LCD projectors contain three separate LCD glass panels, one for red,
green, and blue components of the image signal being transferred to the projector.
As the light passes through the LCD panels, individual pixels can be opened to
allow light to pass or closed to block the light. This activity modulates the light
and produces the image that is projected onto the screen (Projectorpoint).
3.2.2.3 Keystone Correction
Keystoning occurs when a projector is aligned non-perpendicularly to a
screen, or when the projection screen has an angled surface. The resulting image
of keystoning will be trapezoidal rather than a square (trapezoidal distortion). To
avoid this trapezoidal distortion, keystone correction is done (Projector People).
Keystone correction is basically changing the shape of the projected image to
compensate for the trapezoidal distortion (Presenters Online).
There are two methods on which keystone correction is done, optical and
digital keystone correction. Optical keystone correction is done by physically
modifying the light-path through the lens. The correction is done after the light
has been reflected off the image panels in the projector. Digital keystone
correction adjusts the image proportions by shrinking the image at the edge
furthest away from the screen before the projector generates it (HTRgroup). The
amount of keystone correction varies on the projectors. Some projectors offer 13
to 35 degrees of vertical keystone correction and some even offer both vertical
and horizontal keystone corrections (Projector People).
3.3 Image Processing
Image processing is basically the transformation of images into images. These
images undergo signal processing techniques to manipulate the images to the users’
desire. These techniques will either enhance or suppress wanted and unwanted part of an
image respectively.
3.3.1 Preprocessing Algorithms
Preprocessing algorithms and techniques are used to make the necessary
data reduction and to make the analysis easier. This stage is basically where we
eliminate unwanted information in different specific applications. Such
techniques include extracting the Region-of-Interest (ROI), performing basic
mathematical operations, enhancement of specific features and data reduction.
(Umbaugh, 2005 )
3.3.1.1 Defining Region-of-Interest
In image analysis we seldom need the whole image, we only want
to concentrate in a specified area of the image called the Region-of-
Interest (ROI). Image geometry operations are used to extract ROI.
Examples of these operations include crop, zoom, rotate, etc.
3.3.1.2 Arithmetic and Logical Operations
Arithmetic and logical operations are applied in preprocessing
stage to combine images in different ways. These operations include
addition, subtraction, multiplication, division, AND, OR, and NOT
3.3.1.3 Spatial Filters
Spatial filtering is used for noise reduction and image
enhancement. This is done by applying filter functions or filter operators
in the domain of the image space.
3.3.2 Thresholding
Thresholding is the process of reducing the gray scale of monochrome
images to two values and the simplest way to do image segmentation. One of
which is the “object pixel” and the other is the “background pixel”. An image will
be marked as an object pixel when its value is greater than the threshold value and
background pixel otherwise. Usually, an object pixel is given a value of '1' while a
background pixel is given a value of '0'.
The main parameter in thresholding lies in selecting the correct value for
the threshold. There are many ways to acquire the value of threshold and the
simplest way to select the threshold value would be to choose the mean or median
value. This is effective provided that the object pixels are brighter than the
background, and they should also be brighter than the average. Using a histogram
to record the frequency of occurrence of the image pixel and use the valley point
as the threshold would be the next. The histogram approach assumes that there is
some average value for the background and object pixels, but that the actual pixel
values have some variation around these average values. A more effective way to
acquire the value of threshold is by using iterative methods.
There are two ways to possibly perform the iterative method. The first
method will incrementally search through the histogram for a threshold. Starting
at the lower end of the histogram, the average of the gray values less than the
suggested threshold will be computed thus labeled L, and the same thing with
gray values greater than the suggested threshold labeled G. The average of L and
G will be then computed. If the average is equal to the suggested threshold, it will
be the threshold. Otherwise the suggested threshold is incremented and the
process repeats. (Umbaugh, 2005)
The second method searches the histogram persistently. First an initial
threshold value is suggested: a suitable choice is getting the average of the
image’s four corner pixels. Then the next steps will be similar to the first method,
the only difference lies on updating of the suggested threshold, on this method the
updated value is now equal to the average the value of L and G. (Umbaugh, 2005)
3.3.3 Edge Detection
Edges are important structures in images and in image processing. Edges
define significant structures in a scene, particularly the outlines of objects and
parts of objects.
Morris (2004) defines an edge as a significant, local change in image
intensity. Edge detectors can be classified in two types of operators: template
matching (TM) and differential gradient (DG). Examples of template matching
are Prewitt, kirsch, and Robinson operators and for differential gradient Roberts
and Sobel operators. Both template matching and differential gradient estimate
local intensity gradients with the help of suitable convolution masks (Davies,
2005).
In TM approach the local gradient magnitude, g is approximated by taking
the maximum of the responses for the different component masks:
g = max (gi : i=1,…,n)
where n is the number of masks used usually 8 to 12.
In the DG approach; the local edge magnitude may be computed
vectorially using the transformation
g = (gx + gy) ½
and the edge orientation is calculated as
= tan-1 ( gy / gx)
3.4 Motion Detection
3.4.1 Image Differencing
A common method for detecting moving objects is by use of image
differencing. Image differencing over successive pairs of frames should reveal the
different pixels which should be composed of the moving object. However certain
considerations complicate the matter. Regions of constant intensity and edges
parallel to the direction of motion give no sign of motion (Davies, E. , 2005). Also
image differencing suffers from noise. It is prone to contain errors due to subtle
changes in illumination. This can be caused due to environmental changes and the
digitization process of the camera where in internal noise causes subtle changes in
successive frames.
The documentation of the OpenCV library suggests using a mean of a
number of frames as the reference of the differencing. The mean is calculated as
And the standard deviation is
Where S(x,y) is the sum of the individual pixel intensities at point x and y
Sq(x,y) is the sum of the squares of the individual pixel intensities at point x and y
N is the total number of frames
A pixel is regarded as part of the moving object if it satisfies the condition
C is a certain constant that controls the sensitivity of the differencing. If C = 3, it
is known as the 3 sigma rule (Intel, 2001).
3.5 Image Segmentation
The term image segmentation refers to the partitioning of an image into a set of
regions according to a given criterion. Regions may also be defined as a group of pixels
having both a border and a particular shape such as circle, ellipse, or polygon. Image
segmentation is a very important tool in many image processing and computer vision
problems. Division of the image into regions corresponding to objects of interests is
necessary before any processing can be done at a higher level than that of the pixel. Most
image segmentation algorithms are modification, extension or combination of two basic
concepts. The two basic concepts are the measure of homogeneity within themselves and
the measure of contrast with the objects on their border. Image segmentation techniques
can be divided into three main categories: (1) region growing and shrinking, (2)
clustering methods, and (3) boundary detection (Umbaugh, 2005).
3.5.1 Region Growing Technique
The region growing and shrinking methods use the row and column based
image domain. The seed based region growing is a bottom up segmentation
approach (Yakimovsky, 1976). A seed point within the region of interest is
selected and the adjacent pixels which satisfy the homogeneity property is added.
This process will output a single connected region in the image. To fully partition
the image into N regions, seed points must be selected in each region and the
region growing process must be repeated N times.
The selection of seed points for region growing is often accomplished by
manually selecting the points within the objects of interest. With this process of
selecting seed points, it does ensure that the resulting object meets the needs of
the application. An alternative is to automatically scan the image in acquiring the
seed points based on some expected properties in the region of interest. Local
intensity maxima was usually used as a seed point since majority of the image
have a brighter objects than their background.
Once a seed point (x,y) is identified, the neighbors of that point (x+1,y),
(x-1,y), (x,y+1) and (x,y-1) will be examined to see which belong in the region.
All pixels whose colour is within the radius Rmax of the mean region colour cr are
part of the region, then these points should be added to the region and their
neighbors are next to be considered. As the region grows, the list of adjacent
pixels will also grow. The region will stop growing when all of the neighboring
pixels lie outside the colour radius Rmax (Sangwine, 1998).
3.5.2 Clustering Techniques
Clustering technique is an image segmentation method wherein individual
elements are placed into groups. These groups are based on some measure of
similarity within the group. The major difference of clustering technique with the
region growing technique is that domains other than the row and column (x,y),
based image space (the spatial domain) may be considered as the primary domain
for clustering. Other domains include color spaces, histogram spaces, or complex
feature spaces.
The process starts by looking for clusters in the domain (mathematical
space) of interest. The simplest method is to divide the space of interest into
regions by selecting the center or median along each dimension and splitting it
there. This method is used in the center and median segmentation algorithms. This
method will only be effective if the space we are using and the entire algorithm is
designed intelligently because the center or median split alone may not find good
clusters.
3.5.3 Boundary Detection
Boundary detection is performed by finding the boundaries between
objects, thus indirectly defining the objects. The process starts by marking points
that may be a part of an edge. These points are then merged into line segments,
and the line segments are then merged into object boundaries. Edge detectors are
used to mark points of rapid change, thus indicating the possibility of an edge.
These edge points represent local discontinuities in specific terms, such as
brightness, color or texture.
After the detection of edges the next step is to threshold the results. One
method is to consider the histogram of the edge detection results, looking for the
best valley manually. Edge detection threshold method works best with a bimodal
(two peaks) histogram (Umbaugh, 2005).
3.6 OpenCV
Intel developed an open source computer vision library named OpenCV which
intended for use, incorporation and modification by researchers, commercial software
developers, government and camera vendors as reflected in the license. OpenCV Library
is a collection of algorithms and sample codes for various computer vision problems.
This library is cross-platformed, and runs both on Windows and Linux Operating
Systems. It focuses mainly towards real-time image processing with applications in areas
of Human Computer Interaction (HCI), object identification, face recognition, gesture
recognition, motion tracking, and mobile robotics. The philosophy behind the creation of
the said library is to aid commercial uses of computer vision in human-computer
interface, robotics, monitoring, biometrics and security by providing a free and open
infrastructure where the distributed efforts of the vision community can be consolidated
and performance optimized.
3.6.1 Advantages of Using OpenCV Library
The software provides a set of image processing functions as well as
image and pattern analysis functions. The functions are optimized for Intel®
architecture processors, and are particularly effective at taking advantage of
MMX™ technology. The OpenCV Library is a way of establishing an open
source vision community that will make better use of up-to-date opportunities to
apply computer vision in the growing PC environment. The Library is open and
has platform-independent interface and supplied with whole C sources.
3.6.2 Relation Between OpenCV and Other Libraries
OpenCV is designed to be used together with Intel® Image Processing
Library (IPL) on which the latter extends the functionality toward image and
pattern analysis. It also uses Intel® Integrated Performance Primitives (IPP) on
lower-level, which provides cross-platform interface to highly-optimized low-
level functions that perform image processing and computer vision operations.
OpenCV can automatically benefit from using IPP on platforms like IA32, IA 64
and StrongARM.
3.6.3 Data Types Supported
To make OpenCV API simpler and more uniform, few fundamental types
helper data types are introduced. The fundamental data types include array-like
types: “IplImage”(IPL image), “CvMat” (matrix), growable and mixed type
collections: “CvSeq”, “CvSet”, “CvGraph” and “CvHistogram” (multi-
dimensional histogram). Helper data types include: “CvPoint” (2d point),
“CvSize”(width and height), ”CvTermCriteria” (termination criteria for iteration),
“CvMoments” (spatial moments) and many others.
3.7 Microsoft Visual C++
Microsoft Visual C++ is an integrated development environment (IDE) product
for C, C++ programming languages engineered by Microsoft Corporation. This contains
tools for creating and debugging C++ codes. It posses features like syntax highlighting,
auto-completion feature and debugging functions. The compile and build system feature,
precompiled header files, "minimal rebuild" functionality and incremental link: these
features significantly shorten turn-around time to edit, compile and link the program.
(Wikipedia) Visual C++ is included in the Visual Studio Suite.
3.7.1 Visual C++ Libraries
This includes the industry-standard Active Template Library (ATL), the
Microsoft Foundation Class (MFC) libraries, and standard libraries such as the
Standard C++ Library, and the C RunTime Library (CRT), which has been
extended to provide security enhanced alternatives to functions known to pose
security issues. A new library, the C++ Support Library, is designed to simplify
programs that target the CLR. (MSDN)
3.8 .Net Windows API
The Microsoft .NET framework is a software component that can be added to the
Microsoft operating system. It is a development and execution environment that allows
different programming languages & libraries to work together to create Windows-based
applications that are easier to build, manage, and integrate with other networked systems
(MSDN).
Windows API is designed for usage by C/C++ programs and is the most direct
way to interact with a Windows system for software applications (MSDN). An API
(Application Program Interface) is a set of predefined Windows functions used to control
the appearance and behavior of every Windows function, from the outlook of the desktop
to the memory allocation for new processes. Every action triggers several more API
functions telling Windows what has happened (Nair, 2002).
The APIs can be found in the DLL’s (Dynamic Link Library) in the Windows
system directory. Dynamic Link Library is Microsoft’s implementation of the shared
library concept in the Microsoft OS (Wikipedia). These win32 APIs can be split in to
three, User32.dll, which handles the user interface, Kernel32.dll, which handles file
operations and memory management and Gdi32.dll which handles graphics (Nair, 2002).
References
Answers. (n.d.). Webcam. Retrieved June 03, 2006 from http://www.answers.com/topic/web-cam.
Buckley, R., et. al. (1999). Standard RGB color spaces. In the IS&T/SID Seventh Color Imaging Conference: Color Science, Systems and Applications. Scottsdale, Arizona
Davies, E. (2005). Machine vision: theory, algorithms, practicalities. Elsiever: CA
DLP and LCD Projector Technology Explained. (n.d.). Retrieved June 2, 2006, fromhttp://www.projectorpoint.co.uk/projectorLCDvsDLP.htm.
Home Theater Research Group. Keystone Correction. Retrieved September 24, 2006 from http://htrgroup.com/?tab=projector-docs§ion=keystone
Intel (2001). Open source computer vision library reference manual. Retrieved September 22, 2006 from http://developer.intel.com
Kolas, O. (2005) Image Processing with gluas: introduction to pixel molding. Retrieved September 24, 2006 form http://pippin.gimp.org/image_processing/chap_dir.html
Microbus (2003). Image, resolution, size and compression. Retrieved September 23, 2006 from http://www.microscope-microscope.org/imaging/image-resolution.htm
MSDN (2006) Microsoft developer network: .network fundamentals. Retrieved September 23, 2006 from http://msdn.microsoft.com/netframework/programming/fundamentals/default.aspx.
Morris, T. (2004) Computer vision and image processing. Palgrave Macmillan: NY
Nair. S. (2002). Working with Win32 API in .NET. Retrieved September 24, 2006 from http://www.c-sharpcorner.com/Code/2002/Nov/win32api.asp
Petrou, M., and Bosdogianni, P (1999). Image Processing, The Fundamentals. John Wiley & Sons, LTD : New York
Presenters Online. Fixing a Distorted Image with Keystone Correction. Retrieved September 24, 2006 from http://www.presentersonline.com/technology/projector/keystonecorrection.shtml
Projector People. Projector Keystone Correction. Retrieved September 24, 2006 from http://www.projectorpeople.com/tutorials/keystone-correction.asp
Sangwine, S. (1998). The colour image processing handbook. Chapman and Hall: London
Shapiro, L. and Stockman, G. (2001). Computer Vision. Prentice Hall. Upper Saddle River, New Jersey.
Umbaugh, S. (2005). Computer Imaging: Digital Image Analysis and Processing. CRC Press: Boca Raton, Florida.
Wikipedia (n.d.). .Net framework. Retrieved September 23, 2006 from http://en.wikipedia.org/wiki/.NET_Framework_3.0
Wikipedia. (n.d.). Segmentation. Retrieved September 19, 2006 from http://en.wikipedia.org/wiki/Segmentation_(image_processing).
top related