team members: mohammed hoque troy tancraitor jonathan lobaugh lee stein joseph mallozi pennsylvania...
TRANSCRIPT
Team Members:Team Members:Mohammed HoqueMohammed HoqueTroy TancraitorTroy TancraitorJonathan LobaughJonathan LobaughLee SteinLee SteinJoseph MalloziJoseph Mallozi
Pennsylvania State University
Presentation Overview
• Problem StatementProblem Statement
• Architectural designArchitectural design
• Robotic layerRobotic layer
• Processing layersProcessing layers
• Communications layerCommunications layer
• Testing and ResultsTesting and Results
• Realization of RequirementsRealization of Requirements
The Big Picture• We built a robot with vision and hearing capabilities.We built a robot with vision and hearing capabilities.
• Objectives:Objectives:• The robot must be able to detect a specific user The robot must be able to detect a specific user
from a known set of users, based on audio and from a known set of users, based on audio and video information.video information.
• The system should facilitate an improvedThe system should facilitate an improved
platform for human computer interaction.platform for human computer interaction.
• Limitations: Limitations: • The recognition is solely limited to the five group The recognition is solely limited to the five group
members and project advisors. members and project advisors.
High-level Architectural Design
Robotic interface
Audio processing
Image processing
Neural network
Communications layer
Processing Layer
Robotic Layer
Motor controller Sensor controller Visual controller
System controller
Motor #
LCD
d
ispla
y
Audio Input
device # LED array
Interface controller
Visual input
device
Environ. Input
device #
USB 2.0
I/O Layer
Controller Layer
Interface Layer
Robotic Layer Diagram
Processing Layer Diagram
Interface
Image correction
Audio finger printing
Neural networks
Image Parsing
USB 2.0 Real-time control system
Image processing
Audio processing
Image Layer Diagram
Face Detection
Image Extraction
Image processing
Real-time control system
Neural Network
O(x,y)
Data
Input Image Output Image
Template image
x,y x,y
I(x,y)
Correlation
Facial Vector Extraction Process
• The location of the distinctive features of the The location of the distinctive features of the face are identified using template matching, and face are identified using template matching, and the resultant position vectors are fed to the the resultant position vectors are fed to the neural network.neural network.
Template MatchingFeature Extraction Example
Face ProcessingResults
Eye and nose profiles Normalized eye and nose profile of Troy
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1
1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301
Image 1
Image 2
Image3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Normalized eye and nose profile of Mohammed
-1-0.8-0.6-0.4-0.2
0
0.20.40.60.8
1
1 23 45 67 89 111 133 155 177 199 221 243 265 287
Data points
Da
ta r
an
ge
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Normalized eye and nose profile of Lee
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Normalized eye and nose profile of Jon
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1
1 24 47 70 93 116 139 162 185 208 231 254 277 300
Data points
Dat
a R
ang
e
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Normalized eye and nose profile of Joe
-1-0.8-0.6-0.4-0.2
00.20.40.60.8
1
1 25 49 73 97 121 145 169 193 217 241 265 289
Image 1
Image 2
Image 3
Image 4
Image 5
Image 6
Image 7
Image 8
Image 9
Image 10
Audio layer
• Finds the normalized highest amplitude of the Finds the normalized highest amplitude of the waveforms ( should be less than or equal to 1).waveforms ( should be less than or equal to 1).
• Looks for amplitudes Looks for amplitudes
greater than the 80% greater than the 80%
of the highest of the highest
amplitudes.amplitudes.• Creates an array of Creates an array of
amplitudes of 40 amplitudes of 40
points after finding points after finding
the first one crossingthe first one crossing
the 80% boundary. the 80% boundary.
Audio Layer (cont)
• Pads secondary array with ending zeros (984) Pads secondary array with ending zeros (984) to get a 1024-point array.to get a 1024-point array.
• Performs a Fast Fourier Transform on 1024 Performs a Fast Fourier Transform on 1024 points. points.
• Sends the first 400 absolute points to neural-Sends the first 400 absolute points to neural-network for processing.network for processing.
Audio results
Neural network layer• Feed-forward back propagation 3 layer networkFeed-forward back propagation 3 layer network
• Neural networks provide a way of allowing the system to Neural networks provide a way of allowing the system to generalize the input data and determine the output. generalize the input data and determine the output.
• Two separate networks; one trained for audio data, the other Two separate networks; one trained for audio data, the other trained for image data.trained for image data.
• InputsInputs• Vectors containing facial feature position information (Eyes Vectors containing facial feature position information (Eyes
and nose)and nose)• Audio vector of 400 known FFT pointsAudio vector of 400 known FFT points
• OutputOutput• Percent similarity of known users based on audio/ imagingPercent similarity of known users based on audio/ imaging
• Utilized Matlab’s neural network toolbox to train the Utilized Matlab’s neural network toolbox to train the system weights. system weights.
Communications Layer
USB Class
Command Class
Buffer Class
Communication Package
Robot-RTS
readwrite
readwrite
readwrite
writeread
writeread
P L
P L
L P
L P
L P
Init()
pop
pop
push
push
Lock and Key Algorithm• Setup in the image and audio arrays.Setup in the image and audio arrays.• Developed to prevent data collision.Developed to prevent data collision.
Communications Layer (cont.)
• Controls the flow of data between the robotic Controls the flow of data between the robotic layer and the processing layer.layer and the processing layer.
• Communicates with the devices.Communicates with the devices.
• Command class interfaces all of the software Command class interfaces all of the software packagespackages
• Interprets and sends commands to the robotic Interprets and sends commands to the robotic layer.layer.
Real-time control system
• Manages input and output of data to the Manages input and output of data to the interface layerinterface layer
• Determines robot’s actions during idle Determines robot’s actions during idle processing timeprocessing time
• Controls robot to follow a face once detected.Controls robot to follow a face once detected.
• Controls programmed responses to robotic Controls programmed responses to robotic interface; audio, optical, and mechanicalinterface; audio, optical, and mechanical
Realization of Requirements• Ease of UseEase of Use
Centralized controlCentralized control Minimize connections needed ( connections <=2)Minimize connections needed ( connections <=2) Standardized adaptorsStandardized adaptors
• CostCost School funded expenses = $102.00School funded expenses = $102.00 Team funded expenses = $420.00Team funded expenses = $420.00 Previously owned = $200.00Previously owned = $200.00 Total = $722.00Total = $722.00
• EfficiencyEfficiency Performs audio and image processing in less than .5 Performs audio and image processing in less than .5
seconds. seconds.
Realization of Constraints (cont)• Economical Economical Total expenses exceeded the initial budget of $250. Total expenses exceeded the initial budget of $250.
• Environmental Environmental Audio System is sensitive to background noise.Audio System is sensitive to background noise. Imaging system works the best in ideal lighting condition. Imaging system works the best in ideal lighting condition.
It also works the best when the user directly looks at the It also works the best when the user directly looks at the
camera from a reasonable distance (3-4 feet). camera from a reasonable distance (3-4 feet).
• AccuracyAccuracy The overall accuracy rate of our system is close to 80% The overall accuracy rate of our system is close to 80%
which is what we initially had set up. which is what we initially had set up.
Conclusions / Questions
The next evolutionary step would be to expand The next evolutionary step would be to expand the interaction between humans and machines the interaction between humans and machines to a social interface. This system is created to to a social interface. This system is created to
help bridge the gap between humans and help bridge the gap between humans and computers. computers.
Someday this distinction may change the Someday this distinction may change the meaning of computers to our everyday lives.meaning of computers to our everyday lives.