team members: mohammed hoque troy tancraitor jonathan lobaugh lee stein joseph mallozi pennsylvania...

Team Members:Team Members:Mohammed HoqueMohammed HoqueTroy TancraitorTroy TancraitorJonathan LobaughJonathan LobaughLee SteinLee SteinJoseph MalloziJoseph Mallozi

Pennsylvania State University

Presentation Overview

• Problem StatementProblem Statement

• Architectural designArchitectural design

• Robotic layerRobotic layer

• Processing layersProcessing layers

• Communications layerCommunications layer

• Testing and ResultsTesting and Results

• Realization of RequirementsRealization of Requirements

The Big Picture• We built a robot with vision and hearing capabilities.We built a robot with vision and hearing capabilities.

• Objectives:Objectives:• The robot must be able to detect a specific user The robot must be able to detect a specific user

from a known set of users, based on audio and from a known set of users, based on audio and video information.video information.

• The system should facilitate an improvedThe system should facilitate an improved

platform for human computer interaction.platform for human computer interaction.

• Limitations: Limitations: • The recognition is solely limited to the five group The recognition is solely limited to the five group

members and project advisors. members and project advisors.

High-level Architectural Design

Robotic interface

Audio processing

Image processing

Neural network

Communications layer

Processing Layer

Robotic Layer

Motor controller Sensor controller Visual controller

System controller

Motor #

LCD

d

ispla

y

Audio Input

device # LED array

Interface controller

Visual input

device

Environ. Input

device #

USB 2.0

I/O Layer

Controller Layer

Interface Layer

Robotic Layer Diagram

Processing Layer Diagram

Interface

Image correction

Audio finger printing

Neural networks

Image Parsing

USB 2.0 Real-time control system

Image processing

Audio processing

Image Layer Diagram

Face Detection

Image Extraction

Image processing

Real-time control system

Neural Network

O(x,y)

Data

Input Image Output Image

Template image

x,y x,y

I(x,y)

Correlation

Facial Vector Extraction Process

• The location of the distinctive features of the The location of the distinctive features of the face are identified using template matching, and face are identified using template matching, and the resultant position vectors are fed to the the resultant position vectors are fed to the neural network.neural network.

Template MatchingFeature Extraction Example

Face ProcessingResults

Eye and nose profiles Normalized eye and nose profile of Troy

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301

Image 1

Image 2

Image3

Image 4

Image 5

Image 6

Image 7

Image 8

Image 9

Image 10

Normalized eye and nose profile of Mohammed

-1-0.8-0.6-0.4-0.2

0

0.20.40.60.8

1

1 23 45 67 89 111 133 155 177 199 221 243 265 287

Data points

Da

ta r

an

ge

Image 1

Image 2

Image 3

Image 4

Image 5

Image 6

Image 7

Image 8

Image 9

Image 10

Normalized eye and nose profile of Lee

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1 21 41 61 81 101 121 141 161 181 201 221 241 261 281 301

Image 1

Image 2

Image 3

Image 4

Image 5

Image 6

Image 7

Image 8

Image 9

Image 10

Normalized eye and nose profile of Jon

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

1 24 47 70 93 116 139 162 185 208 231 254 277 300

Data points

Dat

a R

ang

e

Image 1

Image 2

Image 3

Image 4

Image 5

Image 6

Image 7

Image 8

Image 9

Image 10

Normalized eye and nose profile of Joe

-1-0.8-0.6-0.4-0.2

00.20.40.60.8

1

1 25 49 73 97 121 145 169 193 217 241 265 289

Image 1

Image 2

Image 3

Image 4

Image 5

Image 6

Image 7

Image 8

Image 9

Image 10

Audio layer

• Finds the normalized highest amplitude of the Finds the normalized highest amplitude of the waveforms ( should be less than or equal to 1).waveforms ( should be less than or equal to 1).

• Looks for amplitudes Looks for amplitudes

greater than the 80% greater than the 80%

of the highest of the highest

amplitudes.amplitudes.• Creates an array of Creates an array of

amplitudes of 40 amplitudes of 40

points after finding points after finding

the first one crossingthe first one crossing

the 80% boundary. the 80% boundary.

Audio Layer (cont)

• Pads secondary array with ending zeros (984) Pads secondary array with ending zeros (984) to get a 1024-point array.to get a 1024-point array.

• Performs a Fast Fourier Transform on 1024 Performs a Fast Fourier Transform on 1024 points. points.

• Sends the first 400 absolute points to neural-Sends the first 400 absolute points to neural-network for processing.network for processing.

Audio results

Neural network layer• Feed-forward back propagation 3 layer networkFeed-forward back propagation 3 layer network

• Neural networks provide a way of allowing the system to Neural networks provide a way of allowing the system to generalize the input data and determine the output. generalize the input data and determine the output.

• Two separate networks; one trained for audio data, the other Two separate networks; one trained for audio data, the other trained for image data.trained for image data.

• InputsInputs• Vectors containing facial feature position information (Eyes Vectors containing facial feature position information (Eyes

and nose)and nose)• Audio vector of 400 known FFT pointsAudio vector of 400 known FFT points

• OutputOutput• Percent similarity of known users based on audio/ imagingPercent similarity of known users based on audio/ imaging

• Utilized Matlab’s neural network toolbox to train the Utilized Matlab’s neural network toolbox to train the system weights. system weights.

Communications Layer

USB Class

Command Class

Buffer Class

Communication Package

Robot-RTS

readwrite

readwrite

readwrite

writeread

writeread

P L

P L

L P

L P

L P

Init()

pop

pop

push

push

Lock and Key Algorithm• Setup in the image and audio arrays.Setup in the image and audio arrays.• Developed to prevent data collision.Developed to prevent data collision.

Communications Layer (cont.)

• Controls the flow of data between the robotic Controls the flow of data between the robotic layer and the processing layer.layer and the processing layer.

• Communicates with the devices.Communicates with the devices.

• Command class interfaces all of the software Command class interfaces all of the software packagespackages

• Interprets and sends commands to the robotic Interprets and sends commands to the robotic layer.layer.

Real-time control system

• Manages input and output of data to the Manages input and output of data to the interface layerinterface layer

• Determines robot’s actions during idle Determines robot’s actions during idle processing timeprocessing time

• Controls robot to follow a face once detected.Controls robot to follow a face once detected.

• Controls programmed responses to robotic Controls programmed responses to robotic interface; audio, optical, and mechanicalinterface; audio, optical, and mechanical

Realization of Requirements• Ease of UseEase of Use

Centralized controlCentralized control Minimize connections needed ( connections <=2)Minimize connections needed ( connections <=2) Standardized adaptorsStandardized adaptors

• CostCost School funded expenses = $102.00School funded expenses = $102.00 Team funded expenses = $420.00Team funded expenses = $420.00 Previously owned = $200.00Previously owned = $200.00 Total = $722.00Total = $722.00

• EfficiencyEfficiency Performs audio and image processing in less than .5 Performs audio and image processing in less than .5

seconds. seconds.

Realization of Constraints (cont)• Economical Economical Total expenses exceeded the initial budget of $250. Total expenses exceeded the initial budget of $250.

• Environmental Environmental Audio System is sensitive to background noise.Audio System is sensitive to background noise. Imaging system works the best in ideal lighting condition. Imaging system works the best in ideal lighting condition.

It also works the best when the user directly looks at the It also works the best when the user directly looks at the

camera from a reasonable distance (3-4 feet). camera from a reasonable distance (3-4 feet).

• AccuracyAccuracy The overall accuracy rate of our system is close to 80% The overall accuracy rate of our system is close to 80%

which is what we initially had set up. which is what we initially had set up.

Conclusions / Questions

The next evolutionary step would be to expand The next evolutionary step would be to expand the interaction between humans and machines the interaction between humans and machines to a social interface. This system is created to to a social interface. This system is created to

help bridge the gap between humans and help bridge the gap between humans and computers. computers.

Someday this distinction may change the Someday this distinction may change the meaning of computers to our everyday lives.meaning of computers to our everyday lives.

team members: mohammed hoque troy tancraitor jonathan lobaugh lee stein joseph mallozi pennsylvania...

Documents

array of amplitudes

face processing results

highest amplitudes

lcd display audio input

amplitudes greater

improvedthe system

normalized highest amplitude

group members