vision sensor network tutorial

8/12/2019 Vision Sensor Network Tutorial

1/66

1

Distributed Vision NetworksDistributed Vision Networks

ACIVS ACIVS- -07 07

Hamid Aghajan, Stanford University, USARichard Kleihorst, NXP Research, NetherlandsChen Wu, Stanford University, USA

August 27, 2007

Delft University, Netherlands

Course Website http://wsnl.stanford.edu/acivs07/index.php

OutlineOutline

Introduction Application potentials Data fusion mechanisms Human pose analysis Multi-view gesture

Distributed Vision Networks ACIVS 2007 Tutorial 2

Smart cameras Outlook


2/66

2

Technology CrossTechnology Cross- -RoadsRoads

Sensor NetworksWireless communicationNetworking

Image Sensors Rich information Low power, low cost

Smart

Camera

Networks


Embedded processing Collaboration methods

Architecture?Algorithms?

Applications?

Scene understanding Context awareness

Potential impact on designmethodologies in eachdiscipline

Sensor Networks PerspectiveSensor Networks Perspective

Opportunities for novel applications:

Make com lex inter retation of environment and events

Learn phenomena and behavior, not just measure effect

Incorporate context awareness into the application

Allow network to interact with the environment


Change of paradigm:High-bandwidth sensors (vision)


3/66

3

Vision Processing PerspectiveVision Processing PerspectiveNovel approach to vision processing:

Use the additional available dimension: spaceData fusion across views, time, and feature levels

Design based on effective use of all available information(opportunistic fusion)

Utilize multiple views to:Overcome ambiguities

Achieve robustness Allow for low complexity algorithms


Use communication to exchange descriptions - not raw dataIn-node processing

Change of paradigm:Networked vision sensors


High-bandwidth data

In-node processing

Low-bandwidth communication


Collaborative interpretation


4/66

4


Rich design space utilizing concepts of: Vision processing Signal processing and optimization Wireless communications Networking Sensor networks

Novel smart environment applications:


Context aware User centric

Distributed Vision NetworksDistributed Vision NetworksProcessing at source allows: Image transfer avoidance Descriptive reports ca a e ne wor s

Design opportunities: Processing architectures for real-time in-node processing Algorithms based on opportunistic data fusion Novel smart environment a lications


Balance of in-node and collaborative processing:

Communication cost Latency Processing complexities Levels of data fusion


5/66

5


Vision sensing requires awareness of: Privacy issues

Em lo in-node rocessin Avoid image transfer Applications that provide services not based on monitoring / reporting

Bandwidth issues Transmit processed information not raw data Transmit based on information value for fusion / query-based


Employ separate early vision and interpretive processing

mechanisms Layered processing architecture: Features, objects, relationships,

models, decisions Employ data exchange and collaboration across different layers


Robotics AgentsResponse systemsSmart environments

Feedback( features, parameters,

Distributed Vision Networks( DVN )

Enabling technologies:o

Vision processingo Wireless sensor networko Embedded computingo Signal processing

ArtificialIntelligence

ContextEvent interpretation

Behavior modeling

SmartEnvironments

Assisted livingOccupancy sensing

Augmented reality

, .

Distributed Vision Networks 10

Multimedia Human ComputerInteraction

Scene constructionVirtual realityGaming

Immersive virtual realityNon-restrictive interfaceRobotics

10 ACIVS 2007 Tutorial


6/66

6

OutlineOutline




Examples by: Chen Wu, Chung-Ching Chang, Huang Lee, Joshua Goshporn, Itai Katz,Kevin Gabayan, Arezou Keshavarz, Ali Maleki-Tabar

Wireless Sensor Networks Lab, Stanford University

Application Potentials: View Selection Application Potentials: View Selection

Select best view of person of interest inreal-time tracking

Data exchange between camerasdetermines which one to stream visual CAM 1

DOOR

data

CAM 2

CAM 5


CAM 3CAM 4


7/66

7

Application Potentials: Assisted Living Application Potentials: Assisted Living

Detect accidents at home

CAM 1

DOOR

CAM 2

CAM 5


CAM 3CAM 4

Application Potentials: Multi Application Potentials: Multi- -Finger GestureFinger Gesture

Manipulate virtual world with free hand gesture

Pan Rotate

Zoom out Zoom in



8/66

8

Application Potentials: Face Profiling Application Potentials: Face Profiling

Interpolate and reconstruct face model from a fewsnapshots

X

-100-50

0

X

Y

Z

Y

ZZ

Y

X


Camera 1(Training set)

Camera 3(Test set)

t1

Application Potentials: 3D Model Reconstruction Application Potentials: 3D Model Reconstruction

t1t2

t2

Onlyobservations at t2


Observations at t1 Observations at t2


9/66

9

Application Potentials: Virtual Reality Application Potentials: Virtual Reality

Place people in virtual world CAM 1

CAM 2

DOOR

CAM 3CAM 4

CAM 5


OutlineOutline






10/66

10

Smart Camera NetworksSmart Camera NetworksTimeTime

(Views)

FeatureLevels

(Views)

FeatureLevels

Space (views) Overcome ambiguities, occlusions Enhance estimate robustness

Fusion Dimensions


Time Increase confidence level of estimates Detection of key frames

Feature levels Exchange of features with other nodes across algorithmic layers

Fusion MechanismsFusion MechanismsFeature fusion:

Use of multiple, complementaryfeatures within a camera node

Spatial fusion:Localization, epipolar geometry, ROIand feature matchingValidation of estimates by checkingconsistency, outlier removal3D reconstructionTemporal fusion:

Local interpolation / smoothing

of estimatesExchange of updates viaspatial fusionSpatiotemporal estimatesmoothing and prediction

Model-based fusion:3D human body reconstruction,human gesture analysisFeedback to in-node featureextraction


Decision fusion:Estimates based on soft decisions

Adequate features in ownobservationsCost, latency of communication

Key features and keyframes:

Information assisting othernodes


11/66

11

Layered Spatial CollaborationLayered Spatial Collaboration

Description Layer 4 :Gestures G

Description Layers Decision LayersFinal decision

Case Study: Human Gesture Analysis

Security

Description Layer 2 :Features

Description Layer 3 :Gesture Elements

Decision Layer 2 :collaboration between

cameras


cameras

F1 F2 F3f12f11 f21

f22f31

f32

E1 E2 E3

Feature-based fusion

Soft decision fusion

In-node feature

Gaming

Mutual reasoning:- Joint estimation

Assisted reasoning:- Estimate validation- Key feature exchange


Description Layer 1 :Images

ec s on ayer :within a single camera

R1 R2 R3

Opportunisticdata fusion

Fusion of features within a single camera

Fusion based on collaboration among multiple cameras

extraction SmartPresentation

AccidentDetection

- In-node feature extraction

Fusion MechanismsFusion Mechanisms


Mutual Reasoning

Spatial fusion Spatiotemporal fusion

Model-based Active vision Feedback

Feature fusion Temporal fusion

Self Reasoning

Assisted ReasoningExchange of key features


12/66

12

The Big PictureThe Big Picture


ModelModel--based Fusionbased Fusion Motivation to build a human model:

A concise reference for merging information from camerasOffers flexibility for interpretation in different applications:

Various esture inter retation a lications

Allows recreation of body gesture in virtual domain Viewing angles to body not available from any of the cameras

Allows employment of active vision methods: Focus on what is important Develop more detail in time

Helps address privacy concerns in various applications



13/66


14/66

14

Use of FeedbackUse of Feedback

CAM 1

CAM 2

CAM N

Merge informationProject / decompose toeach image plane again

CAM 1

CAM 2

CAM N

3Ddescription

Gestureextraction

1e

2e

3e

CAM 1

CAM 2

CAM N


Gestures

Feedback Initialize in-node feature extraction Active vision (focus on what is important)

OutlineOutline





15/66

15

RegionRegion- -based Feature Fusionbased Feature Fusion Problems with pixel-based features:

Localized attributes need local thresholds hard to set Comparing color of foreground / background pixels

No information from extended neighborhood considered Knowledge about extent of neighborhood not available

That is the objective in many cases segmentation

Objects often contain correlated attributes in aregion


The idea is to grow regions based on correlatedattributes Achieve segmentation an intermediate step in many

applications

Feature Fusion: Optical Flow and Color Feature Fusion: Optical Flow and Color Use of complementary features

Edge and color Color and motion

Combine pixel-based andregion-based methods



16/66

16

RegionRegion- -based Fusion: Optical Flow and Color based Fusion: Optical Flow and Color


Joint Refinement of Color and MotionJoint Refinement of Color and Motion

Description Layer 3 :Gesture Elements

Description Layer 4 :Gestures


cameras

E1 E2 E3

G

Description Layers Decision Layersimages

coarse estimation ofcolor segmentation

coarse estimationof motion flows

Description Layer 1 :Images

Description Layer 2 :Features

Decision Layer 1 :within a single camera


cameras

R1 R2 R3

F1 F2 F3f12f11 f21

f22f31

f32

better color segmentation better motion flows

refine refine

. . . . . . . . . . . .

Optical flow assisting color segmentation Color segmentation assisting optical flow


without using angles of ellipses after using angles of ellipses

Search for fitted ellipse in motion flow allows for effectivedetection of arms motion vector

Clustering close-by points with similar motionvector allows for better segmentation of the leg


17/66

17

Spatial FusionSpatial Fusion Geometric fusion

Mutual reasoning

Making correspondences Tracking Reconstruction of 3D models Camera network calibration o n es ma on

Joint refinement Decision fusion

Assisted reasoning Estimate validation

Use of epipolar geometry to:

Feature matchingOutlier removalROI mapping between camera views

X


-100-50

0

X

Y

Z

YZZ

Y

X

Mapped to an ellipsoidCamera 1

(Training set)Camera 3(Test set)

Face Orientation Estimation Color and geometry-based method Spatial / temporal validation method


* *

Color and Geometry FusionColor and Geometry FusionFace orientation analysis

In-node feature extraction by fusion of Color and Geometry Apply position constraints for the eyes when thresholding Cb/Cr

CompensatedImage Eye-map

Mean andCovariance

Skin colorellipse model

EyeCandidate

Cb/Cr

Eye-GaussianDistribution x

Gaussian-ChrominanceDistribution

x


Skin mask



18/66

18

Color and Geometry FusionColor and Geometry Fusion

Use geometry of cameras to:

Face orientation analysisFeature matching with epipolar geometry

x1 x2

X

l x1 x2

X

l

100 200 300 100 200 300 100 200 300

F a

l s e

f a c e

c a n

d i d a

t e

Remove false feature candidates

ase ne

Epipolar line for x 1

ase ne

Epipolar line for x 1


0

0

0

0

50

100

150

200

50

100

150

200

0

0

0

0

50

100

150

200

50

100

150

200

F a

l s e e y e

c a n

d i d a

t e s

Epipolar lines forfalse candidates

An Example ofMutual Reasoning


Feature FusionFeature Fusion Level of features for fusion between cameras?

Features are typically dense fields Edge points, motion vectors

Descriptions are exchanged

Valuable features may be exchanged as dense descriptors

Communication cost issues need to be considered

High-level descriptionsCollaborationbetween cameras

Sparse


Key features and key frames allow selective sharing of dense features

Low-level features

High-level features

ow- eve escr p ons

Processing withina single camera

Features (single camera)or descriptions (shared)

Dense



19/66

19

Key FramesKey Frames Frames with high confidence estimates

Node with key frame observation broadcasts derivedinformation

Other nodes use them to refine their local estimates

Distributed Vision Networks 3737 ACIVS 2007 Tutorial

Temporal FusionTemporal Fusion Use key frames to re-initialize local face angle estimate

Use angle estimates close to zero (frontal view)

Aims to limit error propagation in time Use optical flow to locally track angle changes between frames Interpolate between two key frames to limit optical flow error propagation

Cameras initializeface angles

Cameras initializeface angles

Local optical flow is usedto track face anglebetween key frames


Cameras interpolateface angles betweenkey frames usinglocal optical flow

Key frames



20/66

20

Spatial / Temporal ValidationSpatial / Temporal Validation Estimates between key frames are

corrected by: Temporal smoothing (one camera) Temporal

smoothing

Spatialsmoothing

u er remova mu p e cameras

Can this be done more effectively?Spatiotemporal filtering

Key frames


Spatiotemporal FusionSpatiotemporal Fusion

0 5 10 15 20 25 30 35 40-200

0

200

d e g r e e

Backward Pass

Forward Pass

0

50

100

150 True orientation

d e g r e e

RightProfile

ppor un s c crea on o ace pro e


0 10 20 5045 62 70 80 8 5 1 10 120 130-10-20-27-40-50-65-70-80-95-100-105-125-140 30

B C D E F G H I J KA M N O P Q R S T U VL X Y ZW

0 5 10 15 20 25 30 35 40-150

-100

-50

frame

LeftProfile

Query result examples:side profiles

Mapped toellipsoid



21/66

21

Spatiotemporal FusionSpatiotemporal Fusion


ModelModel--based Human Posture Reconstructionbased Human Posture Reconstruction

From model,or via k -means

Refine color models(PerceptuallyOrganized)

Or other morphologicalmethod with constraints

Concise description ofsegments

Segmentation function: Single cameraFeedback


Model fitting function: CollaborativeGoodness of ellipsefits to segments

Projection on imageplanes

E.g. parameters for theupper body (arms)


22/66

22

InIn--Node Feature Fusion for SegmentationNode Feature Fusion for Segmentation


Collaborative Model FittingCollaborative Model FittingFrame 105



23/66

23

Collaborative Model FittingCollaborative Model Fitting


Virtual PlacementVirtual Placement



24/66

24

Virtual PlacementVirtual PlacementCollaborativeFace Analysis

Feature Fusion Ellipse Fitting


In-node processing-

SpatiotemporalFusion

Decision FusionDecision Fusion Smart home care network for fall detection

States are combined as soft decisions to create a report

AccelerometerSignal Classifier

State1 State2 State3

Decision Making Process

State0Trigger Image Analysis

Camera 1 Camera 2 Camera 3

Analysis Analysis Analysis

50 100 150 200 250 3000

50

100

150

200

250 Accelerometer Signal for Falling and Sitting Down

Time (s)

S i g n a

l L e v e l

xaxis

yaxiszaxis

FallSitting Down

50 100 150 200 250 3000

50

100

150

200

250 Accelerometer Signal for Falling and Sitting Down

Time (s)

S i g n a

l L e v e l

xaxis

yaxiszaxis

FallSitting Down


0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80

20

40

60

80

100

120

140

160

180

A c c e

l e r o m e

t e r

A m p

l i t u

d e

C h a n g e

Duration (s)

Falling versus Sitting Down

Sitting DownFalling

State0=1State0=3

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.80

20

40

60

80

100

120

140

160

180

A c c e

l e r o m e

t e r

A m p

l i t u

d e

C h a n g e

Duration (s)

Falling versus Sitting Down

Sitting DownFalling

State0=1State0=3

NoReport(Safe)

Report AllUsefulData

(PossibleHazard)

Report(Hazard)

Joint work with: Arezou Keshavarz, Ali Maleki-Tabar


25/66

25

Decision FusionDecision Fusion

W ai t f or

-I ni t i al i z e S e ar

-V al i d a t eM

o d


or e O b s er v a t i on s

c h S p a c ef or 3 DM

o d el

el




26/66

26




Alert level= 0.6598

Confidence= 0

Alert level= 0.8370

Confidence= 0.7389

Alert level= 0.8080

Confidence= 0.7695

combine


Lying down, danger ( 0.6201 )


27/66

27

Event InterpretationEvent Interpretation

Vision Reasoning /Interpretations

Human Model

Kinematics AttributesStates

Interpretationlevels

Behavioranalysis

Instantaneousaction

Low-level

AI reasoning

Posture / attributes

AI

VisionProcessing


features o e parame ers

QueriesContextPersistenceBehavior attributes

Feedback

Event InterpretationEvent Interpretation



28/66

28

OutlineOutline




Why is it difficult to estimate posture?Why is it difficult to estimate posture?

Image cues arenot obvious

Human modelvaries(body part sizes)


High-dimensionaloptimization


29/66

29

Image CuesImage Cues Edge

Templates Chamfer distance

Motion Structure Object boundaries / edges

,

Color skin color adaptively learned color

No single one is robust ! Point/line features v.s. region

features


Human ModelHuman Model

Measured Motion capture devices

Acquisition Init at the beginning, with not too difficult postures

If differs from the subject too much, fitting might be


trapped into ar-away local minima easily


30/66

30

OptimizationOptimization

Top-down approach 3D model > 2D projections of

Validate 2D projections with image observations

Deutscher(particle filtering), Sminchisescu(scaled covariance sampling) etc.


OptimizationOptimization Bottom-up approach

Looking for body part candidates in images Assemble 2D/3D models from body part

can ates Sigal(loose limbed people), Ramanan(profile

pose)



31/66

31

OptimizationOptimization - - Pros and ConsPros and Cons

Top-down approach+ Easy to handle occlusions Time consuming in calculating projections and evaluate

them

Bottom-up approach+ Distribute more computation in images (i.e. body part

candidates, local assemblage)


Difficult to handle occlusions without knowing relative configurations of body parts

Not direct to map from 2D assemblage to the 3D model

Multiple Views?Multiple Views?

Gains More image cues Resolve ambiguity (sometimes helps a lot!)

Challenges Huge redundancy, how to reduce?

Misleading Correspondence? Communication?


Review Score addition (Gavrila etc.) 3D voxels / stereo (Malik etc.) 3D templates from training (Sigal etc.)


32/66

32

MultiMulti--View Camera NetworkView Camera Network

Basic Assumption and Constraint-- Powerful local image processor, limited communication e uce oca n orma on Maximally utilize multi-views:

to compensate for partial observations and reduced descriptions

Ideas Combine bottom-up and top-down approaches

Concise and informative local deduction

Choose best view for different purposes


Optimally combine Reduce redundancy

Challenge: Can we learn adaptively? Model (size, appearance) Behaviors -> prediction & validation

TemporalTemporal- -Spatial FusionSpatial Fusion



33/66


34/66

34

Spatial FusionSpatial FusionCombining top-down and bottom-up approaches


Distributed Communication FlowDistributed Communication Flow

Camera 5 wants to update itsCAM 5

Fusion to update localknowledge of the subject

knowledge of the subject

1Broadcast the request for

collaborationCAM 1

2 The other cameras sendrequested descriptions

3

4Updated knowledgeof the subject is fedback

Vector of model parameters

5


CAM 2

CAM 3

CAM 4

Vector of descriptions from local processing

5

5

5

Up-to-date knowledge of the subjectis used in local processing 1 2 3 4 5


35/66

35

Algorithm AlgorithmFrom model,or via k -means

Refine color models(PerceptuallyOrganized)

Or other morphologicalmethod with constraints

Concise description ofsegments

Segmentation function: Single cameraFeedback


Model fitting function: CollaborativeGoodness of ellipsefits to segments



SegmentationSegmentation In-node function based on:

Feature fusion Feedback from model

Feedback allows for incorporation of spatiotemporal fusion outcome intolocal analysis

Rough estimate of segments provided by: Local initialization

Segmentation function: Single camera


Expectation Maximization (EM) methods use new observation to refinelocal color distributions EM produces markers (collection of high-confidence segment islands) for

watershed Also helps with varying color distributions between cameras

Watershed enforces spatial proximity information to link the segment


36/66

36

EM SegmentationEM Segmentation Initialization:

Not a good idea to arbitrarily specify an initial estimate EM may be trapped to local optima

ays o o a n n a es ma es: K-means

Centers of clusters are taken as the initial estimations for EM Segment parameters from the 3D body model

Assumes appearance doesnt change very quickly


Segmentation function: Single camera

Perceptually Organized EM (POEM)Perceptually Organized EM (POEM) Regular EM method:

A pixel-based method Doesnt use spatial relationship between pixels / segment islands

POEM: Segments are continuous, so consider a pixels neighborhood

Use a measure of expected grouping: 2 21 2

( ) ( )

( , )i j i j x x coord x coord x

iw x x e

=


The neighborhood votes for ( xi in segment l ):

( ) ( ) ( , ), where ( ) ( | ) j

l i l j i j l j j j x

V x x w x x x p y l x = = =


37/66

37

Watershed SegmentationWatershed Segmentation Removing vague pixels is important for watershed,

since wrong seeds/markers would compete withcorrect ones and cause false segments

Segmentation function: Single camera Red : undecided pixels


Assigns labels to undecided ( dark blue ) pixels

Ellipse FittingEllipse Fitting

Motivation: Concise descriptions of segments Each ellipse should represent a

segment with similar shape Not necessarily correspond to

body parts

Goodness of fit measures


Occupancy of the ellipse Coverage of the segment


38/66

38

ModelModel--based Human Posturebased Human Posture ReconstructionReconstruction

1 3

1 3

Ellipse parameters are exchanged betweencameras

Reduced data communication load forcollaboration

2 4

2 4

data and create a virtual skeleton


Goodness of ellipsefits to segments



Skeleton FittingSkeleton Fitting

Frame 1

Frame 28

Frame 70

Frame 81


Frame 105

Frame 148


39/66


40/66

40

OutlineOutline





Aims of This Section Aims of This Section

o n ro uce smar camera arc ec ures To motivate smart wireless cameras from

a power consumption point of view To discuss the research and industrial



41/66

41

Vision SystemsVision Systems

Are systems that analyse images and video They report in events/objects/properties DVD recorders, set-top boxes, smart

cameras


VISIONSYSTEMimage

data

Smart Camera VisionSmart Camera Vision S Systemystem

Source: Wikipdia

A definition

smar camera s an ntegrate mac ne v s on system w c , in addition to image capture circuitry, includes a processor, which can extract information from

images without need for an external processing unit, and interface devices used to make results available to otherdevices.


= + +

Franois Berry, Univ. Blaise Pascal, France


42/66

42

Granularity of IntegrationGranularity of IntegrationSmart camera is vision system

with an intermediate hardware granularity

Vision chipsSmart CameraMachine

Vision System


PROCESSING IN INTEGRATED LEVEL


Smart CamerasSmart Cameras

= Camera + intelligence = The basis for new applications Such as: detection, tracking, scene

analysis


Automotive Surveillance ConsumerMobile Comm .


43/66

43

Challenges in Wireless Smart CamerasChallenges in Wireless Smart Cameras

Performance Power consumption Programmability


Actually, Why a Smart Wireless Camera? Actually, Why a Smart Wireless Camera?


Why not a radio connection to PC that does processing?


44/66

44

Power Consumption of a Wireless VGA CameraPower Consumption of a Wireless VGA Camera

200 mWatt to transmit life video over a short distance.

1 year of daily use means 1750 Wh


2.2 liters of methanol fuel per year.

This takes about 400X the total camera volume

Consumption is in the RadioConsumption is in the Radio P Pathath

RFDADS01101 PA

TRANSMITTER

transmit signal processingenergy per bit

Link energy per bitELET


E T [nJ/bit] E L [nJ/bit]

Bluetooth 150 1GSM (0.2 Watt) 500-1000 2500


45/66

45

PowerPower ConsumptionConsumption forfor RadioRadioPower

1W

802.11a

mains UTP

1m

10mBluetooth

Zigbeebattery

Data RatePicoRadiopower/bit does not scalewith Moores Law100 Auto-


s10k 100k 1M 10M 100M

~ moving pictures~ speech, audio, hifi~sensor

nomous1k100

Raf Roovers

Computational Efficiency is GComputational Efficiency is G rowing (Moore)rowing (Moore)

[GOPSer 10

1010 11

20 Age in Years80

Watt]

ASICs

CPUs

10 510 410 310 210 1

10 810 7

10 6

HumanBrain

SIMD Processor


Feature size [um]

2 1 0.5 0.25 0.13 0.07

Pentium 410 0

SIMD Single Instruction Multiple Data


46/66


47/66

47

Smart Wireless Camera PlatformSmart Wireless Camera Platform


WiCa (Xetal SIMD)50 GOPS @ 600mWatt

Cyclops (AVR RISC)8 MIPS @ 50mWatt?

Architecture Architecture

Smart Camera

Embedded system

Differentiating features:power cost Not end-user

Operates in fixedrun-timeconstraints Real


performanceRuns a few

applications oftenknown at design

time

programmable time)



48/66

48

Smart Camera Designs for ResearchSmart Camera Designs for Research

Technology

Processor approach:DSP,Media Processor,GPU

Programmable Logic approach:CPLD, FPGA



Processor ApproachProcessor Approach GPU and DSP

DSP (Digital Signal Processor) : ,

generally in real-time computing

Features:

Highly parallel accumulator and multiplier : MAC OperationFixed-point arithmetic is often used to speed up arithmeticprocessing.

The specification of DSPs is 4 algorithms:



Infinite Impulse Response (IIR) filtersFinite Impulse Response (FIR) filtersFFTConvolvers


49/66

49

Smart Camera Designs for ResearchSmart Camera Designs for Research

Technology

Programmable Logic approach:CPLD, FPGA

Processor approach:DSP,Media Processor,GPU



Programmable Logic ApproachProgrammable Logic Approach

FPGA (Field Programmable Gate Array):Its an electronic component used to build dedicated digital circuits

+Hardware

DescriptionFPGA

VHDLVerilo

=Customized ICProcessor Specific Glue logicS ecific a lications

u v y ufundamental computing components via configuration information stored inonboard static RAM




50/66

50

Programmable Logic ApproachProgrammable Logic Approach

Increasing speed & density

Increased I/O pin count and bandwidth

FPGA (Field Programmable Gate Array):

Integration of hard IP (e.g. multipliers, processor soft cores,)

150

200

250

300

r a t i o n s

Maximum Floating-PointMultiply-Accumulates

Floating Point Operations (GFLOPS)

Maximum Sustained Single PrecisionFPGA Costs

$150

$200

$250

$300

$350

i l l i o n

G a

t e s



0

50

100

1998 1999 2000 2002 2005

O p e

$0

$50

$100

1998 1999 2000 2001 2002 2003

C o s

t p e r

1 M

Smart Camera for ResearchSmart Camera for Research

A very efficient smart cam could be based onan efficient mix of FPGA, DSP, GPU,!!!




51/66

51

Smart Camera Design forSmart Camera Design for C Consumer onsumer UseUse

Challenges:

Performance Energy consumption Cost


Let us look at an example

Example Event Casting: Face DetectionExample Event Casting: Face Detection



52/66

52

Face Detection Application MappingFace Detection Application Mapping

lowlow intermediateintermediate highhighVideo Data

Image processing :Image pyramid

Application:Draw box, event

Pixel processing:Haar filters

for ever ixel for ever ima e For every event


similar similar different

SIMD10++GOPS

FPGA/DSP100MOPS

CPU1MOPS


Why SIMD for LowWhy SIMD for Low- -Level?Level?

High-performance (need > 10GOPS) g n erna - an w nee > s

A

instruction

B

10GOPS * 3 * 16bits

Bandwidth =


C



53/66

53

Uniprocessor Uniprocessor to SIMD: 1 PEto SIMD: 1 PE

4.6mm 20.6mm 2

Data Memor

1mm 2

0.02mm 2

DSP ControlProgramMemory

Performance 100MOPS

100MHz


Size 5.22mm 2

Performance/area 19MOPS/mm 2

Overhead 26%

Bandwidth 4.8Gb/s

Uniprocessor Uniprocessor to SIMD: 2PEsto SIMD: 2PEs

4.6mm 20.6mm 2

1mm 2

0.02mm2

DSP ControlProgramMemory

Performance 200MOPS

0.02mm2

1 2

100MHz


Size 5.24mm 2

Performance/area 38MOPS/mm 2

Overhead 25%

Bandwidth 9.6Gb/s


54/66

54

Uniprocessor Uniprocessor to SIMD: 100PEsto SIMD: 100PEs

4.6mm 20.6mm 2

1mm 2

0.02mm 2

DSP Control

Performance 10 GOPS

0.02mm 2

1 2 100ProgramMemory

100MHz


Size 8.2 mm 2

Performance/area 1.2 GOPS/mm 2

Overhead 20%

Bandwidth 480 Gb/sec.

Uniprocessor Uniprocessor to SIMDto SIMD

RISC : 1PE50MHz

Xetal-II SIMD :320PE@150MHz

Pentium42.4GHz

Peak 0.05 100 6Performance GOPS GOPS GOPSSize 6.4

mm 244.4 (0.18u)

11.1 (0.09u) mm 2131mm 2

Performance /area 0.18u

0.008GOPS/mm 2

2.25GOPS/mm 2

0.045GOPS/mm 2

Overhead 26% 12% ??%


Bandwidth 2 Gb/S 1.5Tb/S 58 Gb/S

Peak PowerConsumption

1.0Watt

59Watt


55/66

55

Why is SIMD LowWhy is SIMD Low- -Power?Power?

Typical DSP

accesses to memory

PE

A

instruction

B


CC = A + B;C = A > B ? A : B;

Why is SIMD LowWhy is SIMD Low- -Power?Power?

SIMD have multiple PEs in parallel r me c a ways as o e one But: Instruction fetch is shared multiple times.

Data (A,B,C) access is shared inmultiple-word-wide memories

Accessin an 8 times wider memor takes


half the amount of energy per data entity.


56/66

56

SIMD Energy ConsumptionSIMD Energy Consumption

25SIMD Energy Consumption

Basis: Convolution

10

15

20

E n e r y

C o n s u m p

t i o n

[ n J / p i x e

l ]

W=9

W=11

W=13

W Communication Memory access


0 100 200 300 400 500 6000

5

Number of PEs

W=3

W=5

W=7

Parallelism Memory Localization

Without voltage scaling,energy saving levels off

ACIVS 2007 Tutorial

SIMD Power/Energy ScalingSIMD Power/Energy Scaling

0.9

1Scaling Energy Consumption in SIMD Architectures

30 GOPS

Parallelism enables

[Vddmax, f max][40,1]

0.3

0.4

0.5

0.6

0.7

0.8

E n e r g y S c a l i n g F a c t o r

10 GOPS

20 GOPS

scaling Vth limits degree of

scaling f max = 50 MHz

Sets P min N = 640 pixels[200,0.3]


0 100 200 300 400 500 600 7000.1

0.2

Number of PEs

0.1 GOPS

max [Vddmin, f min]


57/66


58/66

58

Smart Wireless Camera PCBSmart Wireless Camera PCB

ZigBee module


Ben Schueler, NXP

Battery module

ACIVS 2007 Tutorial

Picture of WiCa SetupPicture of WiCa Setup


Alexander Danilin, NXP


59/66

59

Which Algorithms Run Easily on WiCa?Which Algorithms Run Easily on WiCa?

Where much of the application is running on the

Where the DSP/CPU is used for limited oroccasional tasks only

Choose appropriate algorithmic basis for sceneanalysis


or examp e: ea ure ase

ACIVS 2007 Tutorial

What Have We Mapped to WiCa?What Have We Mapped to WiCa?

Face detection: soft edge features

Horizontal soft edges


Vertical soft edges

ACIVS 2007 Tutorial


60/66

60


Wireless intruder detection system



Object recognition applications



61/66


62/66

62

ConclusionsConclusions

Low-power smart wireless cameras can bees gne w ron -en s

Real-time applications ZigBee node opens research challenges

for distributed camera networks


ACIVS 2007 Tutorial

OutlineOutline






63/66

63

SummarySummarySmart camera networks:

Enable novel user-centric applications:InterpretiveContext awareUser centric

Processing at source allows:Image transfer avoidanceScalable networksDescri tive re orts


Privacy issues: Awareness of user choicesIn-node processing and image transfer avoidanceModel-based or silhouetted images

SummarySummarySmart camera networks:

Algorithm design is key in efficient use of computingresources

In-node feature extraction and opportunistic fusionUse of key features in the data exchange mechanismModel-based approach provides feedback / initial points for in-nodeprocessing

Balance between in-node and collaborative processing


Communication costLatencyProcessing complexitiesLevels of data fusion


64/66

64

Towards Active VisionTowards Active Vision

Active vision in feature extraction:Use of key features instead of generic features (edges, motion, etc.)Detection of prominent color / texture attributesUse of spatiotemporal fusion results to learn key features

Active vision in modules with processing load:Instead of avoiding methods with high processing cost / latency:

Define what they should look for Perform initialization to restrict searches


Active vision in gesture analysis:Use history of subject and semantic meanings of gestures tofeedback what is important to detect

OutlookOutlook

Applications:Select best view of person of interest in real-time tracking

Manipulate virtual world with free hand / finger gesturesDetect accidental falls at home / elderly careReconstruct face model from a few snapshotsBuild 3D models of objects



65/66

65

OutlookOutlook

Robotics AgentsResponse systemsSmart environments


Distributed Vision Networks

( DVN )

Enabling technologies:o Vision processingo Wireless sensor networko Embedded computingo Signal processing

ArtificialIntelligence

ContextEvent interpretationBehavior modeling

SmartEnvironments

Assisted livingOccupancy sensing

Augmented reality

, .


Multimedia Human ComputerInteraction


Immersive virtual realityNon-restrictive interfaceRobotics

OutlookOutlook

Distributed Vision Networks

( DVN )

Enabling technologies:o Vision processingo Wireless sensor networkso Embedded computing

ContextEvent interpretationBehavior models

Artificial Intelligence(AI)

o Signal processing

Human Computer Immersive virtual realityNon-restrictive interface


decisions, etc. )

QualitativeKnowledge

QuantitativeKnowledge


Robotics

Multimedia

nteract on

AgentsResponse systemsUser interactions


Interactive robotics

SmartEnvironments


66/66

ReferencesReferences H. Aghajan and C. Wu, From Distributed Vision Environment to Human Behavior Interpretations, Behaviour

Monitoring and Interpretation Workshop at the 30th German Conference on Artificial Intelligence, Sept. 2007.

C. Wu and H. Aghajan, Model-based Human Posture Estimation for Gesture Analysis in an OpportunisticFusion Smart Camera Network, Int. Conf. on Advanced Video and Signal based Surveillance (AVSS), Sept.2007.

C. Chang and H. Aghajan, A LQR Spatiotemporal Fusion Technique for Face Profile Collection in Smartamera urve ance, n . on . on vance eo an gna ase urve ance , ep . .

C. Chang and H. Aghajan, Spatiotemporal Fusion Framework for Multi-Camera Face Orientation Analysis, Advanced Concepts for Intelligent Vision Systems (ACIVS), August 2007.

C. Wu and H. Aghajan, Model-based Image Segmentation for Multi-View Human Gesture Analysis, AdvancedConcepts for Intelligent Vision Systems (ACIVS), August 2007.

Hamid Aghajan and Chen Wu, Layered and Collaborative Gesture Analysis in Multi-Camera Networks, Int.Conf. on Acoustics, Speech, and Signal Processing (ICASSP), April 2007.

Chen Wu and Hamid Aghajan, Opportunistic Feature Fusion-based Segmentation for Human Gesture Analysisin Vision Networks, IEEE SPS-DARTS, March 2007.

Chen Wu and Hamid Aghajan, Collaborative Gesture Analysis in Multi-Camera Networks, ACM SenSys


, . .

Chung-Ching Chang and Hamid Aghajan, Collaborative Face Orientation Detection in Wireless Image SensorNetworks, ACM SenSys Workshop on Distributed Smart Cameras (DSC), Oct. 2006.

A. Maleki-Tabar, A. Keshavarz, H. Aghajan, Smart Home Care Network using Sensor Fusion and DistributedVision-Based Reasoning, ACM Multimedia Workshop On Video Surveillance and Sensor Networks (VSSN),Oct. 2006.

A. Keshavarz, A. Maleki-Tabar, H. Aghajan, Distributed Vision-Based Reasoning for Smart Home Care, ACM SenSys Workshop on Distributed Smart Cameras (DSC), Oct. 2006.

http://wsnl.stanford.edu/publications.php

www.ICDSC.org

First ACM / IEEE International Conference on Tutorials:

Smart camera architectures Image sensing techniques for smart cameras Embedded vision programming Fusion of vision and other sensors Distributed vision processing algorithms Distributed appearance modeling

Distributed Smart Cameras (ICDSC-07)September 25-28, 2007

Vienna, Austria

Andrea Cavallaro, Queen Mary Universityof London, UK: Smart Cameras:

Algorithms, Evaluation and Applications

Bjoern Gottfried, U. of Bremen, Germany:

Ambient Intelligence and the Role ofSpatial Reasoning: Smart Environmentswith Smart Cameras

Richard Radke, Rensselaer PolytechnicInstitute, USA: Multiview Geometry forCamera Networks o a ora ve ea ure ex rac on, a a an ec s o n us o n

Architectures and protocols for camera networks Wireless and mobile image senor networks

Position disco er and middle are applications

Wilfried Elmenreich, Vienna Univ., Real-time Sensor Networks for Smart Cameras:Communication, Data Processing and

vision sensor network tutorial

Documents