speeding up robot control software through seamless...
TRANSCRIPT
Speeding Up Robot Control Software Through Seamless Integration With
FPGA
Xuan Sang LE1,2, Luc Fabresse1, Jannik Laval3, Jean-Christophe Le Lann2, Loic Lagadec2 and Noury Bouraqadi1
1Mines Douai-DIA, Univ. Lille
2ENSTA Bretagne
3DISP Laboratory, University Lumière Lyon 2
Contents
1. Context & Motivation
2. Platform for Software/FPGAs Integration
3. Applications
1. Robot Follower Using Camera for Object Tracking
2. Reconfigurable IP-Based Smart Sensor Network
4. Conclusion
2
Traditional Robotic Computing System
1
•Accessibility
•Simplicity
•Productivity
Context
3
Sensors
Actuators
sensing
controlProcessor
Robotic Middleware
Robotic Application
Pros.
vs
Cons.•Optimization opportunities
•Performance
•Energy Requirement
Processing + Planification
FPGAs as Accelerators for Robotic Real Time System
1 Motivation
4
Sensors
Actuators
sensing
control Processor
Robotic Middleware
Robotic Application
Post-processing + Planification
FPGAs
Pre-processing
•FPGAs
•Acceleration
•Hardware Flexibility
•Processor
•Flexible Software Environment
Benefits
FPGAs as Accelerators for Robotic Real Time System
1 Motivation
4
Sensors
Actuators
sensing
control Processor
Robotic Middleware
Robotic Application
Post-processing + Planification
FPGAs
Pre-processing
•FPGAs
•Acceleration
•Hardware Flexibility
•Processor
•Flexible Software Environment
Benefits Challenges• Require a specific knowledge → loss of productivity
• FPGAs and processor interfacing varies from project to project → development time consuming
Our Approach
1 Motivation
5
A Generic Software/Hardware Platform for Easily Integrating FPGAs in Existing Robotic System
Our Approach
1 Motivation
5
A Generic Software/Hardware Platform for Easily Integrating FPGAs in Existing Robotic System
physical linkFPGA Processor
(PC)
Robotic Middleware
(ROS)
Existing robotic system
1
Our Approach
1 Motivation
5
A Generic Software/Hardware Platform for Easily Integrating FPGAs in Existing Robotic System
physical linkFPGA Processor
(PC)
Robotic Middleware
(ROS)
Existing robotic system
1
FPG
A
Proc
esso
r
Single Board Computer/ SOC
SW Network2
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
Generate
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
deployment Generate
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
deployment GenerateGenerate
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
deployment GenerateGenerate
Robotic Middleware
1
Scenario
2 Platform for Software/FPGA Integration
6
FPGA
VHDL Legacy
Signal Processing, Image Processing
Processor SW API
deployment GenerateGenerate
Robotic Middleware
1
Network
2Robotic
Middleware
Architecture General
2 Platform for Software/FPGA Integration
7
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
Architecture General
2 Platform for Software/FPGA Integration
7
•Meta-model
•Capture a subset of synthesizable VHDL
•Oriented-Object circuit description
•Circuit Model refactoring
•Implemented in Pharo/Smalltalk
VHDL
MM
M M
MM
M’
Parsing+
Model building
ModelRefacto-
rying
M’
MM
VHDL
Exporting
Bistream Synthesis VHDL
IntegrationHW Debug etc.
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
FPGA Circuits & Interface
2 Platform for Software/FPGA Integration
8
•Automatic interface generation
•Address/data IO interface supports memory mapping mechanism of SW
•Wishbone is used for the interface
•Each user circuit connects to a slave
•Circuit’s registers can be accessed via a virtual address (provided by SW)
•IRQ support software handlers
•Automatically circuit registers addressing
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
Software API
2 Platform for Software/FPGA Integration
9
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
Software API
2 Platform for Software/FPGA Integration
9
• System Driver Layer
• User space IO Driver
• Users circuits are mapped to a virtual memory region
• Physical interface specific
• Compliant with Wishbone interface on FPGA
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
Software API
2 Platform for Software/FPGA Integration
9
• System Driver Layer
• User space IO Driver
• Users circuits are mapped to a virtual memory region
• Physical interface specific
• Compliant with Wishbone interface on FPGA
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
IRQmanager
Wishbonewrapper Physical
interface
shared bus
Wishboneslave
Wishboneslave
User Logic 1
User Logic n
irq
<<generate>>
Drivers/memory
map device file…
.
Dynamic language
VMRobotic
API
Wrapperclasses
Userapp.
<<generate>>
Non-reusable Partially reusable completely reusable automatically generated
FPGA Processor
• API Layer
• FPGA circuits registers are accessed via virtual memory address
• Smalltalk VM supports memory mapping at primitives level
• Wrapper classes consider circuits as Smalltalk Object
• IRQ handler support
• Support high level robotic middleware (ROS, REST, etc)
Controllability and Debugging
2 Platform for Software/FPGA Integration
10
• Hardware Breakpoint
• A breakpoint controller is injected automatically
• Clock-gating technique is used to control the user circuit
• Breakpoint condition can be set from software
• Execution is resumable
• Draw back
• Vendor specific (Digital Clock Manager)
outputsinputs
Shared bus
Breakpoint controller (WB. slave)
User logic
clock controller
clock countercomparator
op. 1 operator op.2
break
global clock
resume
clk
clock count
active
Debug sub-circuit
IRQ Manager
irq
start
cnt := HWCounter new.cnt input:100.cnt setBreakpointOn:#output forValue:50 condition:#=.cnt start:true.cnt waitForIRQ:[
cnt bpActive ifTrue:[('Stop at: ', cnt output asString) print.('Steps: ', cnt clockCount asString) print.cnt resume:true.
] ifFalse:['Execution done' print.
]] timeOut:10 milliseconds.
Impact of Different Software Layers to the Performance of the Link
2 Platform for Software/FPGA Integration
11
• Experiments : Read/Write to a Single Clock Block Ram. 3 scenarios:
• Ideal, without our platform
• With the generated interface + our low level software API
• With the generated interface + our high level software API (Smalltalk)
• Device: APF 51
• FPGA : Xilinx Spartan 6 (LX9)
• Processor : Freescale iMX515 (Cortex A8 @ 800 Mhz)
• Physical Interface: 16 bit WEIM (Wireless External Interface Module)
Table 1
10 times Ideal Interface + low-level SW
Interface + Smalltalk
Write 20MB (s) 2,574 7,367 13,09
read 20MB (s) 3,993 6,343 10,1
Read speed (MB/s) 5,00876533934385 3,15308213778969 1,98019801980198
write speed (MB/s) 7,77000777000777 2,71480928464775 1,52788388082506
MB/
s
0
2
4
6
8
Ideal Interface + low-level SW Interface + Smalltalk
1,528
2,715
7,77
1,98
3,153
5,009
Read speed (MB/s) write speed (MB/s)
�1
Important amount of development time
Manual direct management of
data (addressing, data conversion)
All accesses performed via the wrapper classes
Robot Follower Using Camera for Object Tracking
3 Applications
12
Meta-model
address
OV7670Camera
CaptureLogic HSV Filter
Center of mass
Block RAM32k x 16bit
I2C
Registers data
siocsiod
8 bits
hrefvsync
pclk
ready
address
16 bits pixel
ready pixon
rez160x120
rez320x240
x ycontroller
addr.
data
frame_ok
16 bits collector
addr
. data
Frame buffer
Interface
+
Circuit
SW API + user Application
Object detection by color pattern
ROS Network Communicate
detected object position
(10 lines of code)
1
2 3
APF51OV7670
Robot Follower Using Camera for Object Tracking
3 Applications
13
address
OV7670Camera
CaptureLogic HSV Filter
Center of mass
Block RAM32k x 16bit
I2C
Registers data
siocsiod
8 bits
hrefvsync
pclk
ready
address
16 bits pixel
ready pixon
rez160x120
rez320x240
x ycontroller
addr.
data
frame_ok
16 bits collector
addr
. data
Frame buffer
• Problem with VGA images
• The original design work only with images QQVGA (160x120)
• Simulation & Debug (VGA)
• Simulation of HSV Filter Unit → 14 cycles/ pixel
• Capture Logic Unit, hard to simulate → Hardware Breakpoint
• Stop the circuit when a line (640 pixels) is captured
• 2556 cycles/ 640 pixels →4 cycles/pixel
Original design
Optimised design
Robot Follower Using Camera for Object Tracking
3 Applications
13
address
OV7670Camera
CaptureLogic HSV Filter
Center of mass
Block RAM32k x 16bit
I2C
Registers data
siocsiod
8 bits
hrefvsync
pclk
ready
address
16 bits pixel
ready pixon
rez160x120
rez320x240
x ycontroller
addr.
data
frame_ok
16 bits collector
addr
. data
Frame buffer
• Problem with VGA images
• The original design work only with images QQVGA (160x120)
• Simulation & Debug (VGA)
• Simulation of HSV Filter Unit → 14 cycles/ pixel
• Capture Logic Unit, hard to simulate → Hardware Breakpoint
• Stop the circuit when a line (640 pixels) is captured
• 2556 cycles/ 640 pixels →4 cycles/pixel
The filter takes more time to process a pixel than the time needed to produce
a pixel
Original design
Optimised design
Robot Follower Using Camera for Object Tracking
3 Applications
13
address
OV7670Camera
CaptureLogic HSV Filter
Center of mass
Block RAM32k x 16bit
I2C
Registers data
siocsiod
8 bits
hrefvsync
pclk
ready
address
16 bits pixel
ready pixon
rez160x120
rez320x240
x ycontroller
addr.
data
frame_ok
16 bits collector
addr
. data
Frame buffer
• Problem with VGA images
• The original design work only with images QQVGA (160x120)
• Simulation & Debug (VGA)
• Simulation of HSV Filter Unit → 14 cycles/ pixel
• Capture Logic Unit, hard to simulate → Hardware Breakpoint
• Stop the circuit when a line (640 pixels) is captured
• 2556 cycles/ 640 pixels →4 cycles/pixel
The filter takes more time to process a pixel than the time needed to produce
a pixel
OV7670Camera
CaptureLogic
HSV Filter
Center of mass
8 bits
hrefvsync
pclk
addr.
16 bits
4 pixelsCollector
ready
pixel 1
HSV Filterpixel 2
HSV Filterpixel 3
HSV Filterpixel 4
ready
address
Block RAM32k x 16bit
addr.
data
16 bits collector
addr
. data
address
4 bits
ready
ready
Frame buffer
x y frame_ok
I2C
Registers data
sioc siod
controller
rez160x120
rez320x240
Original design
Optimised design
Robot Follower Using Camera for Object Tracking
3 Applications
14
Hz
0
15
30
45
60
Window
47 146 195 245 295 345 395 443 493 543 593 644 694 744 794 843 893
Average
Frequency(hz)
Publishing rate of the ROS topic
Reconfigurable IP-Based Smart Sensor Network
3 Applications
15
FPGA
sensor 1 sensor n
Gateware
ARM
HTTP server
Driver/memory mapping IP-Stack
Smalltalk VM RESTFul API
…
Physical Interface
Meta-model
VHDLlegacy
User logics
VHDL Parser
Wishbonemaster
Wishbonewrapper
shared bus
Wishboneslave
Wishboneslave
sensor logic
sensorlogic
<<generate>>
…
<<generate>>
Gateware
Wrapperclasses
RESTFul API
Distributed Object APISoftwarepartition
SW
IP-Based Network
User application
Software
<<use>>
Execute
Base station
Sensor node
CameraUnitWrapper » x<remote >ˆself int32At:24
CameraUnitWrapper » y<remote >ˆself int32At:28
CameraUnitWrapper » position<remote >ˆ { self x. self y}
CameraUnitWrapper » positionDo:aBlock<remote:#aBlock>500 timesRepeat:[
aBlock value: self position.100 milliseconds wait]
CameraUnitWrapper » streamself positionDo:[:p| p print]
• IP based Sensor Network:
• Integration of FPGA to the nodes
• Compliant with IoT Standard
• Distributed Objected programming on sensor nodes using a dynamic language
• Over-the-air reconfiguration of HW/SW on the nodes
ContextIP-based Middleware
Sensor Network
Flexible software env.Ease of reconfiguration
Hardware flexibilityPerformance/energy
IP
IP IP
FPGA
ChallengesComplex hybrid software/hardware :• Development/ reconfiguration ?• Deployement ?
Application specific Generic or automatically generated Physical interface specific/partial reusable
Automatic software reconfigurationAutomatic gateware reconfiguration
Execution flow
Reconfigurable IP-Based Smart Sensor Network
3 Applications
16
Proof of concept of the hybrid system :• Base station: Implemented using Pharo Smalltalk• Sensor node:
• ARM: Httpd + Smalltalk VM + REST API• FPGA + camera: object tracking using a color pattern
• The base station fetches the object position from the node
64080REST+VM
544HTTPDModule
532
Shared MemoryRSS¹
¹Residence Set Size
S.W Memory footprint on a node (KB)
0.5streaming
5.3
GW. reconfig.0.5
26.20.5
4.7% CPU0.5% Memory
Resource
0
SW. reconfigIdleResource used on a sensor node at different stages
t t+5 t+10 t+15 t+20 t+25 t+30 t+35 t+40 t+45 t+50 t+55 t+60 t+65
t t+5 t+10 t+15 t+20 t+25 t+30 t+35 t+40 t+45 t+50 t+55 t+60 t+65
(s)
(s)
Network load² of streamming 500 data samples
Network load²
of the software/gateware
reconfiguration
t t+5 (s)
t t+5 (s)
4❖ A platform for interfacing FPGA to High level robotic
software
❖ Easy integration of legacy circuit design using an auto-generation code approach
❖ Debugging using hardware breakpoint
❖ Use cases
❖ Futur works
๏ Improvement of the meta-model
๏ Support more robotic middleware in our software API
Conclusion
17
FPGA vs Processor
4 Annexe
18
0
8,5
17
25,5
34
FPGA (VGA) processor(QQVGA)
Processing time per frame (ms)
0
0,4
0,8
1,2
1,6
FPGA (VGA) processor(QQVGA)
power (W)
• Device: APF 51
• Scenario 1:
• FPGA used to acquire image
• The object detection algorithm is implemented in C (on Processor)
• Image size QQVGA
• Scenario 2:
• The object detection algorithm is implemented on FPGA
• FPGA communicates object position with processor
• Image size VGA
Validity of the VHDL Parser
4 Annexe
19
• The VHDL Parser is tested again some set of VHDL Benchmark
• The ANTLR set: standard VHDL Package
• IWLS 2005 Benchmarks
• ITC’99 Benchmarks
• Gaisler Research Benchmarks