the world leader in high performance signal processing solutions smp implementing on blackfin bf561...
TRANSCRIPT
![Page 1: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/1.jpg)
The World Leader in High Performance Signal Processing Solutions
SMP Implementingon Blackfin
BF561
Graf Yang ( 杨明明 )Oct 18, 2008
![Page 2: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/2.jpg)
Agenda
BF561 architecture Cache coherency solution Interrupt dispatch SMP status and applications SMP performance Limitations
![Page 3: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/3.jpg)
BF561 architecture
![Page 4: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/4.jpg)
BF561 architecture (cont.)
Block diagram
![Page 5: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/5.jpg)
BF561 architecture (cont.)
Memory architecture• L1 run at core speed L1 scratchpad sram 4K L1 instruction cache 16K L1 instruction sram 16K L1 data cache 32K L1 data sram 32K
• L2 run at 1/2 core speed Data or instruction sram 128K Shared by CoreA/B Cached (Disabled in SMP)
![Page 6: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/6.jpg)
BF561 architecture (cont.)
Compare to x86
BF561 x86
Cache Coherency N/A Cache coherency protocols
Atomic Instruction N/A
Local interrupt controller CEC LAPIC
System interrupt controller SIC (SICA, SICB) IOAPIC
Local timer Core timer LAPIC timer/TSC
Peripheral timer General purpose timer HPET/8254 PIT
Inter-Processor Interrupt SICB LAPIC
Lock# signal
Lock instruction prefix
![Page 7: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/7.jpg)
BF561 architecture (cont.)
How to boot CoreB
Cores Booting Method Starting Address Settings
CoreA
BMODE[1:0]#=00
Boot from 8/16-bit external flash memory BMODE[1:0]#=01
BMODE[1:0]#=11
CoreB Execute from L1 instruction memory SICA_SYSCR[5]
Execute from 16-bit external memory (bypass) 0x2000 0000 (BANK0)
0xEF00 0000 (BOOTROM)
Boot from SPI serial EEPROM (16-bit addressable) 0xEF00 0000 (BOOTROM)
0xFF60 0000 (L1 I-SRAM)
![Page 8: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/8.jpg)
Cache coherency solution
• Why cache coherence• Jiffies, Spin-lock, Semaphore, Mutex, ...
![Page 9: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/9.jpg)
Cache coherence solution (cont.)
• Cache policy• Main memory - Write Through
• Shared on chip SRAM (L2 SRAM) – No cacheable
• Global Lock: protect atomic data• A special spin lock that stay in share on chip SRAM (L2 SRAM)
• Operate functions: _get_core_lock/_put_core_lock
• Parameter: address of atomic data
• Spin lock: based on global lock• Invalidate all the data cache if the same lock has been got by another CPU
• Atomic ops: based on global lock• Protect the atomic operations
• Memory barrier• Invalidate all the data cache
![Page 10: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/10.jpg)
Interrupt dispatch
• Peripheral interrupt trigger both cores
• Two kinds of irq handlers
![Page 11: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/11.jpg)
Interrupt dispatch (cont.)
• Time monotonicity problem• Using two Core timers causes not monotonic
• Using gptimer and 'handle_simple_irq' casues CoreB sticky
• Solution• Use general purpose timer0 instead of Core timers
• Use handle_percpu_irq() instead of handle_simple_irq()
![Page 12: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/12.jpg)
Interrupt dispatch (cont.)
• Inter-processor interrupt: SICB_SYSCR• Write 1 to CA_supplement_int0 trigger an interrupt to CoreA
• Write 1 to CB_supplement_int0 trigger an interrupt to CoreB
• Interrupt handler write 1 to relevant bit to clear interrupt request
Inter-processor interrupt implementing • Per-cpu message queue
• Per-cpu spin lock
• Per-cpu interrupt
![Page 13: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/13.jpg)
SMP status and application (cont.)
• SMP status• 2008R1.5svn://sources.blackfin.uclinux.org/svn/uclinux-dist/branches/2008R1/bfin_patch/smp_patch/
• Trunksvn://sources.blackfin.uclinux.org/svn/uclinux-dist/trunk/bfin_patch/smp_patch/
• Application - Multi-task
• Video encoder/decoder - codec1 on CoreB, codec2 on CoreA
• VoIP - codec on CoreB, network stack on CoreA
![Page 14: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/14.jpg)
SMP performance
• Whetstone test result• Test software: Whetstone
• Test Hardware: BF561, Core Clock 600MHz, System Clock: 100MHz
• Test Environment 1: UP
• Test Environment 2: SMP
Command Line UP SMP
whetstone 15s 15s
whetstone ; whetstone 30s 30s
whetstone & whetstone 30s 20s
• Performance analysis
•Invalidate entire data cache: 79130 times in whetstone test
![Page 15: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/15.jpg)
Limitations
• Store routines to L1 I-SRAM• Store shared data to L1 D-SRAM• User multi-threads running on different Cores
![Page 16: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/16.jpg)
16
Questions?
![Page 17: The World Leader in High Performance Signal Processing Solutions SMP Implementing on Blackfin BF561 Graf Yang ( 杨明明 ) Oct 18, 2008](https://reader036.vdocuments.mx/reader036/viewer/2022062321/56649eeb5503460f94bfc849/html5/thumbnails/17.jpg)
The World Leader in High Performance Signal Processing Solutions
The End Thank you!