new parallel queue processor pqp prof. masahiro sowa university of electro-communications in tokyo,...
TRANSCRIPT
New Parallel Queue Processor PQP
Prof. Masahiro SOWAUniversity of Electro-Communications
in Tokyo, JAPAN
Queue Machines
as Next Generation Computer Systems
Over QueueOver Queue
For mobile, embedded and super computers
Contents
● Introduction of the University of Electro-
Communications
● Introduction of Japan
● Queue computer
Japan
Bulgaria
TokyoTokyo
UECUEC
JapanJapanKyotoKyotoHiroshimaHiroshima
NagoyaNagoyaOsakaOsaka
NagasakiNagasaki
NaganoNagano
University of Electro-University of Electro-CommunicationsCommunications
70% mountain15% for agriculture3% for houses
OkinawaOkinawa
Theory of continental drift
Philippine plate
Eurasian plate
Pacific plate
North American plate
SapporoSapporo
Population 130,000,000 Tokyo 13,000,000
National university corporation
The University of Electro-Communications
Tokyo, Japan
Since 1918.
University with Large Doctor’s and Master’s Programs
MT.FUJI
Number of Students 5,516 Doctor’s Program 206 Master’s Program 974 Under Graduate 4,336Academic Staffs 359Administrative Staffs 175
メチニコフ博士スモーリアン地方。
ブルガリア 32k g
オランダ 20.8k g
フランス 17.7k g
デンマーク 15.1k g
ドイツ 12.5k g
スペイン 9.8k g *
日本 5.1k g
イギリス 4.6k g
アメリカ 2.1k g *
1970 年の大阪万博に「ブルガリア館」 1973 年にブルガリアの国名使用許可
Koto oushuu
琴 欧州
Earthquake
Many kind of natural disaster
Typhoon
October 1951-2005
The biggest typhoons
Died Injured
Flood Ship
Muroto(1934) 2702 14994 401157 27594
Makura(1945) 2473 2452 273888
Isewan(1959) 4697 38921 363611 7576
0 1000 200018001600400
TENNOU period (Emperor)
EDO Shohgunate period
(Tokyo)
1.The separation of Church & state
About 50% people can read and write.
ACBC2008
The continuation and a few war are key word to understand Japan.
NovelGENJI MONOGATARI
by woman
BC3000
1.Peace & harmony are the best
2.Integration of religionsPoems CONGRESS period
(Nobility period)
Rich culture
Japanese Brief History We are here.
Busidou: Pure spiritWarrior
M or C
R
K
C etc
A
A,EFeudal period
sowa曽和そわソワ
700
M or CK
2. Prohibition of people’s weapon
Buddhism
not takennot colonized.
They made a lot of small schools for pleasure.20,000
Literacy Rate(1860) Samurai 100%Men 50%Women 20%
Edo(Tokyo) 70%London 30%
Now you can see many products for common people, produced by people.
The most clean country in the world.People's prevailing satisfactionRich, safety and perfect orderly countryWell plowed farmlandPeople likes gardeningThe most big insult is to send moneyFurniture is needlessBy Heinrich Schliemann(1865)
The common people, having a big economic power, leads big peaceful consumption and big peaceful production in EDO period regardless of feudal period.
Ukiyoe ( Wood engraving printing)Poem (Haiku, Senryu..)
Schools (20,000)Sushi
TenpuraComicBook
DanceSports (Judou, Kendou. Sumou.)
Tea ceremonyFlower arrangement
MusicTravel
1200
Parallel queue computers
Why do we recommend queue computers?
Because the conventional computers use inefficient computing method.
Inefficient computing principle of conventional computers
・ Bring milk, breads and apples to the cashier, pay for all
then bring back them.
We never use this kind
inefficient method
Bring milk to the cashierPay for the milkBring back the milkBring breads to the
cashierPay for the breads Bring back the breads Bring apples to the cashier Pay for the apples Bring back the apples
When we buy milk, breads and apples
■Conventional computer
■Parallel Queue Computer
Almost electric products are computers with Net work
Processor is the most important.
What is required to a Computer●To process big video and big photo data
such as HDTV in high speed
● To process many kind of programs
● To be high performance (Approaching the limit)
● Small energy consumption (Approaching the limit)
● To decrease surface temperature
of the LSI (Approaching the limit)
● Small is better (Approaching the limit)
Break through is needed!!
Parallel Queue Processor can do!!
■ Big parallel processing → Big high performance
All parallelism of a program can be expressed Suitable for video and photo processing.(1000,2000 times
high speed )■ Short program size Small memory, Small cache, Small instruction traffic■ High speed interrupt handling (Short response
time) Single assignment■ Independency between hard and soft wares A program can be executed without changing if changing
hardware
■ Simple hardware No instruction window No register renaming logic Finding instructions in parallel is easy
■ Small energy comsumption Simple hardware
Sowa Lab. Original
Queue model
FIFO Access Register
Queue
0+1 -23456 125
2789
n
0123 1254
Stack model
0+1 -23456 125
2789
n
FILO Access Register
0123 1254
Stack
Three Computation Models
Memory Data
RAR model
0123 1254
Register
0+1 -23456 125
2789
n
Address
Random Access Register
Program
Memory for Intermediate result
Processing Unit
What is the queue computing model?
a b c a+bd c-d (a+b)/(c-d)
y=(a+b)/(c-d)
2x8=16 byte
Queue (FIFO)
Queue program
a b c d
+ -
/
ld a
ld b
ld c
ld d
add
div
st y
sub
UEC SOWA Lab.
single assignment
It can’t do parallel computingProgram becomes longerHardware is complexData disappears by its accessOne procedure is separated
Original queue computing had a lot of drawbacks
Production order
Consumption order
* /
A1
A2 A3
A7
a b c d
x
A9
A6
A4
Memory
*A10
A8
y
Queue
-
ld
st
1 2 3 4
1 2 4
Production order should be equal to
the consumption order.
What is the important point of queue computing
3
st
ld
ld
ld
A5
Instruction hole problem
/
A1 A2
A3
A7
a b c d
x
A9
A6
A4
A5
Memory
A10
A8
y
-
ld l
dld
ld
*
*
st st
Cross arc problem
The order is destroyed by cross arcs and a instruction hole
/
A1 A2
A3
A7
b d
x
ld
*
stA9
A6
A4
A5
Memory
*
A10
A8
y
IH
-
ld
ld
st
a c
1 2 3 4
1 2 43
12 3 41 2 43
Production order
Consumption order
ld
Production order
Consumption order
* /
A1
A2 A3
A7
a b c d
x
A9
A6
A4
Memory
*A10
A8
y
Queue
-
ld
st
1 2 3 4
1 2 4
Production order = consumption order
3
st
ld
ld
ld
A5
C-type : To keep consumption order onlyP-type : To keep production order only
PC-type: Both order (Conventional)
Too hard restriction
Weaken
Fig11 Qp Computing model
P-type queue computing
ld ald bld cld dmuldivmul sub -3st xst y
(b) Graphical program with IH
a c
* /
A1
A2
A3
A7
b d
x
ld
ld
ld
ld
A9
A6
A4
A5
Memory
*
A10
st
A8
y
Queue
st
-
x
A9
A6
A4
A5
Memory
*
A10
A8
y
Queue
-
ld ald bld cld dmuldivmulsub -2st xst y
ld
ld
ld
st
st
/*
ld
(a) Graphical program with arcs crossing
a cb d
A7
offset
QH
Program becomes longer → By wreaking the order dependencies Shorter program size
(1/3-1/2)Complex hardware Simple hardware,
nowadays
Data disappears by its using Reference access
This idea makes all drawbacks to good points.
One procedure are separated
It can’t do parallel computing
→ It allows parallel computing
→ It can express all parallelism in group Big adaptability
of parallel computing
All problems have been solved!!
a b c a+bd c-d (a+b)/(c-d)
y=(a+b)/(c-d)
2x8=16 byte
Queue (FIFO)
Parallel Queue program execution
a b c d
+ -
/
4 step
ld a
ld b
ld c
ld d
add
div
st y
sub
st r2,y
ld r1,a
ld r2,b
add r2,r1,r2
ld r1,c
ld r2,d
sub r2,r1,r2
div r2,r2,r1
st r2,t1
ld r1,t1
Spill
4X10=40byte
8 steps
UEC SOWA Lab.
Spill back
Conventional program(2 registers)
Group parallelism
Small Program Size
Queue processorConventional
Parallelism (more)
Parallelism = High performance
Queue processor
Conventional
Fetch Unit
Instruction Memory Decode Unit
QueueIM
FU
FB
QCU
EU
IU
MS
DB
LQH
1
2
3
4
5
0
6
7
Fetch Buffer
Decode Unit
DU
QueueComputingUnit
QB
FetchUnit
IssueUnit
Data Memory
QH
QT
DM
PQP Architecture
QHQTLQH
FU
STU
IM
DM
CI JPEG I/O
MI
EXULDU 256
44
SIMDPQP
MI
SIMD Queue Register
General Queue Register
Processor for Video and Photo processing
1024 times, 2048 times high performance are easy
1024
All parallelism of a problem can be extracted.
Flexibility for photo processing
Old software can be executed when hardware is changed.
From bread first traverse and no register name in an instruction All parallelism can be expressed: Suitable for large parallel processing →
→ No register renaming logic → High performance Group parallelism → Easy to find parallel instructions → No
instruction windows → Simple hardware → Small energy consumption and high speed
From no register name No need of register renaming logic → Simple hardware → Small
energy consumption and high speed Short instruction length → Small memory, small cache, small instruction
traffic → Simple hardware → Small energy consumption and high speed
Short instruction length → Small cache miss → Simple hardware → Small energy consumption and high speed
Suitable for SMT( Simultaneously Multi Threading) Big independency between software and hardware
From single assignment High interrupt response
Characteristic of Queue Computer
Conventional Parallel Queue Computer
Parallel Queue Computer
QH
QT
New Parallel Queue Computer (P type)
Advanced Parallel Queue Computer
QT
QH
Offset
QHQT
Offset
Queue Computer
Register Computer
Conventional computers are one kind of queue computers
■Collaboration of queue and conventional computers becomes easy.
■Big flexibility
2007 Arquimedes Canedo, Ben Abderazek, Masahiro Sowa ”,A New Code Generation Algorithm for
2-offset Producer Order Queue Computation Model”, Journal of Computer Languages and Compiler Techniques, (2007)
2007 Arquimedes Canedo, Ben Abderazek, Masahiro Sowa ”,Queue Register File Optimization
Algorithm for QueueCore Processor”, 19th International Symposium on Computer Architecture and High Performance Computing: SBAC-PAD 2007, OCt. 24-27 (2007)
Ben A. Abderazek, Tsutomu Yoshinga, and Masahiro Sowa,” Mathematical Model for Multiobjective Synthesis of NoC Architectures”, IEEE Computer Society Proc. of the 35th International Conference on Parallel Processing ICPP Workshop (2007)
Md. Musfiquzzaman Akanda, Ben Abderazek, Masahiro Sowa ,"Dual-Execution Mode Processor Architecture", Journal of Supercomputing,
Teruhisa Yuuki, Arquimedes Canedo, Ben Abderazek, Masahiro Sowa ,”Novel Addressing Method for Aggregate Types in Queue Processors, 2007 International Conference on Convergence Information Technology (ICCIT¨07) (2007)
Arquimedes Canedo, Ben Abderazek, Masahiro Sowa ,”Compiler Framework for an Embedded 32-bit Queue Processor , 2007 International Conference on Convergence Information Technology (ICCIT¨07) (2007)
Yuuki Nakanisi, Arquimedes Canedo, Ben Abderazek, Masahiro Sowa ,” Optimizing Reaching Definitions Overhead in Queue Processors , 2007 International Conference on Convergence Information Technology (ICCIT¨07) (2007)
Arquimedes Canedo, Ben Abderazek, Masahiro Sowa ,”New Code Generation Algorithm for Queue Core - An Embedded Processor with High ILP, The International Conference on Parallel and Distributed Computing PDCAT 07(2007)
Arquimedes Canedo, Ben Abderazek, Masahiro Sowa ,” An Efficient Code Generation Algorithm for Code Size Reduction using 1-offset P-Code Queue Computation Model, EUC2007 ,Taiwan Taipei (2007)
Research Results■ Completion of the PQP in verilog HDL■ Completion of 2D PQP in verilog HDL ■ Completion of Parallelized C Compiler Near actual compiler
Best Paper Award
■ Completion of Hybrid PQP in verilog HDL
Queue and Stack computing
■QJAVA Parallel JAVA■ SIMD Queue computer
Thank you for your attention!
We are expecting you to attend our research group QCI (Queue Computer Initiative).
Please mail me.
University of Electro-CommunicationsTokyo, [email protected]://www.sowa.is.uec.ac.jp
Multi-Processor
Sowa Lab.
4 CPU
ARM: Popular for embedded use
To increase hardware
To increase power consumption
To increase difficulty for
programming
Large programBreak through is needed!!
Multi-Processor System