processor architectures and program mapping

42
Processor Architectures and Program Mapping Application domain specific processors (ADSP or ASIP) 5kk10 TU/e Henk Corporaal Jef van Meerbergen Bart Mesman

Upload: gabriel-russell

Post on 31-Dec-2015

31 views

Category:

Documents


0 download

DESCRIPTION

Processor Architectures and Program Mapping. Application domain specific processors (ADSP or ASIP) 5kk10 TU/e Henk Corporaal Jef van Meerbergen Bart Mesman. Application domain specific processors (ADSP or ASIP). DSP. Programmable CPU. Programmable DSP. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Processor Architectures and Program Mapping

Processor Architectures and Program Mapping

Application domain specific processors(ADSP or ASIP)

5kk10TU/e

Henk CorporaalJef van Meerbergen

Bart Mesman

Page 2: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

2

flexibility

efficiency

DSP

Programmable CPU

Programmable DSP

Application domain specific

Applicationspecific processor

Application domain specific processors (ADSP or ASIP)

Page 3: Processor Architectures and Program Mapping

Application domain specific processors (ADSP or ASIP)

takes a well defined application domain as a starting point• exploits characteristics of the domain (computation kernels)• still programmable within the domain

e.g. MPEG2 coding uses 8*8 DCT transform, DECT, GSM etc ...

performance: clock speed + ILP ILP + tuning to domain flexible dev. (new apps.) cost effective (high volume)

Appl. domain

implementation

ADSP

implementation

Appl. domain

GP

problems - specification manual design, - design time and effort large effort => synthesized cores

Page 4: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

4

Part DescriptionClock(MHz)

Size(gates)

ROM(Kbyte)

RAM(Kbyte)

Speech Components

ADPCM Full duplex ITU-T G.726 compliant and 40 kbit/s speech-compression encoder/decoder. 4 5,100 1.3 0.128

ADPCM-16 Full duplex 16 Channel ITU-T G.726 compliant 16, 24, 32 and 40 kbit/s speech-compression encoder/decoder. 32 10,200 1.3 2.048

IW-ASRSpeechRecognition

Template-based speaker-dependent, isolated-word automatic speech recognition 1.3 9,000 6approx.1kbyte/word

G.723.1 Low bit-rate ITU-TG.723.1 compliant speech-compression at 6.3 kbit/s; can be combined with G.723.1A. 20 24,000 22 2.3

G.723.1AExtended version of G.723.1 to reduce bit rate by a silence compression scheme. Uses voice activity detection andcomfort-noise generation. Fully compliant with Annex A of speech-compression standard CODEC G.723.1.Yields no additional hardware cost.

20 24,000 22 2.3

SpeechSynthesis

Phrase-concatenated speech synthesisDepends on compressionrequirements

Telecommunications

EchoCancellation

High-performance Echo-cancellation and suppression processor. 4 6,000 2.80 0.15

DTMF Full-duplex DTMF transceiver. 2 4,000 1.00 0.15

Caller-ID On-hook and off-hook caller line identification. Includes DTMF and V.23. 3 6,000 2.10 0.15

Reed-Solomon Full-duplex Reed-Solomon codec 7,000 3.75 0.15

ViterbiDecoder

Configurable rate, code and constraint-length. (depending on throughput) Configurable traceback depth. Supportssoft & hard decision making. Supports code puncturing.

5,000

to9,000

--- ---

V.23 modem ITU-T V23 compliant 1200 baud FSK modem 6,000 0.80 0.15

Other

Pink NoiseGenerator

Low-ripple pink noise filter with filter characteristic of -3 ± 0.08 dB per octave over the bandwidth 20Hz to 20kHz 4,000 0.10 0.10

CCIR 656/601 Digital video converter : CCIR to raw-video data and vice versa. 1,500 none none

www.adelantetech.com

Page 5: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

5

• design process• retargetable code generation (problem statement)• ADSP/VLIW architectures (Mistral 2 /A|RT designer)• instructive demo (Adelante)• application examples• low power aspects (Mistral 2 /A|RT designer)• discussion• conclusion

Outline

Page 6: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

6

application(s)processor

-model

OK?

more appl.? yes

no

noyes

Estimationscycles/algoccupation

HWdesign

SW (code generation)

Estimationsnsec/cycle,

area, power/instr

go to phase 2

3 phases 1. exploration 2. hw design (layout) + processing 3. design appl. sw

Fast, accurate and early feedback

Design process

parametersinstance

e.g. VLIW withshared RFs

Page 7: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

7

A compiler is retargetable if it can generate code for a ‘new’ processor architecture specified in a machine description file.

A guarded register transfer pattern (GRTP) is a register transferpattern (RTP) together with the control bits of the instruction word that control the RTP. a: = b + c | instr = xxxx0101GRTPs contain all inter-RT-conflict information.

Instruction set extraction (ISE) is the process of generating all possible GRTPs for a specific processor.

Problem statement

Page 8: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

8

Algorithmspec

FE

CDFG

Code Generation

Machinecode

Processorspec (instance)

ISE

GRTP

Problem statement

in ch 4 this is

part of the code

generator

Page 9: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

9

PC

IM

+1

I.(20:0)

RAM

I.(12:5)

I.(4)

Inp

I.(20:13)

I.(3:2)

I.(1:0)

REG

outp

Example: Simple processor [Leupers]

Page 10: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

10

Instruction Instruction bits21111111111098765432109876543210

PC := PC + 1 xxxxxxxxxxxxxxxxxxxxxREG := Inp xxxxxxxxxxxxxxxxx011x

REG := IM PC .(20..13) xxxxxxxxxxxxxxxxx001x

REG := RAM IM PC . (12..5 ) xxxxxxxxxxxxxxxxx1x1xREG := REG - Inp xxxxxxxxxxxxxxxxx0101

REG := REG - IM PC .(20..13) xxxxxxxxxxxxxxxxx0001

REG := REG - RAM IM PC . (12..5 ) xxxxxxxxxxxxxxxxx1x01REG := REG + Inp xxxxxxxxxxxxxxxxx0100

REG := REG + IM PC .(20..13) xxxxxxxxxxxxxxxxx0000

REG := REG + RAM IM PC . (12..5 ) xxxxxxxxxxxxxxxxx1x00RAM IM PC . (12..5 ) := REG xxxxxxxxxxxxxxxx1xxxxoutp := REG xxxxxxxxxxxxxxxxxxxxxRAM_NOP xxxxxxxxxxxxxxxx0xxxx

Example: Simple processor [Leupers]

Page 11: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

11

ASIP/VLIW architectures

A|RT designer template as an example (= set of rules, a model)

Differences with VLIW processors of ch. 41. // FUs

• ASUs = complex appl. Spec. FUs (beyond subword //) e.g. biquad, median, DCT etc …

• larger grainsize, more heterogeneous, more pipelines2. Rfiles

• many Rfiles (>5 vs 1 or 2)• limited # ports (3 vs 15) • limited size (<16 vs. 128)

3. Issue slots• all in parallel vs. 5

Page 12: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

12

RF1

FU1

RF2 RF3

FU2

RF4 RF5

FU3

RF6 RF7

FU4

RF8

IR1 IR2 IR3 IR4

Instruction memory Con-trol

flags

Page 13: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

13

readaddress

RF 1

writeaddress

RF 1

readaddress

RF 2

writeaddress

RF 2mux 1 mux 2

controlFU

outputdrivers

Additional characteristics of the A|RT designer template• interconnect network: busses + input multiplexers

mux control is part of the instruction control can change every clock cycle network can be incomplete busses can be merged

• memories are modeled as FUs separate data in and data out 2 inputs (data in and address) and 1 output

• Each FU can generate one or more flags• instruction format (per issue slot)

ASIP/VLIW architectures

Page 14: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

14

ALU MACbus1 bus2

RF1 RF2 RF3 RF4

mux 2

read RF1

write RF1

read RF2

write RF2

ALU instr.mux

3read RF4

write RF4

read RF3

write RF3

MAC instr.

091019

ASIP/VLIW architectures: example

Page 15: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

15

GRTP Instruction bits1 1 1 1 1 1 1 1 1 19 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0

RF1 = ALU (RF1, RF2) x c c c c x x c c c x x x x x x x x x xRF2 = ALU (RF1, RF2) x c x c c c c c c c x x x x x x x x x xRF3 = ALU (RF1, RF2) x c x c c x x c c c c x x c c x x x x xRF3 = MAC (RF3, RF4) x x x x x x x x x x c c c c c c x c c cRF4 = MAC (RF3, RF4) x x x x x x x x x x x c c x x c c c c cRF2 = MAC (RF3, RF4) c x x x x c c x x x x c c x x c x c c c

ASIP/VLIW architectures : example

Page 16: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

16

Datapath synthesis

Controller synthesis

OK?

Changepragmas

Algorithmspec

no

yes

RTs

Estimationsarea, power, timing

RF1 : x = RF2 : y, RF3 : z | ALU = ADDInmux = bus2

assign ( a+b, ALU, fu_alu1)assign ( a+_, ALU, fu_alu2)assign ( _+_, ALU, fu_alu3)

VLIW makes relatively simple code selection

possible

ASIP/VLIW architectures:design flow

Page 17: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

18

architecture viewarchitecture view

life-time analysislife-time analysis

resource loadresource load

bus loadbus load

cycle-countcycle-count

ASIP/VLIW architectures: feedback

Page 18: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

19

• design process• retargetable code generation (problem statement)• ASIP/VLIW architectures (Mistral 2 /A|RT designer)• instructive demo (Adelante)• application examples• low power aspects (Mistral 2 /A|RT designer)• discussion• conclusion

Outline

Page 19: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

20

filter

Control unit -

c0 c1 c63

x y

er

Application examples: adaptive filterMinimizes the difference between

x and e (reference signal)

Many applications are possible• echo cancelling for TV

e = flyback signal (known without echoes)• automatic equalization of cables in data transmission• acoustic echo cancelling

Page 20: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

21

filter

Control unit -

c0 c1 c63

x

y

e

r

speaker

microphone

speech

Speech + noise

noise

Application examples: adaptive filter

Page 21: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

22

filter

Control unit -

c0 c1 c63

x

y

e

r

noise (e.g. radio)

Speech + noise

speech

Hearing aid

Application examples: adaptive filter

Page 22: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

23

A1 *

Z-1

Ai *

Z-1

An *

Z-1

A0 *

*

Z-1

+

-

S0[n] S1[n] Si[n]S63[n]

c0 c1 ci c63

x[n] x[n-1] x[n-i] x[n-63]

r[n]

e[n]

ê [n]mu

t[n]

Application examples: adaptive filter

Page 23: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

24

* + Z-1Ci[n]

Ci[n-1]

x[n-i]

t[n]

Ai

Application examples: adaptive filter

Page 24: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

25

#define mu 0.1#define WORD num<32,12>

func main ( input, e : WORD) r : WORD =begin

sum [ 0 ] = WORD ( 0 )x = inputt = WORD ( r @ 1 * WORD ( mu ) )(i : 0 .. 63) ::

beginc [ i ] = c [ i ] @ 1 + WORD ( t * x @ i)s [ i ] = WORD ( x @ i * c [ i ] @ 1)sum [ i+1 ] = sum [ i ] + s [ i ]

endehat = sum [ 64 ]r = e – ehat

end

*

r

+

w

r

*

sum[i+1]

sum[i]

x@i

t

c[i]@1

+

Application examples: adaptive filter

Page 25: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

26

RAM

bus1

21

ALU

12

ROM MULT

12

ACU

23

bus2

266 clock cycles1.1 mm2

Application examples: adaptive filter

implementation 1

Page 26: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

27

RAM

bus1

41

ALU

55

ROM ACU

25

bus2

2250 clock cycles0.7 mm2

Application examples: adaptive filter

implementation 2

Page 27: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

28

RAM1

11

ACU1

22

ALU

12

MULT

12

RAM2

11

ROM ACU2

11

202 clock cycles1.4 mm2

Application examples: adaptive filterimplementation 3

Page 28: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

29

clockcycles

area (mm2)1 2

1000

2000

Page 29: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

30

• design process• retargetable code generation (problem statement)• ADSP/VLIW architectures (Mistral 2 /A|RT designer)• instructive demo (Adelante)• application examples• low power aspects (Mistral 2 /A|RT designer)• discussion• conclusion

Outline

Page 30: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

31

ImplementationIndependent

Design Database

ImplementationIndependent

Design Database

Low power aspects

• Estimation

EXU ACTIVITY AREA POWERalu_1 20% 261 105acs_asu_1 83% 2382 3816or_asu_1 10% 611 122romctrl_1 16% 65 21acu_1 36% 294 205ipb_1 20% 107 43opb_1 11% 163 35ctrl 1864 3597total 5747 7944

area

speed

power

Estimation Database

+Architecture

Mistral2 Mistral2

Page 31: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

32

GSM viterbi decoder : default solution

13750

EXU ACTIV AREA POWERalu_1 96% 3469 46196romctrl_1 48% 39 259acu_1 26% 327 1209ipb_1 5% 131 105opb_1 23% 1804 5801ctrl 9821 135035total 15591 188605

EXU ACTIV AREA POWERalu_1 96% 3469 46196romctrl_1 48% 39 259acu_1 26% 327 1209ipb_1 5% 131 105opb_1 23% 1804 5801ctrl 9821 135035total 15591 188605

• controller responsible for 70% of power consumption

– maximum resource-sharing

– heavy decision-making : “main” loop with 16 metrics-computations per iteration

• EXU-numbers include Registers for local storage

Page 32: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

33

GSM viterbi decoder : no loop-folding

• area down by 33%

• power down by 35%

• next step: reduce # of program-steps with second ALU

14247

EXU ACTIV AREA POWERalu_1 92% 3411 45073romctrl_1 45% 39 255acu_1 25% 294 1087ipb_1 5% 107 86opb_1 22% 1661 5340ctrl 4919 70087total 10431 121928

EXU ACTIV AREA POWERalu_1 92% 3411 45073romctrl_1 45% 39 255acu_1 25% 294 1087ipb_1 5% 107 86opb_1 22% 1661 5340ctrl 4919 70087total 10431 121928

Page 33: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

34

GSM viterbi decoder : 2 ALU’s

9739

EXU ACTIV AREA POWERalu_1 69% 1797 12248alu_2 65% 1393 8916romctrl_1 67% 39 255acu_1 37% 294 1087ipb_1 8% 149 119opb_1 33% 2136 6871ctrl 8957 87235total 14766 116731

EXU ACTIV AREA POWERalu_1 69% 1797 12248alu_2 65% 1393 8916romctrl_1 67% 39 255acu_1 37% 294 1087ipb_1 8% 149 119opb_1 33% 2136 6871ctrl 8957 87235total 14766 116731

cycle count down 30%

area up 42% power down by 5% next step: introduce

ASU to reduce ALU-load

Page 34: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

35

GSM viterbi decoder : 1 x ACS-ASU

EXU ACTIV AREA POWERalu_1 20% 261 105acs_asu_1 83% 2382 3816or_asu_1 10% 611 122romctrl_1 16% 65 21acu_1 36% 294 205ipb_1 20% 107 43opb_1 11% 163 35ctrl 1864 3597total 5747 7944

EXU ACTIV AREA POWERalu_1 20% 261 105acs_asu_1 83% 2382 3816or_asu_1 10% 611 122romctrl_1 16% 65 21acu_1 36% 294 205ipb_1 20% 107 43opb_1 11% 163 35ctrl 1864 3597total 5747 7944

func ACS ( M1, M2, d ) MS, MS8 =begin MS = if ( M1+d > M2-d ) -> ( M1+d) || ( M2-d) fi; MS8 = if ( M1- d > M2+d) -> ( M1- d) || ( M2+d) fi;end;

func ACS ( M1, M2, d ) MS, MS8 =begin MS = if ( M1+d > M2-d ) -> ( M1+d) || ( M2-d) fi; MS8 = if ( M1- d > M2+d) -> ( M1- d) || ( M2+d) fi;end;

=

1930

cycle count down 5X power down 20X !

Page 35: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

36

GSM viterbi decoder : 4 x ACS-ASU

EXU ACTIV AREA POWERalu_1 94% 243 97acs_asu_1 95% 1041 420acs_asu_2 95% 1041 420acs_asu_3 95% 1041 420acs_asu_4 95% 1041 420split_asu_1 47% 90 18or_asu_1 47% 592 118romctrl_1 28% 48 6acu_1 98% 212 85ipb_1 23% 60 6opb_1 50% 369 80ctrl 1306 555total 7084 2645

EXU ACTIV AREA POWERalu_1 94% 243 97acs_asu_1 95% 1041 420acs_asu_2 95% 1041 420acs_asu_3 95% 1041 420acs_asu_4 95% 1041 420split_asu_1 47% 90 18or_asu_1 47% 592 118romctrl_1 28% 48 6acu_1 98% 212 85ipb_1 23% 60 6opb_1 50% 369 80ctrl 1306 555total 7084 2645

cycle count down another 5X

area up 23% power down another

3X !

425

Page 36: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

37

GSM viterbi example : summary

ImplementationIndependent

Design Database

ImplementationIndependent

Design Database

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

default loop 2 ALU 1 ACS 4 ACS

power

areacycles

72x !72x !

Mistral2 Mistral2

Page 37: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

38

Exploration phase

Application softwaredevelopment:

constraint driven compilation

application(s)processor

-model

OK?

more appl.? yes

no

noyes

HWdesign

SW (code generation)

application(s)

OK?no

yes

SW (code generation)

Freezeprocessor

model

no

Discussion: phase 3

Page 38: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

39

Discussion: problems with VLIWs

• code compaction = reduce code size after scheduling possible compaction ratio ?e.g. p0 = 0.9 and p1 = 0.1 information content (entropy) = - pi log2 pi = 0.47

maximum compression factor 2 • control parallelism during scheduling = switch between

different processor models (10% of code = 90% runtime) • architecture

reduce number of control bits for operand addressese.g. 128 reg (TM) -> 28 bits/issue slot for addresses only=> use stacks and fifos

code size and instruction bandwidth

Page 39: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

40

RF1

FU1 FU2 FU3 FU4

IR1 IR2 IR3 IR4

Instruction memory Con-trol

flags

RF2 RF3 RF4

Page 40: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

41

RF1

FU1 FU2 FU3 FU4

RF2 RF3 RF4

Discussion: clustered VLIW architectures

Page 41: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

42

Conclusions

• ASIPs provide efficient solutions for well-defined application domains (2 orders of magnitude higher efficiency).

• The methodology is interesting for IP creation.

• The key problem is retargetable compilation.

• A (distributed) VLIW model is a good compromise between HW and SW.

• Although an automatic process can generate a default solution, the process usually is interactive and iterative for efficiency reasons. The key is fast and accurate feedback.

Page 42: Processor Architectures and Program Mapping

04/19/23 Processor Architectures and Program Mapping H. Corporaal, J. van Meerbergen, and B. Mesman

43

Imagine assignment

• For the coming 3 weeks:– Install the tools (VisualC package will be sent by

mail)– Read the beginners’ guide– Experiment with the compiler on a few examples

• http://www.ics.ele.tue.nl/~hfatemi/5kk10/

• Further information on Imagine:– www.cva.stanford.edu/projects/imagine/