a low-power wave union tdc implemented in fpga wu, jinyuan fermilab yanchen shi and douglas zhu...
TRANSCRIPT
A Low-Power Wave Union TDC Implemented in FPGA
Wu, Jinyuan
Fermilab
Yanchen Shi and Douglas Zhu
Illinois Mathematics and Science Academy
Sept. 2011
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 2
Performance Degrading in CPU/GPU, ASIC & FPGA
Imperfect designs degrade performance of ICs, including CPU/GPU considerably. ASIC devices are built using older technology and suffering similar design degrading. FPGA internal structure causes extra performance degrading in addition to design degrading. Design modification in FPGA is easier so that design degrading can be minimized.
Performance
CPU/GPU
DegradingDue toDesign
Theoretical limit of current technology
ASIC.
DegradingDue toDesign
Theoretical limit of Older technology
FPGA
DegradingDue toStructure
DegradingDue toDesign
Carefully designed FPGA may have better performance than typical ASIC.
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 3
Introduction A 32-channel Wave Union TDC firmware has
been implemented in an Altera Cyclone III FPGA device (EP3C25F324C6N, $73.90) and has been tested on a Cyclone III evaluation card.
Low-power design practice has been applied for applications in vacuum.
Time measurement function is tested on 16 channels and typical delta t RMS resolution between two channels is 25-30 ps.
Power consumption is measured for 32 channels at ~27 mW/channel.
The Wave Union TDC Implemented in FPGA
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 4
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 5
TDC Using FPGA Logic Chain Delay
This scheme uses current FPGA technology
Low cost chip family can be used. (e.g. EP2C8T144C6 $31.68)
Fine TDC precision can be implemented in slow devices (e.g., 20 ps in a 400 MHz chip).
IN
CLK
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 6
Two Major Issues In a Free Operating FPGA
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
wid
th (
ps)
1. Widths of bins are different and varies with supply voltage and temperature.
2. Some bins are ultra-wide due to LAB boundary crossing
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 7
0
500
1000
1500
2000
2500
0 16 32 48 64
bin
tim
e (
ps)
Auto Calibration Using Histogram Method It provides a bin-by-bin calibration at
certain temperature. It is a turn-key solution (bin in, ps out) It is semi-continuous (auto update
LUT every 16K events)
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
wid
th (
ps)
DNLHistogram
In (bin)LUT
S
Out (ps)
16KEvents
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 8
Good, However
Auto calibration solved some problems However, it won’t eliminate the ultra-wide bins
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64
bin
wid
th (
ps)
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 9
Wave Union Launcher A
In
CLK
1: Unleash0: HoldWave UnionLauncher A
Regular TDC records only one transition
Wave Union TDC records multiple transitions.
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 10
Wave Union Launcher A: 2 Measurements/hit
1: Unleash
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 11
Sub-dividing Ultra-wide Bins
1: Unleash
1
2
1
2
Device: EP2C8T144C6 Plain TDC:
Max. bin width: 160 ps. Average bin width: 60 ps.
Wave Union TDC A: Max. bin width: 65 ps. Average bin width: 30 ps.
0
20
40
60
80
100
120
140
160
180
0 16 32 48 64 80 96 112 128bin
wid
th (p
s)
Plain TDC
Wave Union TDC A
Low-power Design Practices
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 12
13
Low-Power Design Practice: Wave Union
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
Intrinsically the Wave Union TDC is a low-power scheme.
Multiple measurements are made with one set of delay line, register encoder etc. yielding finer resolution that otherwise needs several regular TDC blocks to achieve.
14
DelayLine &
SamplingRegister
Array
Low-Power Design Practice: Clock Speed
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
The Sampling Register Arrays are clocked at 250 MHz. All other stages are clocked at 62.5 MHz. When a valid hit is sampled, the Sampling Register Array is disabled so that the registered pattern is stable
for 64 ns. The Data Load/Transfer Registers are enabled to load input 64 ns, so that a valid hit is guaranteed to be load
once and only once.
CK250
DataLoad/
TransferRegister
CK62Load
ClockDisable
Sequencer
Encoder
IN0
Buffer w/Zero
Suppression
250 MHz 62.5 MHz
15
DelayLine &
SamplingRegister
Array
Low Power Design Practice: Resource Sharing
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
The Data Load/Transfer Registers are enabled to load input 64 ns, (i.e., 4 clock cycles at 62.5 MHz).
The Data Load/Transfer Registers transfer data from other channels when they are not enabled to load.
Four channels share an Encoder and a Buffer with Zero Suppression.
CK250
DataLoad/
TransferRegister
CK62Load
ClockDisable
Sequencer
Encoder
IN0
IN1
IN2
IN3
Buffer w/Zero
Suppression
DataMergingRegister
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 16
Block Diagram of 16 Channels
The hit time for each of the 16 channel inputs is digitized and encoded. Data from 4 channels are buffered and data from 4 groups of 4 channels are merged together. Raw hit times are converted to fine time through automatic calibration block. Data from all 16 channels are buffered and sent out via 4 pairs of LVDS ports @250 M bits/s.
TDC + Encoder
Data Buffer +Concentration
AutomaticCalibration
OutputBuffer
8B10B +Serialization
Test Results
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 17
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 18
The Test Hardware
2008
Altera Cyclone II + VME (~$1k)
FPGA: EP2C8T144C6 ($28.80)
16 channel: 25 ps
2 channel: 10 ps
81 mW/channel
Ref: Search “Wave Union TDC”
2011
Altera Cyclone III Starter Kit ($211+$50)
FPGA: EP3C25F324C6N ($73.90)
32 channel: 30 ps (25 ps with linear power supply)
27 mW/channel
www.altera.com
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 19
Test Setup
NIM toLVDSConverter
TDCModule
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 20
Output Raw Data and Typical Delta T Histogram Between Two Channels
RMS of this histogram is 25 ps.
00003CC064A6F064B8C07CA4F07CB4C094A0F094B0C0AC9CF0ACACC0C497F0C4A8C0DC91F0DCA2
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 21
Delta T Between NIM Inputs
TDC channels internally ganged together has smallest standard deviation of time differences. Typical channel pairs sharing same fan-out unit has 30 ps RMS. Timing jitters of the fan-out units add to the measurement errors.
0
10
20
30
40
50
60
10 100 1000 10000
sigm
a (p
s)
dt (ps)
8ns 6ns 4ns 2ns
0
500
1000
1500
2000
2500
1500 1700 1900 2100 2300 2500
dt (ps)
HistA HistB HistC
PulseGen.
LeCroy429ANIMFAN-OUT
NIMTo
LVDS
FPGA
LeCroy429ANIMFAN-OUT
TDC
TDC
TDC
TDC
TDC
TDC
TDC
TDC
NIMTo
LVDS
A
B
C
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 22
Time Measurement Errors Due to Power Supply Noise
Typical RMS resolution is 25-30 ps. Measurements with cleaner power (diamonds) is better than noisy power (squares).
SwitchingPower Supply Linear
Power Supply
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 23
Specifications
RMS Resolution (Delta T between two channels) 25 to 30 ps
Same channel re-hit time interval 64 ns
Temporary buffer capacity 128 hits/(4 ch)/(16 us)
LVDS output port rate: 250 M bits/s/port
Output capacity in each LDVS output port: 128 hits/(16 ch)/(16 us)
Number of LVDS output ports: 1, 2, 3, 4/(16 ch)
Power Consumption (Core only) 9.3 mW/channel
Power Consumption (Total) 27 mW/channel
24
Other Applications: Single Slope ADC
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
0
0.5
1
1.5
2
2.5
0 32 64 96 128 160 192 224 256
V
t (ns)
Vc+ Vc- Vin+/2 Vin-/2
FPGA
TDC
TDC
R RC
R1
VREF+
4xR2
4xR2
VREF-
VIN1+
VIN1-
VIN2+
VIN2-
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 25
If You Want to Try
The FPGA on the Starter Kit is fairly powerful. More than 16 pairs LVDS I/O can be accessed via the daughter card. FPGA can fit 32 channels but implementing 16 channels is more practical given the I/O pairs. TDC data are stored in the RAM on the board and can be readout via USB. A good solution for small experiment systems as well as student labs.
www.altera.comDK-START-3C25NCyclone III FPGA Starter Kit$211
www.altera.comTHDB-H2G (HSMC to GPIO Daughter Board)$50
The End
Thanks
Timing Uncertainty Confinement
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 27
28
Historical Implementation in ASIC TDC
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
DLL Clock Chain
Encoder
CoarseTime
Counter
HIT CoarseTime
Register
CoarseTime
SelectionLogic
c1c0
HIT is used as CK input which creates unnecessary challenges.
Deadtime is unavoidable. Coarse time recording needs special care. Two array + encoder sets are needed for raising edge and falling edge. The register array must be reset for next event. The encoder must be re-synchronized with system clock in order to interface with readout stage.
Unnecessary Challenges = Extra Efforts + Reduced Performance
29
Unnecessary Challenges
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
In history, Gray code counters, double counters and dual registers + MUX are found in ASIC TDC coarse time counter schemes.
Theses are unnecessary if the TDC is designed appropriately. In FPGA, a plain binary counter is sufficient.
CoarseTimeCounter
CoarseTimeCounter
CoarseTimeCounter
GrayCodeCounter
000001011010110111101100
Unnecessary for FPGA TDC
30
A Better Implementation
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
DLL Clock Chain
OR + Register
ClockDomainTransfer
DV EG T4..T0
HITMulti-
SamplingRegister
Array
Deadtimeless operation is possible. No special care is needed for coarse time. Both raising and falling edges are digitized with a single array + encoder set. No resetting is needed for the register array. The output is synchronized with the system clock and is ready to interface with readout stage.
CoarseTime
Counter
TC
16-bit Encoder with Registered Outputs 16-bit Encoder with Registered Outputs
HIT is used as D input.
31
Coarse Time Counter
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
The timing uncertainty between HIT and CLK is confined in the sampling register array.
All the remaining logics are driven by the CLK signal.
No special cares such as Gray code counter is needed for coarse time counter.
Hit Detect Logic
CoarseTimeCounter
FineTimeEncoder
HIT
CLK
ENA
FineTime
CoarseTime
DataValid
32
Comparison
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
Historical Scheme:HIT-> CK; (c0..c31)->D;
Preferable Scheme:HIT-> D; (c0..c31)->CK;
Deadtime is unavoidable. Deadtimeless operation is possible.
Coarse time recording needs special care. No special care is needed for coarse time.
Two array + encoder sets are needed for raising edge and falling edge.
Both raising and falling edges are digitized with a single array + encoder set.
The register array must be reset for next event.
No resetting is needed for the register array.
The encoder must be re-synchronized with system clock in order to interface with readout stage.
The output is synchronized with the system clock and is ready to interface with readout stage.
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 33
More Measurements
Two measurements are better than one. Let’s try 16 measurements?
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 34
Wave Union Launcher B: 16 Measurements/hit
1 Hit16 Measurements@ 400 MHz
VCCINT=1.20V
VCCINT=1.18V
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 35
Delay Correction
0
500
1000
1500
2000
2500
3000
0 4 8 12 16
m
T0 (
ps)
16
32
48
64
0 2 4 6 8 10 12 14 16
m
TD
C (
bin
)
Delay Correction Process: Raw hits TN(m) in bins are first calibrated into
TM(m) in picoseconds. Jumps are compensated for in FPGA so that
TM(m) become T0(m) which have a same value for each hit.
Take average of T0(m) to get better resolution.
The raw data contains: U-Type Jumps: [48-63][16-31] V-Type Jumps: other small jumps. W-Type Jumps: [16-31][48-63]
15
000 )(
16
1
mav mtt
The processes are all done in FPGA.
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 36
Test ResultNIM Inputs
0 1 2
RMS 10ps
LeCroy 429ANIM Fan-out
NIM/LVDS
NIM/LVDS
-
140ps
Wave Union TDC BWave Union TDC BWave Union TDC BWave Union TDC B
Wave Union TDC BWave Union TDC BWave Union TDC BWave Union TDC B
+
+BNC adapters to add delays @ 140ps step.
37
A Preferable Scheme
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
DLL Clock Chain
32-bit Encoder with Registered Outputs
ClockDomainTransfer
c1c0 c16 c17
DV EG T4..T0EG: Edge, =1: Raising or =0: Falling.T4..T0: Time.DV: Data Valid, =1 Valid edge detected.It is used as PUSH signal for FIFO orWrite Enable for other memory buffers.
HITMulti-
SamplingRegister
Array
Minimum setup time between the multi-sampling register array stage and the clock domain transfer stage: 17 clock taps.
Setup time between the clock domain transfer stage and the encoder register: 32 or 16 clock taps.
All outputs including TC are aligned with c0. Supports both raising and falling edges.
CoarseTime
Counter
TC
38
DelayLine &
SamplingRegister
Array
Low Power Design Practice: Clock Speed
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA
The Sampling Register Arrays are clocked at 250 MHz.
All other stages are clocked at 62.5 MHz. When a valid hit is sampled, the Sampling
Register Array is disabled for 64 ns. The Data Load/Transfer Registers are enabled
to load input 64 ns, so that a valid hit is guaranteed to be load once and only once.
The Data Load/Transfer Registers transfer data from other channels when they are not enabled to load.
Four channels share an Encoder and a Buffer with Zero Suppression.
CK250
DataLoad/
TransferRegister
CK62Load
ClockDisable
Sequencer
Encoder
IN0
IN1
IN2
IN3
Buffer w/Zero
Suppression
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 39
Test Setup
The hit time for each of the 16 channel inputs is digitized and encoded. Data from 4 channels are buffered and data from 4 groups of 4 channels are merged together. Raw hit times are converted to fine time through automatic calibration block. Data from all 16 channels are buffered and sent out via 4 pairs of LVDS ports @250 M bits/s.
TDC + Encoder0
10
20
30
40
50
60
10 100 1000 10000
sigm
a (p
s)
dt (ps)
8ns 6ns 4ns 2ns
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 40
Test Setup
The hit time for each of the 16 channel inputs is digitized and encoded. Data from 4 channels are buffered and data from 4 groups of 4 channels are merged together. Raw hit times are converted to fine time through automatic calibration block. Data from all 16 channels are buffered and sent out via 4 pairs of LVDS ports @250 M bits/s.
TDC + Encoder
0
500
1000
1500
2000
2500
1500 1700 1900 2100 2300 2500
dt (ps)
HistA HistB HistC
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 41
Wave Union Launcher B
Wave UnionLauncher B
In
CLK
1: Oscillate0: Hold
Sept. 2011, Wu Jinyuan, Fermilab [email protected]
A Low-Power Wave Union TDC Implemented in FPGA 42
Cell Delay-Based TDC + Wave Union Launcher
Wave UnionLauncher
In
CLK
The wave union launcher creates multiple logic transitions after receiving a input logic step.
The wave union launchers can be classified into two types:
Finite Step Response (FSR) Infinite Step Response (ISR)
This is similar as filter or other linear system classifications:
Finite Impulse Response (FIR) Infinite Impulse Response (IIR)
Wave Union?
Photograph: Qi Ji, 2010Sept. 2011, Wu Jinyuan, Fermilab [email protected] 43
A Low-Power Wave Union TDC Implemented in FPGA