california state university, northridge pcie …

63
CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE CONFIGURATION FOR DATA TRANSFER AT RATE OF 2.5-GIGA BYTES PER SECOND (GBPS): FOR DATA ACQUISITION SYSTEM A graduate project submitted in partial fulfillment of the requirements For the degree of Master of Science in Electrical Engineering By Avani Dave May 2013

Upload: others

Post on 12-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE

PCIE CONFIGURATION FOR DATA TRANSFER AT RATE OF 2.5-GIGA BYTES PER

SECOND (GBPS): FOR DATA ACQUISITION SYSTEM

A graduate project submitted in partial fulfillment of the requirements

For the degree of Master of Science in

Electrical Engineering

By

Avani Dave

May 2013

Page 2: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

ii

The graduate project of Avani Dave is approved:

Dr. Emad El Wakil Date

Dr. Somnath Chattopadhayay Date

Dr. Nagi El Naga, Chair Date

California State University, Northridge

Page 3: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

iii

ACKNOWLEDGEMENT

I wish to express my appreciation to those who have served on my graduate project.

Firstly, I would like to thank Dr. Nagi El Naga for his valuable advice and guidance throughout

my project. His constant support and encouragement has helped me learn a great deal all through

the project. He was instrumental in providing not only all the guidance but also inspiration that I

needed.

I also wish to acknowledge special appreciation to Information and Technology Department’s

Mr. Emil Henry and Mr. Armando Tellez for their help and support for driver’s installations.

Special thanks to Dr. Somnath Chattopadhyay and Dr. Emad El wakil for their valuable

comments on my work.

I also wish to acknowledge special appreciation and thanks to the best and brightest engineers of

Xilinx help and supports, who have helped me throughout my project. Their input, comments

and guidance helped immensely in learning the finer details of the subject, grasping new ways of

learning concepts and successfully completing this project.

Finally, the patience and support from my parents, family and friends has been enormously

important to me while I have been engaged in the graduate project. Thanks to all of you.

Page 4: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

iv

TABLE OF CONTENTS

SIGNATURE PAGE .......................................................................................................... ii

ACKNOWLEDGEMENTS ............................................................................................... iii

LIST OF FIGURES ............................................................................................................ v

ABSTRACT ...................................................................................................................... vii

CHAPTER 1 INTRODUCTION…………………………………………………………1

1.1 Basic concept of data acquisition system…………………………………..………….1

1.2 Objective.. ..................................................................................................................... 2

1.3 Project Outline .............................................................................................................. 3

CHAPTER 2 DATA ACQUISITION SYSTEM FOR 32X32 PHOTO DETECTOR

ARRAY…………………………………………………………………………………...4

2.1 Top Level Architecture ................................................................................................. 4

2.2 Detailed Design Implimentation ................................................................................... 5

2.3 Specification and Features ............................................................................................ 6

CHAPTER 3 INTRODUCTION TO PCIE……………………………………………..9

3.1 Background of Computer Bus Systems ........................................................................ 9

3.2 Why to choose PCIE ................................................................................................... 11

3.3 PCIE Basics ................................................................................................................ 12

3.3.1 Working principal ……………………………………………………………….12

3.3.2 Differential Signaling............................................................................................ 13

3.3.3 Lanes ..................................................................................................................... 14

3.3.4 Transmission Rate ................................................................................................. 15

3.4 LogiCore PCIE Block………………………………………………………..............15

3.5 Protocol Layer......……...…………………………………………………………….17

CHAPTER 4 IMPLIMENTATION OF PCIE…………………………………………..18

4.1 Hardware Setup ........................................................................................................... 18

4.2 Driver’s Installation .................................................................................................... 19

4.3 Software Installation ................................................................................................... 20

4.4 Logic Core PCIE Generation..…………………………………………….................22

4.5 Programming ML605……………………………………………………………..….30

CHAPTER 5 TESTING AND VERIFICATION……………………………………….37

5.1 PCIE Functional Testing……………………………………………………...............37

5.2 Simulation ................................................................................................................ ...49

CHAPTER 6 CONCLUSION AND FUTURE SCOPE .................................................. 54

REFERENCES ................................................................................................................. 55

Page 5: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

v

LIST OF FIGURES

Figure 1.1 Block diagram of Data Acquisition System…………………………………...2

Figure 2.1 Top Level Design of Data Acquisition System…………………………..........4

Figure 2.2 Detailed Design Implementation of data Acquisition system…………………5

Figure 2.3 Pulse Data Transfer……………………………………………………………6

Figure 3.1 Various buses connected to the CPU............................................................... 10

Figure 3.2 Comparison of technology and data transfer rate….………………………....11

Figure 3.3.1 PCIE and motherboard socket connector……………………………..……12

Figure 3.3.2 A Differential signal pair and subtract or…………………………………...13

Figure 3.3.2 B signal pulse and noise subtraction. ........................................................... 13

Figure 3.3.3 A PCIE x1 four-wire lane configuration....……………………..…………..14

Figure 3.3.3 B PCIE lane connectors with speed grad ..................................................... 14

Figure 3.4 Functional Block Diagram and interfaces for logiCORE IP ........................... 15

Figure 3.5 Protocol Layers ................................................................................................ 17

Figure 4.1 PCIE x 8 slot connected to ML605 (Lab server) ............................................. 18

Figure 4.2 Driver’s installation and com-port opening…………………………………..19

Figure 4.3 A Tera-term connections ................................................................................. 20

Figure 4.3 B DIP switch S1 setting (1000) ....................................................................... 20

Figure 4.3 C Compact Flash Card Insertion…………………………………………...…21

Figure 4.3 D Built In System Test menu (BIST) .............................................................. 21

Figure 4.4.1 A Core Generator-New Project .................................................................... 22

Figure 4.4.1 B Select Device Parameter ........................................................................... 23

Figure 4.4.1 C Customizing PCIE Core…………………………………….……..……...23

Figure 4.4.1 D Clock Parameter Setting ........................................................................... 24

Figure 4.4.1 F Base Address Register(BAR) setting ........................................................ 25

Figure 4.4.1 G Vendor ID Setting………………………………...……………………..26

Figure 4.4.1 H Xilinx development Board ML605 selection ........................................... 27

Figure 4.4.1 I Reference Clock Frequency Select ............................................................ 27

Figure 4.4.1 J PCIE core generate .................................................................................... 28

Figure 4.4.1 K Project IP……………………………………………………………..…..28

Figure 4.4.1 L Script to Generate Core ............................................................................. 29

Figure 4.4.1 M Script to Make routed.bit ......................................................................... 29

Figure 4.5 A S1 an S2 switch settings .............................................................................. 30

Figure 4.5 B IMPAC ...………….......................................................................................31

Figure 4.5 C PROM .................................................................................................. …..31

Figure 4.5 D Setting PROM Parameters………………………………………………….32

Figure 4.5 E Setting PROM Parameters ........................................................................... 32

Figure 4.5 F Setting PROM Parameters………………………………………………….32

Figure 4.5 G loading.bit file .............................................................................................. 33

Figure 4.5 H BPI settings .................................................................................................. 33

Figure 4.5 I Generate File…………...…………………………………………………...34

Figure 4.5 J Boundary scans……………………………………………………...............34

Figure 4.5 K Initialize chain……..……………………………………………………….35

Figure 4.5 L Select Device xc6vlx240t……………………………………………….….35

Figure 4.5 M SPI/BPI PROM setting…………………………………………………….36

Page 6: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

vi

Figure 4.5 N Programming Flash…………………………………………………………36

Figure 5.1 A PCI TREE…………………………………………………………………..37

Figure 5.1 B Intel’s x8086 PCIE bridge device ……………….………………………....38

Figure 5.1 C Xilinx’s ML-605 device…………………………………………………….38

Figure 5.1 D Configuration registers……………………………………………………..39

Figure 5.1 E Configuration registers editing……………………………….…………….40

Figure 5.1 F Configuration Register Read……………………………………………….41

Figure 5.1 G Memory read (BAR)……………….……………….………………………42

Figure 5.1 H Memory read (Content)……………………………………………………43

Figure 5.1 I Memory write……………………………………………………………….43

Figure 5.1 J Memory write using file…………………………………………………….44

Figure 5.1 K Memory writes using count value…………………………...……………..45

Figure 5.1 L Image Transfer………………………………………….…………………..46

Figure 5.2 A Design compiling………………………………………….………………..47

Figure 5.2 B Detailed Analysis Report ............................................................................. 48

Figure 5.2 C Configuration Register Read ....................................................................... 49

Figure 5.2 D BAR Data Read………………………………………………………….…49

Figure 5.2 E Data TLP……………………………………….………………….………..50

Figure 5.2 F Physical Layer Memory Read Packet……………….……………………...50

Figure 5.2 G Physical Layer Data TLP ............................................................................. 51

Page 7: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

vii

ABSTRACT

PCIE CONFIGURATION FOR DATA TRANSFER AT RATE OF 2.5-GIGA BYTES PER

SECOND (GBPS): FOR DATA ACQUISITION SYSTEM

By

Avani Dave

Master of Science in Electrical Engineering

In a modern era, most of the consumer and industrial electronic devices use High Speed

data transfer. High speed data transfer can be achieved with different protocols like I2C, SPI,

AGP, PCI, and PCIE. A 32x 32 photo detector data acquisition system is a group project which

needs high speed data transfer at approximate rate of 2.5 Giga Bytes per Second (GBPS).

This project is part of this group project, which includes the research work of selecting right

protocol, hardware and software requirement for such high speed data transfer. Followed by,

detailed configuration and implementation of Xilinx’s Virtex 6 FPGA integrated block for PCI

Express v1.3 with x8 Gen 1. Some testing and verification on the PCIE core using-- Xilinx’s

chip scope pro, ISE simulation, DMA performance demo application xapp1052 and a third party

tool PCITREE are also performed. The core is generated and simulated in Verilog. The PCIE

core performance achieved is 2.3 GTPS for read and 2.8 GTPS for write using DMA

performance demo xapp1052. This module and the instantiated core can be used at five different

levels for data transfer at rate of ~2.5 GTPS for the main project after making EZDMA module

and main memory mapping.

Page 8: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

1

CHAPTER 1

INTRODUCTION

High speed and controlled data processing and communication are the demand of today’s users.

The process of collecting, processing, controlling and transferring data all together in easy terms

called “Data Acquisition System”. Speed is of the essence, when it comes to transferring data in

any electronic device/gadget today. Also, today’s user wants more controlled data at the fastest

speed of communication. Nowadays, many high speed data transfer protocols are available such

as fiber channel, Peripheral Component Interconnect (PCI), Peripheral Component Interconnect

Express (PCIE), Accelerated Graphics Protocol (AGP), Serial Peripheral Interface (SPI),

Universal Serial Bus (USB), wireless, Ethernet. User can select the protocol for communication

based on the application and speed needed for data transfer. Selection of right protocol is one of

the essential building blocks for high speed data transfer, which in consequences also one of the

major requirements for any data acquisition system.

Our group project is about 32 x 32 Data Acquisition systems and in that we need high speed data

transfer at approximate rate of 2.5-Giga Bytes per second (GBPS). By taking part in this group

project, we had a chance to learn new technologies available for high speed data transfers in

market, and make their practical implementation working for our project. Also, we had an

opportunity to apply all our digital fundamentals, programming knowledge, and problem solving

skills, which we gained during our entire course work of master engineering at California State

University, Northridge.

1.1 Basic concept of data acquisition system

The data acquisition system consists of three main functions. First is to collect data. The data

could be manmade or naturally occurring like temperature, humidity, wind speed, image. Data is

usually analog in nature. Transducers / sensors are used to get data in the form of electrical

signals. Second function is to convert data to the appropriate format. Better immunity to noise,

easy to storage, reproduce and process at high speed data transfer are main advantages of digital

data. The analog electrical signal needs to be converted into digital form. This is done by

different bit rate and speed grade analog to digital converters (ADC). Third function is to -

control, processing and transferring data. The data controlling and processing in most cases is

Page 9: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

2

done by -personal computers, servers or controller units. Data transfer is done by various

methods or protocols like I2C, SPI, USB, Wi-Fi, Ethernet, PCI, PCI-Express (PCIE) etc.

server

Data processing Transducer DataADC

Figure 1.1 Block diagram of Data Acquisition System

Figure 1.1 is a simplified block diagram of data acquisition system. Sensors or transducer which

converts the physical phenomena to electrical signals i.e. Voltage or current or into some

electrical property like resistance or capacitance. The electrical signal is the converted to digital

form by using Analog to Digital Converter (ADC). After which, it can be used for further

processing and transferring at high speed. The Digitized multi-bit data is stored in to some

memory or given to control and processing unit like computers.

Depending on the application, various algorithms are used to process and represent the data in a

meaningful way. Depending on the application and cost, the size and complexity of data

acquisition systems varies. For applications such as medical, military or space accuracy of the

system is extremely critical. Therefore, these systems have complex signal conditioning and use

a wider resolution to represent analog data. In another kind of applications, like consumer

appliances or vehicles, cost is more valuable than accuracy. In addition to the above mentioned

applications, data acquisition systems are found in places such as monitoring weather or other

geological phenomena, photography, sport, testing, verifying and controlling prototypes,

robotics, communication monitoring and controlling industrial machines.

1.2 Objective

The objective of this graduate project is to design and implement Xilinx’s virtex6 FPGA based

PCIE interface to get data transfer rate of 2.5-GBPS on ML-605 evaluation kit.This can be used

for our group project of data acquisition system. Here, we got a chance to learn about PCIE IP

core. We had an opportunity to configure the hardware and software setup for PCIE core

generator IP. Initially, we have to research on minute details, starting from hardware and

Page 10: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

3

Operating System (OS) selection needed for making PCIE work, the drivers and software

installation, configuration of IP core’s internal registers, selecting lane width. Finally, testing and

verification of the PCIE core module. We are going to use this PCIE core generator module at

five different places in the main project for high speed data transfer.

This project is a group effort, where we can apply all the concepts and logic design knowledge,

which we learnt during my master’s course work to implement the system. The main project is

data acquisition system, which is capable of reading and controlling data from an array of 32 X

32 photo diodes. Which is then given to ADC, this system generates 12-bit digital data for each

analog sample. We have to rescan the 32 x 32 bit frame 18 times in one second. So, very large

amount of data would be generated by such a system, which needs to be processed at high speed.

Also, In between two bursts of 18 frames there should be a significant amount of time to transfer

and process the collected data from on board memory to a computer. The details specifications

are discussed in the next chapter.

1.3 Project Outline

This report is organized as follows. The project report begins with a general description of data

acquisition systems, the objective of doing this project and outline of project report. In Chapter

II, Main Data Acquisition System project, its top level and detailed design is discusses and some

of the specifications and features were listed. Chapter III includes details of PCI and PCIE data

transfer. In Chapter IV, implementation of PCIE interface on ml605 virtex 6 Xilinx’s FPGA was

explained in details. Chapter V focused on testing and transferring data using PCIE. Chapter VI,

summarizes the results of newly proposed PCIE methods and provides some thoughts on future

work.

Page 11: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

4

CHAPTER 2

DATA ACQUISITION SYSTEM FOR 32 X 32 PHOTO DETECTOR ARRAY

The Project of 32 x 32 photo detector array based data acquisition system is started with the

research work of selecting best architecture, devices-components, and protocols for the design.

Main goal here is to collect data from 32 X 32 photo detector array and store it in levels of FPGA

for future processing and then transfer it at ~2-GBPS. This chapter gives you overview of our

main group project “32 x 32 data acquisition system” by providing top level design and detailed

architectural details and some of the important features and specifications.

2.1 Top Level Architecture

Figure 2.1 shows top level architecture of data acquisition system. Complete data from 32 x 32

photo detector array is called one frame. The frame is divided in to four segments. These

segments are called segment-0, segment-1, segment-2, and segment-3. Each segment has 4 x 4 =

16-photo detectors modules. Each module has 4 x 4 = 16-photo detectors.

Figure 2.1 Top Level Architecture of Data Acquisition System

Page 12: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

5

As it can be seen from figure 2.1, each segment has its own FPGA for processing and controlling

data. Each photo detector’s output is given to respective 12-bit analog to digital converter.

Outputs from ADCs, from respective photo detector modules and segments, are given to

respective FPGA’s BRAMs as indicated in figure 2.1 by FPGA blocks as 1, 0, 2 and 4. Then,

based on the 4 x 1 multiplexer logic of level-2 FPGA, any one FPGA from level-1 is selected to

transfer data for selected row, and that data is stored in second level FPGA. Then, the data is

read and transferred using PCIE at the rate of ~2.5-GBPS to server for monitoring.

2.2 Detailed Design Implementation

FPGA-0 controller

4X1 MUX

PCIEP

CIE

2.5GBPS-DATA CLK

BRAM

CLK Generator

adad

EZDMA

ADS5281-ADC

PC

IE

SPI

CLK Generator

DATA

144MHz-CLK

12MHz-CLK

12-bit digital DATA,144MHz

DATADATA

CLK

CLK

BRAM

CLK

CLK

DATACLK

CLK

DATA

FPGA-0 controller

Level-2 controller

DATA

32x32 photo detector array

Person server to control data

Figure 2.2 Detailed Design Implementation of data Acquisition system

Page 13: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

6

The detailed design implementation of the project is illustrated from Figure 2.2. Firstly, the data

is scanned from photo detector and send via SPI-bus to ADS 5281. Secondly, the EZDMA

control component was configured to use a TLP load size of 256-bytes at 250-MHz frequency. it

also supports four outstanding requests namely -- one DMA channel, one local address width of

16-bit, local memory read latency of one clock cycle, and communication link control to PCIE

bus. Third task is to configure the PCIE IP core to perform FPGA to FPGA data transfer at the

rate of ~2.5-GTPS with virtex-6 based ml605 PCIE x8 Gen1. The same procedure is repeated for

all four FPGAs (segments of photo detector arrays).

On the second-level FPGA, the main function is to receive the data via PCIE bus from all four

FPGA’s namely FPGA-0, FPGA-1, FPGA-2, and FPGA-3 based on the 4 x 1 multiplexer

selection logic. So, that accurate line information will be reproduced. This data will be stored on

BRAM. Data can be transferred with the required clock speed via PCIE for final reproduction on

screen.

2.3 Specifications and Features

We have 32 x 32 array of photo detectors spaced at 5-mm, pulse comes in, to initiate sampling at

about 1-kHz. We are taking 18-samples of each frame. Each frame is having data from 32 x 32

detector detectors. Which are captured at 3-MHz. Data must be sampled, digitalized, read and

send out before the next data set begins. Data will be mixed into a single data stream by a data

recorder at about ~2.5-GBPS. It takes 1.5-us to collect 18-samples frames as shown in Figure

2.3.

Figure 2.3 Pulse Data Transfer

1.5µs to take 18 sample frames *

Page 14: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

7

Each module has a 4 x 4= 16 photo detector. So, each segment has 16 x 16 = 256-photo detectors

information and such 4-segments are there, in total 256 x 4 = 1024-photo detectors information

there in one frame. Each such frame is scanned 18-times, which means, total of 1024 x 18 =

18432-information packets are given to analog to digital converter IC-ADS5281.The IC-

ADS5281 is 12-Bit Octal-Channel ADC at 65-MSPS. It converts one analog input signal to 12-

bits digital output, which makes 18432 x 12 = 221184-bits total data, for each sample frame. The

data must be sampled, digitalized and read out before the next data set begins. Data should be

transmitted at ~2.5-GBPS on a single stream.

In total 32 x 32 photo detectors, each will generate 12-bit digital values to complete one frame

scanning. This is repeated 18-times as we are scanning each frame 18 times. Thus, total of

221184-bits per sample will be generated. This data is transmitted at 2-GBPS = 2,147,483,648

bits per second rate.

Page 15: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

8

It takes 12.875-micro seconds to transmit 18-sample frames of data as per the above calculation.

From this, we can calculate the time each FPGA has for processing data. As we have 1-msec

/sample pulse, 1.5-usec to collect 18-frames and it takes 12.9-usec for transmitting this data on

serial interface at 2-Giga bits per second rate (GBPS). Data acquisition system has 0.9999-

Seconds to process data generated during each sample pulse.

Lastly, let’s calculate the number of bits to be processed by each FPGA during this time, which

is calculated as follow. Data per FPGA segment is 55,296-bits per sample pulse.

0.0010000 (1 msec /sample pulse) - 0.0000015 (1.5 usec to collect 18 frames) - 0.0000129 (12.9 usec for serial 2 Gbyte) - ---------------------------------------------------------

0.99985 seconds to process collected data

18 samples

12 bits per sample

X 256 photo detectors per fpga

------------------------------------------

55,296 bits per fpga per sample pulse

Page 16: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

9

CHAPTER 3

INTRODUCTION TO PCIE

In today’s modern era of communication, use of high speed data transfer system is must. For

high speed data transfer, keyed features are - the data transfer rate, method of communication

and configuration protocols. From which, data transfer rate is mainly dependent on the data

transfer protocol and the method of communication. These days, all high speed data

communication are digital. Digital data communication method is more secure and less

interfered by noise. The important thing is to select the protocol used for communication.

Protocol is essentially defined as, “set of rules”. So, it refers to -set of rules for data transfer.

There are many protocols like SPI, I2C, PCI, PCIE, and USB. The data transfer rate is also

reliant on the configuration of particular protocol used like PCIE Gen1 can transfer at a rate of

2.5-GTPS and Gen2 can transfer at a rate of 5.0-GTPS, when configured for x1 lanes. If we

increase the number of lanes the bandwidth will be increased accordingly.

Our project requirement is the serial data transfer at high speed of approximately 2.5-Giga Bytes

per Second (GBPS). So which protocol is to use can be finalized after having an understanding

of different computer bus system and data transfer rates for different protocols. In this section,

we are going to study basics of computer bus systems, why to select PCIE, some basic concepts

of PCIE like working principal, differential signaling, lanes and transfer rates, logiCore IP details

and working, protocol layered architecture and protocol overhead.

3.1 Background of Computer Bus Systems

20- 30 years back, the bus and the processor runs at the same speed, thus, they are able to

synchronize. Synchronization was also the reason in old computers to have only one bus. But

today, the processors speed increase exponentially with every new generation computer system

launch. So, to communicate with high speed processor, this day’s most of the computer systems

have two or more buses for a specific kind of traffic handling. Figure 3.1 shows how various

buses were connected to the CPU.

Page 17: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

10

Figure 3.1 Various buses connected to the CPU

A classic PC has mainly two types of buses: type one is called as “the system bus or local bus”,

which connects the system memory and the microprocessor (central processing unit). This is also

the fastest bus and sometimes called as backend bus as it is close to CPU and system memory.

Another type of bus is a slower one, which communicates with hard disks, sound cards etc. This

are also called front end bus, as it is near to interfacing device. These slower buses connect to the

system bus, through a bridge for transferring and integrating data from the other buses to the

system bus. PCI bus standard is one of these kinds of frontend bus system.

There are other buses standards as well For example, to connect things like -cameras, scanners

and printers to your computer the Universal Serial Bus (USB) connection is the simplest and

cheapest way. It works on thin wire sharing configuration so that many devices can be connected

to that simultaneously. Another example for video cameras and external hard drives connection

is Firewire bus is mostly used in today’s time.

Page 18: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

11

3.2 Why to choose PCIE

The data acquisition system project needs high speed data transfer at all levels and between

FPGA’s at approximate rate of ~2-GBPS. So, we had researched on best available bus standards-

protocol and data transfer rates. Figure 3.2 shows the tabular comparison of technology and data

transfer rate.

Figure 3.2-Comparison of Technology and Data Transfer Rate

From above table, it is seen, that PCI Express1.0(x1 link) can fulfill project need.

Page 19: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

12

3.3 PCIE Basics

The General purpose IO interconnect standard is called, Peripheral Component Interconnect

Express (PCIE), which is enhanced feature version of PCI bus standard and more economical

than PCI-X. Peripheral Component Interconnect Express -as the name implies this is a peripheral

device interconnect bus standard. PCIE replaces parallel bus architecture of older version, such

as PCI and PCI-X, with new scalable serial point to point interface with packet base

transmissions.

A high-speed serial connection, which can operate more like a network rather than a bus, is

called PCI Express. PCIE has a switch, which controls several point-to-point serial connections,

which are primarily output from a switch, pointing straight to the devices point, where data needs

to go. Each device has its own committed connection. PCIE has no bandwidth sharing as normal

bus.

3.3.1 Working Principal

When the computer starts up, PCIE determine which devices are connected into the mother

board, establishes the links between them. It direct the flow of traffic and negotiates the width of

each link. The identification of devices and connections is carried out by drivers for the PCIE

device. Figure 3.3.1 shows the PCIE socket and motherboard connection for PCIE x1 and PCIE

x 16 add-in card connectors.

Figure 3.3.1 PCIE and motherboard socket connector

Page 20: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

13

3.3.2 Differential Signaling

The PCIE uses differential signaling technique, which uses two transmission lines for sending

one signal. These two signals have positive and negative voltage levels respectively. The

information signal is transmitted in positive and negative signals and at the receiver side they are

subtracted to get original signal. This technique is highly effective for noise cancellation. Figure

3.3.2 A shows the differential signaling technique and Figure 3.3.2B shows how noise will be

subtracted because of use of differential technique. This, two wire configuration is called lane or

links in PCIE.

Figure 3.3.2 A Differential signal pair and subtract or

Figure 3.3.2 B Signal pulse and noise subtraction

Page 21: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

14

3.3.3 Lanes

The PCIE bus system has two pairs of wires. One wire to send and another to receive packets

forms Lane. Packets of data move at a rate of one bit per cycle across the lane. The smallest

PCIE connection has one lane, made up of four wires as shown in the Figure 3.3.3 A.

Figure 3.3.3 A PCIE 4 wire lane configuration for x1

As PCIE is serial data transfer protocol. But it has separate path for individual signal, which is

called as lanes. Depending on the motherboard and PCIE socket connection, they are limited like

-x1, x2, x4, x8, x16, and x32 etc. Figure 3.3.3 B shows the socket types for few lanes like x1, x4,

x8and x16 respectively with speed grads

Figure 3.3.3 B PCIE Lane connectors with speed grad

Page 22: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

15

3.3.4 Transmission rates

A PCIE bus has raw data transmission rate of 2.5 Giga transfers per second (GTPS) in each

direction. The aggregate raw bandwidth of a link can be calculated using 2.5 GT/s multiplied by

the number of lanes. Second generation PCI Express devices (version 2.0 or higher) may

optionally transmit at 5.0 GT/s per lane, but are backwards compatible with the first generation

transmission rate.

3.4 LogiCore PCIE Block

Xilinx Virtex6 has Integrated Block for PCI Express. Before, going to implement it using IP core

generator in the next chapter, the main part of the project, here is to understand detailed

architecture of PCIE and how it Functions. Figure 3.4 A shows the Top level Functional Block

Diagram and interfaces for logiCORE IP Virtex6 FPGA integrated Block for PCI Express Core.

Figure 3.4 Functional Block Diagram and interfaces for logiCORE IP

Page 23: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

16

The LogiCore IP v6 for PCIE core internally instantiates integrated block for PCIE (PCIE_2_0),

which consists of Physical, Data link and Transaction layers based on PCIE base specification

layering model. The PCIE block, when invoked, generates five interface modules namely,

1) System (SYS) interface

2) PCI Express (PCI_EXP) interface

3) Configuration (CFG) interface

4) Transaction (TRN) interface

5) Physical layer (PL) control and status interface

The system interface consists of reset and system clock. The reset signal is asynchronous active

low. The system clock should be 100-MHz, 125-MHz, or 250-MHz. The PCI Express

(PCI_EXP) interface consists of differential signals pair for transmitting and receiving of data on

multiple lanes. The Transaction (TRN) interface generates and process the transactions for each

lanes.The Physical Layer (PL) interface checks the status of the links and provides control of link

transfers.

The data transfer between the modules inside the core is done using information packets. These

packets are generated to convey necessary information from the transmitter to the receiver at

Data Link layer and Transaction layers. Necessary header’s bit length and parity information are

added from these layers for secure communication between transmitting and receiving

components. At the receiver side all the layers who receive the information packets -processes

the packets, strips it and transfers to the next layer based on the information it has in the data link

header. As a result, the physical level information signal is converted into the Data Link layer

information packet and then into the transaction layer information packet.

The data transfer is performed using requests and completions. PCIE transactions have four basic

types of requests; message, configuration, I/O, and Memory. Interfacing device sends the -IO

read/write, memory read/write, and configuration read/writes requests and the PCIE device

respond to it by sending completion signal. PCIE device can use Base Address Register (BARs)

to reserve memory block in the host system’s memory map. When OS assigns the address to the

block, and then BARs are programmed with these addresses. For design, we used BAR0 with the

size of 1 Megabyte.

Page 24: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

17

3.5 Protocol layers

The PCIE layered structure’s basic data flow is shown in Figure 3.5. As it is seen from the

figure, PCIE is bidirectional data transfer protocol. When user initiates a data transfer from the

source then data from the specified memory location is read by transaction layer, and it adds

error checking or parity bits and the header, which points to the destination location to data.

Thus, data packet is generated and transmitted to data link layer. Then, the data link layer adds

the destinations Mac address and sequence information bits to the data packets and transfer it to

the physical layer in the form of bits. The physical layer transfers the data inform of bits in serial

mode based on the number of lanes selected and speed grades. At the receiver side, all reverse

presses will take place.

Figure 3.5 Protocol Layers

Page 25: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

18

CHAPTER 4

IMPLIMENTATION OF PCIE

In this chapter, the hardware-software implementation, issues and solutions are discussed for

PCIE core setup to get data transfer rate of 2.5-Giga Transactions per second. The complete

implementation is divided in three parts namely hardware requirements and setup, second is

drivers’ installation, and third is software installation and IP core installation and generation.

4.1 Hardware

First we need a server or a computer system, which should have x8 PCIE slot. As our project

requirement says, we need to get ~2.5-Giga Bytes per second data transmission rate. For that we

need to use PCIE x8 configurations, and we need x8 slot for physical connection. Figure 4.1

shows lab server’s PCIE x 8 slot connected to ML-605 board.

Figure 4.1 lab server’s PCIE x 8 slot connected to ML605

Page 26: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

19

Other hardware requirements are Xilinx evaluation board ML-605, a server with windows Xp sp-

3 installed, updated and working, Programming cable to program the ml605 board. Connect one

USB Type-A to mini-B 5-pin cables from your PC to J21 on the ML-605 board.

4.2 Driver

For Xilinx’s ML-605 board to be detected by the server, proper driver should be installed. Install

the latest CP210x VCP Win2K/XP/2K3 Drivers for Server from www.silabs.com. make sure that

the proper version and operating system should be selected. For our case, we had wrong drivers,

and it took us few days to find this right link.

Follow the driver installation steps from the user guide ug533.pdf of ML-605. Once the driver is

installed, you can select silicon labs CP210x USB to UART Bridge (COM3).from there do right

click and go to > properties >advance setting from device manager and make COM3 port open

for communication.

Figure 4.2 Driver installation and com-port Opening

Page 27: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

20

4.3 Software

Once the hardware is setup and driver is installed, now the time is to configure the device for

PCIE use. Before that just to make sure, that Xilinx ML-605 is connected and working properly.

Use any terminal connecting tool to connect to com-port3 of the system using ssh. We are using

tera-term. Shown in Figure 4.3 tera-term connection and port baud rate settings.

Figure 4.3 A Tera-term connections

Make the DIP switch s1 on ML-605 as 1000 (position 4 to position1) for making compact flash

demo designs working as shown in the Figure 4.3 B DIP switch S1 setting.

Figure 4.3 B DIP switch S1 setting (1000)

Page 28: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

21

Insert the compact flash card to ML-605 as shown in Figure 4.3 C and press Sw3 switch the

ACE system reset push button on ML-605.

Figure 4.3 C Compact flash card insertion

If everything is correct, compact flash card contents built in system test designs which can be

used to verify system board functionality. You will be pointed to the screen, from where you can

run the demo programs on ML-605 for checking functionality of different modules. This screen

will look like Figure 4.3 D Built in System Test (BIST).

Figure 4.3 D Built In System Test menu (BIST)

Page 29: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

22

4.4 Logic Core Generations

For easy to implement logic cores, Xilinx provides CORE Generator tool to generate code that

instantiate the core. I used CORE Generator for modules instantiations of PCIE core which

includes BRAM core and Clock Generator core. This section describes the instantiation of this

cores and configuration points, which need to be taken care for complete setup to work.

4.4.1 Xilinx PCIE core

For generating PCIE core, follow the steps given in xtp044.pdf. Here, I am focusing only on the

main steps. Open Xilinx core generator from the start menu and make a new project folder and

save everything in this folder. Figure 4.4.1 A shows new project screen for core generator.

Figure 4.4.1 A New Project screen Core Generator

Next make sure, you select the right device and speed grade as shown in Figure 4.4.1 B. We are

using Xilinx ML-605 evolution kit. It has virtex6 xc6vlx240t device with FPGA package of

ff1156 and speed grade -1. If any of the parameter is miss-matching, the core will not work.

Figure 4.4.1B shows the device parameter selection for core.

Page 30: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

23

Figure 4.4.1 B Device Parameter Select

These are highly important configuration settings for PCIE core to work properly, and we can

configure various –speed grades, lanes, memory starting location, base address register

initialization setting from here.

Figure 4.4.1 C PCIE core customizing

Select virtex6-integrated block for PCI Express version 1.3. As indicated in

Figure4.4.1C.Customize the core for number of lanes =8.data transfer rate is 2.5GTPS for

Page 31: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

24

generation-1 PCIE device core and 5.0-GTPS for generation-2 PCIE devices core. Select

interface clock frequency to 250-MHz. Figure 4.4.1 D shows parameter settings.

Figure 4.4.1 D Parameter Setting

Page 32: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

25

Then, select Base Address Register-BAR0 for BRAM starting location, width and size as shown

in the Figure 4.4.1 F. Here, we have used 32 bit width and 1-Mega byte size is selected. Uncheck

all other BAR registers like BAR1, BAR2 etc. BAR serves two purposes, initially they server as

a mechanism for the device to request blocks of address space in the system memory map. After

the bios determine what address to assign to the device, the BARs are programmed with

addresses and the device uses this information to perform address decoding.

Figure 4.4.1 F Base Address Register setting

Page 33: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

26

Next step is to select vendor ID 10EE, Device ID 6018, Revision ID 00, Subsystem Vendor ID

10EE, subsystem ID 0007. Figure 4.4.1 G shows all the necessary settings to be made for core

generator to generate PCIE v1.3 core. Leave all other parameters with default settings and move

to page 9.

Figure 4.4.1 G Vendor ID

Page 34: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

27

From here, you can select the hardware board on which you want to dump the IP core. In product

selection menu select ML-605 as shown Figure 4.4.1 H screen short below

Figure 4.4.1 H Xilinx development Board selection

The last step, is to select reference clock frequency. The Integrated block for PCIE allows you to

select reference clock frequency from 62.5-MHz, 125-MHz, and 250-MHz (etc. since). we had

selected the clock frequency of 250-MHz for BRAM data. We are selecting the same frequency

for reference clock here as indicated by Figure 4.4.1 I.

Figure 4.4.1 I Reference Clock Frequency

Page 35: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

28

Finally, generate the core. Once the coreis generated, following screen Figure 4.4.1 J will display

to indicate that core gen has generated the core.

Figure 4.4.1 J PCIE Core generate

You can see the virtex-6 PCIE v1.3 core generated from project IP menu as shown in Figure

4.4.1 K and instantiate the component for further use.

Figure 4.4.1 K Project IP

Page 36: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

29

Run the following two scripts from the command prompt to implement the core and make the

routed.bit File, which needs to be programmed on ML-605 board for using the IP core

component of PCIE.

Figure 4.4.1 L Script to Generate Core

Figure 4.4.1 L Script will synthesis and implement the PCIE virtex 6 v1.3 IP core design. The

simulation results files are generated in a new folder name results.

Figure 4.4.1 M Script to make routed.bit

Page 37: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

30

Figure 4.4.1M script will make routed.bit, A bit map file for using it to load the ip core on ML-

605.

4.5 Programming ML-605

As the routed.bit fileis generated, it needs to be loaded on the ML-605 board. Follow the steps to

program ML-605 with routed.bit file.

Connect Mini-B cable to the USB JTAG connector to USB type-A on ML-605 board.

Set the S1 Switch to 0xxx(x=don’t care, position4position1) this disables the compact

flash.

Also, set the S2 to 011001 (1=on, position6position1) this selects slave select MAP

(position 5, 4 and 3), platform flash (2) and EXT CClk (1, for PCIE compliance). Figure

4.5 A shows the switch s1 and s2 setting.

Figure 4.5 A S1-S2 switch settings

Page 38: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

31

Now, run impact from the results directory, which is generated, during implementation part.

Figure 4.5 B shows impact running from the command prompt.

Figure 4.5 B IMPACT

This will launch impact to program ML-605 FPGA. Select prepare PROM file as shown in

Figure 4.5 C.

Figure 4.5 C PROM

Page 39: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

32

Now select PROM file parameters as shown in the screen-shorts in Figure 4.5 D and Figure 4.5

E, Figure 4.5 F respectively.

Figure 4.5 D Setting PROM Parameters

Figure 4.5 E Setting PROM Parameters

Figure 4.5 F Setting PROM Parameters

Page 40: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

33

Add the routed.bitfile, to design as shown in Figure 4.5 G.

Figure 4.5 G loading routed.bit File

Select the Multi boot BPI revision and data files assignment default values, as shown in Figure

4.5 H.

Figure 4.5 H BPI settings

Page 41: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

34

Select operations menu and generate file as shown in Figure 4.5 I.

Figure 4.5 I Generate file

Select Boundary scan from IMPACT Flows menu in Figure 4.5 J

Figure 4.5 J Boundary scans

Page 42: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

35

From the file menu select initialize chain, which will prompt you to select the device for loading

.bit file. Figure 4.5 K shows initialize chain.

Figure 4.5 K Initialize Chain

Now select the device xc6vlx240t and right click to add spi/bpi flashes to it as shown in Figure

4.5 L.

Figure 4.5 L Select Device xc6vlx240t

Page 43: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

36

Make the SPI/BPI PROM settings as shown in Figure 4.5 M

Figure 4.5 M SPI/BPI PROM setting

Finally, right click the flash and program it. Figure 4.5 N shows programming the flash. Make

sure to check the erase before programming option.

Figure 4.5 N Programming flash

Finally, we wrote VHDL code to instantiate the PCIE core and use for our application. The code

and all required software and generated files are provided with the CD attached in the project.

Page 44: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

37

CHAPTER 5

TESTINGAND VERIFICATION

The testing and verification of the PCIE core component is done in two ways. First way is the

PCIE functional testing and another is simulation. The main goal here is to check that it is giving

data transfer rate of 2.5-GTPS.

5.1 PCIE Functional Testing

Here, I used a third party tool called PCIE tree to check that, PCIE core can be configured at the

register level and I can write and read the data from memory using this core.

Download the tool from the website www.pcitree.comand run.exe. Then from the start menu

open PCIE tree as shown in Figure 5.1 A

Figure 5.1 A PCI TREE

Page 45: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

38

This tool will show all the devices connected to the system board using PCI, PCIE, AGP

connectors. Figure 5.1 B shows the tool detected Intel x8086 PCIE Bridge connected to the

system.

Figure 5.1 B Intel’s x8086 PCIE bridge device

Now search for the Xilinx PCIE card with the VID and DID a number as we entered in the

configuration of the core. Figure 5.1 C shows the Xilinx PCIE device.

Figure 5.1 C Xilinx’s ML-605 device

Page 46: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

39

Now the time is to check, that we can configure the configuration registers, read and write from

the memory. Select from the radio button Number of configuration registers to 64 and hit refresh

dump. This will list all the configurable register in the window next to it as shown in Figure 5.1

D.

Figure 5.1 D Configuration Registers

Page 47: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

40

Select any development register, which is available to see its location and value writteninit, also

the next memory location it is pointing. Figure 5.1 E shows the register value editing and

checking. Here, we have selected register 48. As you can see, it shows me the current

configuration value in edit configuration blocks in hex format. we can change the value for any

configuration register. Also, it points, to the next configuration register location by value 6005

which means next configuration register available is 60.

Figure 5.1 E Configuration registers editing

Page 48: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

41

Now, there are a few configuration registers, which will tell us valuable information about the

core like generation of the PCIE core, maximum number of lanes configured and supported,

from which we can calculate the data transfer rate. Figure 5.1 F shows this configuration

registers. Here, we can see Link Capability Register 6C, which has value F481, the last two

digits 81 implies that the core is having x8 lanes and gen 1 configuration. Based on this, we can

say that it has data transfer rate of 2.5GTPS per lane, which means 2.5-GBytes/Sec as we have 8

lanes.

Figure 5.1 F Configuration Register Read

Page 49: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

42

Next is to check read and write from the memory using PCIE core. As in core implementation,

we had selected Base Address Register 0 with the size of 32-bit. This is shown in Figure5.1 G.

Then hit yes and it will show all 1-Kilo byte memory content.

Figure 5.1 G Memory Read (BAR)

Page 50: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

43

As we do not have any value which is to be loaded into BRAM, it should show all value zeros or

blank. Figure 5.1 H shows memory read values all blank.

Figure 5.1 H Memory Read (Content)

Figure 5.1 I Memory Write

Page 51: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

44

To write on the BRAM with the tool, since we don’t have data from actual image file yet. We are

using count values to be written on to the memory by selecting all locations and selecting radio

count button, as shown in Figure 5.1 I. But, if we have data file, we can directly load the file

from the load file option as shown in the Figure 5.1 J.

Figure 5.1 J Memory writes using file

Page 52: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

45

But, here to write count values into the memory locations, select all the memory location and tick

on count and hit write memory button, then do refresh view. Figure 5.1 K shows the output of

hex value.

Figure 5.1 K Memory writes using count value

Page 53: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

46

I, also checked by transferring image via PCIE and getting it back, by using xilinx demo image

filtering module. As shown in Figure 5.1 L, this is reference design, which we programed on

PCIE of ML-605. run the image filtering with the identity setting. As you can see, both the

images are identical because, we did not use any filtering action.but, it proves my point, that we

can transfer imageusing it and reproduce it.

Figure 5.1 L Image Transfer

Page 54: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

47

You can also check, The performance of the PCIE, by configuring xilinx’s XAPP1052

DMA performance Demo IP core logic. This application loads the demo DMA read and write

files, which you can load on Board.

We used xapp1052.zip DMA performance demo module from xilinx download .after

downloading the design by following Xapp1052.pdf document you can invoke the GUI as shown

in the Figure 5.1 M called DMA Performance Demo GUI . User can set the data type,TLP

size,read or write perameter to check.

Figure 5.1 M DMA Performance Demo GUI

Page 55: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

48

We checked the DMA read and Write to check the actual data transfer rate of xilinx virtex6

FPGA PCIE v1.7 performance. for read performance checking select read and hit start in Figure

5.1 M screen then, as shown in Figure 5.2 N DMA performance is half of the number seen in

screen in Mbps =47610.61/2=23805.30 ,Which is ~2.3 GTPS.

Figure 5.1 N Read Performances

Then same way for Write perfoemance select write and hit start in Figure 5.1M screen then, as

shown in Figure 5.2O DMA performance is half of the number seen in screen in Mbps

=56109.59/2= 28054.79,Which is ~2.8 GTPS.

Figure 5.1 O Write Performance

Page 56: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

49

5.2 Simulation

Load the design in xilinx ise and compile the core,do the simulation. Before that, check the

report for detailed analysis. The code and all the simulation files are provided with the attached

CD. I used verilog for component generation and simulation. I used Xilinx ISE 13.2 to simulate

the design for checking configuration register’s content as shown in Figure 5.2 A Design

Compilation.

Figure 5.2 A Design compiling

The detailed analysis report is shown in the Figure 5.2 B. This shows the number of LUTs,

FIFO, and Other logic components used to form the design.

Page 57: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

50

Figure 5.2 B Detailed Analysis Report

Page 58: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

51

The simulation waveform is generated, but it’s all core generated output, so it is hard to read and

interpret the meaning of each and every signal. Figure 5.2 C shows the Configuration registers

initialized with default values.

Figure 5.2 C Configuration Register Read

Figure 5.2 D BAR Data Read

Page 59: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

52

As we did not generated such high speed data to be read from memory, so, using ise simulation it

is hard to verify the transfer rate. But, we can check that registers are initialized with default

values. But, to see the actual data transfer through the PCIE IP core. We used chip scope pro tool

and opened the projects .cfg File from the result folder and made trace at 2 points in IP core.

Figure 5.2 E Data TLP

Figure 5.2 E shows data TLP, which is at the transaction level point on IP core by chip scope,

and it shows here trn_td is having some data value, when the trn_tsrc_rdy is active low.

Similarly, we made two points data read observation by chip scope at physical layer. As you can

see from Figure 5.2 F physical layer memory read packet. The signal phy_rd shows data read

from the memory.

Figure 5.2 F Physical Layer Memory Read Packet

Page 60: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

53

The second point on the physical layer simulation for chip scope as shown in Figure 5.2 G has

transaction layer header added to data in every phy_rd.

Figure 5.2 G Physical Layer Data TLP

Page 61: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

54

CHAPTER 6

CONCLUSION AND FUTURE SCOPE

Xilinx’s Virtex6 FPGA IP core for PCI-Express is generated and implemented for data transfer

rate of 2.5-Giga bits per second rate. We learned about detailed architecture and configuration of

PCIE IP core. The design is implemented on ML-605 evaluation kit.

Current configuration is running and giving data transfer rate of ~2.5-GBPS for PCIE gen1 x 8

lanes. Using LogiCore’s Core generator tool and with Xilinx xapp-1052, Xapp-045 application

software’s for PCIE core, we had checked the image transfer, and memory read and write

transactions. For configuring and controlling PCIE cores registers, we used third party tool called

PCITREE.

Initially, we had issues in selecting the right hardware for ML-605 evaluation kit installation.

mainly PCIE x8 port finding in the server system. Then driver installation for windows xp sp3

and Xilinx ISE 13.2 configuration. As the latest release, ISE-14.2 was having issues with the

PCIE core 1.3v and 1.6v.

Future scope to this project is PCIE core module is the main building block for high speed data

transfer for our data acquisition system. We have ADC and PCIE modules working. Now we

need to do memory mapping and EZDMA coding. After that, we can use the PCIE core

generator module at five different levels in design and our project is ready.

Page 62: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

55

REFERENCES

[1] Xilinx FPGA design modules reference materials, Retrieved on 15 September 2012 from

http://www.xilinx.com/support/documentation/ip_documentation/

v6_PCIE_ug517.pdf /

v6_PCIE_ds715.pdf/

xtp044.pdf/

xtp025.pdf/

xtp025.pdf/

ug671.pdf/

[2] Xilinx Virtex 6 IP Block FPGA design modules materials, Retrieved on 25 September 2012

from http://www.xilinx.com/products/ipcenter/V6_PCI_Express_Block.html

[3] PCIE specification reference materials, Retrieved on 15 September 2012 from

http://www.pcisig.com/specifications/pciexpress/

[4] PCIE working and understanding materials, Retrieved on 17 September 2012 from

http://en.wikipedia.org/wiki/PCI_Express/

[5] PCIE working and understanding materials, Retrieved on 15 September 2012 from

http://computer.howstuffworks.com/pci-express.html/

[6] EBook: PCIE reference material, Retrieved on 15 September 2012 from

http://www.mindshare.com/files/ebooks/pci%20express%20system%20architecture.pdf/

[7] Xilinx Virtex 5 IP core PCIE Block design modules materials Retrieved on 20 October 2012

from http://www.em.avnet.com/en-us/design/drc/Pages/Xilinx-Virtex-5-LXTSXT-PCI-Express-

Development-Kit.aspx/

Page 63: CALIFORNIA STATE UNIVERSITY, NORTHRIDGE PCIE …

56

[8] UART and Bridge connectivity driver reference, Retrieved on 15 September 2012 from

http://www.exar.com/connectivity/uart-and-bridging-solutions

[9] Basic Knowledge of data acquisition system By National Instruments, Retrieved on 15

September 2012 from http://www.ni.com/white-paper/3536/en