affymetrix analysis data model (aadm) and data filestkirsten/presentations/aadm-210520… · aadm...

26
Interdisciplinary Centre for Bioinformatics Affymetrix Analysis Data Model (AADM) and data files Toralf Kirsten

Upload: others

Post on 05-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Interdisciplinary Centre for Bioinformatics

Affymetrix Analysis Data Model (AADM)and data files

Toralf Kirsten

Page 2: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Agenda

• Introduction• Affy‘s data files• AADM – Affy analysis data model

• Affy‘s MicroDB• Overview AADM• Dimensions / Facts• Stars

• Matching Affy Suite facts to AADM• Access to AADM• Conclusion

Toralf Kirsten

IZ2

2

BI 2

1.05

.200

Page 3: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

IntroductionAADM - Affymetrix Analysis Data Model

Standard ???

GATC - Genetic Analysis Technology ConsortiumMolecular Dynamics and Affymetrix

AADM Notation

scheme

biological item

atom

cell

chip

probe set

probe pair

probe

Hierarchy

Toralf Kirsten

IZ2

3

unit – subset of chip where cells have some similar characteristicsblock – subset of a unit where cells have similar characteristics

in gene expression studies block = biological item / block = unit BI 2

1.05

.200

Page 4: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Affy‘s data files IBack end and data logistics

Data files from experimental process Library data files

*.cif*.psi*.cdf

define an experiment

process probe array in fluidics station

scan probe array

compute cell intensities

analyze intensities

generate report

*.exp

*.dat

*.cel

*.chp

*.rpt

Affy‘ Suite

Affy‘MicroDB

Toralf Kirsten 4

IZBI

-21.

05.2

002MSDE – Microsoft Desktop Engine

SQL-Server 2000 DB with specific accessspecific file format

Page 5: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Affy‘s data files II

Toralf Kirsten 5

IZBI

-21

.05.

2002

Experimental file (*.exp)

Affymetrix GeneChip Experiment InformationVersion 1

[Sample Info]Chip Type HG_U95Av2Chip Lot 1006279Operator ??????????Sample TypeDescriptionProjectCommentsSolution TypeSolution Lot

[Fluidics]Protocol EukGE-WS2v3Wash A1 Recovery Mixes 0Wash A1 Temperature (C) 25Number of Wash A1 Cycles 10Mixes per Wash A1 Cycle 2Wash B Recovery Mixes 0Wash B Temperature (C) 50Number of Wash B Cycles 4Mixes per Wash B Cycle 15Stain Temperature (C) 25First Stain Time (seconds) 600Wash A2 Recovery Mixes 0Wash A2 Temperature (C) 25…

Ascii fileAffy Suite creation Control the experiment procedure

Page 6: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Affy‘s data files IIIImage file (*.dat)

Toralf Kirsten 6

IZBI

-21

.05.

2002

scan imageexport to *.dat/*.tifbasis for intensitiesvery large file (ca. 40MB)

Page 7: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Affy‘s data files IVCell intensities (*.cel)

Toralf Kirsten 7

IZBI

-21

.05.

2002

Ascii fileAffy Suite creationXY coordinates without probe set desc.

[CEL]Version=3

[HEADER]Cols=242Rows=248TotalX=242TotalY=248OffsetX=0OffsetY=0GridCornerUL=42 58GridCornerUR=1385 45GridCornerLR=1396 1387GridCornerLL=53 1399Axis-invertX=0AxisInvertY=0swapXY=0Algorithm=PercentileAlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004

[INTENSITY]NumberCells=60016CellHeader=X Y MEAN STDV NPIXELS

0 0 3179.5 311.6 121 0 46167.0 0.0 92 0 3633.3 410.6 123 0 46167.0 0.0 94 0 2684.5 223.0 125 0 3476.0 205.0 96 0 46167.0 0.0 12

240 247 46167.0 0.0 9241 247 3457.8 354.8 12

[MASKS]NumberCells=0CellHeader=X Y

[OUTLIERS]NumberCells=5059CellHeader=X Y

1 03 06 08 010 0

Page 8: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Affy‘s data files VChip file (*.chp)

Toralf Kirsten 8

IZBI

-05

.200

2

very large files (>10 MB)specific / proprietary file formatnot readable with other programsexport to MS Excel / *.txt files

contains measurement

21.

Page 9: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Affy‘s data files VIReport file (*.rpt)

Report Type: Expression ReportDate: 01:30PM 01/17/2002_____________________________________________________

Filename: 159U.chpProbe Array Type: HG_U95Av2Algorithm: ExpressionProbe Pair Thr: 8Controls: Antisense_____________________________________________________

Absolute Thresholds:Difference (SDT): 38.6(4.00Q)Ratio (SRT): 1.50

Absolute Decision Matrix: {{3.0,4.0}Scaled Noise (Q): 9.657Scale Factor (SF): 5.518Norm Factor (NF): 1.000______________________________________________________

Background:Avg: 52.68 Std: 0.97 Min: 51.52 Max: 54.77

Corner+Avg: 65 Count: 32

Corner-Avg: 7699 Count: 32

Central-Avg: 7731 Count: 9

Toralf Kirsten 9

IZBI

-05

.200

221

.Ascii fileAffy Suite creation Hybridization qualities

_______________________________________________________

Housekeeping Controls:Probe Set AD(5') Call(5') AD(M') Call(M')HUMISGF3A/M97935 19.3 A 172.6 PHUMRGE/M10098 34.9 A -23.0 AHUMGAPDH/M33197 681.6 P 807.7 PHSAC07/X00351 617.5 P 938.7 PHUMTFRR/M11507 -10.0 A 3.7 AM27830 45.5 A 489.9 P

__________________________________________

Spike Controls:Probe Set AD(3') Call(3') AD(all) AD(3'/5')BIOB 750.2 P 882.29 1.05BIOC 2614.4 P 2520.81 1.08BIODN 9373.5 P 5501.90 5.75CREX 22098.2 P 20005.11 1.23DAPX 4984.9 P 3134.85 3.18LYSX 29800.7 P 21886.76 2.18PHEX 13933.4 P 10421.32 2.18THRX 1248.2 P 958.20 1.66TRPNX 260.4 P 89.42 55.70

Page 10: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Affy‘s data files VIILibrary files (…/genechip/library/) probe set information (*.psi)

Toralf Kirsten 10

IZBI

-05

.200

2

[CDF]Version=GC3.0

[Chip]Name=HG_U95Av2Rows=640Cols=640NumberOfUnits=12625MaxUnit=102119NumQCUnits=13ChipReference=

[QC1]Type=10NumberCells=300CellHeader=X Y PROBE PLEN ATOM INDEX MATCH BGCell1=167 80 N 20 0 51367 0 0Cell2=167 81 N 20 0 52007 1 0Cell3=167 82 N 20 0 52647 0 0Cell4=167 83 N 20 0 53287 0 0Cell5=167 84 N 1 0 53927 -1 1Cell6=168 80 N 20 1 51368 0 0Cell7=168 81 N 20 1 52008 1 0Cell8=168 82 N 20 1 52648 0 0Cell9=168 83 N 20 1 53288 0 0Cell10=168 84 N 1 1 53928 -1 1Cell11=169 80 N 20 2 51369 0 0Cell12=169 81 N 20 2 52009 1 0Cell13=169 82 N 20 2 52649 0 0Cell14=169 83 N 20 2 53289 0 0Cell15=169 84 N 1 2 53929 -1 1

#Probe Sets: 126251 AFFX-MurIL2_at 202 AFFX-MurIL10_at 203 AFFX-MurIL4_at 204 AFFX-MurFAS_at 2010 AFFX-BioB-5_at 2011 AFFX-BioB-M_at 2012 AFFX-BioB-3_at 2013 AFFX-BioC-5_at 20…85 AFFX-HUMGAPDH/M33197_5_st 2086 AFFX-HUMGAPDH/M33197_M_st 2087 AFFX-HUMGAPDH/M33197_3_st 2088 AFFX-HSAC07/X00351_5_st 2089 AFFX-HSAC07/X00351_M_st 2096 AFFX-YEL002c/WBP1_at 2097 AFFX-YEL018w/_at 2098 AFFX-YEL024w/RIP1_at 2099 AFFX-YEL021w/URA3_at 20100 31307_at 16101 31308_at 16102 31309_r_at 16103 31310_at 16104 31311_at 16105 31312_at 16106 31313_at 16107 31314_at 16…

cell informationfile (*.cif)

[Chip]Rows=640Cols=640…

[HP]XOrigin=-7100YOrigin=8140…

[TileTypes]Type1=Expression

[Chip Servers]BaseCallProgID=GeneChip.CallGEBaseCall.1CellAvgProgID=GeneChip.PercentileCellAvg.1ViewProgID1=GeneChip.GESeqView.1

[CellAverage]Percentile=75PercentileDefault=75PercentileMin=0PercentileMax=100RejectFactor=6RejectFactorDefault=6RejectFactorMin=1…

cell data file(*.cdf)

21.

Page 11: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM IAffy‘s MicroDB

Affy‘sData Mining Tool

Toralf Kirsten 11

IZBI

-05

.200

221

.

Affy‘sMicroDB

MSDE – Microsoft Desktop EngineSQL-Server 2000 DB with specific accessspecific file format

one publish db can be opened at the same timemax. 128 experiments in one db !!!

Third party tools

*.exp

expressionarrays

*.cel

*.chp

spottedarrays

*.spt

Page 12: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM IIOverview AADM (subset)

Toralf Kirsten 12

IZBI

-21

.05.

2002

Page 13: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM III

Toralf Kirsten 13

IZBI

-05

.200

2

Categories

Chip design tables• gene chip description (name, number of rows/columns, …)• spot array description• unit description • data equivalent to CDF files (library installation)

Experiment setup tables• experiment desc. (file name• physical chip desc. (relation between experiment and chip design• target desc. (concentration, date prepared)

Analysis result tables• cell intensities• absolute gene expression• comparative gene expression

Protocol parameter tables• target preparation• experiment setup

!!!

21.

Page 14: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM IVNotation of dimensional modeling

dimension 1namedescription…

dimension 2namedescription…

Toralf Kirsten 14

IZBI

-05

.200

221

.

Dimensions• more static character• descriptions

Factsmeasurements (numbers and values)

Dimensions Experiments

Genes

Signal values

Facts

measurementfact 1fact 2fact …fact n

dimension 3namedescription…

Page 15: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Demodata vs. Real experiment datadatabase analysis

Toralf Kirsten 15

IZBI

-05

.200

2

emptyemptyCell intensities

emptyemptyBackground intensities

filledfilledRelative gene expression statistical results

emptyfilledRelative gene expression results

filledfilledAbsolute gene expression statistical results

emptyfilledAbsolute gene expression results

real experiment datademodata

21.

Page 16: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM – Dimensions IBiological item and cell hierarchy

unit type

Toralf Kirsten 16

IZBI

-21

.05.

2002

PMMMPMMM…

biological item= probe set

chip

scheme atom= probe pair

scheme cell= probe

item name{31481_s_at, …}

name{HG_U95Av2, …}

position, tbase, atom_no{13, {a,t,c,g}, {0…68}

location (x,y), pbase, feature, …{{0…639},{0…639},{Q,a,t,c,g},{QC,control}}

scheme unit

name{Expression,HybNegativeQC,…}

name, direction{{}, {0,1,2}}

Page 17: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM – Dimensions IIExperiment and analysis hierarchy

algorithm type

Toralf Kirsten 17

IZBI

-05

.200

221

.

experiment

analysisname, *.dat file name

name, analysis date

analysis algorithmname{CellAverage,ExpressionCallAbs,…}

name{Histogram,Percentile,…}

Page 18: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM – Stars IAbsolute gene expression results

absolute gene expression resulttypenumber positivenumber negativenumber allnumber usednumber in avgpm excessmm excessavg difference intensity

biological item= probe set

analysis= one record for each

analysis producedin Microarray Suite(*.chp + *.cel files)

{31481_s_at, …}

{AB_vs_FB_emp, …}

Toralf Kirsten 18

IZBI

-21

.05.

2002

Page 19: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM – Stars IIAbsolute gene expression statistical results

biological item= probe setabsolute gene expression stat result

typesignaldetection p valuepairspairs used

Toralf Kirsten 19

IZBI

-21

.05.

2002

{31481_s_at, …}analysis= one record for each

analysis producedin Microarray Suite(*.chp + *.cel files)

{AB_vs_FB_emp, …}

Page 20: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM – Stars IIIRelative gene expression results (comparative results)

Toralf Kirsten 20

IZBI

-21

.05.

2002

relative gene expression resulttypenumber increasenumber decreaseincrease ratiodecrease ratiopositive deltanegative deltafold changesignificance…

{31481_s_at, …}

biological item= probe set

analysis= one record for each

analysis producedin Microarray Suite(*.chp + *.cel files)

{AB_vs_FB_emp, …}

Page 21: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM – Stars IVRelative gene expression statistical results (comparative results)

biological item= probe setrelative gene expression stat result

typechange p valuesignal log ratiosignal log ratio lowsignal log ratio highpairs used

Toralf Kirsten 21

IZBI

-21

.05.

2002

{31481_s_at, …}analysis= one record for each

analysis producedin Microarray Suite(*.chp + *.cel files)

{AB_vs_FB_emp, …}

Page 22: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM – Stars VBackground intensities

biological item= probe set

Toralf Kirsten 22

IZBI

-05

.200

221

.

absolute gene expression atom resultbackground intensity

{31481_s_at, …}

{AB_vs_FB_emp, …}

analysis= one record for each

analysis producedin Microarray Suite(*.chp + *.cel files)

scheme atom= probe pair

(atom_no)

no data available

Page 23: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

AADM – Stars VICell intensities

biological item= probe set

Toralf Kirsten 23

IZBI

-05

.200

221

.

measurement element resultcalculated intensitystandard deviationnumber pixel usedoriginal intensitymask flag

{31481_s_at, …}

{AB_vs_FB_emp, …}

analysis= one record for each

analysis producedin Microarray Suite(*.chp + *.cel files)

scheme cell= probe

(x location, y location)

no data available

Page 24: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Access to AADM4. Link to the publish database

- „Datei externe Daten verknüpfen“

- data typ: ODBC- select your connection

1. Create and fill a publish database- open MicroDB- select file locations- create a publish database- select specific experiments files- publish experiment files

2. Create a odbc connection- file dsn- database type: SQL Server- specify the db data file

3. Create a MS Access database

Toralf Kirsten 24

IZBI

-05

.200

2

5. Use- open table / create a view- login / password

21.

Page 25: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Matching Affy Suite fields to AADM

Affy Suite fields AADM fields

Calculationse.g. inc / dec

e.g. pm_excess, mm_excess

Toralf Kirsten 25

IZBI

-05

.200

221

.

Page 26: Affymetrix Analysis Data Model (AADM) and data filestkirsten/presentations/AADM-210520… · AADM – Stars IV Relative gene expression statistical results (comparative results) biological

Conclusion

Toralf Kirsten 26

IZBI

-05

.200

2

Affy‘ data files

*.cel file contains intensities, but without probe set desc.

proprietary *.chp file

Affymetrix Analysis Data Model

structures for original and derived data

structures for absolute and relative (comparison) values

MicroDB

stores data locally and use the AADM structures

stores no intensity data

can store max. 128 experiments 21.