affymetrix analysis data model (aadm) and data filestkirsten/presentations/aadm-210520… · aadm...
TRANSCRIPT
Interdisciplinary Centre for Bioinformatics
Affymetrix Analysis Data Model (AADM)and data files
Toralf Kirsten
Agenda
• Introduction• Affy‘s data files• AADM – Affy analysis data model
• Affy‘s MicroDB• Overview AADM• Dimensions / Facts• Stars
• Matching Affy Suite facts to AADM• Access to AADM• Conclusion
Toralf Kirsten
IZ2
2
BI 2
1.05
.200
IntroductionAADM - Affymetrix Analysis Data Model
Standard ???
GATC - Genetic Analysis Technology ConsortiumMolecular Dynamics and Affymetrix
AADM Notation
scheme
biological item
atom
cell
chip
probe set
probe pair
probe
Hierarchy
Toralf Kirsten
IZ2
3
unit – subset of chip where cells have some similar characteristicsblock – subset of a unit where cells have similar characteristics
in gene expression studies block = biological item / block = unit BI 2
1.05
.200
Affy‘s data files IBack end and data logistics
Data files from experimental process Library data files
*.cif*.psi*.cdf
define an experiment
process probe array in fluidics station
scan probe array
compute cell intensities
analyze intensities
generate report
*.exp
*.dat
*.cel
*.chp
*.rpt
Affy‘ Suite
Affy‘MicroDB
Toralf Kirsten 4
IZBI
-21.
05.2
002MSDE – Microsoft Desktop Engine
SQL-Server 2000 DB with specific accessspecific file format
Affy‘s data files II
Toralf Kirsten 5
IZBI
-21
.05.
2002
Experimental file (*.exp)
Affymetrix GeneChip Experiment InformationVersion 1
[Sample Info]Chip Type HG_U95Av2Chip Lot 1006279Operator ??????????Sample TypeDescriptionProjectCommentsSolution TypeSolution Lot
[Fluidics]Protocol EukGE-WS2v3Wash A1 Recovery Mixes 0Wash A1 Temperature (C) 25Number of Wash A1 Cycles 10Mixes per Wash A1 Cycle 2Wash B Recovery Mixes 0Wash B Temperature (C) 50Number of Wash B Cycles 4Mixes per Wash B Cycle 15Stain Temperature (C) 25First Stain Time (seconds) 600Wash A2 Recovery Mixes 0Wash A2 Temperature (C) 25…
Ascii fileAffy Suite creation Control the experiment procedure
Affy‘s data files IIIImage file (*.dat)
Toralf Kirsten 6
IZBI
-21
.05.
2002
scan imageexport to *.dat/*.tifbasis for intensitiesvery large file (ca. 40MB)
Affy‘s data files IVCell intensities (*.cel)
Toralf Kirsten 7
IZBI
-21
.05.
2002
Ascii fileAffy Suite creationXY coordinates without probe set desc.
[CEL]Version=3
[HEADER]Cols=242Rows=248TotalX=242TotalY=248OffsetX=0OffsetY=0GridCornerUL=42 58GridCornerUR=1385 45GridCornerLR=1396 1387GridCornerLL=53 1399Axis-invertX=0AxisInvertY=0swapXY=0Algorithm=PercentileAlgorithmParameters=Percentile:75;CellMargin:2;OutlierHigh:1.500;OutlierLow:1.004
[INTENSITY]NumberCells=60016CellHeader=X Y MEAN STDV NPIXELS
0 0 3179.5 311.6 121 0 46167.0 0.0 92 0 3633.3 410.6 123 0 46167.0 0.0 94 0 2684.5 223.0 125 0 3476.0 205.0 96 0 46167.0 0.0 12
…
240 247 46167.0 0.0 9241 247 3457.8 354.8 12
[MASKS]NumberCells=0CellHeader=X Y
[OUTLIERS]NumberCells=5059CellHeader=X Y
1 03 06 08 010 0
Affy‘s data files VChip file (*.chp)
Toralf Kirsten 8
IZBI
-05
.200
2
very large files (>10 MB)specific / proprietary file formatnot readable with other programsexport to MS Excel / *.txt files
contains measurement
21.
Affy‘s data files VIReport file (*.rpt)
Report Type: Expression ReportDate: 01:30PM 01/17/2002_____________________________________________________
Filename: 159U.chpProbe Array Type: HG_U95Av2Algorithm: ExpressionProbe Pair Thr: 8Controls: Antisense_____________________________________________________
Absolute Thresholds:Difference (SDT): 38.6(4.00Q)Ratio (SRT): 1.50
Absolute Decision Matrix: {{3.0,4.0}Scaled Noise (Q): 9.657Scale Factor (SF): 5.518Norm Factor (NF): 1.000______________________________________________________
Background:Avg: 52.68 Std: 0.97 Min: 51.52 Max: 54.77
Corner+Avg: 65 Count: 32
Corner-Avg: 7699 Count: 32
Central-Avg: 7731 Count: 9
Toralf Kirsten 9
IZBI
-05
.200
221
.Ascii fileAffy Suite creation Hybridization qualities
_______________________________________________________
Housekeeping Controls:Probe Set AD(5') Call(5') AD(M') Call(M')HUMISGF3A/M97935 19.3 A 172.6 PHUMRGE/M10098 34.9 A -23.0 AHUMGAPDH/M33197 681.6 P 807.7 PHSAC07/X00351 617.5 P 938.7 PHUMTFRR/M11507 -10.0 A 3.7 AM27830 45.5 A 489.9 P
__________________________________________
Spike Controls:Probe Set AD(3') Call(3') AD(all) AD(3'/5')BIOB 750.2 P 882.29 1.05BIOC 2614.4 P 2520.81 1.08BIODN 9373.5 P 5501.90 5.75CREX 22098.2 P 20005.11 1.23DAPX 4984.9 P 3134.85 3.18LYSX 29800.7 P 21886.76 2.18PHEX 13933.4 P 10421.32 2.18THRX 1248.2 P 958.20 1.66TRPNX 260.4 P 89.42 55.70
Affy‘s data files VIILibrary files (…/genechip/library/) probe set information (*.psi)
Toralf Kirsten 10
IZBI
-05
.200
2
[CDF]Version=GC3.0
[Chip]Name=HG_U95Av2Rows=640Cols=640NumberOfUnits=12625MaxUnit=102119NumQCUnits=13ChipReference=
[QC1]Type=10NumberCells=300CellHeader=X Y PROBE PLEN ATOM INDEX MATCH BGCell1=167 80 N 20 0 51367 0 0Cell2=167 81 N 20 0 52007 1 0Cell3=167 82 N 20 0 52647 0 0Cell4=167 83 N 20 0 53287 0 0Cell5=167 84 N 1 0 53927 -1 1Cell6=168 80 N 20 1 51368 0 0Cell7=168 81 N 20 1 52008 1 0Cell8=168 82 N 20 1 52648 0 0Cell9=168 83 N 20 1 53288 0 0Cell10=168 84 N 1 1 53928 -1 1Cell11=169 80 N 20 2 51369 0 0Cell12=169 81 N 20 2 52009 1 0Cell13=169 82 N 20 2 52649 0 0Cell14=169 83 N 20 2 53289 0 0Cell15=169 84 N 1 2 53929 -1 1
#Probe Sets: 126251 AFFX-MurIL2_at 202 AFFX-MurIL10_at 203 AFFX-MurIL4_at 204 AFFX-MurFAS_at 2010 AFFX-BioB-5_at 2011 AFFX-BioB-M_at 2012 AFFX-BioB-3_at 2013 AFFX-BioC-5_at 20…85 AFFX-HUMGAPDH/M33197_5_st 2086 AFFX-HUMGAPDH/M33197_M_st 2087 AFFX-HUMGAPDH/M33197_3_st 2088 AFFX-HSAC07/X00351_5_st 2089 AFFX-HSAC07/X00351_M_st 2096 AFFX-YEL002c/WBP1_at 2097 AFFX-YEL018w/_at 2098 AFFX-YEL024w/RIP1_at 2099 AFFX-YEL021w/URA3_at 20100 31307_at 16101 31308_at 16102 31309_r_at 16103 31310_at 16104 31311_at 16105 31312_at 16106 31313_at 16107 31314_at 16…
cell informationfile (*.cif)
[Chip]Rows=640Cols=640…
[HP]XOrigin=-7100YOrigin=8140…
[TileTypes]Type1=Expression
[Chip Servers]BaseCallProgID=GeneChip.CallGEBaseCall.1CellAvgProgID=GeneChip.PercentileCellAvg.1ViewProgID1=GeneChip.GESeqView.1
[CellAverage]Percentile=75PercentileDefault=75PercentileMin=0PercentileMax=100RejectFactor=6RejectFactorDefault=6RejectFactorMin=1…
cell data file(*.cdf)
21.
AADM IAffy‘s MicroDB
Affy‘sData Mining Tool
Toralf Kirsten 11
IZBI
-05
.200
221
.
Affy‘sMicroDB
MSDE – Microsoft Desktop EngineSQL-Server 2000 DB with specific accessspecific file format
one publish db can be opened at the same timemax. 128 experiments in one db !!!
Third party tools
*.exp
expressionarrays
*.cel
*.chp
spottedarrays
*.spt
AADM IIOverview AADM (subset)
Toralf Kirsten 12
IZBI
-21
.05.
2002
AADM III
Toralf Kirsten 13
IZBI
-05
.200
2
Categories
Chip design tables• gene chip description (name, number of rows/columns, …)• spot array description• unit description • data equivalent to CDF files (library installation)
Experiment setup tables• experiment desc. (file name• physical chip desc. (relation between experiment and chip design• target desc. (concentration, date prepared)
Analysis result tables• cell intensities• absolute gene expression• comparative gene expression
Protocol parameter tables• target preparation• experiment setup
!!!
21.
AADM IVNotation of dimensional modeling
dimension 1namedescription…
dimension 2namedescription…
Toralf Kirsten 14
IZBI
-05
.200
221
.
Dimensions• more static character• descriptions
Factsmeasurements (numbers and values)
Dimensions Experiments
Genes
Signal values
Facts
measurementfact 1fact 2fact …fact n
dimension 3namedescription…
Demodata vs. Real experiment datadatabase analysis
Toralf Kirsten 15
IZBI
-05
.200
2
emptyemptyCell intensities
emptyemptyBackground intensities
filledfilledRelative gene expression statistical results
emptyfilledRelative gene expression results
filledfilledAbsolute gene expression statistical results
emptyfilledAbsolute gene expression results
real experiment datademodata
21.
AADM – Dimensions IBiological item and cell hierarchy
unit type
Toralf Kirsten 16
IZBI
-21
.05.
2002
PMMMPMMM…
biological item= probe set
chip
scheme atom= probe pair
scheme cell= probe
item name{31481_s_at, …}
name{HG_U95Av2, …}
position, tbase, atom_no{13, {a,t,c,g}, {0…68}
location (x,y), pbase, feature, …{{0…639},{0…639},{Q,a,t,c,g},{QC,control}}
scheme unit
name{Expression,HybNegativeQC,…}
name, direction{{}, {0,1,2}}
AADM – Dimensions IIExperiment and analysis hierarchy
algorithm type
Toralf Kirsten 17
IZBI
-05
.200
221
.
experiment
analysisname, *.dat file name
name, analysis date
analysis algorithmname{CellAverage,ExpressionCallAbs,…}
name{Histogram,Percentile,…}
AADM – Stars IAbsolute gene expression results
absolute gene expression resulttypenumber positivenumber negativenumber allnumber usednumber in avgpm excessmm excessavg difference intensity
biological item= probe set
analysis= one record for each
analysis producedin Microarray Suite(*.chp + *.cel files)
{31481_s_at, …}
{AB_vs_FB_emp, …}
Toralf Kirsten 18
IZBI
-21
.05.
2002
AADM – Stars IIAbsolute gene expression statistical results
biological item= probe setabsolute gene expression stat result
typesignaldetection p valuepairspairs used
Toralf Kirsten 19
IZBI
-21
.05.
2002
{31481_s_at, …}analysis= one record for each
analysis producedin Microarray Suite(*.chp + *.cel files)
{AB_vs_FB_emp, …}
AADM – Stars IIIRelative gene expression results (comparative results)
Toralf Kirsten 20
IZBI
-21
.05.
2002
…
relative gene expression resulttypenumber increasenumber decreaseincrease ratiodecrease ratiopositive deltanegative deltafold changesignificance…
{31481_s_at, …}
biological item= probe set
analysis= one record for each
analysis producedin Microarray Suite(*.chp + *.cel files)
{AB_vs_FB_emp, …}
AADM – Stars IVRelative gene expression statistical results (comparative results)
biological item= probe setrelative gene expression stat result
typechange p valuesignal log ratiosignal log ratio lowsignal log ratio highpairs used
Toralf Kirsten 21
IZBI
-21
.05.
2002
{31481_s_at, …}analysis= one record for each
analysis producedin Microarray Suite(*.chp + *.cel files)
{AB_vs_FB_emp, …}
AADM – Stars VBackground intensities
biological item= probe set
Toralf Kirsten 22
IZBI
-05
.200
221
.
absolute gene expression atom resultbackground intensity
{31481_s_at, …}
{AB_vs_FB_emp, …}
analysis= one record for each
analysis producedin Microarray Suite(*.chp + *.cel files)
scheme atom= probe pair
(atom_no)
no data available
AADM – Stars VICell intensities
biological item= probe set
Toralf Kirsten 23
IZBI
-05
.200
221
.
measurement element resultcalculated intensitystandard deviationnumber pixel usedoriginal intensitymask flag
{31481_s_at, …}
{AB_vs_FB_emp, …}
analysis= one record for each
analysis producedin Microarray Suite(*.chp + *.cel files)
scheme cell= probe
(x location, y location)
no data available
Access to AADM4. Link to the publish database
- „Datei externe Daten verknüpfen“
- data typ: ODBC- select your connection
1. Create and fill a publish database- open MicroDB- select file locations- create a publish database- select specific experiments files- publish experiment files
2. Create a odbc connection- file dsn- database type: SQL Server- specify the db data file
3. Create a MS Access database
Toralf Kirsten 24
IZBI
-05
.200
2
5. Use- open table / create a view- login / password
21.
Matching Affy Suite fields to AADM
Affy Suite fields AADM fields
Calculationse.g. inc / dec
e.g. pm_excess, mm_excess
Toralf Kirsten 25
IZBI
-05
.200
221
.
Conclusion
Toralf Kirsten 26
IZBI
-05
.200
2
Affy‘ data files
*.cel file contains intensities, but without probe set desc.
proprietary *.chp file
Affymetrix Analysis Data Model
structures for original and derived data
structures for absolute and relative (comparison) values
MicroDB
stores data locally and use the AADM structures
stores no intensity data
can store max. 128 experiments 21.