sasbdb small angle scattering biological data bank
TRANSCRIPT
SASBDB Small Angle Scattering Biological Data Bank
Erica ValentiniDmitri Svergun group
Solution Scattering from biological macromolecules EMBO course 2014
Index
1. Introduction:– What is SAS?
– Do we need a SAS database?
2. SASBDB:– Features
– Usage
– Quality check
– Missing
3. Conclusions
2SAS EMBO Course 201411/2/2014
Index
1. Introduction:– What is SAS?
– Do we need a SAS database?
2. SASBDB:– Features
– Usage
– Quality check
– Missing
3. Conclusions
3SAS EMBO Course 201411/2/2014
What is SAS?SAS Experiment
2θs
|s| = 4π sinθ/λ
s scattering vector2θ scattering angleλ wavelengthI(s) intensity
X-ray/Neutron beam
Low resolution Model
ATSAS
Scattering In
tensity, Lo
g I(s)
4SAS EMBO Course 201411/2/2014
What is SAS?ATSAS Package
Rg
MM
Dmax
Volume
Shape
Rigid bodymodelling
Missingfragments
Oligomericmixtures
FlexibleSystem
5SAS EMBO Course 201411/2/2014
Do we need a SAS DB?SA(X)S advantages
Increasing popularity of SAXS
Solution
Broad size range
New developments
in software and hardware
From few kDa to GDa
Fast experiments: μor m seconds. Small amount of sample: 5-30 μl.
Monitor alteration in environmental conditions.
6SAS EMBO Course 201411/2/2014
Do we need a SAS DB?SAS database motivations
7SAS EMBO Course 2014
• Increasing number of publications about SAS and the ATSAS package.
• Increasing amount of data collected with a single experiment.
• Importance of making the data underlying scientific publications available for the community.
Graewert, M. a and Svergun, D.I. (2013) Impact and progress in small and wide angle X-ray scattering (SAXS and WAXS). Curr. Opin. Struct. Biol., 23, 748–54.Franke, D., Kikhney, A.G. and Svergun, D.I. (2012) Automated acquisition and analysis of small angle X-ray scattering data. Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip., 689, 52–59.Collins, F.S. and Tabak, L. a (2014) Policy: NIH plans to enhance reproducibility. Nature, 505, 612–3.
0
50
100
150
200
250
300
350
400
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
Number of publications referring to biological SAS
ATSAS
bioSAS
.
11/2/2014
Do we need a SAS DB?wwPDB SAS task force
SAS EMBO Course 2014 8
Trewhella, J., Hendrickson, W.A., Kleywegt, G.J., Sali, A., Sato, M., Schwede, T., Svergun, D.I., Tainer, J.A., Westbrook, J. and Berman, H.M. (2013) Report of the wwPDB Small-Angle Scattering Task Force: Data Requirements for Biomolecular Modeling and the PDB. Structure, 21, 875–881.
“…a global repository is needed that holds standard format X-ray and neutron SAS data that is searchable and freely accessible for download”
Database and small angle scattering experts
SASBDB11/2/2014
Do we need a SAS DB?Existing DB including SAS data
Database SAS data included Missing
47 models where SAS was used for refinement
Primary data used to calculate the models
Scattering curves from 20.000 pdb structures
Models and possibility to deposit SAS data.
SAXS data and models Complete search, cross-references to other databases, quality check on data
Scattering curves and ensembles models fromdisordered proteins
SAS data and models from “not disordered proteins”
9SAS EMBO Course 201411/2/2014
Do we need a SAS DB?Existing DB including SAS data
Database SAS data included Missing
47 models where SAS was used for refinement
Primary data used to calculate the models
Scattering curves from 20.000 pdb structures
Models and possibility to deposit SAS data.
SAXS data and models Complete search, cross-references to other databases, quality check on data
Scattering curves and ensembles models fromdisordered proteins
SAS data and models from “not disordered proteins”
10SAS EMBO Course 2014
Berman, H., Henrick, K., Nakamura, H. and Markley, J.L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res., 35, D301–3.
11/2/2014
Do we need a SAS DB?Existing DB including SAS data
Database SAS data included Missing
47 models where SAS was used for refinement
Primary data used to calculate the models
Scattering curves from 20.000 pdb structures
Models and possibility to deposit SAS data.
SAXS data and models Complete search, cross-references to other databases, quality check on data
Scattering curves and ensembles models fromdisordered proteins
SAS data and models from “not disordered proteins”
11SAS EMBO Course 2014
dara.embl-hamburg.deSokolova, A. V, Volkov, V. and Svergun, D. I. (2003) Prototype of a database for rapid protein classification based on solution scattering data. Conference papers classification based on solution scattering data. 1, 865–868.
11/2/2014
Do we need a SAS DB?Existing DB including SAS data
Database SAS data included Missing
47 models where SAS was used for refinement
Primary data used to calculate the models
Scattering curves from 20.000 pdb structures
Models and possibility to deposit SAS data.
SAXS data and models Complete search, cross-references to other databases, quality check on data
Scattering curves and ensembles models fromdisordered proteins
SAS data and models from “not disordered proteins”
12SAS EMBO Course 2014
Hura, G.L., Menon, A.L., Hammel, M., Rambo, R.P., Poole, F.L., Tsutakawa, S.E., Jenney, F.E., Classen, S., Frankel, K. a, Hopkins, R.C., et al. (2009) Robust, high-throughput solution structural analyses by small angle X-ray scattering (SAXS). Nat. Methods, 6, 606–12.
11/2/2014
Do we need a SAS DB?Existing DB including SAS data
Database SAS data included Missing
47 models where SAS was used for refinement
Primary data used to calculate the models
Scattering curves from 20.000 pdb structures
Models and possibility to deposit SAS data.
SAXS data and models Complete search, cross-references to other databases, quality check on data
Scattering curves and ensembles models fromdisordered proteins
SAS data and models from “not disordered proteins”
13SAS EMBO Course 2014
Varadi, M., Kosol, S., Lebrun, P., Valentini, E., Blackledge, M., Dunker, a K., Felli, I.C., Forman-Kay, J.D., Kriwacki, R.W., Pierattelli, R., et al. (2014) pE-DB: a database of structural ensembles of intrinsically disordered and of unfolded proteins. Nucleic Acids Res., 42, D326–35.
11/2/2014
Index
1. Introduction:– What is SAS?
– Do we need a SAS database?
2. SASBDB:– Features
– Usage
– Quality check
– Missing
3. Conclusions
14SAS EMBO Course 201411/2/2014
SASBDB features:
1. Entries
2. Cross links
3. Searching
4. Browsing
5. Benchmark
6. Plots
7. Interactivity
8. Availability
SAS EMBO Course 2014 1511/2/2014
1. Entries
SAS EMBO Course 2014 16
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
www.sasbdb.org
1. Entries
SAS EMBO Course 2014 17
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
2. Cross links
SAS EMBO Course 2014 18
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
3. Searching1. Simple search:
SAS EMBO Course 2014 19
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
3. Searching1. Simple search:
SAS EMBO Course 2014 20
2. Advanced search:
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
3. Searching
SAS EMBO Course 2014 21
Browsing unit
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
4. Browsing
SAS EMBO Course 2014 22
Scattering curve
Model
Kratky plot
Experiment information
Publication
Structural parametersUnique code
format: SASXXXN
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
4. Browsing
SAS EMBO Course 2014 23
Chronological order
Browse according to the selected field
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
5. Benchmark
SAS EMBO Course 2014 24
Benchmark
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
5. Benchmark
SAS EMBO Course 2014 25
• 17 Entries from a set of 14 “standard proteins”
• SAXS and WAXS data
• Extra purification steps
• Benchmark for algorithm testing proposes
• Dissemination
Dissemination
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
6. Plots
SAS EMBO Course 2014 26
Scattering plot
Guinierregion
Kratky plot
P(r) distribution
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
vRadius of Gyration
Maximum Distance
MWs & Porod
Volume
vRadius of Gyration
27SAS EMBO Course 2014
6. Plots
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
Fitting 1 Model 1
Fitting 2 Model 2
28SAS EMBO Course 2014
7. Interactivity
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
Fitting 3 Model 1
Model 2
Model 3
29SAS EMBO Course 2014
Model 4
7. Interactivity
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
Experim
ental
details
Mo
lecule
details
30SAS EMBO Course 2014
7. Interactivity
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
8. Availability
SAS EMBO Course 2014 31
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
8. Availability
SAS EMBO Course 2014 32
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
8. Availability
• Possibility to log in using ATSAS account
• Submission form
• Users can choose between:– “on hold”
– “public”
33SAS EMBO Course 2014
1. Entries2. Cross links3. Searching4. Browsing5. Benchmark6. Plots7. Interactivity8. Availability
11/2/2014
Index
1. Introduction:– What is SAS?
– Do we need a SAS database?
2. SASBDB:– Features
– Usage
– Quality check
– Missing
3. Conclusions
34SAS EMBO Course 201411/2/2014
SASBDB Usage
SAS EMBO Course 2014 35
More than 500 users from August 2014We are currently monitoring also search items and number of downloads
11/2/2014
SASBDB Usage: use cases
11/2/2014 SAS EMBO Course 2014 36
SAS userSAS novice Article referee
11/2/2014 SAS EMBO Course 2014 37
SASBDB Usage: use cases
11/2/2014 SAS EMBO Course 2014 38
SASBDB Usage: use cases
11/2/2014 SAS EMBO Course 2014 39
SASBDB Usage: use cases
11/2/2014 40
SASBDB Usage: use cases
11/2/2014 41
SASBDB Usage: use cases
11/2/2014 42
SASBDB Usage: use cases
SAS EMBO Course 2014
11/2/2014 43
SASBDB Usage: use cases
11/2/2014 44
SASBDB Usage: use cases
SAS EMBO Course 2014
11/2/2014 45
SASBDB Usage: use cases
SAS EMBO Course 2014
11/2/2014 46
SASBDB Usage: use cases
SAS EMBO Course 2014
Index
1. Introduction:– What is SAS?
– Do we need a SAS database?
2. SASBDB:– Features
– Usage
– Quality check
– Missing
3. Conclusions
47SAS EMBO Course 201411/2/2014
SASBDB Quality check:Difference Rg (Guinier) and Rg (p(r))
11/2/2014 SAS EMBO Course 2014 48
A B
SASBDB Quality check:Difference Rg (Guinier) and Rg (p(r))
11/2/2014 SAS EMBO Course 2014 49
A B
SASBDB Quality check:Difference MW (expected) and MW (experimental)
11/2/2014 SAS EMBO Course 2014 50
A B
SASBDB Quality check:Quality p(r) distribution
11/2/2014 SAS EMBO Course 2014 51
A B
SASBDB Quality check:Quality Guinier region
11/2/2014 SAS EMBO Course 2014 52
A B
SASBDB Quality check:Quality of the fit
11/2/2014 SAS EMBO Course 2014 53
A B
SASBDB Quality check:Quality of the data
11/2/2014 SAS EMBO Course 2014 54
A B
SASBDB Quality check:Quality of the data
11/2/2014 SAS EMBO Course 2014 55
A B
SASBDB Quality check:
11/2/2014 SAS EMBO Course 2014 56
A B
A B
• Difference between structural parameters• Quality of the Guinier region• Quality of the p(r) distribution• Discrepancy between expected and experimental MW• Overall quality of the data• Goodness of fit of the model
Quality score based on the comparison between the selected entry and all the other entries.
Index
1. Introduction:– What is SAS?
– Do we need a SAS database?
2. SASBDB:– Features
– Usage
– Quality check
– Missing
3. Conclusions
57SAS EMBO Course 201411/2/2014
SASBDB: missing
Network of SAS databases
Validation/Quality check
Pipeline to compare
values
Assessment of the
angular range
Difference between
curves
Validation of models
Standard format
sasCIF
Submission interface
Automatic
SAS EMBO Course 2014 5811/2/2014
SASBDB: missing
Network of SAS databases
Validation/Quality check
Pipeline to compare
values
Assessment of the
angular range
Difference between
curves
Validation of models
Standard format
sasCIF
Submission interface
Automatic
SAS EMBO Course 2014 5911/2/2014
Berman, H., Henrick, K., Nakamura, H. and Markley, J.L. (2007) The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res., 35, D301–3.
SASBDB: missing
Network of SAS databases
Validation/Quality check
Pipeline to compare
values
Assessment of the
angular range
Difference between
curves
Validation of models
Standard format
sasCIF
Submission interface
Automatic
SAS EMBO Course 2014 60
Read, R.J., Adams, P.D., Arendall, W.B., III, Brunger, A.T., Emsley, P., Joosten, R.P., Kleywegt, G.J., Krissinel, E.B., Lutteke, T., Otwinowski, Z., Perrakis, A., Richardson, J.S., Sheffler, W.H., Smith, J.L., Tickle, I.J., Vriend, G., Zwart, P.H.. (2011) A new generation of crystallographic validation tools for the Protein Data Bank. Structure 19: 1395-1412.
11/2/2014
SASBDB: missing
Network of SAS databases
Validation/Quality check
Pipeline to compare
values
Assessment of the
angular range
Difference between
curves
Validation of models
Standard format
sasCIF
Submission interface
Automatic
SAS EMBO Course 2014 61
Franke, D., Kikhney, A.G. and Svergun, D.I. (2012) Automated acquisition and analysis of small angle X-ray scattering data. Nucl. Instruments Methods Phys. Res. Sect. A Accel. Spectrometers, Detect. Assoc. Equip., 689, 52–59.
11/2/2014
SASBDB: missing
Network of SAS databases
Validation/Quality check
Pipeline to compare
values
Assessment of the
angular range
Difference between
curves
Validation of models
Standard format
sasCIF
Submission interface
Automatic
SAS EMBO Course 2014 62
Konarev, P. and Svergun, D.I. (2014) Submitted.
11/2/2014
SASBDB: missing
Network of SAS databases
Validation/Quality check
Pipeline to compare
values
Assessment of the
angular range
Difference between
curves
Validation of models
Standard format
sasCIF
Submission interface
Automatic
SAS EMBO Course 2014 63
Franke, D., Jeffries, C.M. and Svergun, D.I. (2014) Submitted.
11/2/2014
SASBDB: missing
Network of SAS databases
Validation/Quality check
Pipeline to compare
values
Assessment of the
angular range
Difference between
curves
Validation of models
Standard format
sasCIF
Submission interface
Automatic
SAS EMBO Course 2014 64
Tuukkanen, A. and Svergun, D.I. (2015) In preparation.
11/2/2014
SASBDB: missing
Network of SAS databases
Validation/Quality check
Pipeline to compare
values
Assessment of the
angular range
Difference between
curves
Validation of models
Standard format
sasCIF
Submission interface
Automatic
SAS EMBO Course 2014 65
Malfois, M. and Svergun, D.I. (2000) sasCIF: an extension of core Crystallographic Information File for SAS. J. Appl. Crystallogr., 33, 812–816.
11/2/2014
SASBDB: missing
Network of SAS databases
Validation/Quality check
Pipeline to compare
values
Assessment of the
angular range
Difference between
curves
Validation of models
Standard format
sasCIF
Submission interface
Automatic
SAS EMBO Course 2014 66
Yang, H., Guranovic, V., Dutta, S., Feng, Z., Berman, H. M. & Westbrook, J. D. (2004). Automated and accurate deposition of structures solved by X-ray diffraction to the Protein Data Bank. Acta Cryst. D60, 1833-1839.
11/2/2014
Index
1. Introduction:– What is SAS?
– Do we need a SAS database?
2. SASBDB:– Features
– Usage
– Quality check
– Missing
3. Conclusions
67SAS EMBO Course 201411/2/2014
SASBDB: Conclusions
• With 100 entries and 163 models SASBDB is currently the largest repository of SAS data available.
• Entirely browsable according to different criteria.• Highly flexible search.• Embedded Javascript to display interactive 3D models.• Set of SAXS and WAXS data from “standard proteins”.• Cross links to other biological databases.• Aimed at different types of users• Several validation methods under development.• Development of the standard format: sasCIF.• Network of interconnected SAS databases.• Paper about SASBDB in N.A.R. 2015 Database issue.
68SAS EMBO Course 201411/2/2014
Thanks for your attention!
69SAS EMBO Course 201411/2/2014