structure validation ton spek, bijvoet centre utrecht university the netherlands platon course,...

53
Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Upload: marvin-stevens

Post on 01-Jan-2016

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Structure Validation

Ton Spek, Bijvoet CentreUtrecht UniversityThe Netherlands

PLATON Course, Utrecht, April 18, 2012

Page 2: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Overview Why Structure Validation Data Archival and Review before CIF The CIF Solution for Archival and Review The CIF Validation History The Role of PLATON in IUCr checkCIF

Validation Validation Report, ALERT Levels A, B, C, G &

Types The Importance of FCF Validation Two Case Studies

Page 3: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Why Automated Structure Validation

• The large volume of new and routine structure reports submitted for publication.

• The limited number experienced and available crystallographic referees for validation.

• Automated validation should save time of authors, referees, journal editors and readers

• Detection of errors due to the black box use of crystallography by less experienced analysts.

• Setting some standards of quality and reliability.• Automated detection of unusual though not necessarily

erroneous issues that might be interesting or needing special attention

• Sadly: The need to detect frauded structure reports.

Page 4: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Structure Validation Before CIF

How it was when I started in crystallography as a student in 1966 at Utrecht University ...

Page 5: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Data Collection around 1966Nonius AD3 DiffractometerOne data set: weeks !

Page 6: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

~1966, Electrologica X8 ALGOL60 ‘Mainframe’ (<1MHz)

16kWOperator

InputOutputPlotter

Console

Multiple Hours of computing time per structure

Page 7: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Flexowriter for the creation and editing of programs and data

Page 8: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Data Storage in the Past

Direct Methods ALGOL60 Program AUDICE on Papertape

Page 9: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Archival of Model Parametersin a Publication (Acta Cryst.)

Page 10: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Archival of Reflection Data ina Publication (Acta Cryst.)

Page 11: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Early Data Validation

- The Cambridge Crystallographic Database. Data were typed from the publication and checked for internal consistency. Authors of an erroneous paper were approached to resolve unresolved issues.

- The Journal 'Crystal Structure Communication' (1970's, Parma, Prof Nardelli) also checked coordinates with reported distances and angles for internal consistency after retyping the data from the submitted paper.

Page 12: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Problems Around 1990• Multiple Data Storage Media (Often hardcopy on

paper or microfilm only). I had a room full of card decks … and numerous magnetic tapes

• No Standard Computer Readable Format for Archival and Data Exchange.

• Data Entry of published data for follow-up calculations had to be done by retyping.

• No easy Numerical Checking or additional calculations by Referees etc.

• Multiple typo’s and inconsistencies were common in the Published Data (as marked-up in the CSD)

• Often incomplete information was reported.

Page 13: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

The CIF Solution

• CIF-Standard Proposal for data archival by S.R. Hall, F.H. Allen, I.D. Brown (1991). Acta Cryst.

A47, 655-685.• Simple, Flexible and Free Format • First Implemented in the XTAL software package

(Hall, Stewart et al.).• Adopted early (1992?) by the author of the

nowadays most commonly used refinement program SHELXL (G.M.Sheldrick)

• Adopted by the IUCr Journals (Syd Hall, Section Editor Acta Cryst. C)

Page 14: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

CIF Example File

Page 15: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

CIF Constructs• data_name

where name is the chosen identifier of the data• Data name and value associations e.g.

_ cell_length_a 16.6392(2) _ diffraction_radiation_source ‘sealed tube’• Repetition (loop)

loop_ __symmetry_equiv_pos_as_xyz ‘x, y, z’ ‘-x, y+1/2, -z’

Page 16: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Construct for Text

• Text can be included between semi-columns• Used for Acta Cryst. Section C & E Abstract and

Comment sections• Example_publ_section_comment;This paper presents the first exampleof a very important compound.;

Page 17: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

CIF Completion

• CIF Files are created by the refinement program (e.g. SHELXL, Crystals, JANA)

• Missing Date can be added with a Text Editor, enCIFer (from the CCDC) etc..

• The CIF Syntax can be checked with a locally installed version of the program enCIFer

(Freely Available: www.ccdc.cam.ac.uk)

Page 18: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Missing Data

PROGRAM enCIFer

Page 19: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Note on Editing the CIF

• The idea of editing the CIF is to add/update missing information to the CIF.

• Unfortunately, some (Acta Cryst.) authors have been found to polish away less pleasing numerical values ….

(including R-values e.g. 0.0975 => 0.0475) This of course leaves traces and is generally

detected now (also in retrospect) by the validation software and is obviously not good for the career of the culprit…

Page 20: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

The Acta Cryst Validation History• Structure Validation of data supplied in computer readable

CIF format was pioneered in the 1990's by Syd Hall, the section editor of Acta Cryst. C at that time.

• Initially the numerical checking of papers submitted to Acta Cryst. C in CIF format was done by the IUCr Chester staff.

• Subsequently automated checking of the CIF for data consistency, data completeness and validity was introduced (checkCIF) (e.g. RFACR01 ALERT).

• External PLATON facilities to check for Missed Symmetry and VOIDS were added soon after on Syd Halls Invitation.

• This was followed by also including the numerous other PLATON based tests (PLATxxx) of the reported structure (currently more than 400). PLATON/checkCIF

• Chester currently checks submitted papers for duplications.• Work is underway in Chester to also crosscheck

geometrical data reported in the text with those in CIF format.

Page 21: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

The Role of PLATON in IUCr/checkCIF Validation

- PLATON includes a collection of structure analysis tools (geometry, absolute structure, twinning etc.)

- A default run of PLATON loops automatically over most of the available (analyze) tools in the program

- 'On-the-fly' ALERT messages are send to a file. The content of this file is analyzed subsequently on the basis of validation criteria detailed in an external text file 'check.def'. The result is a validation report on a file name.chk.

- The IUCr PLATON/checkCIF combines the Chester ALERTS with PLATON ALERTS into a single report.

Page 22: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

WHAT ARE THE VALIDATION QUESTIONS ?

Single Crystal Structure Validation addresses three simple but important questions:

1 – Is the reported information complete?2 – What is the quality of the analysis?3 – Is the Structure Correct?

Page 23: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

ALERT LEVELS

CheckCif Reports in terms of a list of ALERTS

ALERT A – Could Indicate a Serious Problem – Consider Carefully (Action: Correct or tell convincingly why Correct)

ALERT B – Might Indicate a Potentially Serious Problem ALERT C – Check to Ensure it is O.K. & Not because of an

oversight. ALERT G – General Info. Check that it is not something

Unexpected.

Page 24: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

ALERT TYPES1 - CIF Construction/Syntax errors,

Missing or Inconsistent Data.

2 - Indicators that the Structure Model may be Wrong or Deficient.

3 - Indicators that the quality of the results may be low.

4 - Cosmetic Improvements, Queries and Suggestions.

Page 25: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Which Key Issues are Addressed

Missed symmetry (“being Marshed”) Wrong chemistry (Mis-assigned atom types) Too many, too few or misplaced H-atoms Missed solvent accessible voids in the structure Missed Twinning Absolute structure issues Data quality and completeness Issues

Page 26: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

FCF Validation• Printed Fo/Fc listings were required in the past for

most publications and journals for deposition.• Fo/Fc reflection file deposition and archival in CIF

format (FCF) was made mandatory early on for Acta Cryst. papers.

• FCF's are useful for subsequent analysis of possibly unique data.

• CIF + FCF checking was added in 2010 into the IUCr PLATON/CheckCIF suite in response to fraud.

• Major chemical journals now require CIF deposition and validation reports but (not yet) the deposition of reflection data. (There appears to be a strong opposition – too complicated for chemists ?)

• The CCDC now accepts FCF's for deposition.

Page 27: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Reflection CIF (FCF)

Page 28: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

WEB-BASED IUCR STRUCTURE VALIDATION

Page 29: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012
Page 30: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Validation with standalone PLATON

- Details: www.platonsoft.nl/platon

- Driven by the file check.def with criteria, ALERT messages and advice.

- Use (UNIX/MAC-OSX): platon –u structure.cif

- Result on file: structure.chk and structure.ckf

- Applicable on CIF’s (Including CCDC generated)

- MS-Windows (Louis Farrugia) from the Toolbar

Page 31: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Two ALERTS related to the misplaced Hydrogen Atom

Page 32: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

ADVISE

- Validation should not be postponed to the publication phase. Most validation issues are best taken care of during the analysis with the crystals still available.

- Everything unusual in a structure is potentially suspect, mostly incorrect (artifact) and should be investigated and discussed in great detail and supported by additional independent evidence.

- The CSD can be a very helpful tool when looking for possible precedents (but be careful)

Page 33: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Systematic Fraud

• A massive fraud was detected in late 2009 of structures mainly published around 2007 in Acta Cryst. E. (Soon 200 retractions !)

• Nobody was prepared for serious and systematic fraud in this not competitive field of routine structures before 2010.

• Many deviations from the expected results can often be explained away as errors, inexperience or due to poor data.

• Several Acta Cryst. retractions before 2010 might in hindsight concern frauded structures and not just errors.

• Ongoing testing of our validation software on the archived data for structures published in Acta E often indicated suspect structures needing a more detailed investigation.

• It was only by following up on a particularly strange structure report with an analysis of all structures published by the authors of that paper that an extensive fraud pattern emerged.

• Among others, it was found that the same data set was used to publish a series if invented isomorphous structures.

Page 34: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

BogusVariations (with Hirshfeld ALERTS) on the Published Structure 2-hydroxy-3,5-nitrobenzoic acid (ZAJGUM)

OH => F

H2O => NH3

OH=>NH2

NO2=>COOH

Page 35: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Error and Fraud Detection Tools• Generalized Hirshfeld Rigid Bond Test.

• CIF versus FCF data checking.

• Scatter Plots of the reflection data of the same or related structure(s).

• Look in Difference Maps for unusual features.

• SHELXL re-refinement using the supplied CIF & FCF data.

• Check in the CSD for related structures.

• Next:Two case studies that illustrate the use of the above validation and analysis tools follow.

Page 36: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Example 1:

Submitted to Acta Cryst. (2011)

Structure I

Page 37: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

PLATON Report Part 1

Page 38: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

PLATON Report Part 2

Page 39: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

RELATED STRUCTURE FROM THE CSD

Structure II

Page 40: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Structure Report for II

Page 41: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Analysis

• Structure (II) has no validation issues.

• C-CH3 distance in (II) of 1.50 Ang. as expected.

• ‘C-F’ distance in (I) is 1.50 Ang. and not the expected 1.35 Ang.

• Conclusion: Structure (I) is the CH3 variety and not F.

• Data sets of (I) & (II) are not identical (see next).

• Data set (I) likely based on CH3 compound.

• Fraud or Error ? DIFABS file Error ?• Authors of (I) confirmed Error believing external chemists

proposal. Paper was retracted.

Page 42: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Example 2: Iron(III) Complex

Page 43: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Fe(III) Validation Part 1

Page 44: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Fe(III) Validation Part 2

Page 45: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Fe Structure Re-refined

Page 46: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Conclusion ?• Structure now O.K. after an erratum ?• Search for similar (isomorphous) structures in the

CSD• Yes, there is an isomorphous Mn complex published

by a different set of authors from a different university.

• Let us compare both structures.

Page 47: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Isomorphous Mn(III) Complex

Page 48: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Mn Structure Validation Part 1

Page 49: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Mn Validation Part 2

Page 50: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Scatter Plots of 2 Data Sets

Two Unrelated Data Sets

Two Identical Data sets

Page 51: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Scatter Plot Fe versus Mn I(obs)

Fe and Mn Data Sets Identical !

Page 52: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Concluding Remarks

• The WEB-based IUCr CheckCIF/PLATON Tool is managed by Mike Hoyland (IUCr)

• Validation is still a learning process.• Chemical insight might be very helpful and often

decisive as a validation tool.• Deposition of structure factors should be a

requirement for all journals (The CCDC now accepts those along with the CIF)

Page 53: Structure Validation Ton Spek, Bijvoet Centre Utrecht University The Netherlands PLATON Course, Utrecht, April 18, 2012

Thanks To

• My former co-worker and successor Martin Lutz and many others for taking the time to bring various unresolved issues to my attention with actual data and suggestions.

• Send to [email protected]

A.L.Spek (2003). J. Appl. Cryst. 36, 7-13.A.L.Spek (2009). Acta Cryst. D65, 148-155.