software quality assurance program and rollout at pppl · Ł sqa plan should define standard...
TRANSCRIPT
Software Quality Assurance Program and Rollout at PPPL
Keith [email protected]
NLIT 2018Nashville, TN May 21‐24, 2018
Agenda
• PPPL Background, SQA Motivation
• Initial Plan, Assessment, Survey of other labs• Industry standards, first attempt
• Adjusted methods, inventory, and culture changes• Applicable controls• Summary and Lessons Learned
2
Princeton University Plasma Physics Laboratory
• A DOE National Lab operated by Princeton University
• Focused on Plasma Fusion science and research
• One primary fusion experiment (NSTX‐U)
• Several smaller projects, including 7 US ITER diagnostics
3
Multiple Projects Fueled Need for SQA
Motivating Projects
CatalystCorrective Actions
Quality Assurance Program
Root Cause Analysis
NSTX‐U Recovery Project
ITER Acceptance Requirements
ITER Diagnostics
4
• NSTX‐U Recovery highlighted opportunity to improve existing SQA
• ITER acceptance of delivered diagnostics requires QA of analysis software such as ATILLA
• DOE Order 414.1D review identified need for stronger SQA controls
National Spherical Torus Experiment ‐ UpgradeOne of the two largest facilities in the global ST research program
TS3/4, JapanUTST, JapanTST-2,
Japan
LTX- / CDX-U, USA
LATE,
Japan
QUEST/CPD, Japan
MAST-U, UK
HIST, Japan
PEGASUS, USA
NSTX-U, USA
ST40, UKSUNIST,
ChinaETE, Brazil KTM,
Kazakhstan
Proto Sphera,
Italy
VEST, Korea
17 international facilities for ST research and broader fusion science
GLOBUS-M2, Russia
*Slide courtesy of J. Menard5
NSTX‐U Research is Highly Collaborative
362 data users
40 international
29 graduate students
25 post-doctoral researchers
54 collaborating institutions
32 domestic, 22 international
*Slide courtesy of J. Menard6
Major damage to internal components in FY16
7
Copper coil insulation deteriorated over time
Section of failed PF1AU
NSTX-U Machine
NSTX-U Centerstack
*Slide courtesy of J. Menard
Achieving NSTX‐U Performance Goals requires Recovery Project
• FY2017: “Extent of Condition” review, resulting Corrective Action Plan covers entire facility, addresses procedural issues along with a Quality Assurance Program
*Slide courtesy of J. Menard8
ITER – The Way to New Energy
9*Slide courtesy of J. Klabacha
ITER – The Way to New Energy
10
• Worlds largest tokamak being built in southern France
• Bringing the power of the sun down here to earth
• 7 ITER Members: China, EU, India, Japan, Korea, Russia, USA
• Nothing on this scale has ever been done before
• Achieve a “burning plasma”
• Produce 500 [MW] fusion energy (10 fold return on energy!)
• Integrate wide range of current device components and diagnostics
*Slide courtesy of J. Klabacha
US ITER – Seven World Class Diagnostics
11*Slide courtesy of J. Klabacha
EPP 03EPP 09
UPP 11
UPP 14
US ITER – Four integrated Port Plug Packages
12*Slide courtesy of J. Klabacha
Upper Port 14 and ATILLA
• ATILLA neutronics work not accepted without ATILLA QA, validation• No PPPL procedure existed to prove validation• Project delivery halted
– Stop‐gap measure enacted to use US ITER process– Parallel effort to create new lab process
• Exercise revealed missing verification of ANSYS as well• Structural analysis in ANSYS also delayed awaiting QA
Must develop SQA policy / procedure to deliver ITER diagnostics!
13
First attempt, false start• QA Program Revamp, DOE Order requires Software QA for nuclear safety
software
• Nothing safety related, but much important software with no SQA program
• QA Department asked for an initial audit to:– Determine if controls are currently in place and if they are sufficient– Check, verify compliance with ITER
• Initial audit revealed virtually no controls, outdated list of “important software”
• Audit cancelled, results clear: Room to improve– Action plan created– Staff trained on SQA at Argonne– Initial SQA Policy drafted and sent for review
14
SQA Program Evolution Overview
15
IEEE 730 Full
ImplementationMajor Effort
Simplified PolicySeparate SW Doc
Unified QAPD Deployment
Culture Shock!
DOE O 414.1DITER RequirementsBNL, ORNL, et al.
Corrective Action Plan
Timeline
Audit
Inventory
Identification
Categorization
Procedures
Outsourced development
In‐house working group
• SQA Program had several false starts, needed input from other labs
• QAPD creation is ideal for SQA launch• Final program perceived as major culture change, learning process
• Follow‐on tasks, major deliverables identified and in progress
IEEE 730 Full
ImplementationMajor Effort X
Simplified PolicyToo Fragmented X
Merged QAPD Deployment Culture Shock!
16
What is software?
Software includes computer programs, firmware, procedures, operating systems, applications, rules, and documentation. [NITSL‐SQA‐2005]
Notable example: an Excel spreadsheet *is* software
17
What is software QA?
Activities that define and assess the adequacy of software processes to establish confidence that the processes are appropriate to produce software products of suitable quality for their intended purposes. A key attribute of SQA is objectivity with respect to the project. The SQA function may also be organizationally independent of the project: free from technical, managerial, and financial pressures from the project. [IEEE 730]
18
What software needs QA?
Software used at PPPL for the design, analysis, control, and operation of research experiments and Laboratory infrastructure [QAPD]
19
IEEE‐730 SQA
• IEEE 730 describes the necessary parts for an SQA Plan– Standards, practices, conventions, and metrics
– Software reviews, Tests, Problem reporting, and corrective actions– Tools, techniques, and methodologies
– Media control, Supplier control, Records collection, maintenance, and retention
– Training
– Risk management
20
IEEE 730 informed the high level roadmap for the initial policy draft
Expanded IEEE SQA Standards List
21
IEEE 828 SCM Configuration Management
IEEE 829 STD Test Documentation
IEEE 830 SRS Requirements Specification
IEEE 1012 V&V Verification and Validation
IEEE 1016 SDD Design Description
IEEE 1058 SPM Project Management
IEEE 1063 SUD User Documentation
• Most of these have been superseded by joint IEEE/IEC/ISO efforts
• The newest standards are excessively complex
• ”Older” standards do provide good guidance as to what to consider
Expanded IEEE SQA Standards List
22
SCM Configuration Management
STD Test Documentation
SRS Requirements Specification
V&V Verification and Validation
SDD Design Description
SPM Project Management
SUD User Documentation
• If we hide “IEEE”, the general concepts are useful and applicable
• Most projects have some degree of these areas already…
• …but there’s always room for improvement
Initial Policy Draft: Very IEEE centric
�High: Personnel, operational hazards�Medium: Impacts efficient, effective operations �Low: Tertiary software not used for operations�None: Everything else
Classification Impact
�Configuration Management
�Test Documentation
�Requirements Specification�Verification & Validation�Design Description�Project Management
�User Documentation
Available Controls
23
Greater Impact
� More controls required� More standards followed
Lesser Impact
� Fewer controls required� Fewer standards followed
• Focus on IEEE standards generally difficult to accept in all but most stringent cases
• Too much reliance on standards expertise across wide range of users
SQA Merged Into Overall QA Program Description• Explicit references to IEEE Standards removed
• General categories of QA controls (SCM, SRS, etc.) remain
• Classification levels changed:– A‐1: All controls apply– A‐2: Most controls apply– A‐3: No controls apply– A‐4: Level removed
• Levels based on 7 criteria: Personnel Hazard, Mission Impact, Cost, Risk, Radiological Impact, Safety, Program Impact
• Lab‐wide software inventory started in earnest as precursor to full implementation of official QAPD
24
Software Inventory• Working group formed to catalog “all” software in use in the lab• Initial attempts used Win/Mac administrative features to gather list• First cut included “development environments”
– New guidance: QA the spreadsheet, not Excel– What about ANSYS vs the ANSYS model? Determination still ambiguous
• Checklist added to determine necessity to categorize• Individual SMEs assigned to reduce bias• Categorization criteria for A‐1/2/3 stems from overall QAPD
– Example: Safety hazard – Minor, Considerable, Serious– Much debate as to strict vs loose definitions of terms
It’s a learning process, and a difficult uphill journey
25
SCM: Tracking and controlling changes• Bare minimum (and a good practice
regardless): Use your favorite Version Control System du jour
• CM requirements should include:– Procedures / Expectations for branches,
releases
– Changelog entry standards: required information, format, etc.
– Mechanisms for ensuring software deployments are authorized and accurate
• Additional provisions for managing
– Acquired vs developed software– Local Windows/Mac vs cluster Linux
26
SCM Configuration Management
STD Test Documentation
SRS Requirements Specification
V&V Verification and Validation
SDD Design Description
SPM Project Management
SUD User Documentation
STD: Repeatable tests runnable by others
• General testing at PPPL thoroughly documented using “Preoperational Test Plans”, or “PTP”
• Used for most engineering systems
• Inconsistently applied to software• Rarely or never applied to physics /
research software• Very easy to use for acquired
software packages• Consider test‐based engineering
27
SCM Configuration Management
STD Test Documentation
SRS Requirements Specification
V&V Verification and Validation
SDD Design Description
SPM Project Management
SUD User Documentation
SRS: Specify “What”, and never “How”
• Largely nonexistent for most existing in‐house software
• Akin to a SOW or RFP for licensed products (ANSYS, ATILLA)
• Why? Peer review highlights issues early before they become costly
• Focus on defined boundary or interface to external system
• Leave internal details to SDD
28
SCM Configuration Management
STD Test Documentation
SRS Requirements Specification
V&V Verification and Validation
SDD Design Description
SPM Project Management
SUD User Documentation
V&V: Review boundaries between “What” and “How”
• Barry Boehm, c. 1979– Validate: Are we building the right product?– Verify: Are we building the product right?
• Software must meet requirements
• Requirements must be appropriate• Ties to SCM to track V&V during lifecycle• Regular reviews, regular testing
throughout lifecycle– Unit tests, systems tests– Continuous Integration– Regression Database
29
SCM Configuration Management
STD Test Documentation
SRS Requirements Specification
V&V Verification and Validation
SDD Design Description
SPM Project Management
SUD User Documentation
SDD: Describe the “How” that meets the “What”
• IEEE 1016 is essentially “Use UML” (SysML would also work)
• These modeling languages are very foreign at PPPL, and represent a significant learning curve
• Existing design descriptions, if any, are PowerPoint slides
• Vendors generally do not release internal design documentation; instead, describe the local implementation
– Configuration settings, runtime parameters
– Customizations, addons, modules
30
SCM Configuration Management
STD Test Documentation
SRS Requirements Specification
V&V Verification and Validation
SDD Design Description
SPM Project Management
SUD User Documentation
SPM: The only bug‐free software is never released
• Time / Cost / Quality – Pick two• SPM nuances over general PM (e.g.,
compare to manufacturing)
– Agile team spiraling down a waterfall…
• Management‐centric:– Requirements, analysis– Changes, impact assessment
– User expectations, deliveries, spirals– Development collaborations (pairs, etc.)
31
SCM Configuration Management
STD Test Documentation
SRS Requirements Specification
V&V Verification and Validation
SDD Design Description
SPM Project Management
SUD User Documentation
SUD: Guide for the User, not the developer
• Three high level approaches*– Tutorial: beginner, take notes during beta– Thematic: intermediate, knowledge base
growing organically– List or Reference: advanced, doxygen or
other API‐centric tools help• Agile does NOT imply skipping this!• SQA Plan should define standard
format, outline, template
• Vendor‐provided documentation may need site‐specific amplification
32
SCM Configuration Management
STD Test Documentation
SRS Requirements Specification
V&V Verification and Validation
SDD Design Description
SPM Project Management
SUD User Documentation
*Earle, 2015, ACM SIGDOC
Summary and Lessons Learned
• Several motivating factors necessitated the creation of a comprehensive Software QA program (DOE Order, NSTX‐U, ITER)
• Current state of SQA implementation
– Procedures in draft form, review in progress– Initial inventory complete, will be signed off by council to improve effectiveness
• Without buy‐in at all levels, SQA will fail– Culture change is the biggest barrier to success, need vigorous training program– Tendency to reduce categorization to ease burden of controls
• Different areas / sources have different needs, mechanisms
– In‐house development, outside procurement
– Engineering, Research
33