se420 software quality...
TRANSCRIPT
October 22, 2014 Sam Siewert
SE420
Software Quality Assurance
Lecture 9 – Negative Testing, Defect
Tracking and Root-Cause Analysis
http://www.nasa.gov/pdf/65776main_noaa_np_mishap.pdf, http://en.wikipedia.org/wiki/NOAA-19
Reminders
Assignment #4 Due Saturday, 10/25
Remaining Assignments [Top Down / Bottom-Up]
– #5 – Design, Module Unit Tests and Regression Suite
– #6 – Complete Code, Refine and Run all V&V Tests and Deliver
Track Bugs with Bugzilla - http://prclab.pr.erau.edu/
Import your Project Code into GitHub -
https://github.com/
Sam Siewert 2
Integration and Test Integrate Software Modules [units] and Hardware Components into Sub-systems
Test Focus on Interfaces [Function, Message, Shared Memory, Hardware], Protocols, and Interoperability of Modules
Sam Siewert 3
Test Types – Goals Today Positive Tests
– Functional Software Interface Tests Functions calling Functions – API
Message Passing – Local Message Queues, Network, Client-Server
Shared Memory – Synchronization, Buffers
– Hardware Interface Tests Drivers and Device Interfaces
Firmware [ROM Code, Run out of Reset]
Negative Tests – Software Interface Faults
– Hardware Interface Fault Injection
Bug Tracking, Defect Rate, How to Use for Project and SQA Management
Root-Cause-Analysis [RCA] Wrap-Up – JPL Mars Pathfinder Story
Diagnostics [Built-in Self-Test]
Unit Interoperability – Sub-system Resource Testing – Memory, CPU, I/O, Storage, Power
– Protocols – Message Acknowledgement, Command/Response, Background Commands, Peer-to-Peer, etc.
Performance Tests – Profiles and Traces
Sam Siewert 4
Outline for Every Integration Test 1. Check out Specific Source Code Test Configuration – CMVC Tools, Git
– Collection of Modules [Units] Tagged by Revision Control
– OR Current
2. Build and Link Modules (*.o) and Libraries (*.a) into Sub-system to Test
3. Load / Install Sub-system Code onto Test Hardware Platform of Known Configuration
– Record key hardware configuration parameters
– E.g. for I/O HW config - lspci, lsusb,
– General config - hwinfo
– Linux OS kernel build config - uname –a
– cat /proc/meminfo
– cat /proc/cpuinfo
4. Run Integrated Test(s) [with Gcov, Lcov, Gprof]
5. Review of Expected Syslogs, Output to Terminal, for Each Feature
6. Review Performance Profiles
7. Track Bugs, Anomalies, and Disposition as Defects
Sam Siewert 5
Bug Open/Close Rates and Readiness Controversy – Bug Counts, Closure and Prediction of Phase Transition Readiness – E.g. Unit to I&T to System Test to Acceptance Test to Shipment
– Can Be inaccurate due to Unsatisfactory Testing or Lack of Criteria
– Guideline for Project Management [Compared to Guessing!]
– Not all Reported Bugs Become Defects [Test Case Errors, Human Error]
Sam Siewert 6
http://www.testandverification.com/DVClub/24_Jan_2011/Greg_Smith.pdf
Test C
ase C
overa
ge
[E.g
. C
ode P
ath
Cove
rage]
Bug C
ounts
[Re
port
ed,
Not
Verified a
s D
efe
ct]
Root-Cause Analysis
Field Issue - Anomaly, Reported Bug, Data Corruption,
…
– Software Defect?
– Hardware Reliability
– User Error
Reproducibility – Capture Conditions via Logging
– Recreate Scenario in SQA / QA Lab
Trace to Root-Cause – Assert
– Analysis Triggers
– Propose Fixes
– Apply and Regression Test
– Release Maintenance Patch
Sam Siewert 7
Case Study – Mars Pathfinder Story
JPL Mission Flow to Mars, Landing on July 4th, 1997
Pathfinder Rolling Resets on Final Approach to Mars
Capture Orbit
VxWorks RTOS Used
Reproduction of Anomaly on the Ground
Root-Cause Analysis
Proposed Fix
Sam Siewert 8
Data Driven CPU Loading
Root-Cause on Pathfinder was a Combination of Issues
1. Software Re-use and Unfortunate Default in Pipe
[INVERSION_SAFE, PRIORITY_ORDER, FIFO_DEFAULT]
2. Unbounded Priority Inversion [Interoperability Issue]
3. Increased Loading Due to Meteorological Analysis of
Candidate Landing Sites [Performance and Interoperability]
Sam Siewert 9
http://www.cse.uaa.alaska.edu/~ssiewert/archive/IBM-Out-of-print/soc-5.pdf
Note on Data Driven Algorithms and
CPU Loading Real-Time Algorithms Ideally have Fixed Computational Demands per Request – Provide Predictable Response,
Enables Accurate Rate-Monotonic Analysis
– Rate Monotonic Theory Requires Known C, T, D Inputs [CPU Required, Request Rate, Deadline Relative to Request Time]
Computer Vision and Image Processing Depends on Data from Instrument Observation – Parsing Scene for Linear Segments
[Edges]
– Finding Elliptical or Circular Objects [Craters, Holes, etc.]
– Number of Features Found and Processed will Vary!
– Optical Navigation – Making an Impact: AI Group at JPL
Sam Siewert 10
Hough Linear Example
Hough Circular Example
Discussion … List of Theories for Root Cause [Good List, From OS, General Engineering Judgement]
Suggestions for Teamwork [Good Approaches – Brainstorm, Gather all Cognizant Engineers into One Room – JPL, Wind River, RAD6000]
Scenario and Anomaly [Rolling Reset on Approach] Reproduction on Ground System
Software Re-Use and Lack of Default to Inversion Safe MUTEX in POSIX Pipes, Triggered due to Meteorological Increased CPU Loading for Landing Sites, Root-Cause
Ground Verification and Uplink to Enable Inversion Safe Option for Hidden MUTEX
Mission Saved and Quite Successful!
Sam Siewert 11
Diagnostic Tests
Primarily Hardware Tests, Driven by Software
Could be OS test, E.g. During Boot of System – CPU
– I/O
– Network
– Memory test
– File system test
– OS Services
Memory Test – Simple – Walking 1’s,
Address Bus Test, Pattern Tests all Read-after-Write to Address
– Advanced – ECC, SoC Drawer Paper
Sam Siewert 12
E.g. Linux Boot-up Process for Centos 6.x
BIST – Built-in Self Tests
SW Driven and Controlled Diagnostics [Firmware] Key to
Hardware Verification
Cooperative Hardware and Firmware Mode
Make Available for Root-Cause Analysis Post-Ship or
During I&T and System Testing
E.g. Dell Laptops – LCD BIST
Disk Drive Test-Unit Ready – sg_turs, T10 TUR
Sam Siewert 13
Performance Tests Profiling
– Gprof – Open souce tool [similar to Gcov, but for Profiling]
– Vtune – Commercial Tool from Intel
– Logic Analyzer and HP’s SPA (Statistical Performance Analysis)
Tracing – E.g. Timestamps output to syslog
Statistics
– top, htop
– iostat
– memstat
Workloads
– Iometer
– stress
Sam Siewert 14
Performance - Sysprof What is Using CPU on my System
Rather than Profile of an Application – Sub-System [Service]
Sam Siewert 15
Gprof Simple –pg compile opiton
Run, gprof on gmon.out to get analysis
Sam Siewert 16
%make
cc -O3 -Wall -pg -msse3 -malign-double -g -c raidtest.c
raidtest.c: In function 'main':
raidtest.c:99: warning: format '%d' expects type 'int', but argument 2 has type 'long
unsigned int'
raidtest.c:68: warning: unused variable 'aveRate'
raidtest.c:68: warning: unused variable 'totalRate'
raidtest.c:66: warning: unused variable 'rc'
raidtest.c:212: warning: control reaches end of non-void function
cc -O3 -Wall -pg -msse3 -malign-double -g -c raidlib.c
cc -O3 -Wall -pg -msse3 -malign-double -g -o raidtest raidtest.o raidlib.o
%./raidtest
Will default to 1000 iterations
Architecture validation:
sizeof(unsigned long long)=8
RAID Operations Performance Test
Test Done in 453 microsecs for 1000 iterations
2207505.518764 RAID ops computed per second
%ls
Makefile gmon.out raidlib.h raidlib64.c raidtest raidtest.o
Makefile64 raidlib.c raidlib.o raidlib64.h raidtest.c raidtest64
%gprof raidtest gmon.out > raidtest_analysis.txt
Gprof Analysis 1 million iterations of RAID test XOR and Rebuild
Sam Siewert 17
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls ns/call ns/call name
82.13 1.54 1.54 main
15.47 1.83 0.29 2000001 145.38 145.38 xorLBA
2.67 1.88 0.05 2000001 25.07 25.07 rebuildLBA
% the percentage of the total running time of the
time program used by this function.
cumulative a running sum of the number of seconds accounted
seconds for by this function and those listed above it.
self the number of seconds accounted for by this
seconds function alone. …
calls the number of times this function was invoked, if
this function is profiled, else blank.
self the average number of milliseconds spent in this
ms/call function per call, …
total the average number of milliseconds spent in this
ms/call function and its descendents per call, …
name the name of the function. …
RAID Operations Performance Test
Test Done in 206417 microsecs for 1000000 iterations
4844562.221135 RAID ops computed per second
Call Graph Profile from Gprof
Sam Siewert 18
Call graph (explanation follows)
granularity: each sample hit covers 2 byte(s) for 0.53% of 1.88 seconds
index % time self children called name
<spontaneous>
[1] 100.0 1.54 0.34 main [1]
0.29 0.00 2000001/2000001 xorLBA [2]
0.05 0.00 2000001/2000001 rebuildLBA [3]
-----------------------------------------------
0.29 0.00 2000001/2000001 main [1]
[2] 15.4 0.29 0.00 2000001 xorLBA [2]
-----------------------------------------------
0.05 0.00 2000001/2000001 main [1]
[3] 2.7 0.05 0.00 2000001 rebuildLBA [3]
-----------------------------------------------
This table describes the call tree of the program, and was sorted by
the total amount of time spent in each function and its children…
% time This is the percentage of the `total' time that was spent
in this function and its children…
self This is the total amount of time spent in this function.
children This is the total amount of time propagated into this
function by its children.
called This is the number of times the function was called…
Discussion and Q&A
I&T is to Verify and Validate Sub-systems from Integrated SW Units and HW Components, in a Configuration – Unit Tests Precede
– Integrate and Configure
– Function/Feature Positive Tests
– Negative Testing [Fault Injection]
– Interoperability Testing
– Diagnostics, Root-Cause, and Bug Tracking Critical New Aspects
– Performance Testing [of Integrated and Configured Sub-systems]
– Determine Readiness for Final Integration and Entry to System Testing
– Provides Regression Test Cases for System Test
Precedes System Test, Where Sub-systems are … – Fully Integrated
– Configured Similar to Deployment [Perhaps Not Exact – E.g. Spacecraft in Thermal-Vac Testing]
– Stimulated with Tests Replicating Operations
Sam Siewert 19