get flexible! apply agile development techniques to ... · qualitest us ceo yaron kottler, who...

SECRETS OF SUCCESSFUL OUTSOURCING | SIX SIGMA PART II | POST-BUILD PROFILING

VOLUME 6 • ISSUE 11 • NOV/DEC2009 • $8.95 • www.stpcollaborative.com

GET FLEX IBLE! Apply AgileDevelopmentTechniques toPerformance Tests

Mitigate Risk With Data-DrivenScenario Modeling

How Do the MetricsMeasure Up? page 14

1144 COVER STORY

Automated Testing: How Do Your MetricsMeasure Up?Without the proper metrics, all the automation in theworld won’t guarantee useful software test results.Here’s how to ensure measurements that matter.By Thom Garrett

2200 Agile Performance Testing Flexibility and iterative processes are key to keepingperformance tests on track. Follow these guidelines.By Alexander Podelko

2277 Data-Driven SoftwareModeling ScenariosTo get a clear picture of how a system will perform,create a realistic framework built on real-world data. By Fiona Charles

3344 Six Sigma Part II How to apply Six Sigma measurement methods tosoftware testing and performance.By Jon Quigley and Kim Pries

VOLUME 6 • ISSUE 11 • NOV/DEC 2009contents

DEPARTMENTS

6 EditorialAt the member-basedSTP Collaborative,we can hear you now.

8 ConferenceNewsNASA Colonel MikeMullane shared a lesson in leadershipat the October STPConference. Checkout coverage of thisevent and more.

9 ST&PediaWhen it comes tocode coverage, whatexactly are we meas-uring when we referto “percentage of theapplication”?By Matt Heusser & Chris McMahon

11 CEO ViewsRelinquishing controlis critical to maximizethe benefits of out-sourced testing, saysQualitest US CEOYaron Kottler, whoshares global insightsinto the state of soft-ware testing and QA.

38 Case StudyWant reliable runtimeintelligence from thefield about end userfeature usage? Post-build injectionmodeling, which isbeing integrated into.Net, Java and more,allows just that. By Joel Shore

4 • Software Test & Performance

www.stpcollaborative.com • 5

A PublicationREDWOODCollaborative Media

GET MORE ONLINE AT

Software Test & Performance Magazine (ISSN- #1548-3460) is published monthly except combined issues in July/August and November/December by Redwood Collaborative Media,105 Maxess Avenue, Suite 207, Melville, NY, 11747. Periodicals postage paid at Huntington, NY and additional mailing offices. The price of a one year subscription is US $69 for postaladdresses within North America; $119 elsewhere. Software Test & Performance Magazine is a registered trademark of Redwood Collaborative Media. All contents copyrighted 2009Redwood Collaborative Media. All rights reserved. POSTMASTER: Send changes of address to Software Test & Performance Magazine, 105 Maxess Road, Suite 207, Melville, NY 11747.To contact Software Test & Performance Magazine subscriber services, please send an e-mail to [email protected] or call 631-393-6051 ext. 200.

Have you ever tried to auto-mate a test and had it backfire?Tell us your story. Visit our com-munity forum to take part in thediscussion.bit.ly/3Hlapt

The Test & QA Report offerscommentary on the latesttopics in testing. In our currentissue, Karen Johnson sharesinsights into how to motivatetest teams, atstpcollaborative.com/knowledge/519-motivating-test-teams

Research, source, commentand rate the latest testing tools,technologies and services, atstpcollaborative.com/resources

Visit STP Contributing EditorMatt Heusser's blog, Testing atthe Edge of Chaos, atblogs/stpcollaborative.com/matt

“Transforming the Load TestProcess” discusses the short-comings of traditional load test-ing and problem diagnosticapproaches with today’s com-plex applications. Download at stpcollaborative.com/system/DynaTraceWhitepaper.pdf(Sponsored by Dynatrace)

CONTRIBUTORS

THOM GARRETThas 20 years’experence in plan-ning, development,testing and deploy-ment of complexprocessing sys-tems for U.S. Navy

and commercial applications. Hecurrently works for IDT.

ALEXANDERPODELKO is a ConsultingMember of Tech-nical Staff atOracle, responsiblefor performancetesting and tuning

of the Hyperion product family.

FIONACHARLES has30 years’ experi-ence in softwaredevelopment andintegration. Shehas managed andconsulted on test-

ing in retail, banking, financialservices, health care and more.

JON QUIGLEYis a test groupmanager at Volvo Truck and a private producttest and develop-ment consultant.

At Stoneridge,KIM PRIES isresponsible forhardware-in-the-loop software test-ing and automatedtest equipment.

THERE’S SOMETHING REFLECTIVE THAToccurs after a big event, whether it’s thelaunch of a new piece of software or, in mycase, a conference for the people who testsoftware. Happily, STPCon 2009 was a greatsuccess (see page 8), and while we receivedsome constructive criticism, we heard over-whelmingly positive feedback.

Two “deep” thoughts occurred to me inthis pensive state. The first is that STP Col-laborative is gathering some real momentum.The second is that life takes us down someunexpected paths. One tester at STPCon toldme he feels he's rare in thathe sought out a career in test-ing, whereas many tend to fallinto the job by circumstance.As someone who planned acareer in corporate financeand now finds himself happilyserving as CEO of a communi-ty-focused media company, Ican relate to his peers.

In my case, I credit genet-ics. My father, Ron Muns, wasa social media pioneer beforesocial media technology exist-ed. He founded the Help Desk Institute (nowHDI) in 1989—not only before Twitter, butbefore AOL launched a DOS-based interface.His business succeeded by focusing on itsmembers and working diligently to raise thestate of the technical support profession. Hewould cringe to hear his business called amedia company, because to him a member-ship organization was something different.HDI was an institute—an association in thebusiness of serving its members. And being amember was far better than being a mere sub-scriber because every member had a voice,and HDI magnified that voice.

The company had a conference and pub-lications just like a media business, but it putmembers first and let advertisers follow. Itcommunicated with members through snailmail (not much choice then!) and enabled net-working among members through local chap-ter meetings in addition to conferences.Before online community building became therage, HDI built a social media company offline.

The critical parts of this legacy are thosewe at STP Collaborative strive to imitate: to

be a member-centric, listening organizationthat helps individual testers do their jobs bet-ter, and to advance the craft and business ofsoftware testing.

That may sound like PR-speak, butwe’re putting our money where our mouthsare and bringing in several new team mem-bers, including three seasoned HDI alumni.We’re excited to welcome Peggy Libbey,who will serve as our new president andCOO. Peggy is a business professional withnearly 30 years' experience in finance andexecutive management, including nine years

with HDI. Peggy's backgroundmanaging a large professionalcommunity is marked by herability to build a corporateinfrastructure highly focusedon customer service.

We’re also pleased to wel-come Abbie Caracostas, handling professional trainingand development; JanetteRovansek, focusing on IT, Weband data management servic-es; Joe Altieri, in sales; and lastbut not least, veteran business-

tech journalist Amy Lipton as our new editor.They all join our existing marketing team,including Jen McClure, CMO and director ofcommunity development, and TeresaCantwell, marketing coordinator. The newteam spent their first days on the job togetherat STPCon, where they talked face to facewith testers and STP Collaborative advisoryboard members, including Matt Heusser,James Bach, Scott Barber, BJ Rollison, RossCollard and Rex Black. I only wish I'd have hadthe same opportunity my first week on the job!

So what does all this mean to you? Asour operational team grows, our ability todeliver on our vision for STP Collaborativeincreases. We’re both excited and humbled bythe opportunity ahead, and we’re confidentwith the participation of a diverse memberbase we can build a community resource thatdelivers best-in-class content, training, profes-sional networking and knowledge sharing. Wehope you’ll join us in pursuing this mission,and making ours a community in which everyvoice is valued.

Thanks for listening! !

6 • Software Test & Performance

publisher’s note

A Legacy Worth Imitating!

Andrew Muns

VOLUME 6 • ISSUE 11 • NOV/DEC 2009

Founder & CEO Andrew Muns

Chairman & President Peggy Libbey

105 Maxess Road, Suite 207Melville, NY 11747+1-631-393-6051+1-631-393-6057 faxwww.stpcollaborative.com

Cover Art by The Design Diva

REDWOODCollaborative Media

Editor Amy [email protected]

Contributing EditorsJoel ShoreMatt HeusserChris McMahon

Copy Editor Michele [email protected]

Art DirectorLuAnn T. [email protected]

Publisher Andrew [email protected]

Media SalesJoe [email protected]

Chief Operating OfficerPeggy [email protected]

Chief Marketing OfficerJennifer [email protected]

Marketing CoordinatorTeresa [email protected]

Training & Professional DevelopmentAbbie [email protected]

IT DirectorJanette [email protected]

ReprintsLisa [email protected]

Membership/Customer [email protected]

Circulation & List ServicesLisa [email protected]

http://www.ranorex.com/?utm_source=STP&utm_medium=magazine&utm_campaign=STPMAG2009

“THE CHALLENGER WAS NO accident—it was an example of a pre-dictable surprise," said Colonel MikeMullane, a retired NASA astronaut, in theopening keynote at the Software Test &Performance Conference (STPCon2009), which kicked off Oct. 19 at theHyatt Regency in Cambridge, Mass.Mullane, who flew in three space shuttlemissions, shared insights into the dan-gers of the “normalization of deviance”as it relates to the 1986 Challenger dis-aster, as well as to testing in general.

What is “normalization of deviance?”“When there is no negative repercussionfor taking a shortcut or deviating from astandard, you come to think it’s OK to doit again,” he explained, “and that almostalways leads to a ‘predictable surprise.’ “

How to avoid this type of outcome?Mullane recommended the following:

• Realize you are vulnerable to accept-ing a normalization of deviance. “Weall are. We’re human.”

• Plan the work and work the plan,and be aware of the context."When you’re testing software, useit as the customers will use it, notas the developers designed it.”

• Be a team member, not a “passen-ger.” If you spot a problem, point itout—don’t assume others knowmore than you do.

• Lead others by empowering themto contribute to the team.

Making the Leap“NASA is just like us,” and deals withmany of the same challenges, includingmultiple and conflicting vendors, changingvendors midstream, and changing rulesand priorities," said DevelopSense'sMichael Bolton, who presented the after-noon keynote, "Testing Lessons LearnedFrom the Astronauts.”

Bolton discussed two dominantmyths: Scientists are absolutely specialpeople, and scientists are simply sophisti-cated chefs. NASA is an example of these

myths in progress, he said. Bolton introduced the idea of heuris-

tics, which he defined as a fallible methodfor solving problems or making decisions.“It’s part of the world of being a tester tofind inconsistencies," he explained."Heuristics are valuable in testing becausethey help accomplish this.”

“Testing is about exploration, discov-ery, investigation and learning," he added."Testing can be assisted by machines, butit can’t be done by machines alone.Humans are crucial.”

Speed Geeking, Anyone?In addition to more than 30 breakout ses-sions held during the conference, cover-ing topics from agile testing to testautomation, performance testing, testmanagement and the latest test tools,technologies and trends, STPCon hostedits first (and very lively) “speed geeking”session, led by ST&P Magazine contribut-ing editor Matt Heusser. Participantsincluded Scott Barber, executive directorof the Association for Software Testing;exploratory testing experts James Bachand Jonathan Bach; David Gilbert, presi-dent and CEO of Sirius SQA; JustinHunter, president, Hexawise; and MichaelBolton. Each discussion leader had fiveminutes to cover one aspect of softwaretesting, followed by two minutes of Q&A.

More than 50 test and QA profes-sionals also took part in two new STPtraining courses at the weeklong confer-ence: Agile Testing with Bob Galen, direc-tor of R&D and Scrum Coach at iContactand principal consultant for RGalenConsulting Group, and Test Automationwith Rob Walsh, president, EnvisionWare.Galen covered just-in-time test ideas,exploratory testing, all-pairs testing, andways to drive collaborative agile testing.Walsh addressed how to make the busi-ness case for automated testing and intro-duced participants to a variety of testautomation tools. Open source expertEric Pugh delivered a half-day session onthe automated continuous integrationprocess using the open-source Hudsonsystem as an example.

New one-day STP workshops includ-ed one on test management, led by con-sultant Rex Black with CA’s BobCarpenter, and another on performancetesting, led by Scott Barber, joined by apanel of performance testing expertsincluding James Bach, Dan Bartow, RossCollard, and Dan Downing.

The Value of Exploratory TestingBrothers James and Jonathan Bach pre-sented the closing keynote, sharing anewly revised model of the structure ofexploratory testing. “It's wrong to callexploratory testing unstructured testing,"James Bach said. “The structure of thetest becomes evident through stories.”

Exploratory testing involves learning,test design and execution as activities thatrun in a parallel process, he explained. Itemphasizes "the personal freedom andresponsibility of testers to optimize thevalue of their work.”

More OnlineCheck out blog posts and tweets from theconference at www.stpcollaborative.com/community and photos by ST&P Mag-azine contributor Joel Shore at http://www.stpcon.com. !

STPCon 2009 Takeaways!

Jen McClure is CMO and director of communitydevelopment for Redwood Collaborative Media.

conference news

Be a team member, not a “passenger,”advised NASA’s Colonel Mike Mullane.

8 • Software Test & Performance NOVEMBER/DECEMBER 2009

By Jen McClure

AUTOMATED TESTS ALLOW USto run a test suite and compare it withan entire application to see what per-centage of the application is coveredby the tests. The question thenbecomes: “What exactly are we meas-uring when we say “percentage of theapplication”?

Here are some definitions of vari-ous forms of software test coverage,courtesy of Wikipedia:

Function coverage: Number of func-tions tested/number of functions in theapplication

Statement coverage: Number of linesof code in the application/total sourcelines of code

Condition coverage (or predi-cate coverage): In addition to lines ofcode, measure statements that mightnot execute

Decision (or branch) coverage:Every conditional (such as the “if” state-ment) has been exercised/total possiblenumber of branches

Entry/exit coverage: Has every pos-sible call and return of the function beenexecuted?

Path coverage: Every combination ofconditionals/total number of combinations

...And the list goes on.To further explain the concept, let’s

look at code coverage for a simple func-tion—a subroutine that validates a stu-dent ID at a local college.

According to the requirements,“student IDs are seven-digit numbersbetween one million and six million,inclusive.”

Pseudocode for this function is:

function validate_studentid(string sid) returnTRUEFALSEBEGIN

STATIC TRUEFALSE isOk;isOk:= true;

if (length(sid) is not 7) thenisOk = False;

if (number(sid)<=1000000or number(sid)>6000000) then

isOK = False;

return isOk;END

Now let’s see whatit would take to get to100 percent coverage ofthis subroutine.

Function coverage: Call the functiononce with the value “999,” return with“That is not a valid student ID,” and wehave 100 percent function coverage.

Statement coverage: Likewise, aninvalid value is likely to hit all the lines ofthe application. (If we did this, would youfeel “covered?”)

Decision coverage: This requires usto execute the “if” statements in bothtrue and not true conditions. We can dothis with two tests: The text string”999” along with the text string“1500000,” or “one point five million.”The first value will execute both condi-tions; the second hits neither.

Entry/exit coverage: If the function

were called from four or five differentplaces, we’d need to test it four or fivedifferent times. But since the logic does-n’t change, those tests wouldn’t tell us

anything new.

Path coverage: Of thecoverage metrics com-monly used, path testingis among the most rigor-ous. A method containingtwo binary conditions hasa total of four possibleexecution paths, and allmust be tested for fullpath coverage. Table 1illustrates a way to ac-complish this.

But which are not covered?• The software has a bug at the

bounds. When this comparison is made:

number(sid)<=1000000

It will consider all numbers at orbelow one million invalid and set isOk tofalse. Yet one million is a valid student IDaccording to the requirements.

• There’s a bug in the requirements,which read “one million through six mil-lion, inclusive,” but should have read“…through 6,999,999, inclusive.” (Ap-parently, the analyst writing the require-ments had a different definition of “inclu-sive” from the rest of the world.)

• What if the student enters an IDof the form 123 456 789 or includescommas? These entries might be limited

Considering Code Coverage!

st& pedia

Matt Heusser and Chris McMahon are careersoftware developers, testers and bloggers. Mattalso works at Socialtext, where he performstesting and quality assurance for the compa-ny’s Web-based collaboration software.

The encyclopedia for software testing professionals

Matt Heusser andChris McMahon

TABLE 1: TESTING FOR FULL PATH COVERAGE

Input

“foo”

7000000

01000001

5000000

number(sid)<=1000000ornumber(sid)>6000000

True

True

True

False

length(sid) is notseven

True

False

False

False

Expected Result

invalid id

invalid id

invalid id

valid id

NOVEMBER/DECEMBER 2009 www.stpcollaborative.com • 9

by the GUI form itself, which is usuallynot counted in code coverage metrics.

• Likewise, what if the input windowor screen is resized? Will it be repaint-ed? How? Behavior in these situations isoften hidden or inherited code that cov-erage analyzers might not find.

• The subroutine calls number() andlength(), which are external libraries.What if those libraries have hidden bugsin areas we don’t know about? (Say, aboundary at 5.8 million causes a “signbit” rollover, or “if” number() throws anexception when words or a decimalpoint is passed into it?)

• What if the user enters a verylong number in an attempt to cause astack overflow? If we don’t test for it, wewon’t know.

• isOk is a static variable. This meansonly one variable will exist, even if multiplepeople use the application at the sametime. If we’re testing a Web app, maybesimultaneous use can cause users to stepon each other’s results. What happens if

we have simultaneous users in the sys-tem? We don’t know. We didn’t test for it!

Suddenly, path testing doesn’tlook like a magic potion after all. Whatmight work better is input combina-torics coverage. Unfortunately, though,the number of possible test combina-tions is usually infinite.

The Place for Coverage MetricsThe total behavior of an application ismore than the “new” code written bydevelopers; it can include the compiler,operating system, browser, third-partylibraries, and any special user opera-tions and boundary conditions—andyes, it is those special user operationsand boundary conditions that get usinto trouble.

What coverage metrics tell us isgenerally how well the developers havetested their code, to make sure it’s pos-sible the code can work under certainconditions. A “heat map” that showsuntested code can be very helpful, for

both the developers and the testers.Now, when the technical staff

crows that test coverage is a high num-ber, consider that the developers havedone a good job ensuring the code didas they asked it to do. Often, however,that’s a different question from, “Is thesoftware fit for use by a customer?”

When programmer test coverage ishigh, we've reached the difference between“as programmed” and “fit for use.” That’s notwhen we’re done. Instead, it’s time for testersto step in and do their thing.

Recommended Reading For more information on this subject, wesuggest a paper published in 2004 byWalter Bonds and Cem Kaner called“Software Engineering Metrics: WhatDo They Measure and How Do WeKnow?” (http://www.kaner.com/pdfs/metrics2004.pdf) and Brian Marick’s“How To Misuse Code Coverage,” from1997 (http://www.exampler.com/test-ing com/writings/coverage.pdf).” !

st& pedia


YARON KOTTLER IS IN GOODcompany. As someone who started at anentry-level position and worked his wayto CEO, he shares an experience withtop executives from Best Buy,GE, McDonald’s, MorganStanley, Nortel and Seagate.

Today, Kottler runs QualiTest US, the North Americanbranch of the QualiTest Group.QualiTest is a global softwaretesting and quality assuranceconsultancy and service providerfocused exclusively on onshoreQA and testing. The companyemploys more than 800 testingand QA professionals in 10 loca-tions throughout the UnitedStates, Europe and the Middle East.

ANDREW MUNS: What was thefirst job you ever had?

YARON KOTTLER: The very firstjob was working at a bike shop in TelAviv, but my first real job—believe it ornot—was at QualiTest, where I washired as a junior testing engineer.

How did you get started in thefield of software testing and whatwas the career path that led you tobe CEO of QualiTest US?

Originally, I wanted to be a softwaredeveloper, and when I joined QualiTest[10 years ago], I thought of softwaretesting as a path toward a career as adeveloper. Nonetheless, I ended upstaying in software testing and movedup through the ranks from senior testengineer to test lead, test manager,business manager, load testing manag-er, and was eventually sent to Turkey tobuild out the effort there. I ended uptraveling back and forth between thereand Israel to build that business.

I became CEO after the companyacquired IBase Consulting [in 2006].This was primarily a strategic acquisitionthat QualiTest used to launch its busi-ness in the US market.

What are the key differentiatorsof QualiTest’s testing services thatallow you to compete effectively inthe US market?

There are a fewthings that make usstand out. First,we’re completelyfocused on QA andtesting. We don’t dotest and develop-ment, we don’t sellsoftware—we justconcentrate on whatwe do best.

Second is ourstrength in theonshore market.We believe high-quality onshoreteams in manycases outper-form their off-shore counter-parts.

Having a large international practicegives us advantages as well. We have theability to take the new technologies andpractices from Europe or North Americaand apply them in other markets.

Lastly, we’re very good at hiringgood people and developing them intoworld-class testers. We focus more onsmarts and potential than resumes, andhave a mentorship program and internaltraining resources that allow people togrow within the organization. Considermy career path!

You interviewed attendees of ourconference last year about onshorevs. offshore testing services andfound that—contrary to popularbelief—onshore testing is often morecost-effective. Have trends sincethen shown increased interest in

onshore vs. offshore testing?Absolutely. For many reasons, a test

team located onshore outperforms andis more efficient than an offshore team.Testers sharing a similar culture, lan-guage and time zone end up having ahuge cost advantage that may not beapparent from a glance at the hourly rate.

There has been growth in onshoreoutsourcing, but I believe this is also dueto the overall growth in outsourcing andnot necessarily only to taking share fromthe offshore side.

In your opinion, what is the keyto successful outsourcing of testservices?

I would say the mostimportant things aretransparency and havingthe appropriate test man-agement tools. Also,being able to trust yourservice provider and feelcomfortable that they candeliver. This is key: If youare going to outsource, youhave to learn to let go ofsome control.

Lastly, I’d say internal organizationbuy-in. If the operations team or thedevelopers disagree with the outsourc-ing decision, it will be hard for them towork with the external test team.

What challenges arise for distrib-uted teams using agile methodolo-gies? Does this make outsourcingmore or less appropriate for agilepractitioners?

I might be old-fashioned, but in myopinion agile and distributed teams don’twork together. I see a lot of teams whodescribe their methodology as agile orScrum, but who in reality have created ahybrid workflow that’s adapted to therealities of a distributed team.

As for the main challenges of agile,these aren’t usually related to applyingthe methodology. Barriers to successful

Outsourced Testing DemandsTrust and Transparency!

CEO VIEWS

RESUMÉJOB: CEO, QualiTest USEDU: MBA, Hebrew University;BA, Interdisciplinary CenterPRIORS: QA specialist, internationalbusiness development, business manager, testingengineer, testing teamleader, all with QualiTest

Yaron Kottler

By Andrew Muns


implementation usually come from theorganizational culture. After all, agile the-ory is simple to understand. Gettingteams to embrace the necessary culturefor the method to work is the hard part.

How is software testing conduct-ed differently in the US vs. overseas?Are there cultural or regional differ-ences in the definition of “quality”?

Differences in the definition of qual-ity and the importance of quality seem tovary more by organization and industrythan by country. Corporate culture varieswidely even within a given country, andthat’s a bigger driver of how much weightis put on shipping a quality product.

There are regional differences inhow testing is done, however. US-basedorganizations are generally much furtheralong in adopting agile and are usuallymore willing to try new things. The UK ismore or less like the US.

In the Netherlands and Germany,testing is much more scripted and thereis much more emphasis placed on met-rics and measurement.

The use of the words “testing” and“QA” also varies, with many in the USusing the term QA to describe what Ithink is really testing.

Companies overseas tend to havea better understanding of the value oftesting and how testing is an activity thatsaves money when done correctly.

You’ve spoken a lot about “vir-tuology.” What does this mean andwhy are testers interested?

The idea is to use virtualizationtechniques to record a testing session inits entirety. This is analogous to a DVRfor a computer where you can rewind,fast forward, pause and go live at anymoment. That is basically a timemachine, if you think about it. You cannow record everything occurring at thetime a test is conducted, including net-work conditions, RAM, ROM, OS condi-tions as well as any server conditions—not just a series of clicks and keystrokesas is done by GUI automation tools.

This is an amazing tool for debug-ging and resolving defects that are diffi-cult or impossible to reproduce. Arecent study of the defect life cycle in 28testing projects, conducted by Quali-Test, showed that about 9 percent of thedefects receive the “can't reproduce”

status at some point of the defect lifecycle. This can be attributed to eitherinadequate reporting or nondeterministicapplication behavior.

So virtuology can simplify testresult documentation. It can also con-tribute greatly to the efficiency of thetesting process, as testers can nowspend more of their time actually testingand less on activities such as prepara-tion, environment setup, running pre-steps and gathering test results. I like tosummarize the benefits as the three“E’s” of virtuology—exploration, effi-ciency and evidence.

What are some ofthe most useful vir-tuology tools current-ly available?

In my opinionVMware’s productsoffer the most value forthe money. We’verolled it out to clientswith good success.

In addition to vir-tuology, how will thenext generation oftesters make use of cloud computingtechnologies?

The benefits provid-ed in environment setupare becoming invaluableto functional and com-patibility testing and I amsure load testing willcontinue to be an area tobenefit, as well as appli-cation environmentscentral to this process.

Virtuology toolsaren’t currently hosted inthe cloud, but I hope we’llsee them start to takeadvantage of cloud computing environ-ments. VMware is great, but is still in theearly stages of what these tools canbecome as they mature.

Call me crazy, but my expectationfor the future is that the world is goingback to the days of the mainframe inwhich the cloud is replacing your two-tone computer.

How is the role of the tester with-in the organization changing and whatwill test teams look like in 10 years?

I hope testing teams will becomemore of an extension of operations andwill work more closely with, and as advo-cates for, the users rather than report todevelopment or IT as they usually do.

I also think we’ll see agile beingused more widely and that we’ll seeagile and Scrum teams mature.

What is the most important skillan individual tester can possess?

Good analytical skills, understand-ing the big picture and most importantlyunderstanding what the end user needsand wants.

Testers must under-stand the businessthey’re working in, andfor that to happen youneed a dialog with oper-ations or with end-userproxies.

Of course, testingrequires a highly special-ized skill set that opera-tional people usuallydon’t possess, so we aretalking about a multidisci-plinary approach to bothtesting and operations.

Are there any com-panies that are doingthis well?

I’m not sure, butthis is essentially abouttraining—not training ontesting methodologies,but training that helpstesters understand thebusiness context.

What advice wouldyou give to someonebeginning a career intesting?

Try to get a job at aplace that will let you

experiment with lots of different tech-nologies, tools and methodologies.Most importantly, look for a companythat will give you some freedom to growin the direction that suits you best.

I know this is talking my own book,but working with a consulting firm is agreat way to start out. You get a lot ofexperience on a diverse range of proj-ects. This allows you to find out whatyou like, and limits the time you spendon projects that aren’t a good fit. !

“My expectation

is that the world

is going back

to the days of

the mainframe

in which the

cloud is replacing

your two-tone

computer. ”

CEO VIEWS


Metrics are an essential gauge of the health, quali-ty, and progress of any automated software testing pro-gram. They can be used to evaluate past performance,current status and future trends. Good metrics areobjective, measurable, sharply focused, meaningful tothe project and simple to understand, providing easilyobtainable data. And while many metrics used in con-ventional software quality engineering tests can beadapted to automated software tests, some metrics—such as percent automatable, automation progress andpercent of automated test coverage—are specific toautomated testing.

Before we discuss those and other metrics in depth,however, note that not every test should be automatedjust because it could be automated. Perhaps the most

critical decision in the test case requirements-gatheringphase of your automation effort is whether it makessense to automate or not. And that decision itself is typ-ically based on a metric: return on investment. Indeed,the test cases for which you can show that automationwill deliver a swift and favorable ROI compared with thatof general testing are the cases for which automatedtesting is justified.

START WITH THE END IN MINDAn integral part of any successful automated testingprogram is the definition and implementation of specificgoals and strategies. During implementation, progressagainst these goals and strategies must be continuous-ly tracked and measured using various types of auto-

By Thom Garrett

“W hen you can measure what you are speaking about, and can express it in numbers,you know something about it; but when you cannot measure it, when you cannotexpress it in numbers, your knowledge is of a meager and unsatisfactory kind.”

—Lord Kelvin, Physicist

Without the Proper Metrics, All the AutomationIn the World Won’t Yield Useful Results

Automated Software Testing:Do Your MetricsMeasure Up?

Thom Garrett has 20 years’ experience in planning, develop-ment, testing and deployment of complex processing systems forU.S. Navy and commercial applications. Specific experience in-cludes rapid introduction and implementation of new tech-nologies for highly sophisticated architectures that supportusers worldwide. In addition, he has managed and tested all

aspects of large-scale complex networks used in 24/7 environ-ments. He currently works for Innovative Defense Technol-ogies (IDT), and previously worked for America Online,Digital System Resources and other companies, supportingsystem engineering solutions from requirements gathering toproduction rollout.


mated and manual testing metrics.Based on the outcome of these metrics, we can

assess the software defects that need to be fixed dur-ing a test cycle, and adjust schedules or goals accord-ingly. For example, if we find that a feature still has toomany high-priority defects, the ship date may bedelayed or the system may go live without that feature.

Success is measured based on the goals we setout to accomplish relative to the expectations of ourstakeholders and customers. That’s where metricscome in.

As Lord Nelson implied in his famous quote, if youcan measure something, you can quantifyit; if you can quantify it, you can explain it;if you can explain it, you have a betterchance to improve on it.

We know firsthand that metrics andmethods can improve an organization’sautomated testing process and trackingof its status—our software test teamshave used them successfully. But soft-ware projects are becoming increasinglycomplex, thanks to added code for newfeatures, bug fixes and so on. Also, mar-ket pressures and corporate belt-tighten-ing mean testers must complete moretasks in less time. This will lead todecreased test coverage and productquality, as well as higher product cost andlonger time to deliver.

When implemented properly, howev-er, with the right metrics providing insightinto test status, automated software test-ing can reverse this negative trend.Automation often provides a larger testcoverage area and increases overallproduct quality; it can also reduce test time and deliv-ery cost. This benefit is typically realized over multipletest and project cycles. Automated testing metrics canhelp assess whether progress, productivity and qualitygoals are being met.

It serves no purpose to measure for the sake ofmeasuring, of course, so before you determine whichautomated testing metrics to use, you must set clearlydefined goals related directly to the performance of theeffort. Here are some metrics-setting fundamentalsyou may want to consider:

• How much time does it take to run the test plan?• How is test coverage defined (KLOC, FP, etc.)?• How much time does it take to do data analysis?• How long does it take to build a scenario/driver?• How often do we run the test(s) selected?• How many permutations of the test(s) selected do

we run?

• How many people do we require to run the test(s)selected?

• How much system time/lab time is required to runthe test(s) selected?

It is important that the metric you decide on calcu-late the value of automation, especially if this is thefirst time automated testing has been used on a partic-ular project. The test team will need to measure thetime spent on developing and executing test scriptsagainst the results the scripts produced. For example,the testers could compare the number of hours todevelop and execute test procedures with the number

of defects documented that probablywould not have been revealed duringmanual testing.

Sometimes it is hard to quantify ormeasure the automation benefits. Forinstance, automated testing tools oftendiscover defects that manual tests couldnot have discovered. During stress test-ing, for example, 1,000 virtual users exe-cute a specific functionality and the sys-tem crashes. It would be very difficult todiscover this problem manually—forstarters, you’d need 1,000 test engi-neers!

Automated test tools for data entryor record setup are another way to mini-mize test time and effort; here, you’dmeasure the time required to set up therecords by hand vs. using an automatedtool. Imagine having to manually enter10,000 accounts to test a systemrequirement that reads: “The systemshall allow the addition of 10,000 newaccounts”! An automated test script

could easily save many hours of manual data entry byreading account information from a data file throughthe use of a looping construct, with the data file pro-vided by a data generator.

You can also use automated software testing met-rics to determine additional test data combinations.Where manual testing might have allowed you to test“x” number of test data combinations, for example,automated testing might let you test “x+y” combina-tions. Defects uncovered in the “y” combinationsmight never have been uncovered in manual tests.

Let’s move on to some metrics specific to auto-mated testing.

PERCENT AUTOMATABLEAt the beginning of every automated testing project,you’re automating existing manual test procedures,starting a new automation effort from scratch, or some

[Not every

test should be

automated

just because

it could be

automated.]


combination of the two. Whichever thecase, a percent automatable metric canbe determined.

Percent automatable can bedefined as this: Of a set of given testcases, how many are automatable?This can be represented by the follow-ing equation:

ATC # of test cases automatablePA (%) = ———— = ( ——————————– )

TC # of total test cases

PA = Percent automatableATC = # of test cases automatableTC = # of total test cases

In evaluating test cases to bedeveloped, what should—and should-n’t—be considered automatable? Onecould argue that, given enough ingenu-ity and resources, almost any softwaretest could be automated. So where doyou draw the line? An application areastill under design and not yet stablemight be considered “not automat-able,” for example.

In such cases we must evaluatewhether it makes sense to automate,based on which of the set of automat-able test cases would provide thebiggest return on investment.: Again,when going through the test casedevelopment process, determinewhich tests can and should be auto-mated. Prioritize your automationeffort based on the outcome. You canuse the metric shown in Figure 1 tosummarize, for example, the percentautomatable of various projects or of aproject’s components, and set theautomation goal.

Automation progress: Of the per-cent automatable test cases, howmany have been automated at a giventime? In other words, how far have yougotten toward reaching your goal ofautomated testing? The goal is to auto-mate 100% of the “automatable” testcases. It’s useful to track this metricduring the various stages of automatedtesting development.

AA # of actual test cases automatedAP (%) = ———— = ( ——————————)

ATC # of test cases automatable

AP = Automation progressAA = # of actual test cases automatedATC = # of test cases automatable

The automation progress metric istypically tracked over time. In this case(see Figure 2), time is measured inweeks.

A common metric closely associat-ed with progress of automation, yet not

FIG. 1: PERCENT AUTOMATABLE


FIG. 2: AUTOMATION PROGRESS

FIG. 3: PROGRESS OVER TIME

exclusive to automation, is testprogress. Test progress can bedefined simply as the number of testcases attempted (or completed) overtime.

TC # of test cases (attempted or completed)TP = ———— = ( —–––––––––—————— )

T time (days, weeks,months, etc.)

TP = Test progressTC = # of test cases (either attempted or com-pleted)T = some unit of time (days/weeks/months, etc)

The purpose of this metric is totrack test progress and compare itwith the project plan. Test progressover the period of time of a projectusually follows an “S” shape, whichtypically mirrors the testing activityduring the project life cycle: little initialtesting, followed by increased testingthrough the various development phas-es, into quality assurance, prior torelease or delivery.

The metric you see in Figure 3shows progress over time. A moredetailed analysis is needed to deter-mine pass/fail, which can be repre-sented in other metrics.

PERCENT OF AUTOMATEDTESTING COVERAGEAnother automated software metric ispercent of automated testing cover-age: What test coverage is the auto-mated testing actually achieving? Thismetric indicates the completeness ofthe testing. It doesn’t so much meas-ure how much automation is being exe-cuted, but how much of the product’sfunctionality is being covered. Forexample, 2,000 test cases executingthe same or similar data paths maytake a lot of time and effort to execute,but this does not equate to a large per-centage of test coverage. Percent ofautomatable test coverage does notspecify anything about the effective-ness of the testing; it measures onlythe testing’s dimension.

AC automation coveragePTC(%) = ——— = ( —————————— )

C total coverage

PTC = Percent of automatable testing coverageAC = Automation coverageC = Total coverage (KLOC, FP, etc.)

Size of system is usually countedas lines of code (KLOC) or functionpoints (FP). KLOC is a commonmethod of sizing a system, but FP hasalso gained acceptance. Some arguethat FPs can be used to size software

applications more accurately. Functionpoint analysis was developed in anattempt to overcome difficulties asso-ciated with KLOC (or just LOC) sizing.Function points measure software sizeby quantifying the functionality provid-ed to the user based on logical designand functional specifications. (There isa wealth of material available regardingthe sizing or coverage of systems. Auseful resourse is Stephen H. Kan,Metrics and Models in SoftwareQuality Engineering, 2nd ed. (Addison-Wesley, 2003).

The percent automated test cov-erage metric can be used in conjunc-tion with the standard software testingmetric called test coverage.

TTP total # of TPTC(%) = ——— = ( —————————— )

TTR total # of test requirements

TC = Percent of testing coverageTTP = Total # of test procedures developedTTR = Total # of defined test requirements

This metric of test coverage (seeFigure 4) divides the total number of testprocedures developed by the total num-ber of defined test requirements. It pro-vides the test team with a barometer togauge the depth of test coverage, whichis usually based on the defined accept-ance criteria.

When testing a mission-criticalsystem, such as operational medicalsystems, the test coverage indicatormust be high relative to the depth oftest coverage for non-mission-criticalsystems. The depth of test coveragefor a commercial software product thatwill be used by millions of end usersmay also be high relative to a govern-


FIG. 4: TEST COVERAGE

FIG. 5: COMMON DEFECT DENSITY CURVE

ment information system with a coupleof hundred end users.

DEFECT DENSITYDefect density is another well-knownmetric not specific to automation. It isa measure of the total known defectsdivided by the size of the softwareentity being measured. For example, ifthere is a high defect density in a spe-cific functionality, it is important to con-duct a causal analysis. Is this function-ality very complex, so the defect den-sity is expected to be high? Is there aproblem with thedesign/implementation of the function-ality? Were insufficient or wrongresources assigned to the functionalitybecause an inaccurate risk had beenassigned to it? Could it be inferred thatthe developer, responsible for this spe-cific functionality needs more training?

D # of known defectsDD = —— = ( ——————————— )

SS total size of system

DD = Defect densityD = # of known defectsSS = Total size of system

One use of defect density is tomap it against software componentsize. Figure 5 illustrates a typicaldefect density curve we’ve experi-enced, where small and larger compo-nents have a higher defect densityratio.

Additionally, when evaluatingdefect density, the priority of thedefect should be considered. Forexample, one application requirementmay have as many as 50 low-prioritydefects and still pass because theacceptance criteria have been satis-

fied. Still, another requirement mightonly have one open defect that pre-vents the acceptance criteria frombeing satisfied because it is a high pri-ority. Higher-priority requirements aregenerally weighted more heavily.

Figure 6 shows one approach tousing the defect density metric.Projects can be tracked over time (forexample, stages of the developmentcycle).

Defect trend analysis is anotherclosely related metric to defect density.

Defect trend analysis is calculated as:

D # of known defectsDTA = ———- = ( ——————————— )

TPE # of test procedures executed

DTA = Defect trend analysisD = # of known defectsTPE = # of test procedures executed over time

Defect trend analysis can helpdetermine the trend of defects found.Is the trend improving as the testingphase is winding down, or is the trendworsening? Showing defects the testautomation uncovered that manualtesting didn’t or couldn’t have is anadditional way to demonstrate ROI.During the testing process, we havefound defect trend analysis one of themore useful metrics to show thehealth of a project. One approach toshowing a trend is to plot total numberof defects along with number ofopen software problem reports (seeFigure 7).

Effective defect tracking analysiscan present a clear view of testing sta-tus throughout the project. A few morecommon metrics related to defects are:

Cost to locate defect = Costof testing/number of defects located

Defects detected in testing= Defects detected in testing/total


SOFTWARE TEST METRCS: THE ACRONYMS

AA # of actual test cases automated

AC Automation coverage

AP Automation progress

ATC # of test cases automatable

D # of known defects

DA # of acceptance defects found after delivery

DD Defect density

DRE Defect removal efficiency

DT # of defects found during testing

DTA Defect trend analysis

FP Function point

KLOC Lines of code (thousands)

LOC Lines of code

PR Percent automatable

PTC Percent of automatable testing coverage

ROI Return on investment

SPR Software problem report

SS Total size of system to be automated

T Time (some unit of time—days, weeks, months, etc.)

TC # of total test cases

TP Test progress

TPE # of test procedures executed over time

FIG. 6: USING DD TO TRACK OVER TIME

system defectsDefects detected in produc-

tion = Defects detected in produc-tion/system size

Some of these metrics can becombined and used to enhance qualitymeasurements, as shown in the nextsection.

IMPACT ON QUALITYOne of the more popular metrics fortracking quality (if defect count is usedas a measure of quality) through test-ing is defect removal efficiency. DRE,while not specific to automation, isvery useful in conjunction with automa-tion efforts. It is used to determine theeffectiveness of defect removal effortsand is also an indirect measurement ofthe quality of the product. The value ofDRE is calculated as a percentage—the higher the percentage, the higherthe positive impact on product qualitybecause it represents the timely identi-fication and removal of defects at anyparticular phase.

# of defects foundDT during testing

DRE(%) = —–— = ( ——————————— )DT+DA # of defects found during testing +

# of defects found after delivery

DRE = Defect removal efficiencyDT = # of defects found during testingDA = # of acceptance defects found after delivery

The highest attainable value of DREis “1,” which equates to “100%.” Butwe have found that, in practice, an effi-ciency rating of 100% is not likely.

DRE should be measured duringthe different development phases. IfDRE is low during the analysis anddesign stages, for instance, it mayindicate that more time should be

spent improving the way formal techni-cal reviews are conducted, and so on.

This calculation can be extended forreleased products as a measure of thenumber of defects in the product thatwere not caught during product develop-ment or testing. !

FIG. 7: DEFECT TREND ANALYSIS

SEE MORE ON SOFTWARETESTING METRICS atwwwwww..ssttppccoollllaabboorraattiivvee..ccoomm

Courtesy of http://www.teknologika.com/blog/SoftwareDevelopmentMetricsDefectTracking.aspx

AgilePerformance

Testing

Flexibility and

Iterative Processes

Are Key to Keeping

Tests on Track


You run a test and get a lot of informationabout the system. To be efficient you need toanalyze the feedback you get, make modifica-tions and adjust your plans if necessary. Let’ssay, for example, you plan to run 20 differenttests, but after executing the first test you dis-cover a bottleneck (for instance, the number ofWeb server threads). Unless you eliminate thebottleneck, there’s no point running the other19 tests if they all use the Web server. To iden-tify the bottleneck, you may need to change thetest scenario.

Even if the project scope is limited to pre-production performance testing, an agile, itera-tive approach helps you meet your goals fasterand more efficiently, and learn more about thesystem along the way. After we prepare a testscript (or generate workload some other way),we can run single- or multi-user tests, analyzeresults and sort out errors. The source of errorscan vary–you can experience script errors, func-tional errors and errors caused directly by per-formance bottlenecks—and it doesn’t makesense to add load until you determine thespecifics. Even a single script allows you tolocate many problems and tune the system atleast partially. Running scripts separately alsolets you see the amount of resources used byeach type of load so you can build a system“model” accordingly (more on that later).

The word “agile” in this article doesn’trefer to any specific development process ormethodology; performance testing for agiledevelopment projects is a separate topic notcovered in this paper. Rather, it’s used as anapplication of the agile principles to perform-ance engineering.

WHY THE ‘WATERFALL’ APPROACHDOESN’T WORKThe “waterfall” approach to software develop-ment is a sequential process in which develop-ment flows steadily downward (hence thename) through the stages of requirementsanalysis, design, implementation, testing, inte-gration and maintenance. Performance testingtypically includes these steps:

• Prepare the system. • Develop requested scripts.• Run scripts in the requested com bin -

ations.• Compare results with the requirements

provided.• Allow some percentage of errors according

to the requirements.• Involve the development team if require-

ments are missed.At first glance, the waterfall approach to per-

formance testing appears to be a well-estab-lished, mature process. But there are manypotential—and serious—problems. For example:

• The waterfall approach assumes that theentire system—or at least all the functionalcomponents involved—is ready for the per-formance test. This means the testing can’t bedone until very late in the development cycle, atwhich point even small fixes would be expen-sive. It’s not feasible to perform such full-scopetesting early in the development life cycle.Earlier performance testing requires a moreagile, explorative process.

• The scripts used to create the systemload for performance tests are themselves soft-ware. Record/playback load testing tools maygive the tester the false impression that creat-ing scripts is quick and easy, but correlation,parameterization, debugging and verificationcan be extremely challenging. Running a scriptfor a single user that doesn’t yield any errorsdoesn’t prove much. I’ve seen large-scale cor-porate performance testing where none of thescript executions made it through logon (singlesign-on token wasn’t correlated), yet perform-ance testing was declared successful and theresults were reported to management.

• Running all scripts simultaneously makesit difficult to tune and troubleshoot. It usuallybecomes a good illustration of the shot-in-the-dark antipattern—“the best efforts of a teamattempting to correct a poorly performing appli-cation without the benefit of truly understandingwhy things are as they are” (http://www.kirk.blogcity.com/proposed_antipattern_shot_in_the_dark.htm). Or you need to go back anddeconstruct tests to find exactly which part iscausing the problems. Moreover, tuning and per-formance troubleshooting are iterative process-es, which are difficult to place inside the “water-fall.” And in most cases, you can’t do themoffline—you need to tune the system and fix themajor problems before the results make sense.

• Running a single large test, or even sev-

By Alexander Podelko

Agile software development involves iterations, open collaboration and process adaptability throughout a pro-ject’s life cycle. The same approaches are fully applicable to performance testing projects. So you need a plan,but it has to be malleable. It becomes an iterative process involving tuning and troubleshooting in close coop-eration with developers, system administrators, database administrators and other experts.

Alexander Podelko has specialized in performanceengineering for 12 years. Currently he is aConsulting Member of Technical Staff at Oracle,responsible for performance testing and tuning ofthe Hyperion product family.


Flexibility and

Iterative Processes

Are Key to Keeping

Tests on Track

eral large tests, provides minimal infor-mation about system behavior. It doesn’tallow you to build any kind of model, for-mal or informal, or to identify any rela-tionship between workload and systembehavior. In most cases, the workloadused in performance tests is only aneducated guess, so you need to knowhow stable the system would be andhow consistent the results would be ifreal workload varied.

Using the waterfall approach does-n’t change the nature of performancetesting; it just means you’ll probably doa lot of extra work and end up back atthe same point, performance tuning andtroubleshooting, much later in the cycle.Not to mention that large tests involvingmultiple-use cases are usually a badpoint to start performance tuning andtroubleshooting, because symptomsyou see may be a cumulative effect ofmultiple issues.

Using an agile, iterative approachdoesn’t mean redefining the softwaredevelopment process; rather, it meansfinding new opportunities inside existingprocesses to increase efficiency overall. Infact, most good perform-ance engineers are alreadydoing performance testingin an agile way but just pre-senting it as “waterfall” tomanagement. In mostcases, once you presentand get managementapproval on a waterfall-likeplan, you’re free to do what-ever’s necessary to test thesystem properly inside thescheduled time frame andscope. If opportunitiesexist, performance engi-neering may be extendedfurther, for example, to earlyperformance checkpointsor even full software per-formance engineering.

TEST EARLYAlthough I’ve never reador heard of anybody argu-ing against testing early, it rarely hap-pens in practice. Usually there are someproject-specific reasons—tight sched-ules or budgets, for instance—prevent-ing such activities (if somebody thoughtabout them at all).

Dr. Neil Gunther, in his bookGuerrilla Capacity Planning (Springer,2007), describes the reasons manage-ment (consciously or unconsciously)

resists testing early. While Gunther’sbook presents a broader perspectiveon capacity planning, the methodologydiscussed, including the guerrillaapproach, is highly applicable to per-formance engineering.

Gunther says there’s a set ofunspoken assumptions behind theresistance to performance-related activ-ities. Some that are particularly relevantto performance engineering:

• The schedule is the main measureof success.

• Product production is more impor-tant than product performance.

• We build product first and thentune performance.

• Hardware is not expensive; we canjust add more of it if necessary.

• There are plenty of commercialtools that can do it.

It may be best to accept that manyproject schedules don’t allow sufficienttime and resources for performanceengineering activities and proceed in“guerrilla” fashion: Conduct perform-ance tests that are less resource-inten-sive, even starting by asking just a few

key questions and expand-ing as time and moneypermit.

The software perform-ance engineering approachto development of softwaresystems to meet perform-ance requirements has longbeen advocated by Dr.Connie Smith and Dr. LloydWilliams (see, for example,their book Perform anceSolutions, Addison-Wesley,2001). While their method-ology doesn’t focus ontesting initiatives, it can’t besuccessfully implementedwithout some preliminarytesting and data collectionto determine both modelinputs and parameters, andto validate model results.Whether you’re consider-ing a full-blown perform-

ance engineering or guerrilla-styleapproach, you still need to obtain baselinemeasurements on which to build your cal-culations. Early performance testing atany level of detail can be very valuable atthis point.

A rarely discussed aspect of earlyperformance testing is unit performancetesting. The unit here may be any part ofthe system—a component, service or

device. This is not a standard practice,but it should be. The later in the devel-opment cycle, the more costly and diffi-cult it becomes to make changes, sowhy wait until the entire system isassembled to start performance testing?We don’t wait in functional testing. Thepredeployment performance test is ananalog of system or integration tests,but it’s usually conducted without any“unit testing” of performance.

The main obstacle is that many sys-tems are somewhat monolithic; theparts, or components, don’t make muchsense by themselves. But there may besignificant advantages to test-drivendevelopment. If you can decompose thesystem into components in such a waythat you may test them separately forperformance, you’ll only need to fix inte-gration problems when you put the sys-tem together. Another problem is thatmany large corporations use a lot ofthird-party products in which the systemappears as a “black box” that’s not eas-ily understood, making it tougher to testeffectively.

During unit testing, variables suchas load, security configuration andamount of data can be reviewed todetermine their impact on performance.Most test cases are simpler and testsare shorter in unit performance testing.There are typically fewer tests with limit-ed scope—for example, fewer variablecombinations than in a full stress or per-formance test.

We shouldn’t underestimate thepower of the single-user performancetest: If the system doesn’t perform wellfor a single user, it certainly won’t per-form well for multiple users. Single-usertesting is conducted throughout theapplication development life cycle, dur-ing functional testing and user accept-ance testing, and gathering performancedata can be extremely helpful duringthese stages. In fact, single-user per-formance tests may facilitate earlierdetection of performance problems andindicate which business functions andapplication code need to be investigatedfurther.

So while early performance engi-neering is definitely the best approach (atleast for product development) and haslong been advocated, it’s still far fromcommonplace. The main problem here isthat the mindset should change from asimplistic “record/playback” perform-ance testing occurring late in the productlife cycle to a more robust, true perform-

[Why wait

until the entire

system is

assembled

to start

performance

testing?]


ance engineering approach starting earlyin the product life cycle. You need totranslate “business functions” per-formed by the end user into compo-nent/unit-level usage, end-user require-ments into component/unit-level require-ments and so on. You need to go fromthe record/playback approach to usingprogramming skills to generate the work-load and create stubs to isolate the com-ponent from other parts of the system.You need to go from “black box” per-formance testing to “gray box.”

If you’re involved from the begin-ning of the project, a few guerrilla-styleactions early on can save you (and theproject) a lot of time and resources later.But if you’re called in later, as is so oftenthe case, you’ll still need to do the bestperformance testing possible before theproduct goes live. The following sec-tions discuss how to make the most oflimited test time.

THE IMPORTANCE OF WORKLOAD GENERATIONThe title of Andy Grove’s book Only theParanoid Survive may relate even betterto performance engineers than to exec-utives! It’s hard to imagine a good per-formance engineer without this trait.When it comes to performance testing,it pays to be concerned about every partof the process, from test design toresults reporting.

Be a performance test architect.The sets of issues discussed belowrequire architect-level expertise.

1) Gathering and validating allrequirements (workload definition, firstand foremost), and projecting them ontothe system architecture:

Too many testers consider all infor-mation they obtain from the businessside (workload descriptions, scenarios,use cases, etc.) as the “holy scripture.”But while businesspeople know thebusiness, they rarely know much aboutperformance engineering. So obtainingrequirements is an iterative process,and every requirement submitted shouldbe evaluated and, if possible, validated.Sometimes performance requirementsare based on solid data; sometimesthey’re just a guess. It’s important toknow how reliable they are.

Scrutinize system load carefully aswell. Workload is an input to testing,while response times are output. Youmay decide if response times areacceptable even after the test, but youmust define workload beforehand.

The gathered requirements shouldbe projected onto the system architec-ture because it’s important to know ifincluded test cases add value by testingdifferent sets of functionality or differentcomponents of the system. It’s alsoimportant to make sure we have testcases for every component (or, if wedon’t, to know why).

2) Making sure the system undertest is configured properly and theresults obtained may be used (or atleast projected) for the production system:

Environment and setup considera-tions can have a dramatic effect. Forinstance:

• What data is used? Is it real pro-duction data, artificially generateddata or just a few randomrecords? Does the volume ofdata match the volume forecast-ed for production? If not, what’sthe difference?

• How are users defined? Do youhave an account set with theproper security rights for each vir-tual user or do you plan to re-usea single administrator ID?

• What are the differencesbetween the production and testenvironments? If your test systemis just a subset of your production

system, can you simulate theentire load or just a portion of thatload? Is the hardware the same?

It’s essential to get the test envi-ronment as close as possible to the pro-duction environment, but performancetesting workload will never match pro-duction workload exactly. In “real life,”the workload changes constantly, includ-ing user actions nobody could everanticipate.

Indeed, performance testing isn’t anexact science. It’s a way to decrease risk,not to eliminate it. Results are only asmeaningful as the test and environmentyou created. Performance testing typical-ly involves limited functional coverage,and no emulation of unexpected events.Both the environment and the data areoften scaled down. All of these factorsconfound the straightforward approach toperformance testing, which states thatwe simply test X users simulating testcases A and B. This way, we leave asidea lot of questions: How many users canthe system handle? What happens if weadd other test cases? Do ratios of usecases matter? What if some administra-tive activities happen in parallel? All ofthese questions and more require someinvestigation.

Perhaps you even need to investi-gate the system before you start creat-ing performance test plans. Perfor manceengineers sometimes have systeminsights nobody else has; for example:

• Internal communication betweenclient and server if recording used

• Timing of every transaction (whichmay be detailed to the point ofspecific requests and sets ofparameters if needed)

• Resource consumption used by aspecific transaction or a set oftransactions

This information is additional input—often the original test design is based onincorrect assumptions and must be cor-rected based on the first results.

Be a script writer. Very few sys-tems today are stateless systems withstatic content using plain HTML—thekind of systems that lend themselves toa simple “record/playback” approach.In most cases there are many obstaclesto creating a proper workload. If it’s thefirst time you see the system, there’sabsolutely no guarantee you can quicklyrecord and play back scripts to createthe workload, if at all.

Creating performance testingscripts and other objects is, in essence,


a software development project. Some -times automatic script generation fromrecording is mistakenly interpreted asthe entire process ofscript creation, but it’sonly the beginning.Automatic generation pro-vides ready scripts in verysimple cases, but in mostnontrivial cases it’s just afirst step. You need to cor-relate and parametizescripts (i.e., get dynamicvariables from the serverand use different data fordifferent users).

After the script is cre-ated, it should be evaluat-ed for a single user, multi-ple users and with differentdata. Don’t assume thesystem works correctlyjust because the scriptwas executed withouterrors. Workload validationis critical: We have to besure the applied workloadis doing what it’s supposed to do andthat all errors are caught and logged.This can be done directly, by analyzingserver responses or, in cases wherethat’s impossible, indirectly—for exam-ple, by analyzing the application log ordatabase for the existence of particularentries.

Many tools provide some way toverify workload and check errors, but acomplete understanding of what exactlyis happening is necessary. For example,HP LoadRunner reports only HTTPerrors for Web scripts by default (500“Internal Server Error,” for example). Ifwe rely on the default diagnostics, wemight still believe that everything isgoing well when we get “out of memo-ry” errors instead of the requestedreports. To catch such errors, we shouldadd special commands to our script tocheck the content of HTML pagesreturned by the server.

When a script is parameterized, it’sgood to test it with all possible data. Forexample, if we use different users, a fewof them might not be set up properly. If weuse different departments, some could bemistyped or contain special symbols thatmust be properly encoded. These prob-lems are easy to spot early on, whenyou’re just debugging a particular script.But if you wait until the final, all-scripttests, they muddy the entire picture andmake it difficult to see the real problems.

My group specializes in perform-ance testing of the Hyperion line ofOracle products, and we’ve found that a

few scripting challengesexist for almost everyproduct. Nothing excep-tional—they’re usuallyeasily identified andresolved—but time aftertime we’re called on tosave problematic perform-ance testing projects onlyto discover serious prob-lems with scripts and sce-narios that make testresults meaningless. Evenexperienced testers stum-ble, but many problemscould be avoided if moretime were spent analyzingthe situation.

Consider the follow-ing examples, which aretypical challenges you canface with modern Web-based applications:

1) Some operations,like financial consolidation, can take along time. The client starts the operationon the server, then waits for it to finish,as a progress bar shows on screen.When recorded, the script looks like (inLoadRunner pseudocode):

web_custom_request(“XMLDataGrid.asp_7”, “URL={URL}/Data/XMLDataGrid.asp?Action=EXECUTE&TaskID=1024&RowStart=1&ColStart=2&RowEnd=1&ColEnd=2&SelType=0&Format=JavaScript”, LAST);

web_custom_request(“XMLDataGrid.asp_8”, “URL={URL}/Data/XMLDataGrid.asp?Action=GETCONSOLSTATUS”, LAST);



Each request’s activity is defined bythe ?Action= part. The number of GET-CONSOLSTATUS requests recordeddepends on the processing time.

In the example above, the requestwas recorded three times, which meansthe consolidation was done by themoment the third GETCONSOLSTATUSrequest was sent to the server. If you playback this script, it will work this way: Thescript submits the consolidation in theEXECUTE request and then calls GET-

CONSOLSTATUS three times. If wehave a timer around these requests, theresponse time will be almost instanta-neous, while in reality the consolidationmay take many minutes or even hours. Ifwe have several iterations in the script,we’ll submit several consolidations, whichcontinue to work in the background, com-peting for the same data, while we reportsubsecond response times.

Consolidation scripts require cre-ation of an explicit loop around GET-CONSOLSTATUS to catch the end ofthe consolidation:

web_custom_request(“XMLDataGrid.asp_7”, “URL={URL}/Data/XMLDataGrid.asp?Action=EXECUTE&TaskID=1024&RowStart=1&ColStart=2&RowEnd=1&ColEnd=2&SelType=0&Format=JavaScript”, LAST);

do {

sleep(3000);

web_reg_find(“Text=1”,”SaveCount=abc_count”,LAST);


} while (str-cmp(lr_eval_string(“{abc_count}”),”1”)==0);

Here, the loop simulates the inter-nal logic of the system, sending GET-CONSOLSTATUS requests every threeseconds until the consolidation is com-plete. Without such a loop, the scriptjust checks the status and finishes theiteration while the consolidation contin-ues long after that.

2) Web forms are used to enter andsave data. For example, one form canbe used to submit all income-relateddata for a department for a month. Sucha form would probably be a Web pagewith two drop-down lists (one for depart-ments and one for months) on the topand a table to enter data underneaththem. You choose a department and amonth on the top of the form, then enterdata for the specified department andmonth. If you leave the department andmonth in the script hardcoded as record-ed, the script would be formally correct,but the test won’t make sense at all—each virtual user will try to overwriteexactly the same data for the samedepartment and the same month. Tomake it meaningful, the script should beparameterized to save data in differentdata intersections. For example, differ-ent departments may be used by each


[Don’t

assume the

system works

correctly just

because the

script was

executed

without errors.]

user. To parameterize the script, weneed not only department names butalso department IDs (which are internalrepresentations not visible to users thatshould be extracted from the metadatarepository). Below is a sample of correctLoadRunner pseudocode (where valuesbetween { and } are parameters that maybe generated from a file):

web_submit_data(“WebFormGenerated.asp”, “Action=http://hfmtest.us.schp.com/HFM/data/WebFormGenerated.asp?FormName=Tax+QFP&caller=GlobalNav&iscontained=Yes”,

ITEMDATA, “Name=SubmitType”, “Value=1”,

ENDITEM, “Name=FormPOV”, “Value=TaxQFP”,

ENDITEM, “Name=FormPOV”, “Value=2007”,

ENDITEM, “Name=FormPOV”, “Value=[Year]”,

ENDITEM, “Name=FormPOV”, “Value=Periodic”,

ENDITEM, “Name=FormPOV”, “Value=

{department_name}”, ENDITEM, “Name=FormPOV”, “Value=<Entity

Currency>“, ENDITEM, “Name=FormPOV”,

“Value=NET_INCOME_LEGAL”, ENDITEM, “Name=FormPOV”, “Value=[ICP Top]”,

ENDITEM,

“Name=MODVAL_19.2007.50331648.1.{department_id}.14.407.2130706432.4.1.90.0.345”, “Value=<1.7e+3>;;”, ENDITEM,

“Name=MODVAL_19.2007.50331648.1.{department_id}.14.409.2130706432.4.1.90.0.345”, “Value=<1.7e+2>;;”, ENDITEM, LAST);

If department name is parameterizedbut department ID isn’t, the script won’twork properly. You won’t get an error, butthe information won’t be saved. This is anexample of a situation that never can hap-pen in real life—users working throughGUIs would choose department namefrom a drop-down box (so it always wouldbe correct) and matching ID would befound automatically. Incorrect parameteri-zation leads to sending impossible combi-nations of data to the server with unpre-dictable results. To validate this informa-tion, we would check what’s saved afterthe test—if you see your data there, youknow the script works.

TUNE AND TROUBLESHOOTUsually, when people talk about per-formance testing, they don’t separate itfrom tuning, diagnostics or capacityplanning. “Pure” performance testing ispossible only in rare cases when thesystem and all optimal settings are well-known. Some tuning activities are typi-cally necessary at the beginning of test-ing to be sure the system is tuned prop-

erly and the results will be meaningful. Inmost cases, if a performance problem isfound, it should be diagnosed further, upto the point when it’s clear how to han-dle it. Generally speaking, performancetesting, tuning, diagnostics and capacityplanning are quite different processes,and excluding any one of them from thetest plan (if they’re assumed) will makethe test unrealistic from the beginning.

Both performance tuning and trou-bleshooting are iterative processeswhere you make the change, run thetest, analyze the results and repeat theprocess based on the findings. Theadvantage of performance testing is thatyou apply the same synthetic load, soyou can accurately quantify the impactof the change that was made. Thatmakes it much simpler to find problemsduring performance testing than to waituntil they happen in production, whenworkload is changing all the time. Still,even in the test environment, tuning andperformance troubleshooting are quite

sophisticated diagnostic processes usu-ally requiring close collaborationbetween a performance engineer run-ning tests and developers and/or sys-tem administrators making changes. Inmost cases, it’s impossible to predicthow many test iterations will be neces-sary. Sometimes it makes sense to cre-ate a shorter, simpler test still exposingthe problem under investigation.Running a complex, “real-life” test oneach tuning or troubleshooting iterationcan make the whole process very longand the problem less evident because ofdifferent effects the problem may haveon different workloads.

An asynchronous process to fixingdefects, often used in functional test-

ing—testers look for bugs and log theminto a defect tracking system, then thedefects are prioritized and fixed inde-pendently by the developers—doesn’twork well for performance testing. First,a reliability or performance problemoften blocks further performance testinguntil the problem is fixed or aworkaround is found. Second, usuallythe full setup, which tends to be verysophisticated, should be used to repro-duce the problem. Keeping the full setupfor a long time can be expensive or evenimpossible. Third, debugging perform-ance problems is a sophisticated diag-nostic process usually requiring closecollaboration between a performanceengineer running tests and analyzing theresults and a developer profiling andaltering code. Special tools may be nec-essary; many tools, such as debuggers,work fine in a single-user environmentbut do not work in the multi-user envi-ronment, due to huge performance over-heads. What’s usually required is the

synchronized work of performance engi-neering and development to fix the prob-lems and complete performance testing.

BUILD A MODELCreating a model of the system undertest significantly increases the value ofperformance testing. First, it’s one moreway to validate test correctness and helpto identify system problems—deviationsfrom expected behavior might signalissues with the system or with the wayyou create workload. Second, it allowsyou to answer questions about the sizingand capacity planning of the system.

Most performance testing doesn’trequire a formal model created by asophisticated modeling tool—it may

FIG. 1: THROUGHPUT


involve just simple observations of theamount of resources used by each sys-tem component for the specific work-load. For example, workload A createssignificant CPU usage on server X whileserver Y is hardly touched. This meansthat if you increase workload A, the lackof CPU resources on server X will cre-ate a bottleneck. As you run increasing-ly complex tests, you verify results youget against your “model”—your under-standing of how the system behaves. Ifthey don’t match, you need to figure outwhat’s wrong.

Modeling often is associated withqueuing theory and other sophisticatedmathematical constructs. While queuingtheory is a great mechanism to buildsophisticated computer system models,it’s not required in simple cases. Mostgood performance engineers and ana-lysts build their models subconsciously,without even using such words or anyformal efforts. While they don’t describeor document their models in any way,they take note of unusual system behav-ior—i.e., when system behavior doesn’tmatch the model—and can make somesimple predictions (“It looks like we’llneed X additional resources to handle Xusers,” for example).

The best way to understand thesystem is to run independent tests foreach business function to generate aworkload resource usage profile. Theload should not be too light (so resourceusage will be steady and won’t be dis-torted by noise) or too heavy (so it won’tbe distorted by nonlinear effects).

Considering the working range ofprocessor usage, linear models oftencan be used instead of queuing modelsfor the modern multiprocessor machines

(less so for single-processor machines).If there are no bottlenecks, throughput(the number of requests per unit of time),as well as processor usage, shouldincrease proportionally to the workload(for example, the number of users) while

response time should grow insignificant-ly. If we don’t see this, it means there’s abottleneck somewhere in the system andwe need to discover where it is.

For example, let’s look at a simplequeuing model I built using TeamQuest’smodeling tool for a specific workloadexecuting on a four-way server. It wassimulated at eight different load levels(step 1 through step 8, where step 1represents a 100-user workload and 200users are added for each step there-after, so that step 8 represents 1,500users). Figures 1 through 3 showthrough put, response time and CPUusage from the modeling effort.

An analysis of the queuing modelresults shows that the linear model

accurately matches the queuing modelthrough step 6, where the system CPUusage is 87 percent. Most IT shopsdon’t want systems loaded more than70 percent to 80 percent.

That doesn’t mean we need to dis-card queuing theory and sophisticatedmodeling tools; we need them whensystems are more complex or wheremore detailed analysis is required. But inthe middle of a short-term performanceengineering project, it may be better tobuild a simple, back-of-the-envelopetype of model to see if the systembehaves as expected.

Running all scripts simultaneouslymakes it difficult to build a model. Whileyou still can make some predictions forscaling the overall workload proportional-ly, it won’t be easy to find out where theproblem is if something doesn’t behaveas expected. The value of modelingincreases drastically when your test envi-ronment differs from the production envi-ronment. In that case, it’s important to

document how the model projects test-ing results onto the production system.

THE PAYOFFThe ultimate goal of applying agile princi-ples to software performance engineer-ing is to improve efficiency, ensuringbetter results under tight deadlines and budgets. And indeed, one of thetenets of the “Manifesto for AgileSoftware Develop ment” (http://agilemanifesto.org/) is that responsivenessto change should take priority over fol-lowing a plan. With this in mind, we cantake performance testing of today’sincreasingly complex software to new lev-els that will pay off not just for testers andengineers but for all stakeholders. !

FIG. 3: CPU USAGE


FIG. 2: RESPONSE TIME

that matter to stakeholders in businessapplications and integrated solutions,giving us tests of functionality from endto end. Often, scenarios are essentialfor business acceptance, because theyencapsulate test ideas in a format that ismeaningful for business users and easyfor them to understand and review.

User stories, use cases and otherbusiness requirements can be goodsources of scenario test ideas. Buttesters know that these are rarely com-prehensive or detailed enough toencompass a thorough test withoutadditional analysis. And if we base ourtest model entirely on the samesources used by the programmers, ourtest will reflect the assumptions theymade when building the system. Thereis a risk that we will miss bugs thatarise from misinterpreted or incomplete

Fiona Charles (www.quality-intelligence.com)teaches organizations to match their software test-ing to their business risks and opportunities. With30 years’ experience in software development andintegration projects, she has managed and consult-ed on testing for clients in retail, banking, finan-cial services, health care and telecommunications.

Phot

ogra

ph b

y To

m O

’Gra

f

By Fiona Charles

Many software test efforts depend onscenarios that represent real sequen cesof transactions and events. Scen ar iosare important tools for finding problems



requirements or user stories. One way to mitigate this risk is to

build a scenario model whose foundationis a conceptual framework based on thedata flows. We can then build scenariosby doing structured analysis of the data.This method helps to ensure adequatecoverage and testing rigor, and providesa cross-check for our other test ideas.Because it employs a structure, it alsofacilitates building scenarios fromreusable components.

Below, I’ll share an example of a data-driven scenario test my team and Idesigned and ran for a client’s point-of-sale project. But before we goany further, let’s define our terms:

Scenario: The AmericanHeritage Dictionary defines “sce-nario” as “an outline or model ofan expected or supposedsequence of events.” In “AnIntroduction to Scenario Testing,”Cem Kaner extends that definitionto testing: “A scenario is a hypo-thetical story, used to help a per-son think through a complex prob-lem or system.…A scenario testis a test based on a scenario.”(http://www.kaner.com/pdfs/ScenarioIntroVer4.pdf)

Test model: Every softwaretest is based on a model ofsome kind, primarily because we cannever test everything. We alwaysmake choices about what to include ina test and what to leave out. Like amodel of anything—an airplane, ahousing development—a test model isa simplified reduction of the system orsolution we are testing. As softwarepioneer Jerry Weinberg explains in AnIntroduction to General SystemsThinking, “Every model is ultimatelythe expression of one thing…we hopeto understand in terms of another thatwe do understand.”

The test model we employ embod-

ies our strategic choices, and thenserves as a conceptual construct with-in which we make tactical choices. Thisis true whether or not we are awarethat we are employing models. Everytest approach is a model—even if wetake the “don’t think about test design,just bang out test cases to match theuse cases” approach. But if we’re notconsciously modeling, we’re probablynot doing it very well.

In these definitions, we see theinextricable connection between scenar-ios and models. For our purposes, then,

a test model is any reduction, mappingor symbolic representation of a system,or integrated group of systems, for thepurpose of defining or structuring a testapproach. A scenario is one kind ofmodel.

We don’t have to use scenarios tomodel a software test, but we do haveto model to use scenarios.

DESIGNING A MODEL FOR ASCENARIO-BASED TESTIt is useful to model at two levels whenwe are testing with scenarios: The over-all execution model for testing the solu-

tion and the scenarios that encapsu-late the tests.

There are many different ways tomodel both levels. The retail point-of-sale (POS) system example discussedbelow uses a hybrid model based onbusiness operations combined with sys-tem data as the basis for the overall testexecution model, and system data forthe scenario model.

Business operations—e.g., day(week, month, mock-year) in the life ormodel office or business. A business oper-ations model is the easiest to communicateto all the stakeholders, who will immediate-ly understand what we are trying to achieveand why. Handily, it is also the most obvi-ous to construct. But it is not always themost rigorous, and we may miss testingsomething important if we do not also lookat other kinds or levels of models.

System data—categorized withineach flow by how it is used in the solu-tion and by how frequently we intend tochange it during testing: static, semista-tic or dynamic.

Combining different model bases inone test helps testers avoid the kinds of

unconscious constraints or bias-es that can arise when we lookat a system in only one way. Wecould also have based our mod-els on such foundations as:

The entity life cycle of enti-ties important to the system—e.g., product life cycle (oraccount, in a banking system) orcustomer experience.

Personae defined by testersfor various people who havesome interaction with the sys-tem, or who depend on it fordata to make business deci-sions. These could include amerchandising clerk, store man-ager, sales associate or a cus-tomer and category manager.

(See books by Alan Cooper for an expla-nation of persona design.)

High-level functional decompositionof the system or organization, to ensurethat we have included all major areas,including those that may not havechanged but could be impacted bychanges elsewhere.

In a point-of-sale application, thismight include functional areas within thesystem or business (e.g., ordering,inventory management, billing), process-es in each area (e.g., order capture, pro-visioning) and functions within eachprocess (e.g., enter, edit, cancel order).

FIG. 1: TRANSACTION BASICS

[The test model

we employ embodies our

strategic choices.]

Already defined user stories or usecases.

Stories created by testers, usingtheir imaginations and business domainknowledge, including stories based onparticular metaphors, such as “soapopera tests” (described by HansBuwalda, in “Soap Opera Testing,”Better Software, February 2004).

Although none of these was centralin our test model, we used each to somedegree in our test modeling for the POSsystem (except for predefined user sto-ries and use cases, which didn’t exist forthis system). Using a range of modelingtechniques stimulated our thinking andhelped us avoid becoming blinkered byour test model.

MODELING BASED ON DATAWe constructed our primary centralmodel using a conceptual frameworkbased on the system data. This is agood fit for modeling the scenario test ofa transactional system.

A data-driven model focuses on thedata and data interfaces:

• Inputs, outputs and referencedata

• Points of entry to the system orintegrated solution

• Frequency of changes to data• Who or what initiates or changes

data (actors) • Data variations (including actors)A model based on data represents a

systems view that can then be rounded outusing models based on business views. Aswell as providing a base for scenarios,focusing on the data helps us identify theprincipal components we will need for theoverall execution model for the test:

• Setup data• Entry points• Verification points• Events to scheduleIt is also easy to structure and

analyze.

CONCEPTUAL FRAMEWORK FOR A DATA-DRIVEN MODELHere is an example of a conceptualframework.

At the most basic level, we wantscenarios to test the outcome of systemtransactions.

Transactions drive dynamic data—i.e., data we expect to change in thecourse of system operation (see Figure1). Transactions represent the principalbusiness functions the system isdesigned to support. Some examples

for a POS system are sales, returns, fre-quent shopper point redemptions andcustomer service adjustments.

A transaction is initiated by an actor,which could be human or the system(see Figure 2).

There may be more than one actorinvolved in a transaction—e.g., a salesassociate and a customer

Many scenarios will have multipletransactions (see Figure 2).

Subsequent transactions can affectthe outcome by acting on the dynamic datacreated by earlier related transactions.

Transactions operate in a contextpartly determined by test bed data (seeFigure 3).

Reference test bed data is static (itdoesn’t change in the course of thetest)—e.g., user privilege profiles, fre-quent shopper (loyalty) points awardedper dollar spent, sales tax rates.Deciding which data should be static is atest strategy decision.

The test bed also contains semista-tic data, which can change the contextand affect the outcomes of transactions.

Semistatic data changes occasion-ally during testing, as the result of anevent. Examples of semistatic datainclude items currently for sale, pricesand promotions.

Determining which data will besemistatic and how frequently it willchange is also a test strategy decision.

Events affect transaction out-comes—by changing the system or testbed data context, or by acting on thedynamic data. (see Figure 4).

Events can represent:• periodic or occasional business

processes, such as rate changes,price changes, weeklypromotions (deals) and month-end ag gre gations

• system happenings, such as sys-tem interface failures

• “external” business exceptions,such as shipments arriving dam-aged or trucks getting lost

Scenarios also operate within acontext of date and time.The date andtime a transaction or event occurs canhave a significant impact on a scenario

FIG. 2: ADD ACTORS, MORE TRANSACTIONS


FIG. 3: FACTORING IN TEST BED DATA


outcome (see Figure 4).Categorizing the data determines

each type’s role in the test and gives usa conceptual framework for the sce-nario model:

• Reference data sets the contextfor scenarios and their compo-nent transactions.

• A scenario begins with an eventor a transaction.

• Transactions have expected re -sults.

• Events operate on transactions andaffect their results: A prior eventchan ges a transaction context(e.g., an overnight price change),and a following event changes thescenario result and potentiallyaffects a transaction (e.g., productnot found in warehouse).

• Actors influence expected results(e.g., through user privileges, orcustomer discount or tax status).

We can apply this framework to thedesign of both levels of test model: theoverall execution model (the central ideaor metaphor for the test approach) andthe scenario model.

TEST DESIGN FOR A POS SYSTEMThe example that follows shows how,together with a small test team, Iapplied the conceptual frameworkdescribed above to testing a POS sys-

tem on a client project. David Wright, asenior tester with whom I have workedon many systems integration test proj-ects, and who has contributed many

ideas and practical details to develop-ment of this method, was a core mem-ber of my team. In addition to experi-ence with end-to-end scenario testing,David and I both have extensive retailsystem domain knowledge, which wasessential to successful testing on thisproject.

THE BACKSTORYThe client operates a chain of morethan 1,000 retail drugstores acrossCanada. This project was for the com-pany’s secondary line of business, ahome health care company, which sellsand rents health care aids and equip-ment to retail customers through 51stores in different regions of Canada.The plan was to implement the POSsystem standalone for the home healthcare company, and then, following suc-cessful implementation, to implementit in the main drugstores, fully integrat-ed with an extensive suite of store andcorporate systems.

The POS system is a standardproduct, in production with severalmajor retailers internationally, but itwas being heavily customized by thevendor for this client. The client hadcontracted with an integrator to man-age the implementation and mitigate

FIG. 4: ADD EVENTS, PLUS DATE & TIME

Periodic Events TransactionsThrough Test Day Daily Verification

Central OfficeSemi-static data POS

Dynamic Tranactions

POSRegister results

Item managementPrice management

Promotions

SalesReturns

ExchangesRentalsVoids

Post-voidsSuspensions

Recalls/Retrieves

Rainchecks

POS ReceiptsTill

TLOGStore daily reports

Cash balance

Store Back OfficeStore-wide results

Weekly VerificationCentral OfficeSummary results

Sales Report

User IDsUser Profiles

StoreParameters

SetupCentral Office

Static data

act on

result in

2 stores’ results

FIG. 5: OVERALL EXECUTION MODEL

its risks. I reported to the integratorwith my test team.

The vendor was contractually obli-gated to do system testing. The inte-grator had a strictly limited testingbudget, and there were several otherconstraints on my test team’s ability todevelop sufficient detailed knowledgeof POS to perform an in-depth systemtest within the project’s timelines. Itherefore decided that our strategyshould be to perform an end-to-endacceptance test, focusing on financialintegrity (i.e., consistency and accura-cy end-to-end). Because we expectedthe client would eventually proceedwith the follow-on major integrationproject in the drugstores, I built thetest strategy and the artifact structureto apply to both projects, although anestimated 40 percent of the detailedscenario design would need to beredone for the primary line of business.

EXECUTION MODELHaving analyzed and categorized thedata for the POS system, I designedthis overall model for executing thetest (see Figure 5).

The POS system had four principalmodules (Item and Price Management,Promotions Manage ment, StoreOperations, and Central Reporting),operating in three location types. ThePOS client would operate in each storeand Store Back Office. The CentralOffice modules would be the same forall stores and all would offer the sameitems, but actual prices and promotionswould differ by region and be fedovernight to each store. We decided torun two test stores, covering tworegions in different time zones and withdifferent pricing and provincial taxstructures. One of our test stores couldbe switched to a different region if weidentified problems requiring more in-depth testing of location variations.

We combined a business opera-tions view with our data-driven overallmodel to create a test cycle model(see Figure 6).

SCENARIO DESIGNWe decided to build our scenarios froman element we defined as an item trans-action. The team began by drilling downinto each framework element and defin-ing it from a test point of view: i.e.,important attributes for testing and thepossible variations for each. Item trans-actions had some other elements listed

(actors, items) with them, giving us a listthat looked like this:

Transaction • Type (sale, return, post-void,

rental, rental-purchase…)• Timeframe (sale/rental + 3 days…) • Completion status (completed,

void, suspended)• Store• Register • User (cashier, manager, supervi-

sor, store super-user)• Customer (walk-in, preferred, loy-

alty, status native, employee…)• Item(s) (multiple attributes, each

with its own variations)• Tender (cash, check, credit card,

debit, coupon, loyalty redemption…)• Loyalty points (y/n)• Delivery/pickup (cash and carry,

home delivery…)• Promotions in effect (weekly flyer,

store clearance…)• Other discount (damage…)Using spreadsheets to structure

our analysis, we designed a full range ofitem transactions, working through thevariations and combining them to makecore test cases. Wherever possible, webuilt in shortcuts—e.g., Excel lists forvariations of most attributes. (Table 1provides a simplified example of thetransaction definitions.)

We designed events in a similarfashion, including price changes andpromotions. We also constructed anitem table with all their attributes andvariations.

We then designed scenarios,using the item transactions we haddefined as the lowest-level buildingblocks. This allowed us to combineitem transactions and also to use themat different points in scenarios. (Figure7 shows our scenario model.)

The scenario spreadsheets refer-

enced the item transaction and itemsheets, so we could make changes eas-ily. Our spreadsheet structures alsoallowed us to calculate expected resultsfor each test day and test cycle. Thiswas essential for a system with so manydate dependencies, where we had toevaluate the cumulative results of eachstore sales day daily, and at the end ofeach cycle evaluate the corporate weekfor all stores. (Table 2 provides a simpli-fied version of the scenario design.)

In addition, we wrote Excel macrosto automate the creation of some testartifacts from others. The testers’worksheets for test execution, forexample, were generated automatically.

RESULTS OF THE POS TEST The vendor POS system, and in particular


Once per cycle events

Central Office

StoresDaily, 3-4 days per cycle

Store 1 Back Office

1 time setupbefore all cycles

User setup (from Store for Pilot)Store parameters

Item load (from spreadsheet)

Store 1 POS Registers Store 1 Back Office

CentralOffice Central Office

Open store

Open registerLogin

POS transactions---

LogoutClose register

(Verify POS receiptsTill balance)

Close store

VerifyTotaller

Cash BalanceDaily Reports

TLOG

VerifyCentral reportsSales report

Once per cycle eventRun reports

Open store

Open registerLogin

POS transactions---

LogoutClose register

(Verify POS receiptsTill balance)

Close store

VerifyTotaller

Cash BalanceDaily Reports

TLOG

Item maintenance

Price maintenance

Promotions Store 2 Back Office Store 2 POS Registers Store 2 Back Office

FIG. 6: TEST CYCLE MODEL


the customizations, turned out to be verypoor quality. As a result, a test that hadbeen planned to fit our budget, with fourcycles (including a regression test) overfour weeks, actually ran through 28 cyclesover 14 weeks—and then continued afterpilot implementation. My strategy, and ourtest scenarios, proved effective. Welogged 478 bugs, all but 20 of which ourclient considered important enough toinsist on having fixed (see Table 3).

It is important to ask whether thiswas the most cost-effective way to findthose bugs. Probably it was not,because many of them should havebeen found and fixed by the vendorbefore we got the system. But wefound many other bugs, primarily inte-gration issues, which would probablynot have been found by any other kindof test. And—given the total picture ofthe project—it was critical that the inte-grator conduct an independent integrat-ed test. This was the most efficient wayfor our team to do that within the givenconstraints.

There were several business bene-fits from our test and test strategy. Themost important was buy-in from theclient’s finance department. Becauseour test verified financial integrity acrossthe system, it provided the finance peo-ple with evidence of the state of thesolution from their point of view.

Our scenario test facilitated accept-ance for implementation in the homehealth care stores and provided informa-tion on which to base the decision ofwhether or not to proceed with integra-tion and implementation in the 1,000-plus drugstores.

Because we tested the end-to-endfunction of complex data interactionsthat are business norms for this client—such as sales, returns and rentals ofitems with date-dependent prices andpromotions, and returns when priceshave changed and new promotions arein effect—we were able to provide thebusiness with an end-to-end view of sys-tem function that crossed modules andbusiness departments. Our varied sce-narios provided sufficiently realistic datato verify critical business reports.

Finally, the spreadsheet artifacts weused throughout the test and gave tothe client supplied solid evidence of test-ing performed, in the event it shouldever be required for audits.

BIGGEST POS PROJECT BENEFITS The principal benefit of my strategy froma testing point of view was that my teamsupplemented the vendor’s testing ratherthan attempting to duplicate it. The ven-dor’s testing did not include an integratedview of POS function. Ours did, and thatallowed us to focus primarily on system

outcomes, and only secondarily on theusers’ immediate experiences.

By adopting a building-block ap -proach, we incorporated efficiency in testpreparation and execution. This gave usflexibility to reschedule our testing accord-ing to the state of the system on any givenday. When we encountered bugs, wecould work with the vendor and drill downinto the components of a scenario, suchas item setup, item transaction and pro-motion setup, to find the problem.

Our robust and structured transac-tion-scenario artifacts provided the clientwith a reusable regression test for futurereleases, upgrades to infrastructure andso on. We were able to layer operabilitytests on top of our scenario testing, sim-ulating real outage conditions and verify-ing the outcomes.

WHEN TO CONSIDER SCENARIOSScenario testing as I have described ithere is not always the best test method.It does not fit well with exploratory test-ing, and it is not the most effective wayto test deep function. It is, however, avery useful method for testing widelyacross a system. It is better for testingsystem outcomes than for evaluating auser’s immediate experience.

Scenario testing should, therefore,be considered as part of an overall teststrategy that includes different kinds of

TABLE 1: TRANSACTION DEFINITIONS (SIMPLIFIED)

Txn-ID

POS-1

POS-2

POS-3

Store

1

1

2

Type

sale

return

sale

Timeframe

n/a

sale+1

n/a

Completion

C

C

c

Register

4

2

1

User Profile

cashier

manager

cashier

Customer

walk-in

walk-in

employee

Item(s)

45270

45270

98651

54945

21498

Tender

Cash-CAD

Cash-CAD

Visa

Points

n

y

Pickup/Deliver

y

C&C

n/a

HD

Promos

n/a

n/a

sidewalk

Discounts

n/a

n/a

raincheck

TABLE 2: SCENARIO DESIGN (SIMPLIFIED)

ScenarioID

POS-S-S1

POS-S-S19

POS-S-S65

Description

Senior phones in to buy 3 items onSuper Senior’s Day of which 1 is acharge sale item and 1 is a govt fundeditem, delivery and service chargesapplied, pre-paid delivery

Loyalty customer pays for 5 items, ofwhich 4 are points-eligible; customerchanges mind befoe leaving register and1 point-eligible and 1 non-eligible arepost-voided

Senior phones in to buy 3 items onSuper Senior’s Day of which 1 is acharge sale item and 1 is a govt fundeditem, delivery and service chargesapplied, pre-paid delivery

Prior Txns

n/a

n/a

n/a

Central Office

PROMO-45PROMO-7

n/a

PROMO-12PROMO-7ITEM-96

Store Back Office

Open store

Open store

Open store

StoreOpen RegisterLogin

Open RegisterLogin

Open RegisterLogin

Main Txn

POS-S133

POS-S17

POS-S133

Follow-onTxns

POS-PV17

n/a

Post-TxnEvents

LogoutClose RegisterClose StoreRPT-16

PRIOR EVENTS

tests. Some situations where it could beappropriate include:

• Acceptance tests of business sys-tems—e.g., UAT or vendoracceptance.

• End-to-end systems integrationtests of multisystem solutions orenterprise integrated systems.

• Situations where a test team lackssufficient detailed system knowl-edge to do under-the-covers ordeep function testing and has noway to get it, and there is insuffi-cient time to explore. When thereis a combination of inadequatedocumentation, restricted or zeroaccess to people who wrote thesoftware, and critical time pres-sures, scenario testing can be thebest solution to the testing prob-lem. (All of these constraintsapplied on the POS project, andwe overcame them with scenariotests informed by business domainknowledge.)

CRITICAL SUCCESS FACTORSThe single most important requirementfor designing good scenario tests isbusiness domain knowledge, in the formof one or both of the following:

• Testers who have experience in

the domain• Input from, and

reviews of sce-narios by, busi-ness represen-tatives

Where neither ofthese is available, it isat least possible toresort to industrybooks, as Cem Kanersuggests in his article.

To use the approach I’ve des cribedhere, you need:

• A model with a framework thatfits the type of application. Adata-driven model works well fortransactional systems.

For other types of applications—e.g., a desktop publishing system—youwould need to create a different modeland framework, such as one based onusage patterns.

• Testers skilled in structuredanalysis. This is not the no-brain-er you would think. Not alltesters—not even all goodtesters—have this skill.

• A building-block approach, so youcan design and build varied andcomplex tests from simple ele-ments. Among other advantages,

this allows you to begin testingwith the simplest conditionsbefore adding complexity to yourscenarios.

It also makes it possible to auto-mate some of the test artifacts and buildin calculated expected results. Thisbecomes essential in large-scale sys-tems integration tests, where you haveto communicate accurate expectedresults to multiple teams downstream inthe integration.

THE RISKSIf you adopt scenario testing, these arethe critical things to watch out for:

• Scenario testing can miss importantbugs you could find with under-the-covers or deep function testing.Remember that scenario testing isbetter for testing end-to-end than itis for testing deeply, and build yourstrategy accordingly.

•Bugs found in scenario outcomescan be difficult to diagnose.Especially if system quality is sus-pect, it is essential to begin withsimple scenarios that test basicend-to-end function, only pro-ceeding to complex scenarioswhen you have established thatfundamental quality is present.

• In choosing a model, there isalways a risk of fixing on one thatis too restrictive. Applying two ormore model types will help pre-vent this.

• Equally, there is a risk of choos-ing a model that is too expansive(and expensive).

FINAL ANALYSISFinally, never fall in love with onemodel type. This applies to your mod-els for a single test as much as itapplies to your model choices for dif-ferent tests.

Every test is different, and everymodel type can bring benefits that oth-ers lack. !

Store 1

1 Test Cycle = (in store POS days *2 stores)

Store 2

Central Office Totals

1 Store POS day = n Customer POS Transactions

1 Customer POS Transaction = n Item Transactions

1 Item Transaction = the lowest level POS test case

Store Day Totals

Itemized POS receipts

Item table in spreadsheets: prices, promotions, etc.

Store POS Day Store POS Day

Customer POS Transaction Customer POS Transaction

Item Transaction Item TransactionItems

Test Cycle

FIG. 7: SCENARIO MODEL


TABLE 3: BUGS

Severity

1

2

3

4

Total

Count

8

331

116

23

478

%

2%

68%

25%

5%


know how and when we’ve hit the target—that is,how we’ve made the correction appropriately.

Here, we’ll assume we’ve already defined aproblem issue, whether code-specific or process-specific (process-specific issues are often relatedto software productivity and failure to meet

deadlines or budget targets), and we’ll delve intothe second piece of the Six Sigma puzzle:“Measure.” (In upcoming issues, we’ll addressthe next three aspects: “Analyze,” “Improve”and “Control.”)

Before doing any measuring, you must:• Identify key quality attributes to be measured• Devise a measurement plan• Understand the limitations of the measure-

ment systemWe’ll break our discussion into two compo-

Six Sigma, Part II:Time to Measure

By Jon Quigley and Kim Pries

In the first article in this five-part series (see “Tracking Quality Variations With Six Sigma Methods” at stpcollaborative.com), we discussed the “Define” phase of the Six Sigma quality management process asit relates to software testing and performance. We explained how clearly defining the problem and explic-itly describing what needs to be accomplished allows us to set the scope for project activities so we’ll

Jon Quigley is the manager of the verification and testgroup at Volvo Truck. Kim Pries is director of productintegrity and reliability for Stoneridge, which designsand manufactures electronic automotive components.

SIX

SIG

MA

I. Define II. Measure III. Analyze IV. Improve V. Control

nents: measuring the software itself andmeasuring the software process.

MEASURING THE SOFTWARESome of the common metrics used withsoftware are:

• Lines-of-code• Function points• Defect counts• Defect types• Complexity

Lines-of-code is an objectivemeasure of software size. It’s impor-tant for developers to understand thathigher-level languages like C++ don’ttranslate one-for-one into the exe-cutable code the machine will use. Thisis also the case when we use dynamiclanguages such as Perl, Python andRuby. Java is converted into byte-code, which, in turn, is converted intoexecutable code. For these reasons,lines-of-code can be a problematicmetric. On the positive side, though, ifall development work is being done inone language, the metric is simple togenerate and provides a basic level ofmeasurement.

Function points are an alternative tolines-of-code. When using the functionpoint approach, we work with the soft-ware specification and attempt to esti-mate the complexity of the product usinga special set of rules. Function pointspresent problems because the estimationprocess itself is highly subjective.Sometimes an organization will use somewell-known numbers and “backfire” thefunction point value from the lines-of-code. So why not use the lines-of-code inthe first place? We might use functionpoints when we’re comparing the poten-tial complexity of a product to some stan-dard and we have no existing code.

Defect counts are easy to accumu-late. They comprise categorical data,which Six Sigma practitioners have touse to make any meaningful statisticalanalysis. Luckily, this kind of analysisdoesn’t have to be that difficult: We canuse a Pareto chart to display defectcounts by function. (See Figure 1 for anexample.)

Using our count data and thePareto chart—which is organized bycount from left to right, high to low—we can easily see that the“read_analog” function is a leadingsource of software defects. By meas-uring our work using this approach, wecan focus our attention on functionsthat show frequent problems. This

doesn’t mean that we’ll ignore theother functions, just that we know wehave serious issues with the functionson the left side of the graph.

Using defect types as a metricallows us to check our software to seeif we’re repeating certain kinds ofdefects regularly and may providesome direction to our testing groupduring the “Analyze” phase of our SixSigma project. A few examples ofdefect types are:

• Boundary issues• Byte boundaries (with embedded

code)• Array/matrix out-of-bounds issues

(with non-embedded code)• Logic issues • Odd expressions, especially with

“negative” tests (those using“not”)

• Unusual case structures• Syntax checking• Functioning of compiler• Data issues• Illegal inputs• Incorrect casting to convert from

one data type to anotherOne benefit of measuring the quan-

tity of each data type is that we canagain apply Pareto analysis to theresults and use the information duringour Six Sigma “Analyze” phase. Also,we can either construct a taxonomy oftypes or we can use a technical standardsuch as IEEE 1044-1993. A taxonomywill represent the types in a hierarchicaltree format.

Software complexity metrics attemptto represent the intrinsic complexity ofthe software by making calculations.Probably the most well-known approachis the cyclomatic complexity calculationof Thomas McCabe. McCabe initially

used a graph theoretical approach—aflow graph—to estimate complexity. Theformula for such an approach is:

Cyclomatic Complexity = E ? N + P, whereE = the count of edges of the graphN = the count of nodes of the graphP = the count of connected components

Luckily, we don’t have to developthe graphs or calculate the formula byourselves. We can use tools such asJavaNCSS for Java or Saikuro forRuby. These tools will go through thecode and make the calculations basedon the number of “if” and “case”statements, as well as other flow-alter-ing code, and present the developerwith the value. The cyclomatic com-plexity is a completely objective value.While some people may question themeaning of cyclomatic complexity, thevalue generated is, at a minimum, sug-gestive of areas of code that may meritcloser attention.

MEASURING THE SOFTWARE PROCESSWe can also use our measurement toolsto take a look at the developmentprocess itself. For example:

• Project status• Quality (addressed in the first

section of this article)• Delivery• Compliance with international

standards• Coding standards• Factor/response analysisWhen looking at a portfolio of soft-

ware projects and assessing them overtime, we can put the status of the soft-ware projects into the following cate-gories:

• Ahead of budget/schedule

FIG. 1: DEFECT COUNT BY FUNCTION



• On budget/schedule• Behind budget/schedule• Suspended (may resume)• Stopped (candidate for aborted)• Unstarted • Completed• Unknown • Aborted (completely dead)• Zombie (similar to unknown and

still consuming resources)Some of these—“unknown,” for

instance—represent serious problemswith the process. Basically, the simplestapproach would be to build a table thatlists the development projects in thefirst column and the status of each proj-ect in the second.

Project budgets are an obviousmetric for the development processand fit into a few of the categories wejust discussed. The highest cost fre-quently comes from the payroll for thedevelopers. During a development wemight wish to estimate our level ofeffort so we can put a value on thesoftware engineering we’re proposingto do. One way to do this would be tomodel our efforts using a Rayleighdistribution.

Figure 2 shows how we can meas-ure our actual effort (e.g., hours oflabor) against the probability distribu-tion function. Front-loading is desirablein order to avoid the “student” effect(named for the tendency of students toput in all their effort the night before atest).

Another interesting feature of thisRayleigh distribution is that we canalso measure the arrival of defectsthrough the duration of the project andcompare our results against the model.

Usually real data is noisy (some ele-ments of randomness), but we can stilluse this approach to potentially pro-vide a basis for releasing the softwareproduct as our defect arrival rate slowsdown and we move into the right tail ofthe distribution.

During our software process, wewant to measure delivery of functions,modules and other configuration itemsto see if we’re meeting our schedules.Instead of using the states we definedpreviously, we measure our historicalprogress against development timelines. Tools such as Microsoft Projectallow the project manager to baselinethe initial plan and then produce a sec-ondary plot of the actual dates super-imposed on the first timeline. We canassess the distribution type, themeans, the variance and any other sta-tistical values that would allow us to

model the software development proj-ect. For example, using a tool like@Risk from Palisade Corp. in concertwith Microsoft Project, we can runMonte Carlo simulations of the proj-ect. The Monte Carlo approach gener-ates random values from a given prob-ability distribution; that is, we’re usingthe probability distribution we meas-ured to generate a model for eachdelivery. In so doing, we can provideour software project managers with amethod for simulating the deliverytimes of an entire project or portions ofa project.

Another method of processimprovement is to drive the develop-ment team to compliance with an inter-national standard. One such standardis ISO/IEC 15504 in nine parts.ISO/IEC 15504 is also known as theSoftware Process Improvement andCapability dEtermination (SPICE),which is a variation on the maturitymodels developed by the SoftwareEngineering Institute in the 1980s—theCapability Maturity Model (CMM) andthe Capability Maturity ModelIntegration (CMMI). These approachesspecify a stepwise approach toincreasing the capability of softwareorganizations. All of these organiza-tions define best practices and publishdocuments for use by auditors, but nosoftware organization has to wait for aformal audit to establish the practicescontained in these standards.

We can also measure compliancewith coding standards such as MISRA C.MISRA, the Motor Industry SoftwareReliability Association, is one group thatspecified best practices in the C language

XXXXXXXXXXXXXXXXXXXXXXX

FIG. 3: CORRELATING INPUTS AND OUTPUTS

FIG. 2: FRONTLOADED EFFORT

for embedded systems in the automotiveindustry. An organization doesn’t have topurchase a coding standard—with enoughproduct history, it can develop its own.The standard developers can refer to thesoftware metrics, especially thetaxonomy of failure types and thetypical functions that become“problem children.”

One of the most valuedtools in the Six Sigma toolbox isthe IPO diagram. (See Figure 3.)

This diagram offers a modelfor measuring and correlating fac-tors (inputs) against their effects(outputs). For example, we mightuse this approach to discernwhich factors are providing ourdevelopers with fewer defects:They could be using static analyz-ers, dynamic analyzers, debug-gers, oscilloscopes, logic analyz-ers and other tools. We’d thenconduct experiments to provideourselves with a meaningful basisfor recommending one tool or family oftools over another. Some readers mayrecognize this approach as the Designof Experiments, or DoE. The downsideto this approach is that we usually needspecial software to perform the analysis.

Some choices of support software are:• Minitab• Design Expert• DOE++• Nutek-4

For something as complex asregression analysis, we can use a freestatistical language called “R” to per-form our calculations. R is command-line driven, although a graphical userinterface called “Rcmdr” helps ease the

shock of using an old-fashioned com-mand line. R even has an interface toMicrosoft Excel that allows R to beused as a server to Excel as a client.Designed experiments and regression

are powerful tools for analyzingprocesses.

WHAT’S NEXT?We can apply the Six Sigma“Measure” approach to soft-ware directly, and we can alsouse it for the software develop-ment process. When Six Sigmatools are applied to a process,we usually call it transactionalSix Sigma because we’re dealing with people. From “Mea -sure,” we’ll move to “Anal yze”in our next article and show youwhat we do with the factorswe’ve already measured.

Don’t get frustrated, thinkingyou have to use statistics as partof the project. In many cases,

using Pareto charts and other basic toolsis more than enough to provide directionfor future analysis and improvement. Andsometimes, simply measuring apreviously un known factor reveals thesolution to your problem. !

Statement of Ownership, Management, and Circulation for Software Test & Performance as

required by PS Form 3526-R. Publication title: Software Test & Performance. Publication num-

ber: 023-771. Filing date: September 25, 2009. Issue frequency: Monthly. Number of issues pub-

lished annually: 12. Annual subscription price: $69.00. Complete mailing address of known office

of publication: 105 Maxess Road, Suite 207, Melville, NY 11747. Publisher: Andrew Muns;

Editor/Managing Editor: Edward J. Correia; Owner: Redwood Collaborative Media, Inc., all locat-

ed at 105 Maxess Road, Melville, NY 11747. Shareholders holding 1% or more of the total

amount of stock: Ronald Muns, 1640 Little Raven, Denver, CO 80202; Robert Andrew Muns, 20

McKesson Hill Rd, Chappaqua, NY 10514; Katherine Muns, 1106 E. Moyamensing, Philadelphia,

PA 19147; Pamela Kay Garrett, 3358 Turnberry Circle, Charlottesville, VA 22911; Erik Leslie, 2485

S. Williams St, Denver, CO 80210; Cole Leslie, 16420 Little Raven, Denver, CO 80202. The tax

status has not changed during preceding 12 months. Issue date for circulation data below:

October 2009. Extent and nature of circulation: Requested US distribution. The average number

of copies of each issue published during the 12 months preceding the filing date include: total

number of copies (net press run): 12,267; paid and/or requested outside-county mail subscrip-

tions: 11,496; in-county paid/requested subscriptions: 0; sales through dealers, carriers, and

other paid or requested distribution outside USPS: 0; requested copies distributed by other mail

classes through the USPS: 0; total paid and/or requested circulation: 11,496; non-requested

copies by mail outside-county: 413; in-county non-requested copies by mail: 0; non-requested

copies distributed by other mail classes through the USPS: 0; total non-requested distribution:

413; total distribution: 11,909; copies not distributed: 358; for a total of 12,267 copies. The per-

cent of paid and/or requested circulation is 93.7%. The actual number of copies of the October,

2008 issue include: total number of copies (net press run): 5,148; paid and/or requested outside-

county mail subscriptions: 4,408; in-county paid/requested subscriptions: 0; sales through deal-

ers, carriers, and other paid or requested distribution outside USPS: 0; requested copies distrib-

uted by other mail classes through the USPS: 0; total paid and/or requested circulation: 4,408;

non-requested copies by mail outside-county: 326; in-county non-requested copies by mail: 0;

non-requested copies distributed by other mail classes through the USPS: 0; total non-requested

distribution: 326; total distribution: 4,734; copies not distributed: 414; for a total of 5,148 copies.

The percent of paid and/or requested circulation is 85.6%. Publication of the Statement of

Ownership for a requestor publication is required and will be printed in the November/December

2009 issue of this publication. I certify that all information furnished on this form is true and com-

plete. Signed: Andrew Muns, President and CEO, September 21, 2009.

Advertiser URL Page

Automated QA www.MissionOffMercury.com 10

Browsermob www.browsermob.com/stp 19

Checkpoint www.checkpointech.com 39

Electric Cloud www.electric-cloud.com 13

Hewlett-Packard www.hp.com/go /alm 40

Ranorex www.ranorex.com 7

STP Collaborative www.stpcollaborative.com 2

Seapine www.seapine.com/stpswift 3

IInnddeexx ttoo AAddvveerrttiisseerrss

[In many cases, Pareto

charts and other basic

tools are enough to provide

direction for future analysis

and improvement.]


THIS SHOULD BE ENGRAVED ONa plaque in every development lab:Modeling application usage in the wild isno longer a black art.

That’s certainly good to know, butwhy was application usage profiling solabyrinthine in the firstplace? It’s self-evident,says PreEmptive Solu -tions’ Sebastian Holst. “Aruntime application mustserve many masters withmarkedly diverse profilingrequirements, prioritiesand expectations.”

Development organi-zations view things from asoftwarecentric perspec-tive while IT organizations,of necessity, adopt anoperational worldview, according to Holst.As a result, developers’ priorities are moregranular than management’s, and areorganized differently from those of theirQA peers. “IT auditors have a differentcharter than the operations managers andsecurity officers with whom they collabo-rate,” Holst says. And, he adds, let’s notforget the end user and line-of-businessowner who ultimately fund this diversecommunity of application stakeholders.

Profiling and testing must serve thevaried and often mutually exclusive inter-ests of these stakeholders. Performance,accuracy, ease of use, security, bandwidthefficiency and a dozen other features andfunctions are critical, but only to specificstakeholders in specific ways.

The mix changes depending on theapplication and the business mission.Holst cites four examples: An automatedteller machine manufacturer wants tomeasure implementation efficiency, moni-tor usage for refactoring Java and ensureend-of-life policy enforcement; a trans-

portation services ISV strives to improvesupport response efficiency for its end-user audience of nontech-savvy truckingindustry professionals; an ISV aims toautomate its customer-experienceimprovement process; and a manufactur-

er of hearing aids plans toimprove the process itsdealers use to fit and tunedevices to users’ ears.

Regardless of the me -t rics development and testteams track for thesediverse markets and goals,a common core of ques-tions remains constant,says Holst: What’s run-ning, how well does it per-form, how does it compare(with a prior version or as a

hosted vs. nonhosted model), how does itaffect business outcome or ROI, and canit be managed fromcradle to grave?

How much more ef -fec tive and efficient couldtesting be if you had reli-able and accurate run-time intelligence from thefield detailing which fea-tures were being used,by whom, where, whenand how? It’s not as faraway as it may seem:Post-build injection isemerging as a de facto.NET standard and is orwill be available for both.NET and Java. Supportfor Azure and Silverlightis provided.

Application runtimeintelligence is emerging as a great way tomeasure how enterprises not only use butexperience software, Holst says.PreEmptive’s Runtime Intelligence andInstrumentation Platform technologyenables executables to be injected–post-build–with application instrumentation logicthat streams runtime data to one or morecloud-based repositories. The resulting

intelligence can be integrated into CRM,ALM and ERP platforms.

How likely is it to become standard inevery professional tester’s arsenal? Forstarters, Microsoft feels strongly enoughabout PreEmptive’s Dotfuscator Com -munity Edition to integrate its functionalityinto Visual Studio 2010.

To realize this ideal, Holst identifiesfour requirements for the technology:Runtime data collection, transport, analyt-ics and reporting; developer instrumenta-tion platform and tools; security and priva-cy policy management services; and role-based access for both the software sup-plier and end-user organizations.

Injection makes it easier to under-stand various end-user hardware configu-rations, for example. It’s virtually impossi-ble for any company to test all the config-urations its customers use, but a post-build injection lets you break out usage

data by nearly any crite-ria. A company may findit has only one customerstill using Windows 2000and de cide to drop sup-port, but the analyticscould grab the customerID, query the CRM sys-tem and return the con-tract value. “If it showsthat this one customerconstitutes millions ofdollars in revenue, thebusiness may continuesupport,” Holst says.

Attempting to mo -del the real-world envi-ronment doesn’tnecessarily yieldaccurate re sults

because the “squeak y wheels” whospeak up may not represent users as awhole, while conversely, satisfied cus-tomers tend to stay silent. But throughinjection of instrumentation, it’s now pos-sible to understand the environment andsee which features are used most andleast often, and which can have significantramifications on systems design and sup-


Post-Build Injection Pros!

Joel Shore is a 20-year industry veteran and hasauthored numerous books on personal computing.He owns and operates Reference Guide, a technicalproduct reviewing and documentation consultancyin Southborough, Mass.

“Post-build

injection is or will

be available for

both .NET and

Java. ”

CASE STUDY

Joel Shore

get flexible! apply agile development techniques to ... · qualitest us ceo yaron kottler, who...

Documents