error what to look for when debugging sas programs

Upload: nagesh-khandare

Post on 03-Jun-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Error What to Look for When Debugging Sas Programs

    1/6

    ERROR: WHAT TO LOOK FOR WHEN DEBUGGING SAS PROGRAMSBrit Minor, Pharmaceutical Research Associates

    ABSTRACTHave you ever had an experienced S ~ programmer glanceover your shoulder for ten seconds and spot an error thatyou've spent hours trying to find? That programmer probablysaw an old friend, an error shelhe's m ~ d e and caught many atime. This paper presents a compendl,- m of common SASerrors, their warning signs, and suggestions for how to fix oravoid them.

    INTRODUCTIONDebugging Is a necessity born of u n d i s c i p ~ i n e d but l l ~too-human practices. If you put enough effort Into the designphase of your project Iwith pen and p a p ~ r or w i ~ c o m p u ~ rused solely as a notepad1. are familia.r With SAS eccentricities,and can type in the designed code Without a lot of spur-ofthe-moment experiments, your programs should run correctiythe first time or require minimal debugging.Like most of us, I spend a lot of time debugging. In thefollowing sections, I'll explain the use of the SAS log as adebugging tool, o ffer a symptom-by-symptom look at ~ m m o nSAS coding errors, and discuss some general debuggingtechniques. Finally, I'll try to convey a few tricks of the ~ o v e r -the-shoulderMguru, in hopes that you can save y?urself timeand potential embarrassment when the next ail-nightdebugging session looms.

    THE SAS lOGThe SAS log is the first place to look for clues as to howyou've gone astray. SAS error messages, warn,iogs, andnotes help guide you, often indirectly; with practice, thesemessages become more enlightening.Notes at the end of each SAS step tell you how manyobservations, variables, and pages of output resulted from thestep. You should compare these to the numbers you hadexpected. Are the number of observations being o ~ t p u tconsistent with what was input? Other notes may Inform youof uninitialized variables, missing valUes, character/numeric.conversions, or ill-formatted values. These notes may providethe first clues that something isn't as you expect.Some notes are cause for alarm. The following two, inparticular, signal potentially grave problems with your data orprogram design: .

    NOTE; Merge has more than one data set with repeatsof BY values.When you merge data sets, you tell SAS how to structurethe merge by using a by statem.ent If xou don't u ~enough by-variables you can wind up With groups Ineach data set that share all by-variables but still havemultiple observations. SAS then d.oes something t h ~ tnearly always not what you want: It match-merges Withinthe groups un gets to the end of the group with fe.werobservations, then it merges the last observatIOn of thiSgroup with the remaining observations of the larger group.If you see this message, stop and reassess your dataand your design. You probably need to ~ d d a by-variable to further distinguish the observahons.

    NOTE: SAS went to a new fine when INPUT statementreached past the end of line.If each record of your data file is one line, you certainlydon't want SAS to try to combine two or more lines when 1 27

    reading a record. This SAS note tells you, implicitly,that SAS is only reading in half your data: everyother line is getting included by mistake with theprevious line. If you haven't made an o ~ t r i g ~ t errorin your Input statement, you can pften fiX thiSproblem by using the missover option on your intilestatement

    I mention these two notes to emphasize that youreally must check the log carefully to be sure your job has runproperly. If SAS has printed a message that you don'tunderstand don't just ignore it. Ask someone about it. SASis probably'telling you that it has detected a dangerousprogramming practice, one that sometimes is okay but usuallyis inadvisable.

    Other notes are cause for concern as well. Causesand cures for the following messages will be discussed in thecompendium which composes the middle section of this paper:NOTE;NOTE:NOTE:NOTE;

    Invalid numeric data xx= yy , at line ..VariabJe xxxx is uninitialized.Missing values were generated as a result ofperforming an operation on missing values.The data set xxxx.yyyy had 0 observations and zzzvariables.Version 6.06 of SAS provides more informativemessages than past versions. One of these I particularlyenjoy:

    NOTE: Previous statement has been deleted.Irs nice to be told that your statement was so nonsensicalthat SAS just ignored it

    If you are faced with checking one or several longSAS logs, you might consider first running a search for thekeywords that appear in SAS' nastiest messages:ERROR:, WARNING:, uninitialized, invalid, repeats,reached, converted, deleted

    If the search finds none of these words in your log, yourprogram has passed one hurdle but is by no meansthoroughly debugged.

    TYPES OF SAS PROGRAMMING ERRORSMost programming errors can be classified asresulting from messy data, syntactical or semantic misuse ofthe language, or logical errors in the program design. Youshould protect your program from data problems, such as outof-range or missing values, since it is likely that no one elsewill. Syntax errors and semantic disagreements are usuallypointed out pretty emphatically, in SAS notes to the log.

    These e r r o ~ most often result from typos, omitted symbols, orpoor choice of variable names.Logical errors, unfortunatefy, are seldom caught bythe compiler. Your code should put messages to the logwarning you of consequences of any impoverished logicyou've committed t? code. Of course, these messages do n.ogood i they are wntten to the log but not l o o k e ~ at. trickis to start them with a Mnasty keyword that you mclude In thelist of search terms given above. At PRA, we signal criticalmessages with "Problem:", coded thusly:put 'Prob' 'Iem:

    This COding prevents the search from finding "Problem" in the

  • 8/12/2019 Error What to Look for When Debugging Sas Programs

    2/6

    put statement. An alternative technique is to write the criticalmessage out to your listing, so that it can't be ignored.PROBLEMS WITH DATA

    SAS programs deal with messy, real-world data. Valuesmay be missing; whole records may be missing; dates enteredincorrectly so that records may be out-of-sequence; fields thatare guaranteed to only have 1s and s in them all too oftenhave a few 35 as well; numbers like 99 may really be codesfor No Data.

    Your program should not be blind to errors that live in' hedata. However, you should not waste time debugging yourcode when the invalid results you are seeing come fromoutside your program. Therefore, get to know your data.

    A COMPENDIUM OF COMMON SASPROGRAMMING ERRORSThe bulk of this paper is a symptom-by-symptomdiscussion of common mistakes that are made repeatedly by

    SAS programmers.Our first class of errors comprises those related to dataand missing values. The programmer who assumes that the

    data will be perfect lives in constant danger of producingmisleading results. Missing values, in particular, can wreakhavoc on merge and flag-setting operations.SymPtom: Merged records have sequences of missing values.

    These probably arose from some non-mergingobservations among the merged data sets. Thiscondition may be okay, or maybe you really meant todelete certain kinds of non-merging records. You canand should check for records that fail to merge by usingthe in= option:data both;merge one (in=inone) two (in=intwo);by some thing;

    if not (inane and intwo) thenput 'Not in both: ' some= thing=;

    This will help you avoid rude surprises.Sym[p 1om: Some wild values among otherwise valid ones.

    When doing complex calculations, be sure put a fewresults, the values that went into them, and values of anyflags that would affect the calculation. You needcomplete information to find obscure errors.Sym[plom: Too many messages being written to log.

    Be careful not to print out so much that the user getsswamped. You want errors to be found, not buried. Auseful trick to limit output is the following:if likely > 5 and toomany < 10thendo 'Un-likely value: ' prot= inv= pt= likely=;

    toomany + 1;drop toomany;end;You might use this trick in cases where you simply wantto know if any such values exist but you don't need tosee them all.Next we will consider syntax errors. These arise fromtypos and otherwise invalid statements. SAS error andwarning messages wl tell you about these, albeit sometimes 1 28

    cryptically. Usually. you just fix the problems. and rerun yourprogram. Some typos can lead to more exotic consequences.however:SympIom: SAS ignores code THAT'S RIGHT THERE ON THEPAGE.

    This can happen if you've misplaced or forgotten acomment terminator, so that SAS treats what followsas part of the comment:r First step: data seat;set seat;

    if cats = 'meow';r Second step: and end of 1st comment'data spooning;merge rumble (in=inrum)seat (In=lnseat);

    if inrum and inseat;

    A highly unrecommended place to forget a commentterminator is at the end of an %include file. In thiscase, code that comes after the call to the %includefile will be treated as part of the %Include file'sunclosed comment. However, you have to put thepieces together in order to see that this ishappening. Again, all the pieces will be in the log;that's what it is for.Symplom: SAS ignores some statements.SympIom: Variable or format labels are mixed up.

    If you forget a quote mark, SAS will search for onewithin the nex t 200 characters. If the quote markreally is missing, SAS prints an error message tothat effect and doggedly mis-pairs quote marksthroughout the rest of your job, producing a c sc deof similar messages. A common cause is a singlequote that you think is part of the string you'vewritten, but SAS thinks is the end of your string:BAD: proc format;

    value $macaroni'popinjay' = 'The eat's pajamas';run;GOOD: ... 'The cat s pajamas';BETTER: ... The cat's pajamas ;Occasionally, SAS may be able to pair up quotemarks that you intended to be in separatestatements. SAS will ignore your intent, assign anice long string, and, in the process, not see someof your intervening code.

    Symptom: SAS mixes up two statements.Symplom: Extra- data sets are announced in the log.Symptom: Your most valuable data set disappears from thesystem.If you leave off a semi-colon accidently, SAS will runtwo statements together before it checks syntax. Theresult probably will be nonsense, but you may stillwind up with a legal SAS statement. How bad canthis be, you ask? Try this:

    data sex drugs rockrollset up.bringing;

    run;if parents = 'Treat me rough, won t give me apuff';

    SAS accepts this code, writes an uninitialized .variable message to the log (for parents ), and tells

  • 8/12/2019 Error What to Look for When Debugging Sas Programs

    3/6

    you that each of the data sets has zero observations andone variable. Whafs the tip-.off that you left off a aiticalsemi-colon and therefore overwrote your irreplaceableinput data set, up.bringing? Irs the message that thedata set setMhad zero observations written to it.SympIlom: SAS fails to translate a macro variable in a string.

    Macro variables need to be enclosed in double quotes orspecial macro quoting functions. Don't embed macrovariables in single quotes, e.g. '&macva( . Period.On occasion, _ yntax errors may indicate amisunderstanding of how SAS operates or how aparticular function, procedure, or statement works. Irs agood idea 10 check the SAS manual evOf J so often to besure you haven't -'earned- something wrong.A related type of error is the semantic error. This resultswhen you mis-appropriate a SAS keyword.

    Symplom: SAS politely refuses to use your data library.t pe is a special IIbname that signals SAS to expect asequential library on tape. If you forget this and specifytape as the IIbname of a disk library, which you mightdo if it is in a directory is called tape, SAS will run yourjob but probably will not give you useful results.Lastly, I'll consider logical errors. If you've written somelegal SAS statements that give wrong results, you have madea logical error. You probably have figured out the right way tosolve your problem, so don't throw out an almost correctprogram too quickly. SAS gives you very little help in findinglogical errors, many of which can be really tricky to finer. Hereare some of my favorites:

    Symptom: fv1essage that variable xxx is uninitialized.This message usually means you've misspelled a variablename, but it can also indicate that you forgot to keep thevariable in an earlier step. -

    Symptom: Unexpected missing values.You may have forgotten a retain statement, s in thefollowing merge:POOR: d t lookedup;merge lookup (in=inlook)codes (keep=COde word);by code;

    run;

    if first.code thencodeword = code word;retain codeword;if inlook;

    Without the retain, codeword would get set to missingeach time an observation was read in from lookup withno observation being read from codes.Symplom: No-longer-correct values are held over fromprevious by-group.Symptom: Values that should be missing get filled in.Symptom: Sums are too big.

    The preceding SAS code segment was incomplete. Whatif a code was read in from lookup that wasn t in

    c o d e s ? ~ Forgetting to reinitialize a retained variable atthe start of a new by-group can cause problems such asthose in the above symptom list retain statementsshould be accompanied by code such as this:GOOD: data lookedup;merge lookup (in=inlook)codes (in=incodes keep=eocle word); 1029

    by code;

    run;

    if firstcode thendo'ii incodes thencodeword = code word;elsecodeword = ' ;end'retain' codeword;if inlook;

    Symptom: Missing d t or observations.Symptom: Flags set improperly.Two reaJ sneaky problems can arise when youcombine subsetting operations with other conditionalprocessing in one step. If you are using theend:eof option on a set or merge statement andhave some code that should execute when eof istrue, it will fail if your 5ubsetting operation throwsaway the last observation too soon. Similarly, if youare doing by-group processing and are expectingSAS to reset some values If first. his or outputsome records If lasUhat it may take you a while torealize that the conditions would have been met hadthe record not been discarded just before. Here's asimple example:BAD: data results;merge acts (keep = id date action)efts (keep = id date effect)end = eot;by id clate;

    run;

    if effect = '1';n + ;if last.id thenoutput;if eof thenput n=;

    This code is supposed to output one observation forevery id that had n effect of 1. However, as writtenit will output records only when both conditions aretrue namely Meffect '1' and Iastid. The putstatement may never be reached. The sameproblem may crop up when using the deletestatement or even when one condition is embeddedin- an Ifthen block:BAD: data results;merge acts (keep:::: id date action)efts (keep :: id date effect)end:: eaf;

    by id clate;

    run;

    if effect = '1' thendo'n + lif l s t ~ i d thenoutput;end;

    if eof thenput n=;Now the eof condition will be reached, but somelines still may not be outputTo solve this problem, you can use the wherestatement (available in SAS version 6) in place ofthe subsetting If statement in this section's first BAD:data step. Another safe solution is to use a

  • 8/12/2019 Error What to Look for When Debugging Sas Programs

    4/6

    separate data step to do the subsetting:GOOD: data results;merge acts (keep id date action)effs (keep : id date effect);by id date;

    if effect :: '1':data results;

    set result end = eof;by id;

    run;

    n + lif last:id thenoutputif eof thenput n=;

    Symptom: Lots of observations reported missing from a testmerge.If you merge just the first 50 observations of two datasets together when testing a program, don't be surprised,or too worried, if you trigger a lot of those non-matchmessages that you've been exhorted to include in yourcode when merging data sets. 50 observations in ademographics data set may apply to 50 patients; 50observations in a clinical lab data set may apply to onlyone patient.

    Symplom: Unexpected missing values.Symptom: Incomplete calculations.When declaring arrays: be sure you've counted light. Ifyou li st eight variables as being part of an array but givethe array {7} as its dimension, SAS will be happy. Thisproblem can arise when you borrow from a old program,one designed for a different data set. Here's twoexamples:BAD: array test {1 } atest btest ctest dtest etast ftestgtest htest itest jtest ktast;misstest = 0;do i=1 to 10;

    if testO} ': . thenmisstest = miss est + 1;end;

    BETTER: array test {*} atest btest ctest dtest etast ftestgtest htest itest jlest ktest;miss est = 0;do i=l to dim(test);if testfi} :: . thenmisstest = misstest + 1;end;In the first example, SAS processes 10 array elementseven though there are 11 listed; Mktest- is left out. In thesecond, SAS counts the number of array elements viathe n in the array statement and the dim function. Ifyou add or subtract test items, you needn't worry aboutcounting them.Of course, if you've browsed through the chapter on SASfunctions, you'll know that the above eight lines of codecould be replaced with one function.nmiss(atest ktest) would count the number of missingitems for you. It pays to read the manual

    Symplom: Too few observations.Symplom: Incomplete array operations.Nested do loops that use the same index can result fromcareless additions to a program. This particular error is apersonal favorite. I've done it more times than I care toadmit. What will the following code do?

    1030

    do i=l to 5;junk. food{ij;do i : l to 5:brew : beer{ij;output;end'end; ,It will output just 5 observations, not the 25 youexpected, because j- is incremented to 5 in thesecond loop.

    Symptom: Too fewltoo many observations.Symptom: Lower or upper limits are wrongly included orexcluded.Boundary values constituta a general problem forprogrammers. If data values for a variable areintegers 0 through 10, 0 and 10 are the boundaryvalues. If you have comparisons such as < or >= inyour code, the boundary values are where you arelikely to make the wrong decision. Similarly, whenprocessing observations, the first or last observationsof the data set or of y group are the boundaries.If your code pretty much works but sometimesdoesn't, or you are getting too few or too manyobservations, check your boundaries carefully.

    Symptom: Incorrect computations.Symptom: Condition fails or succeeds when it shOUldn't.Do you write complicated expressions withoutparentheses? Do you also like high-wire walkingwithout a net? It takes a while to realize that theexpression:

    if a or band c or d;is equivalent to

    if a or (b and c or d;,especially if you thought SAS would see it as

    if (a or b and (c or d):,which it won' Using parentheses liberally will helpcut down on computation errors.

    Symptom: SAS ignores or misinterprets an else clause.SAS may be seeing the code differently than you'vewritten it:BAD: if apples thenif oranges thena :: b;elsea c;SAS will connect the above else clause with the firstIf. not the second, your indentation to the contrary.You need to enclose the If-then-else in a do-endblock:GOOD: if apples thendo'

    i oranges thena = b'else a = c'end; ,

    Symplom: Values change haphazardly after a merge.If the unmerged data sets share a variable that isnot in the by statement, you will write over values inthe first data set with values from the second. Toavoid this. either rename one of the variables (usethe rename= option on the data set name), or, if the

  • 8/12/2019 Error What to Look for When Debugging Sas Programs

    5/6

    data sets are so sorted, add the variable to the bystatement.Symp1om: -Extra- variable.Symp1om: Invalid data message from SAS.Symp1om: Input values or data _null_ output lines areoverwritten.

    If you omit a decimal point in a format on Input or putstatements, you get a column specifier or anothervariable reference:

    put 6. b 4 c 5. d .gefmt;The put statement above will write -a- in 6. formatstarting at column 1, will write -b- in its default formatstarting at column 4 (instead of in 4. format starting atcolumn 7 which is no doubt what you intended), and willwind up writing -d" in its default format, not in -agefmtFinally. SAS will print the message -Varia6Te agefmt isuninitialized" to the log. Had this been an Inputstatement. "b" might have re-read some character datafrom column 4; an "Invalid data" message might havebeen the result.

    Symp1om: Critical messages unseen due to being buried inlong logs.Too many errors or put statements: when the log getsreally long due to lots of messages, a critical messagemay go unnoticed. This kind of problem is what makesdebugging an iterative process: solve one problem so youcan see the next one. Don't spend 100 much timewading through a SAS log thafs more infonnative thanyou ever wanted it to be, just fix one of the majorproblems and rerun.

    Symp1om: Character-to-numeric, numeric-Io-characterconversion messages.Conversion messages may not signal problems. SAS willconvert numeric to character values without losingprecision in the result. SAS also will reliably convertnumbers stored in character variables to the equivalentnumeric value. However, conversion messages may be aclue that, while YOU may think that all of the variablesinvolved in an expression are numeric, SAS considersone or more to be character. You may not want SAS todecide for you how these values should be translated.

    :::) Several common problems m y arise when manipulatingstrings. Here are a few choice ones:Symplom: Truncated string.Symplom: Data set size much larger than expected.

    The length of a SAS character variable is set the firsttime SAS learns about it within a data step. That firsttime should be in a length statement in most cases.Otherwise, SAS may set the length equal to that of thefirst value encountered (e.g., cvar = test;, where test is apre-existing character variable and cvar is a new variable)or to $200 ( ) if certain functions are used (charva, =scan(test,1);}

    Symptom: Concatenations fail, despite best efforts.When concatenating strings, be wary of unseen "trailingblanks. You can spend hours worrying over code like thefollowing:

    length this that theother $8;this '=' 'pell';that '=' 'mell';theother this Il that;What does th oth r equal? It won't be -pellmell, andhere's why. The variable this gets set to a 4-characterstring, but SAS had been told that it was to be length 8,so SAS silently put 4 trailing blanks on the end. The 1031

    concatenate operation worked, yielding "pell mell,then SAS chopped off th oth r to be length 8.,resuhing in theother = -peU-.A safe way to concatenate is by using the trimfunction:

    theather = trim(this) that:Symptom: Unexpected missing values.Symptom: If-then block never entered.

    Another common problem is comparing two stringswhere one is uppercase and the other isn't. SAS iscase-sensitive. If you aren't sure about the case ordon't care, use the upcase function to convert thestring to uppercase:if upcase(theother} :::: 'PELL';

    FUNCTIONAL GROUPS OF CODING ANDLOGICAL ERRORSTo summarize and augment the compendium of whatcan go wrong, I've rearranged the types of errors intofunctional groups rather than symptom groups. This regrouping may help you when doing an over-the-shoulder quick

    scan of errant code; it may help you achieve that alternativemindset that successful debugging usually requires.Pairing Up

    do-end ( )numeric variables with numeric fonnats; characterwith character

    Missing TokensSemi-colonretaIn statementOmitting decimal point from a format.Incomplete EcitsForgetting to keep a newly added variableAdding a by-variable without checking first. or last

    referencesForgetting to change set to merge after adding dataset to a set statement.Adding or removing variables from an array withoutchanging array bounds.Improper Counting or CodingDeclaring arrays with too few slots.Using a format too small for some values.Misspelling a variable name so that it matchesanother var name.Just Plain WrongNesting do-loops with same index variable.Using set when you mean merge, or vice versa.Using tape as a libname for non-tape library.Letting an Include file call itself.Side EffectsOverwriting values during a merge.Not declaring length of character variables.Using length < 8 for floating point numbers.Not realizing that SAS evaluates all clauses of acompound boolean.Unseen data warnings because SAS stopped printingany after first 20.Events Out-of-SequenceNew by-values not known yet when printing aheader. Flags re-initialized 100 soon or too late.

    symput-defined macro vars not available yet.Referencing a data set that doesn't exist yet.

  • 8/12/2019 Error What to Look for When Debugging Sas Programs

    6/6

    Boundary Values and Unexpected ConditionsUsing < or > when you should use =.Flags set to unanticipated missing values.Using wrong by-variable when setting flags.Omitting other= in format or otherwise in select.StringsLength too short to hold all values.Concatenating string to an existing variable without trimfunction.Ignoring upper lower case when comparing.Reading a character variable from end of variable-lengthrecord without first specifying length of thecharacter variable.Dangerous PracticesMerge without a by.Using too few parentheses.Setting flags after deleting observations in the same step.Using first. or last. values or eof after a delete orsubsetting if or within an if-then block.

    SOME GENERAL ADVICE ON DEBUGGINGThe big trick to complex debugging tasks is to seek anew perspective.An insidious problem with trying to debug your own codeis just that: it's YOUR code. Your thought patterns producedit, your fingers typed it in, your eyes are bugging out at itWhat you probably will wind up doing when trying to fix anagging error is to keep pounding away with the samemindset that produced the error. What are your chances offinding that error? Low, as I know from many frustratinghours and days of doing just that.If you suspect you're in that bind, try to break out. Agreat way is to get another person to look at your code -choose someone friendly if you fear you've made a REALLYdumb error If you can, put the problem aside till another day.If you must keep at it. challenge your assumptions:Print cut some observations from all data sets that areinvolved, including ones that you are sure are okayRun simple proc freqs to look for unexpected values.Remind yourself of odd conditions imposed by the dataTry a different SAS method of doing the "same" thing.Make a special test data set, not just the first 50observations of the one you're using. This data setshould include observations and values that will test allimportant combinations and conditions of your code.

    PROGRAMMING DEFENSIVELY

    You can avoid a lot of debugging simply by followingsome oft-repeated guidelines. Write down the program'sdesign before you code. Develop a consistent coding style,one that you and others can read easily. Indent. Use whitespace. Use descriptive variable names. As discussed in anearlier section, assume the data will be messy; design yourprogram so it won't be defeated by the data. Try hard toignore interruptions if you are in the midst of a series oftedious, interfocking changes. And if you are stuck with asection of awkward, questionable code, alert yourself andother readers to that fact, r LOUDLY 1.HOW TO BE YOUR OWN GURU

    The most important key to being effective at debuggingyour programs is to seek and gain a new perspective on yourproblem. Here's a few closing tips on how to gain insight intoSAS programming problems and solve them more qJickJy:1032

    Learn aU you can about the SAS system and how itprocesses data and code.Reassess your programming assumptions. Rethinkthe problem.Check the SAS log carefully. Don't ignore SASnotes.Diagram program and d t flow, even after the codeis written.Apply lessons learned from other programminglanguages .

    .:: Write lots of SAS code.And, if nothing else seems to work .. call Jim Goodnight

    SASS is a registered trademark. of SAS Institute, Inc., Cary,NC.