chunks: a basis for complexity measurement

030~4573/84 $3.00 + .I0 Pergamon Press Ltd.

CHUNKS: A BASIS FOR COMPLEXITY MEASUREMENT

JOHN S. DAVIS AIRMICS. 115 O’Keefe Building, Georgia Institute of Technology, Atlanta, GA 30332, U.S.A.

Abstract-The state of the art in psychological complexity measurement is currently at the same stage as weather forecasting was when early Europeans based their predictions on portents of change. Current direct measures of program characteristics such as operator and operand counts and control flow paths are not based on convincing indicators of complexity. This paper provides justification for using chunks as a basis for improved complexity measurement, describes approaches to identifying chunks, and proposes a chunk-based complexity measure.

1, INTRODUCTION

The state of the art in complexity measurement is currently at the same stage as weather forecasting was when early Europeans based their predictions on portents of change: “migration of birds, . . color of the sunset,. . .” [I]. Few of the complexity measures presented in the literature are based on a convincing model of the process of human understanding of a program. There is much more emphasis on attempting to predict software phenomena than on explaining the causes of the phenomena.

Most popular complexity measures are based on indicators which have not been shown to be strongly related to the process of human understanding, for example:

Metric Indicator

v(G)121 linearly independent control flow paths Effort[3] operators and operands Knots[4] control flow path intersections

Control flow paths or knots are good indicators of complexity if “playing machine” is the principal approach of programmers to understanding a program. Operators and operands are good indicators if size is the main cause of complexity. But the results of human factors research support another view, that programmers use chunks to understand a program.

2. CHUNKING

Some of the earliest studies of chunking examined chess players. Experiments have shown that master players can remember more from a quick scan of a chess board than can novices, if the chess board has a meaningful game situation[5]. If the pieces are in a random arrangement, the non-master players do as well as the masters.

Apparently experienced chess players are able to form meaningful patterns into metal chunks. This conclusion follows from the observation that since the mental capacity of the experts is not greater they must use some encoding which makes several pieces as easy to remember as one. It is a reasonable assumption that forming these chunks not only facilitates recall of piece positions but also expedites understanding of the game situation represented on the chessboard.

Chase and Simon repeated DeGroot’s chess experiment but also investigated the chunks. They found that subjects often formed chunks based on attack or defense formations, which suggests that they noticed functional aspects of the pieces[6]. Reitman obtained a similar result with go players[7].

Egan and Schwartz showed that expert technicians recall circuit diagrams in chunks

119

120 J. S. DAVIS

based on function[8]. In analyzing a circuit, they tend to look for elements which are conceptually related.

Experiments on recall of chess and electronic circuits provide results which may be relevant to programs. In any case, there have been some experiments on programs with similar results.

Shneiderman found that all subjects could remember a meaningful program better than one whose statements were in scrambled order, but performance improved with experience. He also found high correlation between recall performance and performance on comprehension tests[9]. The conclusion is that experienced programmers employ chunking to understand a program, whereas beginners tend to consider the individual statements of a program [ IO].

Adelson compared the practices of novice and expert programmers by examining proximities of their responses during free recall of program statements[ll]. The basis of this type of experiment is that, given a random list of items, people tend to order their recall of the items using some meaningful structure. Adelson presented lines randomly from 3 different programs. Experienced subjects recalled statements in groups by program; novices recalled statements grouped by syntactic categories. The groupings of the experts were also less varied (among subjects) than those of the novices. These results are consistent with the hypothesis that the expert programmers identified familiar patterns among the intermixed statements.

3. FAMILIARITY OF CHUNKS

An interesting research issue is the importance of familiarity in the formation of chunks. Can any arbitrary collection of statements from a program become a chunk, or are there just certain patterns which qualify as chunks?

The Raytheon Company found that about half the code in their inventory of COBOL programs consisted of “redundant” segments with potential for standardization [ 121. For example, programs which updated a file often had similar versions of the following supporting functions: get-transaction, edit-transaction, soft-transaction, get-master- record, etc. It is likely that Raytheon programmers are familiar with the code segments which perform these common functions, and they may encode these segments as chunks when they notice them in a program.

Recent controlled experiments at Yale have shown that experts recognize a number of common “plans” (chunks) in programs. The experiments resulted in the identification and classification of several types of chunks, including the language independent categories “strategic”, “tactical”, and the language dependent “implementation”. An example of chunks identified in the experiments is[ 131:

Strategic: “read a value, then process it”

Tactical: “counter-controlled running total loop” Implementation:

count := 0 running total := 0 read (new value) while New value test condition () stopping value do _ _

begin count := count + 1 running total := running total + new value read (new value)

_ _ _

end

Mayer postulated a classification of chunks familiar to many BASIC programmers [ 141. The “mandatory chunk” is a sequence of two or more statements that have to occur together. For example a FOR statement requires a corresponding NEXT statement. “Non

Chunks: a basis for complexity measurement 121

mandatory chunks” include the “repeat a READ loop”. Mayer believes as programmers gain more experience they learn more, and larger, chunks (which he also calls “super- statements”).

Shneiderman believes that programmers use their syntactic knowledge of programming language to form an internal semantic representation. This representation is based on recognition of familiar sequences of statements (chunks) which perform functions such as interchanging the values of two variables, summing the elements of an array, finding the larger of two variables, etc.[15]. The programing language and the application environment of the programmer determine which chunks a programmer will recognize. Some programming languages have built in functions to accomplish the aforementioned tasks, but those which do not require a chunk of several statements.

Other support for investigating chunks as the key to comprehension is found in [l, 9, 16211.

The available evidence and the opinion of many experts strongly suggest that programmers do not understand programs on a character by character basis. Rather they assimilate groups of statements which have a common function. These groups are called chunks. (This is a loose definition, because it is not clear exactly what a common function is and how big a chunk can be.) The mental encoding of chunks probably facilitates both recall and understanding.

4. CHUNK AS A BASIS FOR COMPLEXITY MEASUREMENT

Any model of program complexity based on chunking should account for the complexity of the chunks themselves and also the complexity of their relationship. (The relationship of chunks may also be called the structure of the program or the interconnection of the chunks.) This follows from the fact that the chunks work together in some fashion to accomplish the overall function of the program. A chunk complexity measure thus requires identifying chunks, assigning individual chunk complexities, and taking account of chunk interconnections to determine overall program complexity.

IdentiJication of true chunks

The difficulty of identifying true chunks (those programmers actually use) is caused in part by the problem of relating the extrinsic (or teleological) and intrinsic roles of a chunk. Instrinsic properties are those independent of the context or role. For example, the intrinsic description of a capacitor is given by i(t) = C du(t)/dt. Yet in an electronic circuit, a capacitor may be identified as a bypass capacitor, a coupling capacitor or a tuning capacitor, depending on what it accomplishes[22]. The intrinsic action of the DO LOOP in FORTRAN is well known. For example, the statement, “DO 10 I = 1, 12” cycles the variable I through the values 1-12. This action could in one program represent cycling through codes for the 12 months of the year, but in another program I might serve as the index of a 12 character buffer, and the extrinsic role of the loop could be to fill the buffer.

Controlled experiments on program recall suggest that chunk boundaries are indicated by the frequency of “correct to incorrect” or “incorrect to correct” patterns among statements recalled by subjects[l7]. Dunsmore proposes that chunk boundaries may be determined by “valleys” in the profile of the number of live variables which exist at each statement [23].

Some progress in formally defining chunks, using plans which resemble flowcharts of program segments, is presented in [22] and [24], but these researchers did not support their work by empirical study. They selected chunks based on their own intuition.

Attempts at automatic recognition of chunks, based on classification by function, have produced recognizers which work for only a very restricted class of programs[24--261.

Approximation of chunks Indentification of chunks by teleological role is beyond the state of the art, but some

simplifying assumption will allow investigation of chunk based complexity measures.

122 J. S. DAVIS

A reasonable approximation of a chunk in COBOL programs is the “performed paragraph” (PP).?

A PP is any segment of code delineated by a start point and end point as described below:

(a) A start point is any paragraph label which is any one of these: the first paragraph label in the PROCEDURE DIVISION, an entry point, is referenced by a GO TO, the first paragraph referenced by a PERFORM.

(b) An end point is a statement preceding a paragraph label which is a start point. Motivation for the above definition of a chunk is that the PP usually exists for one

of the following reasons: ~ the PP is a logical entity (definitely a chunk to the author of the program) -the PP is shared (referenced in more than one place). A desirable result of the above definition is that chunks may be identified objectively.

Complexity measurement of’ chunks

There are two general approaches to assessing chunk complexity. One idea is based on the view that programs consist of an assortment of a few different types of chunks, and each type has a standard complexity. The other approach is to apply some metric to each chunk.

The author has conducted an extensive search for data on the cost of constructing or understanding chunks. No useful data has been located. Potential sources which were investigated included:

---error reporting systems (a detailed classification might associate an error history with certain chunks)

-software development cost records *ontrolled experiments.

Chunk data is scarce because it is difficult to obtain. Sophisticated classification of chunks seems beyond the state of the art. Certainly

applying a metric to each chunk is simpler. It is proposed that a reasonable approximation of chunk complexity is lines of code, one of the simplest measures which takes into account the chunk size.

Apparently there have been no previous attempts to quantify chunk interconnection, but many have proposed metrics for interconnection of programs or modules[27-311.

5. THE EQUAL CHUNK MODEL

Interconnection in this model is adapted from Woodfield[30], who took into account the existence of “logical modules” within programs and assumed that modules are reviewed repeatedly by a programmer (once per interconnection) when he or she attempts to understand a program. Reviews after the first one are assumed to require decreasing amounts of time. Woodfield used a formula based on these ideas to demonstrate an improvement over several other complexity measures, but he failed to precisely identify “logical modules”. Rather he assumed that they were numerous in large programs and estimated that they have a constant cost to understand.

In the model proposed here, the chunk and interconnection between chunks are of primary interest. It is assumed that every chunk must be reviewed (read) at least once simply because it is in the program. A further assumption is that each chunk will be reviewed again, once for every other chunk it “affects”. The reason is that, when attempting to understand a certain chunk, a programmer must refer to other chunks which affect it, since the behavior of all chunks cannot be fully understood after only one “pass” (one reading).

A review factor takes into account the assumption that the time to review a given chunk decreases with each repetition. Woodfield had some success by cumulatively

tThe concept can be extended to other programming languages. The author has derived a definition of PP for FORTRAN.


applying a review constant of 213, and it will be used here. Using this approach, if the first review took 9 minutes then the second would require 6 minutes, the third would require

4 minutes, etc. The formula for program complexity is then:

where C, = complexity of chunk i ( # lines of code) R = 213, the review constant. The fan-in accounts for the number of other chunks affected by a particular chunk.

“Affected by” is a binary relation on the set of chunks in a program. Formally, A is affected by B, denoted A = > B, if any of the following is true:

(1) Chunk A has a control connection to B, denoted A = > cB.

(2) Chunk A has a data connection to B, denoted A = > dB. A has a control connection to B if A contains a PERFORM or GO TO statement

which references B. A has a data connection to B is there is some variable x whose value is changed in B and referenced in A.

The fan-in of chunk B is the number of chunks K such that (K, B) is in the “affected by” relation.

The chunk model is not based on the strict “play machine” aproach to program understanding. Certain data connections (as defined above) are not supported by a true connection. For example, for some A, B it is possible for A = > dB even though the execution of B never affects the behavior of A.

6. THE FAMILIAR CHUNK MODEL

In all controlled experiments on program complexity known to this author, the programs were either created during the experiment or were presented to subjects at the start of the experiment. One of the most significant differences between this situation and actual practice is that maintenance programmers often work with the same programs over a period of time and thus gain some familiarity which has an impact on their performance in making modifications.

The previously described model, though founded on the chunking idea, does not carry the concept very far. Experienced programmers probably look for familiar patterns when attempting to understand a program. Therefore they will tend to grasp the effect of familiar chunks more easily that that of unfamiliar segments of code. The unfamiliar portions must be investigated more thoroughly, probably with a line by line study.

The ‘chunk model may be extended to account for the difference in effort required to understand familiar and unfamiliar chunks. The approach is to use as the review constant 6, a function of the familiarity of chunk i.

The more familiar a chunk, the lower the review constant should be. This is consistent with the belief that it takes abour the same effort for the first review of any chunk, regardless of its familiarity. But a familiar chunk will not require as much effort for repeated reviews; at some point the programmer will grasp it as an abstraction and think, “I know what that does without having to look at it again”.

Familiarity could be subjectively rated by programmers, but an objective method would be more practical. Evidence has been cited that most programming tasks consist of a number of standard subtasks which are repeated in many projects. If one can establish a classification of these commonly occurring subtasks, or chunks, then an alternative, objective, method is available. One can determine the frequency of occurrence of each type of chunk among the programs in a given environment. Following the assumption that familiarity is proportional to frequency, the higher frequency chunks may be assumed to be more familiar and should be assigned a lower review constant.

7. APPLICATION OF THE EQUAL CHUNK MODEL

The author has not found any suitable experimental data for COBOL, but the equal chunk model seemed to perform well on FORTRAN programs selected from reports of

IPM Vol. 20. No. l/2-~1

124 J. S. DAVIS

controlled experiments. The following results are based on data from an experiment designed to test the effect of control flow complexity on program understanding[32]. Two of the programs performed the same function but the second had more complicated control flow. Graduate student subjects were evaluated on the percentage of statements which they could recall correctly

Program 1 Program 2

Statements recalled Correctly Chunk model Complexity Cyclomatic Complexity Lines of code

59” I) 42’:”

25 31

4 5

18 19

Listings of the two programs are at Appendix A, with chunk boundaries marked. Another experiment measured program comprehension by giving an examination on

four programs[30]. Subjects were advanced under-graduates or graduates, and all were proficient FORTRAN programmers. The programs solved the same problem but differed in modularization. The modularization approaches were monolithic, super modularization, functional modularization and abstract data type, but listings for only the last two were presented in the report.

Mean scoore of comprehension tests Chunk model Complexity Cyclomatic Complexity Lines of code

Functional Abstract modularizdtion data type

51 77

223 214

15 18

139 170

The performance of the chunk measure on this data is more supportive than that for the first experiment, since it ranks programs correctly, but cyclomatic complexity and lines of code do not.?

8. FURTHER WORK NEEDED

These models do not account for the “bandwidth” of the connections between chunks. All “affected by” relations are counted the same, whether they are established by a single data or control connection or by any number of control and data connections. The reason for this simplification is that there is likely some mental effort economy involved in studying multiple interconnections between two given chunks, but further research on this issue would be helpful.

These chunk models do not take into account the construction by the programmer of a multi level internal semantic structure to represent the program[9]. Shneiderman points out that it is possible for a programmer to understand the lower level chunks without knowing what the overall program does. Conversely, one may understand the higher level function of a program (say, from reading comments) without grasping the low level details. Mayer argues that new programmers quickly learn small chunks, such as the “repeat a READ loop”, and with time become familiar with higher level chunks[l4]. Experiments with chess players have also shown that more experienced players tend to form larger

twoodfield’s measures based on the logical module idea, from which the chunk measure is derived, ranked the programs correctly.


chunks[33]. The process of understanding a program probably involves forming a hierarchy of chunks. At the highest level, the meaning of the program is evident. Further research is necessary to determine how the mental structuring can be modelled and measured.

The assumptions of the chunk models should be validated by empirical work to resolve these questions:

-are PP’s a good approximation of chunks? -is the review constant a valid idea? <an chunks be classified by type? -is chunk frequency related to familiarity? The author is investigating the performance of chunk based measures using large

COBOL programs. Results will be compared with cyclomatic complexity, Halstead’s Effort, and lines of code as predictors of the number of error occurrences and the effort to repair an error.

9. CONCLUSION

Since chunks play an important role in programmer cognitive processes, they are a more convincing basis for complexity measurement than those of existing direct measures. A chunk model of program complexity requires schemes to identify chunks, assess individual chunk complexity and account for the chunk interconnection complexity. Further evaluation of the chunk models presented here should help determine whether chunks are useful in developing a complexity measure of practical value.

REFERENCES

[l] R. A. DEMILLO and R. J. LIPTON, Software Project Forecasting. Yale Report, 182/80, 6-1 to 6-20 (1980).

[2] T. J. MCCABE, A complexity measure, IEEE Trans. Software Engng 1976, SE-2, 308-320.

[3] M. H. HALSTEAD, Elements of Software Science. Elsevier North-Holland, New York (1977). [4] M. WOODWARD, M. HENNELL and D. HEDLEY, A measure of control flow complexity in program

text, IEEE Trans. Software Engng 1979, SE-5, 45-50. [5] A. D. DEGROOT, Thought and Choice in Chess. Mouton, The Hague (1965). [6] H. A. SIMON and W. G. CHASE, Skill in chess, Am. Scientist 1973, 61, 39aO3.

[7] J. REITMAN, Skilled perception in go: deducing memory structures from inter-response times. Cognitive Psychology 1976, 8, 336356.

[S] D. E. EGAN and B. J. SCHWARTZ, Chunking in recall of symbolic drawings. Memory Cognition

1979, 7(2), 149-158. [9] B. SHNEIDERMAN, Measuring computer program quality and comprehension. Int. J.

Man-Machine Studies 1976, 9, 465478. [lo] B. SHNEIDERMAN, Exploratory experiments in programmer behavior. Int. J. CZCS 1976, 5(2),

122-143. [I 1] B. ADELSON, Problem solving and the development of abstract categories in programming

languages. Memory Cognition 1981, 9, 422-433. [12] R. LANERGAN and B. POYNTON, Reusable Code-The Application Development Technique of the

Future. Raytheon Co. report 1979. [13] E. SOLOWAY, What do novices know about programming? Directions in Human-Computer

Interactions. Ablex (1982). [14] R. E. MAYER, A psychology of learning basic. Commun. ACM 1979, 22(11), 589-593.

[15] B. SHNEIDERMAN, Software Psychology: Human Factors in Computer and Information Systems.

Winthrop (1980). [ 161 M. ATWOOD, A. TURNER and H. RAMSEY, An Exploratory Study of the Cognitive Structures

Underlying the Comprehension of Software Design Problems. AR1 Rept. 392 (1979). [ 171 A. NORCIO, Human Memory Processes for Comprehending Computer Programs. Applied Science

Department, U.S. Naval Academy (1980). [18] W. J. TRACZ, Computer programming and the human thought process. Software-Practice

Experience 1979, 9, 127-137. [19] B. CURTIS, S. SHEPPARD and P. MILLIMAN, Third time charm: stronger predictions of

programmer performance by software complexity metrics. Proc. 4th ICSE 1979, 356-360. [20] R. BROOKS, Towards a theory of the cognitive processes in computer programming. ht. J.

Man-Machine Studies 1977, 9, 737-751.

126

1211

PI

u31

[241

v51

WI

[271

P31

[291

[301

[311

[321

[331

R. E. MAYER, The psychology of how novices learn computer programming. Computing Surcey.s 1981, 13(l), 121-141. C. RICH. H. STROBE. R. WATERS, G. SUSSMAN and C. HEWITT, Progvumming Virlred us an ,&@neering Acticit),. A.I. memo 459, MIT (1978). H. E. DUNSMORE, personal communication (1982) R. J. WATERS. Automatic analysis of the logical structure of programs. Ph.D. Thesis, MIT/AI/TR-492 (1978). G. FAUST, Semiautomutic Transition of COBOL into HIBOL. MIT Rept. MITjLCSiTR-256 (1981). D. BROTSKY, Program Understunding Through Cliche Recognition. Working Paper 224, AI Lab., MIT (1981). H. E. DUNSMORE and J. D. GANNON, Entropy und the Complexity of Interprocedurul Duta

Communicution. Department of Computer Science, University of Maryland (1978). S. M. HENRY, D. KAFURA and K. HARRIS, On the relationships among three software metrics. Proc. ACM SIGMETRICS 1981, 81-89. T. GILB, Software Metrics. Winthrop, Cambridge, Mass (1977). S. N. WOODFIELD. Enhanced effort estimation by extending basic programming models to include modularity factors. Ph.D. Thesis, Department of Computer Sciences, Purdue University (1980). W. P. STEVENS, G. J. MYERS and L. L. CONSTANTINE, Structured design. IBM SJX. J. 1974, 13(Z). 115-139. T. LOVE, An experimental investigation of the effect of program structure on program understanding. SIGPLAN Notices 1977, 12, 105-l 13. H

5

10

20

J. S. DAVIS

A. SIMON, How big is a chunk. Science 1974, 183, 482-488.

APPENDIX A Program 1

SUBROUTINE SHELL (X, N) DIMENSION X(N) IGAP = N

IF (IGAP. LE. I) RETURN IGAP = IGAP/ IMAX = N - IGAP

IEX=O

DO 20 I = I, IMAX IPLUSG = I + IGAP IF(X(1)) .LE. X(IPLUSG)) GOT0 20 SAVE = X(1) X(1) = X(IPLUSG) X(IPLUSG) = SAVE IEX=IEX+l CONTINUE

IF(IEX .GT. 0) GOT0 IO GOT0 5

_

END

Program 2

SUBROUTINE SHELL (X, N) DIMENSION X(N) IGAP = N

I IF(IGAP - I) 50. 50, 5


5 IGAP = IGAP/ IMAX = N - IGAP

I IEX=O I=l

10 IPLUSG = 1+ IGAP IF(X(1) - X(IPLUSG)) 20, 20, 15

15 SAVE = X(1) x(I) = X(IPLUSG) X(IPLUSG) = SAVE IEX=IEX+l

20 1=1+1 IF(1 - IMAX) 10, 10, 40

40 IF(IEX) 1, 1, 7

50 RETURN END

Note. Horizontal lines represent chunk boundaries.

APPENDIX B

Determination of performed paragraphs in FORTRAN (1) A start point is any statement such that: (a) it is the target of a GO TO; (b) it is the first

statement in the program; (c) it is an ENTRY statement; (d) it is a DO statement; (e) it immediately follows the last statement of a DO loop.

(2) An end point is any statement which immediately precedes a start point. (3) A performed paragraph is delineated by start and stop points as described above. (4) All non executable statements except SUBROUTINE and FUNCTION are disregarded

when identifying start/stop points and counting lines of code. Examples are DIMENSION, REAL, DATA, FORMAT.

(5) Data dependency calculation is the same as for COBOL, with the following exceptions: (a) Index variables in DO loops are considered local. (b) Subprograms are expanded like macros prior to computing data dependencies. In other words, a separate copy of the subroutine is generated for each different call.

chunks: a basis for complexity measurement

Documents