Concept, Measurement, and Data in Migration Analysis
Post on 01-Oct-2016
CONCEPT, MEASUREMENT, AND DATA IN MIGRATION ANALYSISWILLIAM HAENSZEL'"
RESUMENSe discute dos metodo para computar tasas de migraci6n, uno relaciona el desplazamiento con
la poblaci6n someiida al riesgo en ellugar de origen y el otro que usa como denominador el productocruzado de la poblaci6n en los lugares de origen y de destino. Se concluye que el segundo asumeimplicitamente que los desplazamientos se originan y terminan como una variable poblacionalfortuita.
Se senalan algunas dificultades con este modele particular y el autor sugiere que deben buscarseotras perspectivas analiticas para tratar datos sobre migraci6n y en conecci6n con esto se refiere a laliteratura sobre la teoria matemdtica de las epidemias.
SUMMARYTwo metlwds of computing migration rates-one relating moves to population at risk in place of
origin and the other using as a denominator the cross-product of population in places of origin anddestination-are discussed. It is concluded that the second assumes implicitly that moves origi-nate and terminate as a random population variable.
Some difficulties with this particular model are pointed out and the author suggests that otheranalytical approaches to migration data be sought and in this connection refers to the literature on themathematical theory of epidemics.
or, more generally,Mr.cu,: Pi I
when several places of destination are in-volved.
of measurement for this rate can be ex-pressed as time/population"; the timedimension is introduced by specificationof the interval in which the count ofmoves was made.
A second approach follows conven-tional vital statistics practice in which therate estimates the probability of the eventin question. The migration rate in thisform is based on the familiar triad ofevent, population at risk (population atplace of origin), and period of observation.The corresponding units of measurementare time/population-s
Rate = Pr( obability) (M ii) = ~:i (2)
2 B. MacMahon, T. F. Pugh, and J. Ipsen,Epidemiologic Methods (Boston: Little, Brown &Co., 1960); M. Spiegelman, Introduction toDemography (Chicago: Society of Actuaries,1955).
The literature on migration presents anambivalent position with respect to thecomputation and presentation of migra-tion rates. One frequently used methodrelates the number of moves (net or gross)between places of origin and destinationto the product of the respective popula-tions:'
R _MiiMiiate- PiPi '
where M represents the number of moves,the direction being indicated by the orderof subscripts, and Pi and Pi the popula-tions at origin and destination. The units
* National Cancer Institute, Bethesda, Mary-land.
1 T. R. Anderson, "Intermetropolitan Migra-tion: A Comparison of the Hypotheses of Zipf andStouffer," American Sociological Review, XX(1955), 287-91; J. Q. Stewart, "The Gravitationor Geographic Drawing Power of a College,"Bulletin of the American Association of UniversityProfessors, XXVII (1941), 70-75; H. ter Heide,"Migration Models and Their Significance forPopulation Forecasts," Milbank Memorial FundQuarterly, XLI (1963), 56-76; G. K. Zipf, "TheP1P./D Hypothesis: On the Intercity Movementof Persons," American Sociological Review, XI(1946), 677-86.
Eldridge" has stressed the need tocalculate migration rates that relate thecount of moves to a population exposedto risk. Thomlinson4 subscribed to thislatter usage in his discussion of migrationrates, although he qualified his remarksby stating that "two base populations arenecessary when using a gravitational ap-proach or when measuring the stream ofmovement-i.e., when emphasis is on themove rather than on an area." Price" alsoaccepts the position that rate of migrationshould be expressed as a probability state-ment; the purpose of his proposed mathe-matical model would be to estimate bymultivariate techniques the contributionsof various components to the force ofmigration.
Their structure and substantive ap-plications clearly show the two rates tohave different properties, and it seemssurprising that no one has discussed therationale underlying the two divergentapproaches. The views presented herehave been shaped by work in vital sta-tistics and epidemiology, and this predis-poses me to favor equation (2) and to re-gard equation (1) with reserve. I amunder no illusion that my comments willgain acceptance from all investigators, forthe purpose is to provoke discussionwhich may cast some light on the issues.
We may begin by noting that the label"migration" had been applied to two re-lated, but different, universes of discourse-a population of "moves" and a popula-tion of "people who move." A universe of"moves" can be generated by simultane-ous classification of individuals by initialand subsequent place of residence, and the
3 H. T. Eldridge, "Primary, Secondary, andReturn Migration in the United States," Demog-raphy, II (1965), 444-55.
4 R. Thomlinson, "The Determination of aBase Population for Computing MigrationRates," Milbank Memorial Fund Quarterly, XL(1962), 356-66.
6 Price, D.O., "A Mathematical Model ofMigration Suitable for Simulation on an Elec-tronic Computer: A Progress Report," in Pro-ceedings of the International Population Confer-nee (1959), pp. 665-73.
data provide useful descriptions of popu-lation redistribution. Such results, how-ever, do not lend themselves to proba-bility statements. Probabilities can becomputed only for denumerable popula-tions at risk, whether they be people,telephone poles, or transistors. Derivativedata obtained by classification proceduresdo not necessarily lead to denumerablepopulations, and this would not appear tobe a property of the data normally avail-able on "moves."
If migration data are to be reported inrate form as probability estimates, thesole option is to report on persons makingprescribed moves. The unique relation-ship between population at risk and direc-tion of move permits consideration ofunidirectional moves only, outward fromPi. Within this framework, the proba-bility of out-migration within a fixed timeperiod from a defined population at risk atplace (Pi), expressed in equation (2), hasas its complement the probability of notmoving:
Pr(Mi , ) +Pr(Mi , ) = 1. (3)Furthermore, the probabilities of movingfrom i to j, k, l . . . are additive, sinceeach comprises a subset of admissiblemoves:
One may, of course, calculate a pooledexperience for two or more areas by sum-mation of events and population at risk inthe usual manner, taking care to defineM i j and Mji as included or excluded fromthe count of events depending on thestudy objective.
Pr(M)=Mi,+Mj , (5)Pi+Pj
Appropriate definition and estimationof the base population at risk for compu-tation of migration rates of this type havebeen discussed by Thomlinson." No com-ment is required here other than to note
8 Thomlinson, op, cit.
Concept, Measurement, and Data in Migration Analysis 255
that the distinction drawn between thenumber at risk at the beginning of an ob-servation period and the average numberat risk over a time interval has receivedmeticulous attention in the actuarial lit-erature with reference to measurements offorce of mortality.
If one is concerned with moves ratherthan with a population at risk of migra-tion, the vital statistics approach to rateconstruction offers no escape from theone-way traffic limitation just noted. Thedesire to handle data on two-way trafficand to admit the concept of net migrationundoubtedly motivated the search forother measures. The crces-product-c-Pzl",-was an obvious candidate for denomina-tor of a migration rate, given its symmetryvis-a-vis M i j and ~fji and invariant rela-tionship with direction of move. An im-portant property of PiP j has been statedby ter Heide, who introduces the notionof the universe of messages (moves)." Ifmoves are assumed to originate and termi-nate as a random population variable, thedistribution of moves originating within iand terminating within j, or vice-versa,will vary in direct proportion to PiP j
The analytical implications of Pi andPP, for measures of migration can be con-sidered in the context of how observationson migration are collected. The threemethods in general use can be cataloguedas follows:1. In an area of origin, count and trace out-
migrants.2. In an area of destination, count and classify
in-migrants by place of origin.3. For a population characterized by census
or register data, distribute individualswithrespect to residence as of two dates.1. In observations on a source popula-
tion, the distinction between moves andthe person who moves has little operation-al significance, and the two will often beidentical. The moves occur in one direc-tion, and study objectives and definitionswill determine the treatment accorded topersons who move away but subsequently
7 ter Heide, op. cit.
return. The base population and sub-groups categorized by age, sex, and otherattributes are fixed, so that the propor-tionate distribution of out-migrants withrespect to destination remains unchanged,whether the data are shown as absolutenumbers (M ih M i k, ) or as rates viadivision by Pi. As stated earlier, Mii/P idescribes the probability of an individualmoving from i to j within a stipulatedtime interval. The observations can beincorporated in new measures by intro-ducing other characteristics linked withthe move, such as distance and popula-tion size of the area of destination.
The transformed rates are useful fortests of study hypotheses (migrationvaries inversely with distance of migra-tion, number of migrants attracted to agiven destination is directly proportionalto population at terminus, and so forth).However, they no longer represent de-scriptive estimators of population parame-ters, because the latter variables are mani-festations of the event (migration) andclassification of individuals becomes pos-sible only after, and not before, the fact.
2. The investigator using place of desti-nation as the vantage point also will notfind the distinction between a "move" and"person moving" to be important in prac-tice. For any destination, the distributionof in-migrants by place of origin can bestated in absolute numbers, as a percent oftotal in-migrants, or as a ratio, Mii/Pj,without disturbing the internal relation-ships; only the form, not the substance, ofthe data is changed. While the formalarithmetic for calculating Mii/Pj andMii/Pi is the same, they have a differentlogical content. The latter, being linkedwith a population at risk, has been statedto estimate the probability that an indi-vidual from i will move to j; the formerconstitutes a relative frequency state-ment, which must be handled with cau-tion and whose range of permissible infer-ences is restricted. The difficulties can beillustrated by a parallel problem whichoften arises on review of percentage distri-butions by disease in autopsy and hos-
pital-admission series. Does the high (low)frequency of a given disease in a seriesarise from a high (low) risk for this diseasein the underlying population, or does itreflect in part the operation of low (high)risks from other diseases?
Relative frequency ratios have descrip-tive properties in the sense that the re-sults are derived from observations onindividuals. When they are manipulatedby adjustment for distance of move, popu-lation concentrations, and so forth, weagain leave the realm of factual descrip-tion to engage in tests of consistency withpostulated models.
3. Given a population sample charac-terized by residence at two points in time,the primary frame of reference for analysiscould be either place of origin (first resi-dence) or place of destination; the re-marks in items 1 and 2 would then holdwithout change. Or, alternatively, theinformation on origin and destination con-sidered jointly can define a universe of"moves." Moves between i and j, andvice-versa, can be summed and repre-sented by a single figure (gross migration).Since moves have direction, their additionas vector quantities would cancel outmoves in opposite directions (net migra-tion). Counts of both gross and net migra-tion describe population redistribution,although neither retains all the informa-tion contained in the separate figures forM ij and M ji. Difficulties arise when theabsolute numbers are converted to rates,since the simple additive properties of theabsolute numbers no longer hold withoutrestriction. In the probability approachto rate construction, we are confronted bytwo populations at risk to two differentevents. M,j and M j, are generated, re-spectively, by Pi and P j, and there is noobvious way in which the specific informa-tion contained in Mij/Pi and Mij/Pj canbe combined into one summary figure.The step of rate computation has relatedeach individual stream of migration to itssource in a manner analogous to describ-ing the flow of a river in relation to itswatershed characteristics.
The device of relating migration to PiP jwas introduced to define a rate combininginformation on M ij and M j i This line ofattack, however, required the assumption(implied by the universe of messages de-scribed by ter Heide) that volume of mi-gration is directly proportional to thepopulations in the areas of origin and des-tination. For this a price has been paid,one not always recognized by the pro-ponents. The ratio M ii Mji/PiPj is nota descriptive estimator determined solelyby the data, since an analytical model hasbeen incorporated at the outset. Rather,it constitutes a test of the hypothesis thatmigration is a random variable propor-tionate to population. As Tolley has re-marked, "The resulting ratios should notdiffer significantly from the overall migra-tion rate in the universe under study, ifthe null hypothesis is true."!
The use of P ,Pi is essentially equivalentto the computation of expected numbersof moves for cells in a contingency tableand closely resembles the familiar x2-testfor independence of row and column ef-fects, in which expected numbers arecalculated as (A)(B)/N, where A, B, andN are the observed values for the cor-responding column, row, and total table.
This may not be immediately obvious,but the point can be elaborated as follows.Table 1 is a schematic representation of apopulation distributed by place of resi-dence at two points in time (t l and t2).While not essential for the discussion thatfollows, it may be noted that such a tablewould conceal information on interme-diate moves (return moves from j to icanceling out moves from i to j) and thuscould report on net migration only. Also,the table would normally cover only indi-viduals surviving to t2 ; allowance for mi-gration associated with terminal illnesscould be introduced by substituting resi-dence at time of death for residence at t2
a G. S. Tolley, "Migration Research in Rela-tion to Agricultural Policy," in The Farmer andMi(J1'ation in the United States (API Series No.3 [Raleigh: North Carolina State College ofAgriculture and Engineering, 1961]), pp. 14-23.
Concept, Measurement, and Datain Migration Analysis 257
and, in principle, population registerscould produce such information. Onewould not often be misled by assuming themigration experience of survivors to ap-proximate that of the initial cohorts oflike age and sex.
Given these qualifications, the stable(non-migrant) population is contained inthe diagonal cells of the table
and a count of all moves is obtained bysummation of observed numbers over theremaining cells:
The number of moves can be related tothe total population ('2P i = '2Pj) to com-pute an over-all migration rate for theuniverse under study
consistent with vital statistics practiceembodied in equation (2). The model"volume of migration is directly propor-tional to the populations in the areas oforigin and destination" determines anexpected number of moves correspondingto each observed number (Mij) calculatedas follows:
E(Mi j ) = ~Mijir
average-can be tested empirically. Fail-ure of observed data to support this thesiswould gravely compromise the case forretention and use of PiP;-type rates, sincethis would imply, among other things,that moves do not vary in a predictable,uniform manner with population size ofplace of destination. Under these circum-stances, the question to be posed wouldbe, "Why should migration rates be ad-justed inversely to population size atplace of destination?"
The measure of velocity or rate of flowof the migration stream proposed byBogue'? retains the PiPj concept, as canbe seen from a trivial rearrangement ofhis formula:
The new element introduced is P t, thetotal population in the universe underinvestigation. This adjustment transformsthe absolute size of all eligible places ofdestination into relative terms and thusfacilitates direct comparison of resultsfrom investigations carried out in studyuniverses of different size. In commonwith all ratios invoking the P'P, concept,velocity retains the assumption that mi-gration is proportionate to population inareas of destination and thus can be char-acterized as a test of a specific hypothesisrather than as a descriptive estimator fora set of data. Its use in the manner pro-posed by Bogue, as a dependent variableamenable to multivariate analysis, cannotbe recommended without reservations,since the values of the dependent variableare influenced by both observation andmodel. Before proceeding, one must askwhat connection this implicit model mayhave with the questions posed for investi-gation by multivariate analysis.
An assessment of rates of the PiP;variety should consider their potential
10 D. J. Bogue, "Internal Migration," in TheStudy of Population: An Inventory and Appraisaled. P. M. Hauser and O. D. Duncan (Chicago:University of Chicago Press, 1959).
for extension to more complex observa-tional situations made possible by sophis-ticated study designs and by computercapabilities for data processing. The out-look in this regard would not appearpromising. The PiP; concept was de-veloped to handle the comparison ofpopulations distributed as of two pointsin time, the source from which most of thedata on migration have been assembled.What happens when information for threeor more points in time becomes available?On the assumption that successive movesare random variables proportionate topopulation, the three-dimensional ana-logue of PiP; would be PiP;Pk This modelis unattractive since the underlying hy-pothesis seems so far removed from thefacts. Work with residence histories hasdemonstrated that the antecedent historyof moves between t l and t2 will be corre-lated with later events, so that it wouldbe unwise to ignore information from theinterval h to t2 in analyzing data for thesucceeding interval." Eldridge," in herrecent analysis of data based on status asof three points in time (date of birth,1955, and 1960), found it necessary anddesirable to analyze the moves between1955 and 1960 with control for previoushistory of migration. Moreover, presenta-tion of rates based on PiP;Pk would nothave been helpful in a discussion of whatshe has termed "primary," "secondary,"and "return" migration and which sheelected to relate to the several popula-tions at risk (the approach of eq. ).These considerations make it unnecessaryto dwell on other complications inherent inPiP;Pk-type rates, including the need fordefinitions to handle data on moves be-tween t1 and t2 accompanied by no changebetween t2 and ts, and vice-versa.
A related approach to data for three ormore points in time would be considera-tion of each interval separately. If oneconcedes for the moment that ratios in the
11 K. E. Tauber, L. Chiazze, and W. Haenszel,Migration in the United States: An Analysis ofResidence Histories (Public Health ReportsMonograph [in pressj),
12 Eldridge, op. cit.
Concept, Measurement, and Data in Migration Analysis 259
form (M ii Mii/PiPj ) can be defined ina manner which permits extension andapplication of the multiplication rule forcombining two probabilities, the dimen-sional units for the results would betimet/population', the denominator tak-ing the form of P iPlPk. The computation-al problems might be overcome, but aformidable question of interpretation ofvalues with meaning only in relation to aspecific model would remain. Their inter-pretation would become so specializedthat the effort is best abandoned if analternative is at hand.
An obvious solution would be to di-vorce estimation of population parametersfrom tests of hypotheses. For three ormore points in time, the probability of adefined change in residence status overany combination of intervals can be esti-mated by multiplication of the interval-specific probabilities, if each has beencalculated in accordance with equation(2). A table of rates specific for time anddirection of move will contain all the in-formation on migration in the sense that,given the population distribution at t.,one could reconstruct from the proba-bility matrix the population distributionsat all subsequent dates (given survivaldata input).
The computation and presentation oftransitional probabilities in great detailfor small subdivisions of the United Stateswould be impractical even with largecomputers. Analytical work must be imag-inative and creative, and a major prob-lem for any investigator would be toidentify the major components necessaryto describe and understand the forces atwork. This would require discriminationin definition of moves and in selection ofkey combinations of transitional proba-bilities. Study hypotheses can be an im-portant tool in shaping analytical deci-sions, but one must also remain attentiveto what the data have to say. Within thisdescriptive framework, the option ofchecking facts against model predictionsat any step in the process is retained.
While no one can foresee the preciseformat that presentation of results of
large-scale, longitudinal studies of migra-tion might take in the future, the use ofmultiple decrement tables, a device well-known to actuaries, should be explored.Multiple decrement tables are well-suitedfor reporting on dependent probabilities,"and migration represents a classical com-petitive risk situation in that the proba-bility of moving from i to j seems condi-tioned by and dependent on alternativesavailable. The probability of a specificmove (Mii/Pi) would change if the im-position or removal of barriers (immigra-tion restrictions, etc.) added or elimi-nated potential destinations and wouldremain independent and unchanged onlyif removal of alternate destinations led todecisions not to move or if addition of newdestinations attracted individuals whowould not have migrated otherwise. Theconcept of "intervening opportunities"formulated by Stouffer" can be viewed asone attempt to measure and evaluate theinterplay of competitive risks.
OTHER APPROACHES TOMEASURE MIGRATION
To this point, the comments have dealton rather narrow, technical grounds withthe relative merits of the P,- and PiP,-type rates. If the Gordian knot were cutand the model "moves are distributed as arandom population variable," which un-derlies the PiPj concept, abandoned,other analytical options, such as thosedisplayed in the literature on communi-cable diseases and the mathematical theo-ry of epidemics, would become available.In a sense, migration can be thought of asa contagious process-a psychic infection-since the departure of an individualfrom a community, if he retains links withthose remaining behind, would influencethe probability of subsequent departures.The connection between migration andepidemics is not too farfetched, and we
13 J. L. Anderson and J. B. Dow, ActuarialStatistics (Cambridge: Cambridge UniversityPress, 1952), Vol. II.
14 S. A. Stouffer, "Intervening Opportunities:A Theory Relating to Mobility and Distance,"American Sociological Review, V (1940), 845-67.
may note that this idea occurred in 1911to Brownlee in his paper "The Mathe-matical Theory of Random Migration andEpidemic Distribution."15
The course of communicable diseaseepidemics within a community can becharted by noting the time intervals be-tween cases and geographical spread overtime. Another tool is the "secondary at-tack" rate, which measures the risk ofsubsequent cases developing among indi-viduals intimately exposed to a knowncase (members of the same household,same classroom). Stochastic and deter-ministic models have been applied tosuch data to correlate theory and ob-servation. These methods employ whatare known as "contagious distributions;"the null hypothesis of independence isdiscarded and attention directed insteadto the estimation of conditional proba-bilities, which are introduced as parame-ters in mathematical models to testthe observed configurations and clustersagainst theoretical predictions."
Adaptation of some of these ideasmight prove to be a fruitful exercise,although one must guard against a rigidmechanical translation of techniques andstrive to develop concepts meaningfulfor migration data. Attention might begiven to what constitutes a suitable indexof familial aggregation of migration (Isthe nuclear or extended family an ap-propriate study unit?). Inquiry into thepossible presence and spacing of waves ofsecondary migration might also provideinsights into the dynamics of migration.These studies would require observationson the population at risk to migration inthe area of origin and could not be im-plemented solely by data collected at thepoint of destination.
15 J. Brownlee, "The Mathematical Theory ofRandom Migration and Epidemic Distribution,"Proceedings of the Rcryal Society of Edinburgh,XXXI (1911), 262-88.
16 N. T. J. Bailey, The Matherrwtical Theory ofEpidemics (New York: Hafner Publishing Oo.,1957).
Reliance on rates of the FiP j varietymay not be an isolated event but rathersymptomatic of a more general outlookand frame of reference shared by manystudents of migration. As one primarilyconcerned with vital statistics and epi-demiology and as a recent intruder in themore specialized domain of migrationdemography, I have been impressed bythe differences in history and traditionembodied in the respective literatures.
Work on migration and population re-distribution appears to have been stronglyinfluenced by the early successesof Raven-stein in formulating "laws of migration.i"?Subsequent papers have placed a premi-um on the development and testing ofnew hypotheses rather than on descrip-tions of facts and their collation. In thisclimate, the use of those rates which in-troduced assumptions keyed to a par-ticular model and represented implicittests of hypotheses could become theprevailing practice. The fact that mostdata on migration and population distri-bution have been obtained from second-ary sources, censuses, and populationregisters not under the direct control ofthe investigator may have been a con-tributing factor. One wonders whetheringenuity in the construction of theorieshas not outrun the capacity for collectionof relevant observations.
This is in contrast to the history of vitalstatistics. While Graunt" more than twocenturies before Ravenstein, had madeseveral important generalizations fromthe study of "bills of mortality" in Lon-don, his successors continued to concen-trate on descriptions of the forces ofmortality and natality by means of ratesbased on populations at risk. While
17 E. G. Ravenstein, "The Laws of Migration,"Journal of the Royal Statistics Society, XLVIII(1885), 167-235; E. G. Ravenstein, "The Lawsof Migration," Journal of the Royal StatisticsSociety, LII (1889), 241-305.
18 J. Graunt, Natural and Political Observa-tions Made upon the Bills of Mortality, 1882, ed.W. F. Wilcox (Baltimore: The Johns HopkinsPress, 1939).
Concept, Measurement, and Data in Migration Analysis 261
generalizations concerning the nature ofthe curves for age-specific mortality andother relationships among age- and dis-ease-specific rates were later deduced,there was no strong tendency to developtheories solely within the framework ofdata provided by birth and death registra-tion. A lively concern with descriptivedata was natural for investigators whowere in the main responsible for registra-tion of vital statistics or who needed thedata to direct or plan public health pro-grams.
However, this cannot be the completeanswer. Inquiries into the epidemiologyof specific diseases called for correlationand synthesis of vital statistics data withinformation from many other sources-clinical observations, autopsy findings,animal experiments, and so forth. Theserequirements fostered a tradition of "shoe-leather" epidemiology (the practice ofobserving and recording at first-hand) andgave vital statisticians and epidemiolo-gists an intimate knowledge and commandof their data sources. These specialistshave traditionally paid careful attention
to problems of nosology and classification,and to assessments of the accuracy andcompleteness of data reported, includingthe nature of diagnostic evidence underly-ing medical certifications of death.
While the importance of models as atool for building a systematic, coherentbody of knowledge in any discipline is notin question, repeated calls by Kirk andothers" for new theoretical insights inmigration studies may have been over-done. Should the emphasis in migrationnow be placed on the design of studies tocollect data not available from census andother administrative sources and to ex-ploit new opportunities that are nowarising as by-products of human popula-tion study centers and long-term follow-up of cohorts in order to investigate therole of factors in chronic diseases?"
19 D. Kirk, "Some Reflections on AmericanDemography in the Nineteen Sixties" (Presi-dent's address to the American Population As-sociation, May 1960).
10 W. Haenszel and R. W. Miller, Role ofHuman Population Study Centers in Studie ofCancer Etiology (Public Health Reports 77 ),pp.713-18.