abstract reasoning test

13

Click here to load reader

Upload: tomor2

Post on 02-Nov-2014

258 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Abstract Reasoning TEST

INTERNATIONAL

ARTABSTRACTREASONING TESTTechnical Manual

Page 2: Abstract Reasoning TEST

© 2006 Psychometrics Ltd

A R T

Page 3: Abstract Reasoning TEST

A R T 1

CONT ENTS

1 Theoretical Overview 31.1 The Role of Intelligence Tests in Personnel Selection and Assessment1.2 The Origins of Intelligence Tests1.3 The Concepts of Fluid and Crystallised Intelligence

2 The Abstract Reasoning Test (ART) 52.1 Item Format2.2 Test Construction

3 The Psychometric Properties of the ART 63.1 Standardisation

3.2 Reliability3.2.1 Reliability: Temporal Stability3.2.2 Reliability: Internal Consistency

3.3 Validity3.3.1 Validity: Construct Validity3.3.2 Validity: Criterion Validity

3.4 ART: Standardisation and Bias3.4.1 Standardisation Sample3.4.2 Gender and Age Effects

3.5 ART: Reliability3.5.1 Reliability: Internal Consistence3.5.2 Reliability: Test-retest

3.6 ART: Validity3.6.1 The Relationship Between the ART and the RAPM3.6.2 The Relationship Between the ART and the GRT1 Numerical Reasoning Test3.6.3 The Relationship Between the ART and the CRTB23.6.4 The Relationship Between the ART and Learning Style3.6.5 The Relationship Between the ART and Intellectance3.6.6 Mean Differences in ART Scores by Occupational Group

Appendix – Administration Instructions 9

References 11

Page 4: Abstract Reasoning TEST

A R T2

Page 5: Abstract Reasoning TEST

A R T 3

1 THEORE T I CA L OVERV I EW

1.1 TTHHEE RROOLLEE OOFF IINNTTEELLLLIIGGEENNCCEETTEESSTTSS IINN PPEERRSSOONNNNEELLSSEELLEECCTTIIOONN AANNDD AASSSSEESSSSMMEENNTTWhile much useful information can be gained from thestandard job interview, the interview nonetheless suffersfrom a number of serious weaknesses. Perhaps the mostimportant of these is that the interview has been shownto be a very unreliable way to judge a person’s aptitudesand abilities. This is because it is an unstandardisedassessment procedure that does not directly assessmental ability, but rather assesses a person’s social skillsand reported past achievements and performance.

Clearly, the interview provides a useful opportunityto probe each applicant in depth about their workexperience, and explore how they present themselves in aformal social setting. Moreover, behavioural interviewscan be used to assess an applicant’s ability to ‘think ontheir feet’ and explain the reasoning behind theirdecision making processes. Assessment centres canprovide further useful information on an applicant byassessing their performance on a variety of work-basedsimulation exercises. However, interviews and assessmentcentre exercises do not provide a reliable, standardisedway to assess an applicant’s ability to solve novel,complex problems that require the use of logic andreasoning ability.

Intelligence tests, on the other hand, do just this;providing a reliable, standardised way to assess anapplicant’s ability to use logic to solve complexproblems. As such, tests of general mental ability arelikely to play a significant role in the selection process.Schmidt & Hunter (1998), in their seminal review of theresearch literature, note that over 85 years of researchhas clearly demonstrated that general mental ability(intelligence) is the single best predictor of jobperformance.

From the perspective of assessing a respondent’sintelligence, the unstandardised idiosyncratic nature ofinterviews makes it impossible to directly compare oneapplicant’s ability with another’s. Not only dointerviews not provide an objective base-line againstwhich to contrast interviewees’ differing performancesbut, moreover, different interviewers typically come toradically different conclusions about the same applicant.Not only do applicants respond differently to differentinterviewers asking ostensibly the same questions, butwhat applicants say is often interpreted quite differentlyby different interviewers. In such cases we have to askwhich interviewer has formed the ‘correct’ impression ofthe candidate, and to what extent any given interviewer’sevaluation of the candidate reflects the interviewer’spreconceptions and prejudices rather than reflecting the

candidate’s performance.There are similar limitations on the range and

usefulness of the information that can be gained fromapplication forms or CV’s. Whilst work experience andqualifications may be prerequisites for certainoccupations, in and of themselves they do not predictwhether a candidate is likely to perform well or badly ina new position. Moreover, a person’s educational andoccupational achievements are likely to be limited bythe opportunities they have had, and as such may notreflect their true potential. Intelligence tests enable us toavoid many of these problems, not only by proving anobjective measure of a person’s ability, but also byassessing the person’s potential, and not just theirachievements to date.

1.2 TTHHEE OORRIIGGIINNSS OOFFIINNTTEELLLLIIGGEENNCCEE TTEESSTTSSThe assessment of mental ability, or intelligence, is oneof the oldest areas of research interest in psychology.Gould (1981) has traced attempts to scientificallymeasure mental acuity, or ability, to the work of Galtonin the late 1800s. Prior to Galton’s (1869) pioneeringresearch, the assessment of mental ability had focussedon phrenologists’ attempts to assess intelligence bymeasuring the size of people’s heads!

Reasoning tests, in their present-day form, were firstdeveloped by Binet (1910); a French educationalist whopublished the first test of mental ability in 1905. Binetwas concerned with assessing the intellectualdevelopment of children, and to this end he inventedthe concept of mental age. Questions assessing academicability were graded in order of difficulty, according tothe average age at which children could successfullyanswer each question. From the child’s performance onthis test it was possible to derive the child’s mental age.If, for example, a child performed at the level of theaverage 10 year old on Binet’s test then that child wasclassified as having a mental age of 10, regardless of thechild’s chronological age.

The concept of the Intelligence Quotient (IQ) wasdeveloped by Stern (1912), from Binet’s notion ofmental age. Stern defined IQ as mental age divided bychronological age multiplied by 100. Previous to Stern’swork chronological age had been subtracted frommental age to provide a measure of mental alertness.Stern on the other hand showed that it was moreappropriate to take the ratio of these two constructs, toprovide a measure of the child’s intellectualdevelopment that was independent of the child’s age. Hefurther proposed that this ratio should be multiplied by100 for ease of interpretation; thus avoiding

Page 6: Abstract Reasoning TEST

A R T4

cumbersome decimals.Binet’s early tests were subsequently revised by

Terman et al. (1917) to produce the famous Stanford-Binet IQ test. IQ tests were first used for selection by theAmerican military during the First World War, whenYerkes (1921) tested 1.75 million soldiers with the ArmyAlpha and Army Beta tests. Thus by the end of the war,the assessment of general mental ability had not onlyfirmly established its place within the discipline ofacademic psychology, but had also demonstrated itsutility for aiding the selection process.

1.3 TTHHEE CCOONNCCEEPPTTSS OOFF FFLLUUIIDDAANNDD CCRRYYSSTTAALLLLIISSEEDDIINNTTEELLLLIIGGEENNCCEE The idea of general mental ability, or generalintelligence, was first conceptualised by Spearman in1904. He reflected on the popular notion that somepeople are more academically able than others, notingthat people who tend to perform well in one intellectualdomain (e.g. science) also tend to perform well in otherdomains (e.g. languages, mathematics, etc.). Heconcluded that an underlying factor termed generalintelligence, or ‘g’, accounted for this tendency forpeople to perform well across a range of areas, whiledifferences in a person’s specific abilities or aptitudesaccounted for their tendency to perform marginallybetter in one area than in another (e.g. to be marginallybetter at French than they are at Geography).

Spearman, in his 1904 paper, outlined the theoreticalframework underpinning factor analysis; the statisticalprocedure that is used to identify the shared factor (‘g’)that accounts for a person’s tendency to perform well(or badly) across a range of different tasks. Subsequentdevelopments in the mathematics underpinning factoranalysis, combined with advances in computing, meantthat after the Second World War psychologists were ableto begin exploring the structure of human mentalabilities using these new statistical procedures.

Being most famous for his work on personality, andin particular the development of the 16PF, thepioneering work that Raymond B. Cattell (1967) did onthe structure of human intelligence has often beenoverlooked. Through an extensive research programme,Cattell and his colleagues identified that ‘g’ (generalintelligence) could be decomposed into two highlycorrelated subtypes of mental ability, which he termedfluid and crystallised intelligence.

Fluid intelligence is reasoning ability in its mostabstract and purest form. It is the ability to analysenovel problems, identify the patterns and relationshipsthat underpin these problems and extrapolate fromthese using logic. This ability is central to all logical

problem solving and is crucial for solving scientific,technical and mathematical problems. Fluid intelligencetends to be relatively independent of a person’seducational experience and has been shown to bestrongly determined by genetic factors. Being the ‘purest’form of intelligence, or ‘innate mental ability’, tests offluid intelligence are often described as culture fair.

Crystallised intelligence, on the other hand, consistsof fluid ability as it is evidenced in culturally valuedactivities. High levels of crystallised intelligence areevidenced in a person’s good level of general knowledge,their extensive vocabulary and their ability to reasonusing words and numbers. In short, crystallisedintelligence is the product of cultural and educationalexperience in interaction with fluid intelligence. As suchit is assessed by traditional tests of verbal and numericalreasoning ability, including critical reasoning tests.

Page 7: Abstract Reasoning TEST

A R T 5

2 THE ABS TRACT REASON ING T ES T ( A R T )

2.1 IITTEEMM FFOORRMMAATTCrystallised and fluid intelligence are most readilyevidenced, respectively, in tests of verbal/numerical andabstract reasoning ability. (See Heim et al. 1974, forexamples of such tests). Probably the most famous teststhat have been developed on British samples of fluidand crystallised intelligence are; the Raven’s ProgressiveMatrices (RPM) – of which there are three versions (theColoured Progressive Matrices, The Standard ProgressiveMatrices and the Advanced Progressive Matrices), andthe Mill Hill Vocabulary Scales (MHVS) – of which thereare ten versions.

Being measures of crystallised intelligence, testsassessing verbal/numerical reasoning ability (includingcritical reasoning tests) are strongly influenced by therespondent’s educational background. These tests requirerespondents to draw on their knowledge of words andtheir meanings, and their knowledge of mathematicaloperations, to perceive the logical relationshipsunderpinning the tests’ items. Tests of abstract reasoningability, on the other hand, assesses the ability to solveabstract, logical problems which require no priorknowledge or educational experience. As such, thesetests are the least affected by the respondent’seducational experience; assessing what is oftenconsidered to be the purest form of ‘intelligence’ – or‘innate reasoning ability’ – namely fluid intelligence.

To this end, the ART presents respondents with athree by three matrix consisting of eight cells containinggeometric patterns; the ninth and final cell in the matrixbeing left blank. Respondents have to deduce the logicalrules that govern how the sequence of geometric designsprogresses (both horizontally and vertically) across thecells of the matrix, and extrapolate from these rules thenext design in the sequence. Requiring little use oflanguage, and no general knowledge, matrix reasoningtests – in the form of the ART or RPM – provide thepurest measure of fluid intelligence.

2.2 TTEESSTT CCOONNSSTTRRUUCCTTIIOONNResearch has clearly demonstrated that in order toaccurately assess reasoning ability it is necessary to usetests which have been specifically designed to measurethe ability being assessed in the population on whichthe test is intended to be used. This ensures that the testis appropriate for the particular group being assessed.For example, a test designed for those of average abilitywill not accurately distinguish between people of highability, as all the respondents’ scores will cluster towardsthe top end of the scale. Similarly, a test designed forpeople of high ability will be of little practical use ifgiven to people of average ability. Not only will the testnot discriminate between respondents, with all theirscores clustering towards the bottom of the scale, butalso as the questions will mostly be too difficult for therespondents to answer they are likely to lose motivation,thereby further reducing their scores.

The ART was specifically designed to assess the fluidintelligence of technicians, people in scientific,engineering, financial and professional roles, as well assenior managers and those who have to take strategicbusiness decisions. As such the test items were developedfor respondents of average, or above average, ability.

The initial item pool was trialled on studentsenrolled on technical training courses andundergraduate programmes, as well as on a sub-sampleof respondents in full-time employment in a range ofprofessional, managerial and technical occupations.Following extensive trialling, a subset of items that hadhigh levels of internal consistency (corrected item-wholecorrelations of .4 or greater), and of graded difficult,were selected for inclusion in the ART.

The test consists of 35 matrix reasoning items,ordered by item difficulty, with the respondent having30 minutes to complete the test.

Page 8: Abstract Reasoning TEST

A R T6

3 THE PSYCHOMETR IC PROPERT IES OF THE ART

3.1 SSTTAANNDDAARRDDIISSAATTIIOONNNormative data allows us to compare an individual’sscore on a standardised scale against the typical scoreobtained from a clearly defined group of respondents(e.g. graduates, the general population, etc.).

To enable any respondent’s score on the ART to bemeaningfully interpreted, the test was standardisedagainst a population similar to that on which it hasbeen designed to be used (e.g. people in technical,managerial, professional and scientific roles). Suchstandardisation ensures that the scores obtained fromthe ART can be interpreted by relating them to arelevant distribution of scores.

3.2 RREELLIIAABBIILLIITTYYThe reliability of a test assesses the extent to which thevariation in test scores is due to true differences betweenpeople on the characteristic being measured – in thiscase fluid intelligence – or to random measurementerror.

Reliability is generally assessed using one of twodifferent methods; one assesses the stability of the test’sscores over time, the other assesses the internalconsistency, or homogeneity, of the test’s items.

3.2.1 Reliability: Temporal StabilityAlso known as test-retest reliability, this method forassessing a test’s reliability involves determining theextent to which a group of people obtain similar scoreson the test when it is administered at two points intime. In the case of reasoning tests, where the abilitybeing assessed does not change substantially over time(unlike personality), the two occasions when the test isadministered may be many months apart. If the testwere perfectly reliable, that is to say test scores were notinfluenced by any random error, then respondentswould obtain the same score on each occasion, as theirlevel of intelligence would not have changed betweenthe two points in time when they completed the test. Inthis way, the extent to which respondents’ scores areunstable over time can be used to estimate the test’sreliability.

Stability coefficients provide an important indicatorof a test’s likely usefulness. If these coefficients are low,then this suggests that the test is not a reliable measureand is therefore of little practical use for assessment andselection purposes.

3.2.2 Reliability: Internal ConsistencyAlso known as item homogeneity, this method forassessing a test’s reliability involves determining the extentto which, if people score well on one item they also scorewell on the other test items. If each of the test’s items were

a perfect measure of intelligence, that is to say the scorethe person obtained on the items was not influenced byany random error, then the only factor that woulddetermine whether a person was able to answer each itemcorrectly would be the item’s difficulty. As a result, eachperson would be expected to answer all the easier testitems correctly, up until the point at which the itemsbecame too difficult for them to answer. In this way, theextent to which respondents’ scores on each item arecorrelated with their scores on the other test items, can beused to estimate the test’s reliability.

The most commonly used internal consistencymeasure of reliability is Cronbach’s (1960) alphacoefficient. If the items on a scale have high inter-correlations with each other, then the test is said to havea high level of internal consistency (reliability) and thealpha coefficient will be high. Thus a high alphacoefficient indicates that the test’s items are allmeasuring the same thing, and are not greatlyinfluenced by random measurement error. A low alphacoefficient on the other hand suggests that either thescale’s items are measuring different attributes, or thatthe test’s scores are affected by significant random error.If the alpha coefficient is low this indicates that the testis not a reliable measure, and is therefore of littlepractical use for assessment and selection purposes.

3.3 VVAALLIIDDIITTYYThe fact that a test is reliable only means that the test isconsistently measuring a construct, it does not indicatewhat construct the test is consistently measuring. Theconcept of validity addresses this issue. As Kline (1993)notes ‘a test is said to be valid if it measures what itclaims to measure’.

An important point to note is that a test’s reliabilitysets an upper bound for its validity. That is to say, a testcannot be more valid than it is reliable because if it isnot consistently measuring a construct it cannot beconsistently measuring the construct it was developed toassess. Therefore, when evaluating the psychometricproperties of a test its reliability is usually assessedbefore addressing the question of its validity.

There are two principle ways in which a test can besaid to be valid.

3.3.1 Validity: Construct ValidityConstruct validity assesses whether the characteristicwhich a test is measuring is psychologically meaningfuland consistent with how that construct is defined.Typically, the construct validity of a test is assessed bydemonstrating that the test’s results correlate other majortests which measure similar constructs and do notcorrelate with tests that measure different constructs. (This

Page 9: Abstract Reasoning TEST

A R T 7

is sometimes referred to as a test’s convergent anddiscriminant validity). Thus demonstrating that a testwhich measures fluid intelligence is more stronglycorrelated with an alternative measure of fluid intelligencethan it is with a measure of crystallised intelligence, wouldbe evidence of the measure’s construct validity.

3.3.2 Validity: Criterion ValidityThis method for assessing the validity of a test involvesdemonstrating that the test meaningfully predicts somereal-world criterion. For example, a valid test of fluidintelligence would be expected to predict academicperformance, particularly in science and mathematics.

Moreover, there are two types of criterion validity -predictive validity and concurrent validity. Predictivevalidity assesses whether a test is capable of predictingan agreed criterion which will be available at somefuture time, e.g. can a test of fluid intelligence predictfuture GCSE maths results. Concurrent validity assesseswhether the scores on a test can be used to predict acriterion which is available at the time the test wascompleted, e.g. can a test of fluid intelligence predict ascientist’s current publication record.

3.4 AARRTT:: SSTTAANNDDAARRDDIISSAATTIIOONN AANNDDBBIIAASS

3.4.1 Standardisation SampleThe ART was standardised on a sample of 651 adults ofworking age, drawn from a variety of managerial,professional and graduate occupations. The mean age ofthe standardisation sample was 30.4 years, with 32% ofthe sample being women. 24% of the sample identifiedthemselves as being of non-white (European) ethnicorigin. Of these respondents 22% identified themselves asbeing of Black African ethnic origin, 12% of Indianorigin, 9% of Black Caribbean origin and 8% ofPakistani origin.

3.4.2 Gender and Age EffectsGender differences on the ART were examined bycomparing the scores that a group of male and a groupof female graduate level bankers obtained on the ART.These data are presented in Table 1 and clearly indicateno significant sex difference in ART scores, suggestingthat this test is not sex biased.

3.5 AARRTT:: RREELLIIAABBIILLIITTYY

3.5.1 Reliability: Internal ConsistencyTable 2 presents alpha coefficients for the ART on anumber of different samples. Inspection of this tableindicates that all these coefficients are above .8,indicating that the ART has good levels of internalconsistency reliability.

3.5.2 Reliability: Test-rretestAs noted above, test-retest reliability estimates the test’sreliability by assessing the temporal stability of the test’sscores. As such, test-retest reliability provides analternative measure of reliability to internal consistencyestimates of reliability, such as the alpha coefficient.Theoretically, test-retest and internal consistencyestimates of a test’s reliability should be equivalent toeach other. The only difference between the test-retestreliability statistic and the alpha coefficient is that thelatter provides an estimate of the lower bound of thetest’s reliability and, as such, the alpha coefficient for anygiven test is usually lower than the test-retest reliabilitystatistic.

Mean and Standard DeviationMales Females t-vvalue df p

21.31 (4.35) 21.15 (4.56) 0.22 207 0.82

Table 1. Mean and standard deviation in parentheses for ART scores for male (n=146)and female (n=63) bankers and the associated t statistic

Sample Sample size Alpha coefficient

Graduate level bankers n=209 .81Retail Managers n=105 .81Undergraduates n=176 .85

Table 2. Alpha coefficients for different samples on the ART

Page 10: Abstract Reasoning TEST

A R T8

3.6 AARRTT:: VVAALLIIDDIITTYY

3.6.1 The Relationship Between the ARTand the RAPMThe Ravens Advanced Progressive Matrices (RAPM) isone of the most well respected, and best validated,measures of fluid intelligence. A sample of 32undergraduates completed the ART and RAPM forresearch purposes. The substantial correlation betweenthese two measures .77 (p<.001) was highly significant.This large correlation therefore provides strong supportfor the construct validity of the ART, indicating thatthese two measures are assessing related constructs.

3.6.2 The relationship between the ARTand the GRT1 numerical reasoning testThe GRT1 numerical reasoning test is a measure ofcrystallised intelligence that has been specifically designedfor graduate populations. 209 graduate level bankerscompleted this test along with the ART. These two testswere found to be significantly correlated (r=.49, p<.001)with each other, providing good support for the constructvalidity of the ART.

3.6.3 The relationship between the ARTand the CRTBThe Critical Reasoning Test Battery (CRTB) assessesverbal and numerical critical reasoning; the ability tomake logical inferences on the basis of, respectively,written text or numerical data presented in tables andgraphs. Both verbal and numerical critical reasoningability are different types of crystallised intelligence, andwould therefore be expected to be modestly correlatedwith the ART, which measures fluid intelligence.

A sample of 272 applicants for managerial positionswith a large multinational company completed theCRTB and ART as part of a selection processes. The ARTwas found to correlate significantly with both the Verbal(r=.32, p<.001) and Numerical (r=.38, p<.001) CriticalReasoning tests, providing support for the constructvalidity of the ART.

3.6.4 The relationship between the ARTand learning styleA sample of 143 undergraduates completed the ART andthe Learning Styles Inventory (LSI) – a measure oflearning style. As would be predicted, fluid intelligencewas found to be modestly correlated (r=.29, p<.001) witha more abstract rather than a more concrete learningstyle, and with a more holistic (r=.23, p<.001) ratherthan a more serial (i.e. focussing on the ‘big picture’rather than fine details) learning style, providing furthersupport for the construct validity of the ART.

3.6.5 The relationship between the ARTand Intellectance Intellectance is a meta-cognitive variable that assesses aperson’s perception of their general mentality ability.While it is a personality factor, rather than an abilityfactor, intellectance has nonetheless consistently beenfound to correlate with objective assessments of mentalability. As such it would be expected to be modestlycorrelated with fluid intelligence. A sample of 132applicants for managerial and professional postscompleted the 15FQ+ and the ART as part of anassessment process. The ART was found to be correlated(r=.32, p<.001) with the 15FQ+ Factor Intellectance (ß),providing further support for the construct validity ofthe ART.

3.6.6 Mean differences in ART scores byoccupational groupTable 3 presents the mean scores (and the associated 95%confidence intervals) on the ART obtained by threeoccupational groups of differing ability levels; retailmanagers, bankers and graduate engineers. Given thenature of their work, engineers would be expected to havethe highest level of fluid intelligence followed by bankersthen retail managers. Inspection of Table 3 indicates thatthese means conform to this pattern, with the 95%confidence intervals indicating that these differences areunlikely to be due to chance effects. These data thereforeprovide further support for the construct validity of theART.

Mean and 95% Occupational Group Sample Size Confidence Interval

Graduate Engineers n=40 24.03 ±3.3Graduate Bankers n=209 21.20 ±1.2Retail Managers n=105 15.2 ±2.2

Table 3. Mean ART scores for each occupational group andassociated 95% confidence intervals

Page 11: Abstract Reasoning TEST

A R T 9

AAPPPPEENNDDIIXX ––AADDMMIINNIISSTTRRAATTIIOONNIINNSSTTRRUUCCTTIIOONNSS

BBEEFFOORREE SSTTAARRTTIINNGG TTHHEEQQUUEESSTTIIOONNNNAAIIRREE Put candidates at their ease by giving information aboutyourself: the purpose of the test; the timetable for the day;whether or not the questionnaire is being completed aspart of a wider assessment programme, and how the resultswill be used and who will have access to them. Ensure thatyou and other administrators have requested that allmobile phones have been switched off, etc.

The instructions below should be read out verbatim andthe same script should be followed each time the ART isadministered to one or more candidates. Instructions forthe administrator are printed in ordinary type. Instructionsdesigned to be read aloud to candidates have lines markedabove and below them, are in bold and enclosed by speechmarks.

If this is the first or only questionnaire being administered,give an introduction as per or similar to the followingexample:

“From now on, please do not talk amongstyourselves, but ask me if anything is notclear. Please ensure that any mobiletelephones, pagers or other potentialdistractions are switched off completely.We shall be doing the Abstract ReasoningTest which is timed for 30 minutes. Duringthe test I shall be checking to make sureyou are not making any accidentalmistakes when filling in the answer sheet. Iwill not be checking your responses to seeif you are answering correctly or not.”

WWAARRNNIINNGG:: It is most important that answer sheets do notgo astray. They should be counted out at the beginning ofthe test and counted in again at the end.

DDIISSTTRRIIBBUUTTEE TTHHEE AANNSSWWEERR SSHHEEEETTSS

Then ask:

“Has everyone got two sharp pencils, aneraser, some rough paper and an answersheet.”

Rectify any omissions, then say:

“Print your last name, first name clearly onthe lines provided. Indicate your preferred

title by checking the title box, then noteyour gender and age. Please insert today’sdate which is [ ] in the Date of Testingbox.”

If biographical information is required, ask respondents tocomplete the biodata section. If answer sheets are to bescanned, explain and demonstrate how the ovals are to becompleted, emphasising the importance of fully blackeningthe oval.

Walk around the room to check that the instructions arebeing followed.

WWAARRNNIINNGG:: It is vitally important that test booklets donot go astray. They should be counted out at the beginningof the session and counted in again at the end.

DDIISSTTRRIIBBUUTTEE TTHHEE BBOOOOKKLLEETTSS WWIITTHHTTHHEE IINNSSTTRRUUCCTTIIOONN::

“Please do not open the booklet untilinstructed to do so.”

Remembering to read slowly and clearly, go to the front ofthe group and say:

“Please open the booklet at Page 2 andfollow the instructions for this test as I readthem aloud.”

Pause to allow booklets to be opened.

“This is a test of your ability to perceive therelationships between abstract shapes andfigures. The test consists of 35 questionsand you will be have 30 minutes in which toattempt them.

Before you start the timed test, there aretwo example questions on Page 3 to showyou how to complete the test. For eachquestion you will be presented with a 3 by3 array of boxes, the last of which is empty.Your task is to select which of the sixpossible answers presented below fitvertically and horizontally to complete thesequence. One and only one answer iscorrect in each case.”

Page 12: Abstract Reasoning TEST

Check for understanding of the instructions so far, thensay:

“You now have a chance to complete theexample questions in order to make surethat you understand the test.

Please attempt the example questionsnow.”

While the candidates are doing the examples, walkaround the room to check that everyone is clear abouthow to fill in the answer sheet. Make sure that nobodylooks through the actual test items after completing theexample questions. When everyone has finished theexample questions (allow a maximum of one and halfminutes), give the answers as follows:

“The answer to Example 1 is number 3and the answer to Example 2 is alsonumber 3, as only these answerscomplete the sequence.”

Then say:

“Before you begin the timed test, pleasenote the following points:

l Attempt each question in turn. Ifyou are unsure of an answer, markyour best choice but avoid wildguessing.

l If you wish to change an answer,simply erase it fully before markingyour new choice of answer.

l Time is short, so do not worry if youdo not finish the test in the availabletime.

l Work quickly, but not so quickly asto make careless mistakes.

l You have 35 questions and 30minutes in which to complete them.

If you reach the End of Test before timeis called, you may review your answers ifyou wish.”

Then say very clearly:

“Is everybody clear about how to do thistest?”

Deal with any questions appropriately, then starting astop-watch or setting a count-down timer on the word

BEGIN, say:

“Please turn over the page and begin.”

Answer only questions relating to procedure at thisstage, but enter in the Administrator’s Test Record anyother problems which occur. Walk around the room atappropriate intervals to check for potential problems.

At the end of the 30 minutes say clearly:

“STOP NOW please and close yourbooklet.”

You should intervene if candidates continue beyond thispoint.

COLLECT ANSWER SHEETS AND TESTBOOKLETS, ENSURING THAT ALL MATERIALSARE RETURNED (COUNT BOOKLETS ANDANSWER SHEETS)

Then say:

“Thank you for completing the AbstractReasoning Test.”

A R T10

Page 13: Abstract Reasoning TEST

A R T 11

RE F ER ENCES

Binet, A. (1910) Les idees modernes sur les enfants.Paris: E. Flammarion.

Cattell, R. B. (1967). The theory of fluid and crystallisedintelligence. British Journal of Educational Psychology,37, 209-224.

Cronbach, L.J. (1960). Essentials of PsychologicalTesting (2nd Edition).. New York: Harper.

Galton F. (1869).. Hereditary Genius. London:MacMillan.

Gould, S.J. (1981). The Mismeasure of Man..Harmondsworth, Middlesex: Pelican.

Heim, A.H., Watt, K.P. and Simmonds, V. (1974).AH2/AH3 Group Tests of General Reasoning; Manual.Windsor: NFER Nelson.

Kline, P. (1993). Personality: The Psychometric View.London: Routledge.

Schmidt, F.L., & Hunter, J.E. (1998). The validity andutility of selection methods in personnel psychology:Practical and theoretical implications of 85 years ofresearch findings. Psychological Bulletin, 124, 262-274.

Spearman, C. (1904). General-intelligence; objectivelydefined determined and measured. American Journal ofPsychology, 15, 210-292.

Stern, W. (1912). Psychologische Methoden derIntelligenz-Prufung. Leipzig, Germany: Earth.

Terman, L.M. et. al., (1917). The Stanford Revision ofthe Binet-Simon scale for measuring intelligence.Baltimore: Warwick and York.

Yerkes, R.M. (1921). Psychological examining in theUnited States army. Memoirs of the National Academyof Sciences,, 15.