nicoletteedenburn.weebly.comnicoletteedenburn.weebly.com/uploads/1/3/3/6/...5993__… · web...

Running Head: ASSESSMENT EVALUATION 1

Evaluation of Assessments Commonly Used for Identification of the Gifted

Nicolette Edenburn

Oklahoma State University

ASSESSMENT EVALUATION 2

Introduction

Tulsa Public Schools has a large and diverse urban population. It has over 40,000

students spread out across 80 schools, and is comprised of many ethnicities. 28.1% of the student

population is Caucasian, 27.9% is Hispanic, 27.8% is African American, 6.94% is American

Indian, and 1.33% is Asian (Tulsa Public Schools, Fast Facts, 2014). Due to its size and

diversity, it is only necessary that the district have a detailed and comprehensive gifted education

plan, which it does. The introduction of the plan includes the district’s goal for gifted education,

which is closely tied to the state’s definition of gifted:

An important goal of Tulsa Public Schools is to identify and provide appropriate

educational experiences for students who give evidence of high performance

capability in areas such as intellectual ability, creative, artistic, or leadership

capacity, or in specific academic areas, and who require opportunities or

experiences not ordinarily provided by the school criteria in order to fully develop

such capabilities (Tulsa Public Schools, Gifted and Talented Education Plan,

2012).

Section four of the plan delineates that the school district “shall identify and serve

students who score at or above the 97th percentile rank on a national standardized test of

intellectual ability” and the standard error of measure shall be included (Gifted and Talented

Education Plan, 2012, p. 6). The plan goes on in section eight to explore nuances in

identification. Low income, rural, African American, Hispanic, American Indian, and other

ethnic populations all call for nonverbal tests for intelligence when outlined in section eight.

They also suggest tests that cover spatial, perceptual, and mechanical reasoning skills. The

district also calls for “informal measures” (p. 23) that are approved by the district to guide


identification. Several sections of groups of students also call for a representative group of that

population to be identified. Female students are allowed to use test scores from previous years.

Physically impaired, visually impaired, hearing impaired, and learning disabled students also call

for nonverbal tests or tests that can be modified for their physical or visual impairments. They

also allow for informal measures to be used. The learning disabled subcategory states that

creativity measures can also be implemented. Learning disabled and underachieving students

also have outlined that early identification is encouraged. Several subgroups state that parent and

teacher input is encouraged. Section eight closes with this statement: “Each site will strive for,

but not force identification of a representative proportion of the population and allow potential

for performance to qualify a child for screening (G&T Education Plan, 2012, p. 25). In order to

show whether or not the district follows these guidelines, the tests used for identification will be

analyzed, as well as other identification measures.

Nonverbal Assessments

Tulsa Public Schools uses the Naglieri Nonverbal Ability Test Second Edition (NNAT2)

is a nonverbal test for general ability. According to Johnson (2011), the test requires intermediate

levels of competency to administer, and suggests having at least one graduate level class prior to

administration. It lists an internal consistency reliability of 0.83-0.92, test-retest reliability of

0.70-0.78, and interscorer reliability of 0.99, which are all at or above the acceptable standards

listed by Johnson. It lists that it has content validity reported with a criterion-related validity of

0.51-0.74, which is a moderate to strong correlation of prediction of performance. The product

overview guide for the NNAT2 (2010) states that 57,000 students ages 5 to 17 were sampled

when norming the seven levels of the test, although Johnson states over 63,000 were used for

standardization. The test has a paper-pencil form, but it is commonly taken online. The test takes


thirty minutes and lists a myriad of educational professionals who may be able to administer the

test. The standard error of measure is different based on age, but in most cases it is 6. Some of

the things the product overview emphasizes are the following: “features culturally neutral items

without reliance on verbal ability,” “expedited administration and instruction,” “identifying

gifted and talented in specialized peer groups,” and “a total assessment solution” (p. 2).

This instrument is a good tool to use for gifted identification, as the district plan states

nonverbal measurements should be used for almost every ethnic and underrepresented group of

students in gifted education. While Johnson recommends a graduate course, the product

overview shows that almost any educator who could be in charge of testing students would be

able to do so. The reliability and validity scores are acceptable and strong in some cases. This

measurement only takes thirty minutes to administer, so the only issue would be whether or not it

should be the sole identification instrument used to qualify a student. A quality test environment

would be necessary, as thirty minutes will determine their general ability. However, it could be

used quickly in identifying K-12 students for the gifted program.

There is another nonverbal assessment called the Test of Nonverbal Intelligence Fourth

Edition (TONI-4). It also has the intermediate level of administration according to Johnson. It

had a much smaller sample size reported at 2,272. Its internal consistency reliability is 0.93-0.97,

quite higher than the NNAT2, and its test-retest reliability is 0.76-0.92, also much higher than

the NNAT2. Its interscorer reliability is the same at 0.99. The criterion validity is 0.70-0.77,

much higher than the NNAT2 as well. While the NNAT2 has seven levels based on age or grade,

the TONI-4 is one test but evaluated differently based on the number of questions a student is

able to complete. It consists of 60 questions, those ten and over starting at question 20. The

administrator stops the test when less than 3 out of 5 questions in a row are able to be completed


correctly and then calculates the score. The standard error of measure for all ages is 3. The

scoring guide also has a place where environmental and examinee notes can be given.

While the NNAT2 is a valid instrument, the TONI-4 has better validity and reliability.

However, this is based on a much smaller sample size. While nonverbal picture puzzles, the

TONI-4 relies more on the administrator as far as knowing where and when to start and end the

test, so training on administration and scoring would be necessary. It makes sense that Tulsa

Public Schools would integrate this assessment as well. Since nonverbal assessments are so

integral within the TPS identification section, it would make sense to have more than one

assessment readily available for use. The NNAT2 is administrator-friendly, but the TONI-4

would be a valid tool to use if multiple assessments are necessary for identification.

OLSAT and CogAT

Tulsa Public Schools also uses the Otis-Lennon School Ability Test, Eighth Edition

(OLSAT) when identifying intellectually gifted students. It integrates verbal and nonverbal

questions and has multiple sections. As of now, Tulsa gives paper-pencil tests to Kindergarten

through 2nd grades and online versions to 3rd grade and up which require less administrator

guidance. The test was sampled by 445,500 according to Johnson (2011), and its internal

consistency reliability is 0.80-0.90 for its subtests and 0.90-0.94 for its composite. Its test-retest

and interscorer reliability were not recorded. It had criterion validity of 0.46-0.78 for its subtests

and 0.45-0.77 for its composite. It has the minimum requirement for administration, meaning

anyone able to administer a test and read the testing manual is qualified to administer, since the

test is scored by the company or someone qualified within the district. There are seven levels to

the test determined by age ranges in Kindergarten-12th grades. The test was first published in

1918, according to its technical manual (2003). The test is highly research-based, and the


technical manual states that efforts were made to eliminate bias and questions were eliminated if

statistically proven to be biased based on cultural or gender group tests. A standard error of

measure of six is added to all tests.

While the OLSAT claims that it has done tests for bias, many of its questions are biased

towards children who have had more worldly experiences. For example, “Johnny is going

camping and is looking forward to catching his lunch. What is he going to do?” If kids haven’t

had the exposure to fishing and that phrase for it, they pick the picnic basket. It also hasn’t had

an update since 2003, so many questions are older than the students taking the test. In the

technological world our students live in, many of the questions are outdated. This test is good for

giving a more comprehensive look at a child, since you get a verbal and a nonverbal score. I

would use this as a tool in my district, but I would not use this test much with children of poverty

or with students who rely heavily on ELL instruction.

In my district, the Cognitive Abilities Test (CogAT) is also used. The CogAT consists of

three tests, with three subtests per test. Students receive a verbal, nonverbal, and a quantitative

score. Therefore, it is an even more comprehensive exam than the OLSAT. It also has the

minimum level of administration requirements, so anyone who can read the administration guide

could give the test. In Johnson (2011), it states that it was standardized with a sample group of

180,538 students. There are 11 levels of the test for Kindergarten-grade 12. Its subtest internal

consistency reliability is 0.856-0.963, and the composite is 0.919-0.982. Its test-retest reliability

composite score is 0.69-0.87. Interscorer reliability was not recorded, and its criterion validity is

0.24-0.88. Validity in both the OLSAT and CogAT are varied due to the fact that the questions

may or may not relate to their grades or performance in math or reading due to the nature of the

questions. There is only a standard error of 3 added to a student’s composite score, no matter the


age. Its administration guide is from 2012, so the questions are a little less outdated than the

OLSAT. The test comes in Spanish and English, so that is helpful for English Language

Learners, but only if they are Hispanic. The test is also long, at three hours, so it needs to be

given in three different sessions. It is also an expensive test if the district chooses to use

machine-scored booklets and purchase individualized results.

Due to the individualize reporting and teaching suggestions provided for children of those

combination of stanines, this would be a good choice of test for a district to use, especially if

they found ways to save money (for instance, create own home reports). This would only be

recommended if the district chose certain grade level(s) to all take the test, since it is so time-

consuming. It would also give the teacher a better cognitive picture of the students. Since it is so

comprehensive, should a child be extremely high in one subtest and low in another, that would

be a signal the special education department could use to pursue testing for learning disabilities.

The CogAT and OLSAT would both be good comprehensive tests for a district to use,

but more emphasis should probably be placed on the CogAT. The CogAT would best be used if

testing an entire grade level. Since early identification is important, a district would need to

decide which grade level would be best to test all at once while also be cognizant of the time it

takes to complete the test. OLSATs would be good to use the student shows potential, but it is

not a year the district is blanket testing with the CogAT.

Wechsler Intelligence Scale for Children (4th ed.)

The Weschler Intelligence Scale for Children (4th ed.) (WISC-IV) is a full IQ test given

by a highly trained school psychologist or outside doctor/psychometrist trained in giving and

interpreting the tests. This is a well-respected but expensive test, so it is not used often for gifted

identification in the public schools. Parents may have this test done on their own at their own


cost. There is a battery of ten core subtests, and then there are five supplemental subtests that ca

be given. Students receive a verbal comprehension index, perceptual reasoning index, working

memory index, processing speed index, and a full scale IQ. This test can also be used to check

for learning disabilities and mental retardation, which is how it is normally used in the public

schools. The composite scores are determined by the ten core subtests, and discrepancies are

analyzed by the administrator through the supplemental subtests. Other information is also

gleaned from the supplemental tests. While only 2,200 students were sampled, many factors

were considered to avoid bias. Johnson (2011) says the test has 0.65-0.92 internal consistency

reliability in its subtests and 0.96-0.97 for its composite. It has 0.76-0.92 test-retest reliability on

its subtests, and 0.93 for the composite. It also has 0.95-0.98 for its interscorer reliability,

probably due to the highly trained requirements of them. Its reliability is quite high. Its criterion

validity is varied, though, with a range of 0.10-0.80. The ten core subtests include the following:

block design, similarities, digit span, picture concepts, coding, vocabulary, letter-number

sequencing, matrix reasoning, comprehension, and symbol search. The five supplemental

subtests include the following: picture completion, cancellation, information, arithmetic, and

word reasoning.

Since this is a lengthy, expensive test, and since few teachers are trained to give it, this is

probably not a realistic measure of gifted identification. If teachers could work with special

education teachers, though, perhaps those with relatively high IQ scores could be screened for

gifted using a different academic ability test. Since twice exceptional students often mask their

disabilities and their giftedness, a WISC-IV score could lead us towards discovering both.


Screening Assessment for Gifted Elementary and Middle School Students (2nd ed.)

The Screening Assessment for Gifted Elementary and Middle School Students (2nd ed.)

(SAGES-2) is a different kind of test. Like the CogAT, it integrates three subtests:

Mathematics/Science, Language Arts/Social Studies, and Reasoning. One test is given to any

Kindergartener-third grader and a different version is given to any fourth-eighth grader, and

scores are given based on how many they received correct at their age level. Quotients are given

for normal samples, and then different quotients are given for gifted samples. Based on their

profile of three scores, a gifted coordinator would determine if they qualify for gifted services.

According to Johnson (2011), they sampled 5,313 students in order to standardize these tests

published in 2001. In Kindergarten-3rd grade, it has internal consistency reliability for normal

students of 0.77-0.93 and 0.88-0.94 for its gifted sample. It has an internal consistency reliability

of 0.88-0.96 for its normal fourth-eighth grade sample and 0.82-0.93 for the gifted sample. There

is no composite score. It has test-retest reliability of 0.95-0.97 for Kindergarten-third grade, and

fourth-eighth grade has 0.78-0.92 test-retest reliability. K-3 interscorer reliability is 0.92-0.99,

and 4-8 interscorer reliability is 0.91-0.97. All of these are moderate to strong correlations. Its

criterion validity is NS-0.89. Administrators must have a moderate level of familiarity with the

test in order to give it, so it should only be given by gifted coordinators or whoever is in charge

of gifted identification.

This test is nice, because it uses national standards to create its questions. However, many

of the standards used are cited from the late 1980s to early 1990s, so it is a little outdated

compared with new standards. The teacher is allowed to read each question and repeat it once,

which helps the younger students, but it would not be helpful for those who benefit the most

from nonverbal tests. Should a third edition of this test come out, it might be worth looking into.


It could be a way to high achieving classroom students who have trouble with qualifying tests

that don’t necessarily show their strength. This also makes sense, because it has a section where

other tests given can be logged and given a SAGES-2 equivalent score, much like Tulsa’s

multicriteria G3 form. It also would be a good screening tool if necessary.

Torrance Tests of Creative Thinking

All of the tests up to now have been measures of intelligence or academic ability.

However, the definition of giftedness also includes creativity, leadership, and the arts. The

Torrance Test of Creative Thinking is one of the few tests that can be given in those categories.

Johnson (2011) shows it was standardized with a sample of 94,796, with most of those people

taking the figural test. It has an internal consistency reliability of 0.89-0.94 on both the verbal

and figural subtests, and it has an interscorer reliability of 0.95-0.99. Only people trained in

giving this test should administer it, which could be difficult for a district, especially a small one.

Its criterion validity is 0.04-0.70, which is quite a spread. However, when measuring creativity,

one would want a certain level of spread in the subjects taking it. The figural response booklet

(2006) has three activities in it where students use the figures provided and create from those,

often giving titles to their work. It appears the activities have limits on the time provided. Part of

what makes this difficult to administer is that the administrator must know how to score the

responses. The seven verbal activities include writing as many questions as possible about a

picture, guessing causes about a picture, guessing consequences about the same picture, giving

product improvement ideas to a stuffed animal, giving unusual uses of tin cans or cardboard

boxes, and giving ideas to what would happen if a supposed improbable event took place.

Fluency, flexibility, and originality are assessed in each of these timed measures. High scores in

each would lead to a possible identification of creativity.


Observation Inventories

There are three observation scales or inventories that the author examined, which could

be used either to screen possible candidates for testing or to add points to a child’s test scores if

they are in a minority group and could have experienced bias with their testing. The Gifted and

Talented Evaluation Scales (GATES) summary and response form is relatively inexpensive.

Someone familiar with the students answers ten questions on a scale of 1-9 for the following five

categories: intellectual ability, academic skills, creativity, leadership, and artistic talent. A score

of 1-3 is below, 4-6 is average, and 7-9 is above. Each category is then totaled. Scores of 90-110

show probability of giftedness, 111-112 show high probability, and 121 and up shows extreme

probability. Since the categories related to the definition of giftedness are separated, this would

help a gifted coordinator who was unfamiliar with the student know which test or type of

identification to pursue. Teachers filling the form out would need training on how to

appropriately fill the document out to avoid differences in scoring.

The Kingore Observation Inventory (2nd ed.) is given for the entire class within one form

to avoid having to fill out individual forms. Its goal is to recognize which students in the

classroom need more differentiation in the classroom. Teachers would need training on how to

make these observations, and they also need to create an anecdotal journal to accompany their

observations. The categories of giftedness examined include the following: advanced language,

analytical thinking, meaning motivation, perspective, sense of humor, sensitivity, and accelerated

learning. There is also a list of negative characteristics of gifted and which of those

aforementioned categories they tie in with. The teacher looks for these attributes and writes

anecdotes on a note card for a child in their folder. Those who consistently show the gifted

characteristics might be tested for gifted identification. Forcing teachers to continuously observe


for gifted characteristics should increase awareness and mindfulness of the gifted and their

needs. It also forces them to look at all students so subgroups that tend to not be identified as

much as others may not be overlooked. If a school and teachers are on board with this extra

work, this could be a helpful measure to use for screening of the gifted.

Finally, the Slocumb-Payne Teacher Perception Inventory is used when identifying

students from backgrounds that are often overlooked in gifted identification. A teacher fills out

the form and circles 1-4, one being seldom or never and 4 being almost always for a set of

questions. The questions/observations show opposite scenarios on each side of the page in order

to make sure responses are consistent and thought-out. This allows gifted coordinators to screen

for gifted students who may be falling through the cracks due to their backgrounds. Slocumb and

Payne (2000) also have an environmental opportunities profile. Someone who has a relationship

with the caregiver of a potentially gifted student from a diverse background would interview that

person. The questions are highly personal and specific, so confidentiality would be necessary and

a relationship would need to be established so the caregiver knew there was no judgment. A

student receives 1-3 points, with more points given to scenarios where the environment of the

child is more diverse or challenged than other gifted counterparts. This could be used by a

district if a student from a diverse background came close to qualifying. The district could decide

which totals of points could add to their identification test scores. Since giftedness is a

combination of nature and nurture, points added due to financial or familial hardships could get a

student who is naturally gifted into a program where their academic needs could be nurtured and

improved even more.


Summary of Findings

According to the advanced site plan of Tulsa Public Schools, it is of the opinion of the

author that the district is on the right track. Since so many subgroups are listed as needing

nonverbal test options, it seems that more nonverbal tests should be offered. The TONI-4 could

be an option, but training would need to be done to ensure the administrators are giving it

correctly. Those not needing nonverbal options have an OLSAT and CogAT option, but there

aren’t two solely nonverbal options given. OLSAT testing should be examined due to the

outdated nature of some of the questions when compared with today’s generation. However,

CogAT needs to be more readily available than just 2nd grade if OLSAT testing is less prevalent.

Coordinators should be working more closely with special education teachers to find out scores

on the WISC-IV tests that are administered in order to try and catch twice exceptional students.

Johnson (2011) recommends multiple assessments, so various assessments should be accessible

within cost limitations. There also needs to be more coordinators at Tulsa Public Schools able to

perform the Torrance Test for creativity. Coordinators should also be made aware what is on the

test so they could think of which students might do well on such a thing. It should also be looked

into if there are any tests for arts or leadership skills, or if that continues to be nominated on a

subjective basis and only identified based on an observation and portfolio basis.

Finally, the Slocumb-Payne and GATES observation skills would be good measures to

include as screening measures. The GATES scales would help locate more students who could

be identified in areas that are non-academic. The Slocumb-Payne would help teachers screen for

potential gifted among subgroups that may not be identified as much as others and it could also

be used to help gain a more representative sample of some subgroups, as mentioned in the TPS

district plan for the gifted. More of these tools could be implemented as long as coordinators


know when and how to use them. They may want to be rolled out in waves and not all at once,

especially if something like the Kingore Observation Inventory were placed upon teachers.

Things like that add more work and may be beneficial, but everyone needs to be on board and

see the benefits before placing the extra workload on them. Once again, the more variety that has

statistical strength behind it and meets current trends within a district’s test inventory, the better

and more successful the identification process could be.


References

Brown, L., Sherbenou, R.J., & Johnson, S.K. (2010). Test of nonverbal intelligence: Answer and

record form. (4th ed.). Austin, TX: PRO-ED.

Gilliam, J.E., Carpenter, B.O., & Christensen, J.R. (1995). Gifted and talented evaluation scales:

Summary/response form. Waco, TX: Prufrock Press.

Johnsen, S. K. (Ed.). (2011). Identifying gifted students: A practical guide. (2nd ed.). Waco, TX:

Prufrock Press.

Kingore, B. (2001). The Kingore observation inventory. (2nd ed.). Austin, TX: Professional

Associates Publishing.

Lohman, D.F. (2012). Cognitive abilities test: Directions for administration. CogAT. Rolling

Meadows, IL: Riverside.

Naglieri, J.A. (2010). Product Overview. Naglieri nonverbal ability test. (2nd ed.). Pearson.

Otis, A.S., & Lennon, R.T. (2003). OLSAT: Otis-Lennon school ability test. (8th ed.). Technical

manual. Pearson.

PRO-ED, Inc. (2001). Screening assessment for gifted elementary and middle school students.

(2nd ed.). SAGES-2: K-3. Austin, TX: PRO-ED, Inc.

Slocumb, P.D., & Payne, R.K. (2000). Slocumb-Payne teacher perception inventory. Removing

the mask: Giftedness in poverty. Aha! Process, Inc.

Tulsa Public Schools. (2012). Gifted and talented education plan. Gifted and Talented Education:

Tulsa Public Schools 2013-2014.

Tulsa Public Schools. (2014). Fast facts. About the district. Retrieved March 31, 2014, from

http://www.tulsaschools.org/4_About_District/fast_facts_main.asp.

http://www.tulsaschools.org/4_About_District/fast_facts_main.asp


Freeman, J. (2004). Cultural influences on gifted gender achievement. High Ability Studies,

15(1), 7-23. doi:10.1080/1359813042000225311

Kerr, B.A., Vuyk, M.A., & Rex, C. Gendered practices in the education of gifted girls and boys.

Psychology in the Schools, 49(7), 647-655. doi:10.1002/pits.21627

Malin, J., & Makel, M.C. Gender differences in gifted students’ advice on solving the world’s

problems. Journal for the Education of the Gifted, 35(2), 175-187.

doi:10.1177/0162353212440617

Pepperell, J.L., & Rubel, D.J. (2009). The experience of gifted girls transitioning from

elementary school to sixth and seventh grade: A grounded theory. The Qualitative

Report, 14(2), 341-360. http://www.nova.edu/ssss/QR/QR14-2/pepperell-rubel.pdf

Peterson, J. (2013). Gender differences in identification of gifted youth and in gifted program

participation: A meta-analysis. Contemporary Educational Psychology, 38(2013), 342-

348. http://dx.doi.org/10.1016/j.cedpsych.2013.07.002

Preckel, F., Goetz, T., Pekrun, R., & Kleine, M. (2008). Gender differences in gifted and

average-ability students: Comparing girls' and boys' achievement, self-concept, interest,

and motivation in mathematics. The Gifted Child Quarterly, 52(2), 146-159. Retrieved

from http://search.proquest.com/docview/212066691?accountid=4117

Reis, S.M. Gifted girls, twenty-five years later: Hopes realized and new challenges found.

Roeper Review, 25(4): 154-157. Retrieved from

http://search.proquest.com/docview/206709056?accountid=4117

Sarouphim, K.M. (2011). Gifted and non-gifted Lebanese adolescents: Gender differences in

self-concept, self-esteem and depression. International Education, 41(1), 26-41, 100.

Retrieved from http://search.proquest.com/docview/911991596?accountid=4117



http://dx.doi.org/10.1016/j.cedpsych.2013.07.002

http://www.nova.edu/ssss/QR/QR14-2/pepperell-rubel.pdf


Torrance, E.P. (2006). Thinking creatively with pictures: Figural response booklet B. Torrance

tests of creative thinking. Bensenville, IL: Scholastic Testing Service, Inc.

Torrance, E.P. (2006). Torrance tests of creative thinking: Norms-technical manual, verbal

forms A and B. Bensenville, IL: Scholastic Testing Service, Inc.

Weschler, D. (2003). WISC-IV: Weschler intelligence scale for children. (4th ed.). Technical and

interpretive manual. San Antonio, TX: The Psychological Corporation.

Willard-Holt, C. (2008). “You could be doing brain surgery”: Gifted girls becoming teachers.

Gifted Child Quarterly, 52(4), 313-325. doi:10.177/0016986208321807

nicoletteedenburn.weebly.comnicoletteedenburn.weebly.com/uploads/1/3/3/6/...5993__… · web...

Documents