nicoletteedenburn.weebly.comnicoletteedenburn.weebly.com/uploads/1/3/3/6/...5993__… · web...
TRANSCRIPT
Running Head: ASSESSMENT EVALUATION 1
Evaluation of Assessments Commonly Used for Identification of the Gifted
Nicolette Edenburn
Oklahoma State University
ASSESSMENT EVALUATION 2
Introduction
Tulsa Public Schools has a large and diverse urban population. It has over 40,000
students spread out across 80 schools, and is comprised of many ethnicities. 28.1% of the student
population is Caucasian, 27.9% is Hispanic, 27.8% is African American, 6.94% is American
Indian, and 1.33% is Asian (Tulsa Public Schools, Fast Facts, 2014). Due to its size and
diversity, it is only necessary that the district have a detailed and comprehensive gifted education
plan, which it does. The introduction of the plan includes the district’s goal for gifted education,
which is closely tied to the state’s definition of gifted:
An important goal of Tulsa Public Schools is to identify and provide appropriate
educational experiences for students who give evidence of high performance
capability in areas such as intellectual ability, creative, artistic, or leadership
capacity, or in specific academic areas, and who require opportunities or
experiences not ordinarily provided by the school criteria in order to fully develop
such capabilities (Tulsa Public Schools, Gifted and Talented Education Plan,
2012).
Section four of the plan delineates that the school district “shall identify and serve
students who score at or above the 97th percentile rank on a national standardized test of
intellectual ability” and the standard error of measure shall be included (Gifted and Talented
Education Plan, 2012, p. 6). The plan goes on in section eight to explore nuances in
identification. Low income, rural, African American, Hispanic, American Indian, and other
ethnic populations all call for nonverbal tests for intelligence when outlined in section eight.
They also suggest tests that cover spatial, perceptual, and mechanical reasoning skills. The
district also calls for “informal measures” (p. 23) that are approved by the district to guide
ASSESSMENT EVALUATION 3
identification. Several sections of groups of students also call for a representative group of that
population to be identified. Female students are allowed to use test scores from previous years.
Physically impaired, visually impaired, hearing impaired, and learning disabled students also call
for nonverbal tests or tests that can be modified for their physical or visual impairments. They
also allow for informal measures to be used. The learning disabled subcategory states that
creativity measures can also be implemented. Learning disabled and underachieving students
also have outlined that early identification is encouraged. Several subgroups state that parent and
teacher input is encouraged. Section eight closes with this statement: “Each site will strive for,
but not force identification of a representative proportion of the population and allow potential
for performance to qualify a child for screening (G&T Education Plan, 2012, p. 25). In order to
show whether or not the district follows these guidelines, the tests used for identification will be
analyzed, as well as other identification measures.
Nonverbal Assessments
Tulsa Public Schools uses the Naglieri Nonverbal Ability Test Second Edition (NNAT2)
is a nonverbal test for general ability. According to Johnson (2011), the test requires intermediate
levels of competency to administer, and suggests having at least one graduate level class prior to
administration. It lists an internal consistency reliability of 0.83-0.92, test-retest reliability of
0.70-0.78, and interscorer reliability of 0.99, which are all at or above the acceptable standards
listed by Johnson. It lists that it has content validity reported with a criterion-related validity of
0.51-0.74, which is a moderate to strong correlation of prediction of performance. The product
overview guide for the NNAT2 (2010) states that 57,000 students ages 5 to 17 were sampled
when norming the seven levels of the test, although Johnson states over 63,000 were used for
standardization. The test has a paper-pencil form, but it is commonly taken online. The test takes
ASSESSMENT EVALUATION 4
thirty minutes and lists a myriad of educational professionals who may be able to administer the
test. The standard error of measure is different based on age, but in most cases it is 6. Some of
the things the product overview emphasizes are the following: “features culturally neutral items
without reliance on verbal ability,” “expedited administration and instruction,” “identifying
gifted and talented in specialized peer groups,” and “a total assessment solution” (p. 2).
This instrument is a good tool to use for gifted identification, as the district plan states
nonverbal measurements should be used for almost every ethnic and underrepresented group of
students in gifted education. While Johnson recommends a graduate course, the product
overview shows that almost any educator who could be in charge of testing students would be
able to do so. The reliability and validity scores are acceptable and strong in some cases. This
measurement only takes thirty minutes to administer, so the only issue would be whether or not it
should be the sole identification instrument used to qualify a student. A quality test environment
would be necessary, as thirty minutes will determine their general ability. However, it could be
used quickly in identifying K-12 students for the gifted program.
There is another nonverbal assessment called the Test of Nonverbal Intelligence Fourth
Edition (TONI-4). It also has the intermediate level of administration according to Johnson. It
had a much smaller sample size reported at 2,272. Its internal consistency reliability is 0.93-0.97,
quite higher than the NNAT2, and its test-retest reliability is 0.76-0.92, also much higher than
the NNAT2. Its interscorer reliability is the same at 0.99. The criterion validity is 0.70-0.77,
much higher than the NNAT2 as well. While the NNAT2 has seven levels based on age or grade,
the TONI-4 is one test but evaluated differently based on the number of questions a student is
able to complete. It consists of 60 questions, those ten and over starting at question 20. The
administrator stops the test when less than 3 out of 5 questions in a row are able to be completed
ASSESSMENT EVALUATION 5
correctly and then calculates the score. The standard error of measure for all ages is 3. The
scoring guide also has a place where environmental and examinee notes can be given.
While the NNAT2 is a valid instrument, the TONI-4 has better validity and reliability.
However, this is based on a much smaller sample size. While nonverbal picture puzzles, the
TONI-4 relies more on the administrator as far as knowing where and when to start and end the
test, so training on administration and scoring would be necessary. It makes sense that Tulsa
Public Schools would integrate this assessment as well. Since nonverbal assessments are so
integral within the TPS identification section, it would make sense to have more than one
assessment readily available for use. The NNAT2 is administrator-friendly, but the TONI-4
would be a valid tool to use if multiple assessments are necessary for identification.
OLSAT and CogAT
Tulsa Public Schools also uses the Otis-Lennon School Ability Test, Eighth Edition
(OLSAT) when identifying intellectually gifted students. It integrates verbal and nonverbal
questions and has multiple sections. As of now, Tulsa gives paper-pencil tests to Kindergarten
through 2nd grades and online versions to 3rd grade and up which require less administrator
guidance. The test was sampled by 445,500 according to Johnson (2011), and its internal
consistency reliability is 0.80-0.90 for its subtests and 0.90-0.94 for its composite. Its test-retest
and interscorer reliability were not recorded. It had criterion validity of 0.46-0.78 for its subtests
and 0.45-0.77 for its composite. It has the minimum requirement for administration, meaning
anyone able to administer a test and read the testing manual is qualified to administer, since the
test is scored by the company or someone qualified within the district. There are seven levels to
the test determined by age ranges in Kindergarten-12th grades. The test was first published in
1918, according to its technical manual (2003). The test is highly research-based, and the
ASSESSMENT EVALUATION 6
technical manual states that efforts were made to eliminate bias and questions were eliminated if
statistically proven to be biased based on cultural or gender group tests. A standard error of
measure of six is added to all tests.
While the OLSAT claims that it has done tests for bias, many of its questions are biased
towards children who have had more worldly experiences. For example, “Johnny is going
camping and is looking forward to catching his lunch. What is he going to do?” If kids haven’t
had the exposure to fishing and that phrase for it, they pick the picnic basket. It also hasn’t had
an update since 2003, so many questions are older than the students taking the test. In the
technological world our students live in, many of the questions are outdated. This test is good for
giving a more comprehensive look at a child, since you get a verbal and a nonverbal score. I
would use this as a tool in my district, but I would not use this test much with children of poverty
or with students who rely heavily on ELL instruction.
In my district, the Cognitive Abilities Test (CogAT) is also used. The CogAT consists of
three tests, with three subtests per test. Students receive a verbal, nonverbal, and a quantitative
score. Therefore, it is an even more comprehensive exam than the OLSAT. It also has the
minimum level of administration requirements, so anyone who can read the administration guide
could give the test. In Johnson (2011), it states that it was standardized with a sample group of
180,538 students. There are 11 levels of the test for Kindergarten-grade 12. Its subtest internal
consistency reliability is 0.856-0.963, and the composite is 0.919-0.982. Its test-retest reliability
composite score is 0.69-0.87. Interscorer reliability was not recorded, and its criterion validity is
0.24-0.88. Validity in both the OLSAT and CogAT are varied due to the fact that the questions
may or may not relate to their grades or performance in math or reading due to the nature of the
questions. There is only a standard error of 3 added to a student’s composite score, no matter the
ASSESSMENT EVALUATION 7
age. Its administration guide is from 2012, so the questions are a little less outdated than the
OLSAT. The test comes in Spanish and English, so that is helpful for English Language
Learners, but only if they are Hispanic. The test is also long, at three hours, so it needs to be
given in three different sessions. It is also an expensive test if the district chooses to use
machine-scored booklets and purchase individualized results.
Due to the individualize reporting and teaching suggestions provided for children of those
combination of stanines, this would be a good choice of test for a district to use, especially if
they found ways to save money (for instance, create own home reports). This would only be
recommended if the district chose certain grade level(s) to all take the test, since it is so time-
consuming. It would also give the teacher a better cognitive picture of the students. Since it is so
comprehensive, should a child be extremely high in one subtest and low in another, that would
be a signal the special education department could use to pursue testing for learning disabilities.
The CogAT and OLSAT would both be good comprehensive tests for a district to use,
but more emphasis should probably be placed on the CogAT. The CogAT would best be used if
testing an entire grade level. Since early identification is important, a district would need to
decide which grade level would be best to test all at once while also be cognizant of the time it
takes to complete the test. OLSATs would be good to use the student shows potential, but it is
not a year the district is blanket testing with the CogAT.
Wechsler Intelligence Scale for Children (4th ed.)
The Weschler Intelligence Scale for Children (4th ed.) (WISC-IV) is a full IQ test given
by a highly trained school psychologist or outside doctor/psychometrist trained in giving and
interpreting the tests. This is a well-respected but expensive test, so it is not used often for gifted
identification in the public schools. Parents may have this test done on their own at their own
ASSESSMENT EVALUATION 8
cost. There is a battery of ten core subtests, and then there are five supplemental subtests that ca
be given. Students receive a verbal comprehension index, perceptual reasoning index, working
memory index, processing speed index, and a full scale IQ. This test can also be used to check
for learning disabilities and mental retardation, which is how it is normally used in the public
schools. The composite scores are determined by the ten core subtests, and discrepancies are
analyzed by the administrator through the supplemental subtests. Other information is also
gleaned from the supplemental tests. While only 2,200 students were sampled, many factors
were considered to avoid bias. Johnson (2011) says the test has 0.65-0.92 internal consistency
reliability in its subtests and 0.96-0.97 for its composite. It has 0.76-0.92 test-retest reliability on
its subtests, and 0.93 for the composite. It also has 0.95-0.98 for its interscorer reliability,
probably due to the highly trained requirements of them. Its reliability is quite high. Its criterion
validity is varied, though, with a range of 0.10-0.80. The ten core subtests include the following:
block design, similarities, digit span, picture concepts, coding, vocabulary, letter-number
sequencing, matrix reasoning, comprehension, and symbol search. The five supplemental
subtests include the following: picture completion, cancellation, information, arithmetic, and
word reasoning.
Since this is a lengthy, expensive test, and since few teachers are trained to give it, this is
probably not a realistic measure of gifted identification. If teachers could work with special
education teachers, though, perhaps those with relatively high IQ scores could be screened for
gifted using a different academic ability test. Since twice exceptional students often mask their
disabilities and their giftedness, a WISC-IV score could lead us towards discovering both.
ASSESSMENT EVALUATION 9
Screening Assessment for Gifted Elementary and Middle School Students (2nd ed.)
The Screening Assessment for Gifted Elementary and Middle School Students (2nd ed.)
(SAGES-2) is a different kind of test. Like the CogAT, it integrates three subtests:
Mathematics/Science, Language Arts/Social Studies, and Reasoning. One test is given to any
Kindergartener-third grader and a different version is given to any fourth-eighth grader, and
scores are given based on how many they received correct at their age level. Quotients are given
for normal samples, and then different quotients are given for gifted samples. Based on their
profile of three scores, a gifted coordinator would determine if they qualify for gifted services.
According to Johnson (2011), they sampled 5,313 students in order to standardize these tests
published in 2001. In Kindergarten-3rd grade, it has internal consistency reliability for normal
students of 0.77-0.93 and 0.88-0.94 for its gifted sample. It has an internal consistency reliability
of 0.88-0.96 for its normal fourth-eighth grade sample and 0.82-0.93 for the gifted sample. There
is no composite score. It has test-retest reliability of 0.95-0.97 for Kindergarten-third grade, and
fourth-eighth grade has 0.78-0.92 test-retest reliability. K-3 interscorer reliability is 0.92-0.99,
and 4-8 interscorer reliability is 0.91-0.97. All of these are moderate to strong correlations. Its
criterion validity is NS-0.89. Administrators must have a moderate level of familiarity with the
test in order to give it, so it should only be given by gifted coordinators or whoever is in charge
of gifted identification.
This test is nice, because it uses national standards to create its questions. However, many
of the standards used are cited from the late 1980s to early 1990s, so it is a little outdated
compared with new standards. The teacher is allowed to read each question and repeat it once,
which helps the younger students, but it would not be helpful for those who benefit the most
from nonverbal tests. Should a third edition of this test come out, it might be worth looking into.
ASSESSMENT EVALUATION 10
It could be a way to high achieving classroom students who have trouble with qualifying tests
that don’t necessarily show their strength. This also makes sense, because it has a section where
other tests given can be logged and given a SAGES-2 equivalent score, much like Tulsa’s
multicriteria G3 form. It also would be a good screening tool if necessary.
Torrance Tests of Creative Thinking
All of the tests up to now have been measures of intelligence or academic ability.
However, the definition of giftedness also includes creativity, leadership, and the arts. The
Torrance Test of Creative Thinking is one of the few tests that can be given in those categories.
Johnson (2011) shows it was standardized with a sample of 94,796, with most of those people
taking the figural test. It has an internal consistency reliability of 0.89-0.94 on both the verbal
and figural subtests, and it has an interscorer reliability of 0.95-0.99. Only people trained in
giving this test should administer it, which could be difficult for a district, especially a small one.
Its criterion validity is 0.04-0.70, which is quite a spread. However, when measuring creativity,
one would want a certain level of spread in the subjects taking it. The figural response booklet
(2006) has three activities in it where students use the figures provided and create from those,
often giving titles to their work. It appears the activities have limits on the time provided. Part of
what makes this difficult to administer is that the administrator must know how to score the
responses. The seven verbal activities include writing as many questions as possible about a
picture, guessing causes about a picture, guessing consequences about the same picture, giving
product improvement ideas to a stuffed animal, giving unusual uses of tin cans or cardboard
boxes, and giving ideas to what would happen if a supposed improbable event took place.
Fluency, flexibility, and originality are assessed in each of these timed measures. High scores in
each would lead to a possible identification of creativity.
ASSESSMENT EVALUATION 11
Observation Inventories
There are three observation scales or inventories that the author examined, which could
be used either to screen possible candidates for testing or to add points to a child’s test scores if
they are in a minority group and could have experienced bias with their testing. The Gifted and
Talented Evaluation Scales (GATES) summary and response form is relatively inexpensive.
Someone familiar with the students answers ten questions on a scale of 1-9 for the following five
categories: intellectual ability, academic skills, creativity, leadership, and artistic talent. A score
of 1-3 is below, 4-6 is average, and 7-9 is above. Each category is then totaled. Scores of 90-110
show probability of giftedness, 111-112 show high probability, and 121 and up shows extreme
probability. Since the categories related to the definition of giftedness are separated, this would
help a gifted coordinator who was unfamiliar with the student know which test or type of
identification to pursue. Teachers filling the form out would need training on how to
appropriately fill the document out to avoid differences in scoring.
The Kingore Observation Inventory (2nd ed.) is given for the entire class within one form
to avoid having to fill out individual forms. Its goal is to recognize which students in the
classroom need more differentiation in the classroom. Teachers would need training on how to
make these observations, and they also need to create an anecdotal journal to accompany their
observations. The categories of giftedness examined include the following: advanced language,
analytical thinking, meaning motivation, perspective, sense of humor, sensitivity, and accelerated
learning. There is also a list of negative characteristics of gifted and which of those
aforementioned categories they tie in with. The teacher looks for these attributes and writes
anecdotes on a note card for a child in their folder. Those who consistently show the gifted
characteristics might be tested for gifted identification. Forcing teachers to continuously observe
ASSESSMENT EVALUATION 12
for gifted characteristics should increase awareness and mindfulness of the gifted and their
needs. It also forces them to look at all students so subgroups that tend to not be identified as
much as others may not be overlooked. If a school and teachers are on board with this extra
work, this could be a helpful measure to use for screening of the gifted.
Finally, the Slocumb-Payne Teacher Perception Inventory is used when identifying
students from backgrounds that are often overlooked in gifted identification. A teacher fills out
the form and circles 1-4, one being seldom or never and 4 being almost always for a set of
questions. The questions/observations show opposite scenarios on each side of the page in order
to make sure responses are consistent and thought-out. This allows gifted coordinators to screen
for gifted students who may be falling through the cracks due to their backgrounds. Slocumb and
Payne (2000) also have an environmental opportunities profile. Someone who has a relationship
with the caregiver of a potentially gifted student from a diverse background would interview that
person. The questions are highly personal and specific, so confidentiality would be necessary and
a relationship would need to be established so the caregiver knew there was no judgment. A
student receives 1-3 points, with more points given to scenarios where the environment of the
child is more diverse or challenged than other gifted counterparts. This could be used by a
district if a student from a diverse background came close to qualifying. The district could decide
which totals of points could add to their identification test scores. Since giftedness is a
combination of nature and nurture, points added due to financial or familial hardships could get a
student who is naturally gifted into a program where their academic needs could be nurtured and
improved even more.
ASSESSMENT EVALUATION 13
Summary of Findings
According to the advanced site plan of Tulsa Public Schools, it is of the opinion of the
author that the district is on the right track. Since so many subgroups are listed as needing
nonverbal test options, it seems that more nonverbal tests should be offered. The TONI-4 could
be an option, but training would need to be done to ensure the administrators are giving it
correctly. Those not needing nonverbal options have an OLSAT and CogAT option, but there
aren’t two solely nonverbal options given. OLSAT testing should be examined due to the
outdated nature of some of the questions when compared with today’s generation. However,
CogAT needs to be more readily available than just 2nd grade if OLSAT testing is less prevalent.
Coordinators should be working more closely with special education teachers to find out scores
on the WISC-IV tests that are administered in order to try and catch twice exceptional students.
Johnson (2011) recommends multiple assessments, so various assessments should be accessible
within cost limitations. There also needs to be more coordinators at Tulsa Public Schools able to
perform the Torrance Test for creativity. Coordinators should also be made aware what is on the
test so they could think of which students might do well on such a thing. It should also be looked
into if there are any tests for arts or leadership skills, or if that continues to be nominated on a
subjective basis and only identified based on an observation and portfolio basis.
Finally, the Slocumb-Payne and GATES observation skills would be good measures to
include as screening measures. The GATES scales would help locate more students who could
be identified in areas that are non-academic. The Slocumb-Payne would help teachers screen for
potential gifted among subgroups that may not be identified as much as others and it could also
be used to help gain a more representative sample of some subgroups, as mentioned in the TPS
district plan for the gifted. More of these tools could be implemented as long as coordinators
ASSESSMENT EVALUATION 14
know when and how to use them. They may want to be rolled out in waves and not all at once,
especially if something like the Kingore Observation Inventory were placed upon teachers.
Things like that add more work and may be beneficial, but everyone needs to be on board and
see the benefits before placing the extra workload on them. Once again, the more variety that has
statistical strength behind it and meets current trends within a district’s test inventory, the better
and more successful the identification process could be.
ASSESSMENT EVALUATION 15
References
Brown, L., Sherbenou, R.J., & Johnson, S.K. (2010). Test of nonverbal intelligence: Answer and
record form. (4th ed.). Austin, TX: PRO-ED.
Gilliam, J.E., Carpenter, B.O., & Christensen, J.R. (1995). Gifted and talented evaluation scales:
Summary/response form. Waco, TX: Prufrock Press.
Johnsen, S. K. (Ed.). (2011). Identifying gifted students: A practical guide. (2nd ed.). Waco, TX:
Prufrock Press.
Kingore, B. (2001). The Kingore observation inventory. (2nd ed.). Austin, TX: Professional
Associates Publishing.
Lohman, D.F. (2012). Cognitive abilities test: Directions for administration. CogAT. Rolling
Meadows, IL: Riverside.
Naglieri, J.A. (2010). Product Overview. Naglieri nonverbal ability test. (2nd ed.). Pearson.
Otis, A.S., & Lennon, R.T. (2003). OLSAT: Otis-Lennon school ability test. (8th ed.). Technical
manual. Pearson.
PRO-ED, Inc. (2001). Screening assessment for gifted elementary and middle school students.
(2nd ed.). SAGES-2: K-3. Austin, TX: PRO-ED, Inc.
Slocumb, P.D., & Payne, R.K. (2000). Slocumb-Payne teacher perception inventory. Removing
the mask: Giftedness in poverty. Aha! Process, Inc.
Tulsa Public Schools. (2012). Gifted and talented education plan. Gifted and Talented Education:
Tulsa Public Schools 2013-2014.
Tulsa Public Schools. (2014). Fast facts. About the district. Retrieved March 31, 2014, from
http://www.tulsaschools.org/4_About_District/fast_facts_main.asp.
ASSESSMENT EVALUATION 16
Freeman, J. (2004). Cultural influences on gifted gender achievement. High Ability Studies,
15(1), 7-23. doi:10.1080/1359813042000225311
Kerr, B.A., Vuyk, M.A., & Rex, C. Gendered practices in the education of gifted girls and boys.
Psychology in the Schools, 49(7), 647-655. doi:10.1002/pits.21627
Malin, J., & Makel, M.C. Gender differences in gifted students’ advice on solving the world’s
problems. Journal for the Education of the Gifted, 35(2), 175-187.
doi:10.1177/0162353212440617
Pepperell, J.L., & Rubel, D.J. (2009). The experience of gifted girls transitioning from
elementary school to sixth and seventh grade: A grounded theory. The Qualitative
Report, 14(2), 341-360. http://www.nova.edu/ssss/QR/QR14-2/pepperell-rubel.pdf
Peterson, J. (2013). Gender differences in identification of gifted youth and in gifted program
participation: A meta-analysis. Contemporary Educational Psychology, 38(2013), 342-
348. http://dx.doi.org/10.1016/j.cedpsych.2013.07.002
Preckel, F., Goetz, T., Pekrun, R., & Kleine, M. (2008). Gender differences in gifted and
average-ability students: Comparing girls' and boys' achievement, self-concept, interest,
and motivation in mathematics. The Gifted Child Quarterly, 52(2), 146-159. Retrieved
from http://search.proquest.com/docview/212066691?accountid=4117
Reis, S.M. Gifted girls, twenty-five years later: Hopes realized and new challenges found.
Roeper Review, 25(4): 154-157. Retrieved from
http://search.proquest.com/docview/206709056?accountid=4117
Sarouphim, K.M. (2011). Gifted and non-gifted Lebanese adolescents: Gender differences in
self-concept, self-esteem and depression. International Education, 41(1), 26-41, 100.
Retrieved from http://search.proquest.com/docview/911991596?accountid=4117
ASSESSMENT EVALUATION 17
Torrance, E.P. (2006). Thinking creatively with pictures: Figural response booklet B. Torrance
tests of creative thinking. Bensenville, IL: Scholastic Testing Service, Inc.
Torrance, E.P. (2006). Torrance tests of creative thinking: Norms-technical manual, verbal
forms A and B. Bensenville, IL: Scholastic Testing Service, Inc.
Weschler, D. (2003). WISC-IV: Weschler intelligence scale for children. (4th ed.). Technical and
interpretive manual. San Antonio, TX: The Psychological Corporation.
Willard-Holt, C. (2008). “You could be doing brain surgery”: Gifted girls becoming teachers.
Gifted Child Quarterly, 52(4), 313-325. doi:10.177/0016986208321807