Download - Discovery Education Assessment K-HS Benchmark … · In 2012-2013, K-2 students in the United States took over 3.3 million Common Core interim benchmark ... standards in English language

Discovery Education Assessment K-HS Benchmark Assessments in ELA and Mathematics

What validation data or reports are available? Are technical specifications available for the assessments? Please see the following Technical Manual for validation studies and psychometric information pertaining to Discovery Education’s interim assessments. Additional reports and technical evidence is available upon request.

COMMON CORE INTERIM BENCHMARK

TECHNICAL MANUAL

APRIL 2013

Page | 1

Discovery Education Assessment

Common Core Interim Benchmark Technical Manual

Table of Contents

I. Introduction: Discovery Education Assessment . . . . . 2

II. Assessment Standards and Content Validity . . . . . . 3

A. Common Core Standards . . . . . . . 3

B. Test Development and Review . . . . . . . 4

C. Web Alignment . . . . . . . . 7

III. Test Administration Online . . . . . . . . 9

IV. Test and Item Scores and Test Reports . . . . . . 10

A. Test and Item Scores . . . . . . . . 10

B. Interim Assessment Reports . . . . . . . 11

C. Interim Assessment Reports Examples . . . . . . 12

V. Reliability, Proficiency Levels, Validity and Growth . . . . . 21

A. Test Reliability . . . . . . . . . 21

B. Proficiency Levels . . . . . . . . 22

C. Validity . . . . . . . . . 23

D. Vertical Scale Averages and Growth . . . . . . 26

Appendices . . . . . . . . . . . 29

A. Test and Question Statistics, Reliability and Scale Scores . . . . . 29

B. Web Alignment Study . . . . . . . . . 42

Page | 2

I. Introduction: Discovery Education Assessment

Discovery Education has focused on the use of formative assessments to improve K-12 student learning

and performance. Bridging the gap between university research and classroom practice, Discovery

Education Assessment offers effective and user-friendly assessment products that provide classroom

teachers and students with the feedback needed to strategically adapt their teaching and learning activities

throughout the school year.

Discovery Education Assessment has pioneered a unique approach to formative assessments using a

scientifically research-based continuous improvement model that maps diagnostic assessments to each

state’s high stakes summative tests. Discovery Education Assessment’s Predictive Test-Specific Interim

Assessments are aligned to the content assessed by each summative assessment allowing teachers to track

student progress toward the standards and objectives used for accountability purposes.

Furthermore, Discovery Education Assessment subscribes to the Standards for Educational and

Psychological Testing articulated by the consortium of the American Educational Research Association,

the American Psychological Association, and the National Council on Measurement in Education.

This technical manual presents information about the Common Core Interim Assessments used during

2011-2012 and 2012-2013 school year.

Discovery Education Assessment across the United States

In 2012-2013, K-2 students in the United States took over 3.3 million Common Core interim benchmark

assessments created by Discovery Education Assessment. These students were found in twenty one

different states across the country. As more states transition to the Common Core State Standards, more

students are being assessed with Discovery Education Assessment’s Common Core Interim Benchmark

Assessments.

Page | 3

II. Assessment Standards and Content Validity

Content validity evidence shows that test content is appropriate for the particular constructs that are

being measured. Content validity is measured by agreement among subject matter experts about test

material and alignment to state standards, by highly reliable training procedures for item writers, by

thorough reviews of test material for accuracy and lack of bias, and by examination of depth of

knowledge of test questions.

To ensure content validity of all tests, Discovery Education Assessment carefully aligns the content of its

assessments to a given state’s content standards and the content sampled by the respective high stakes

test. Discovery Education Assessment hereby employs one of the leading alignment research

methodologies, the Webb Alignment Tool (WAT), which has continually supported the alignment of

our tests to state specific content standards both in breadth (i.e., amount of standards and objectives

sampled) and depth (i.e., cognitive complexity of standards and objectives). All Discovery Education

Assessment tests are thus state specific and feature matching reporting categories of a given state’s

large-scale assessment used for accountability purposes.

Common Core Standards

The Common Core State Standards Initiative is a state-led effort to establish a shared set of educational

standards in English language arts and mathematics for grades K-12. The standards are adopted

voluntarily by state, and are designed to prepare students to be ready to enter college or join the

workforce upon graduation. These standards will not be tested until the 2014-2015 school year.

Since the implementation of the Common Core standards, Discovery Education Assessment is working

with the adopted states to help transition to these new standards from their old state standards. Below are

the Common Core State Standards that can be found on the interim benchmark assessments in Reading

and Mathematics, grades 3-8, and Algebra 1 & 2 in High School.

Common Core English Language Arts Reporting Categories

Reading: Literature English Language Arts Standards : Writing

Reading: Informational Text English Language Arts Standards: Language

Reading: Foundational Skills English Language Arts Standards: Listening &

Speaking

Common Core Algebra 1 & 2 Reporting Categories

Number & Quantity Functions

Algebra Statistics & Probability

Page | 4

Common Core Mathematics Reporting Categories

Operations & Algebraic Thinking Ratios & Proportional Relationships

Number & Operations in Base Ten The Number System

Number & Operations—Fractions Expressions & Equations

Measurement & Data Geometry

Statistics & Probability

Test Development and Review

Alignment

For the last nine years, Discovery Education has led in assuring educators that its items are aligned

specifically to each state’s curriculum or district pacing guide. Every change in a state’s curriculum was

carefully identified and items changed to meet the current year’s statewide assessment.

Alignment is done by trained test developers who have subject matter expertise, teaching experience in

the grade/subject, and the assessment expertise to produce appropriate items. Each individual has

certification in the grade/subject and has at least three years teaching experience in the area. The

alignment is managed within our software and mapped one item at a time. This is not a software

alignment but rather a teacher who compares each item to the state standard to determine if it is

aligned.

We begin by matching our tests to the Common Core State Standards that are assessed. We agree that for

teachers to have confidence in the results, they must be assured that an assessment is aligned to their

standards. The test has to exhibit content validity, which is demonstrated when test items represent the

subject area, such as math or reading. In other words, a math assessment must have items that match or

align to the Common Core Standards and benchmarks defined by a state’s curriculum and high stakes’

test. Difficulty levels are based on actual prior student performance and provide teachers with a crucial

comparison of how a current class or individual student compares to what is generally expected of student

performance on these items.

Discovery Education Assessment pioneered a unique approach to benchmark assessments using a

scientifically research-based continuous improvement model that maps diagnostic assessments to each

state’s high stakes test. Discovery Education’s Predictive Test-Specific Benchmark tests are aligned to

the content assessed by each state test allowing teachers to track student progress toward the standards

and objectives used for accountability purposes. This same predictive approach is being applied to state

tests as they transition to Common Core assessments.

Page | 5

Items

Discovery Education employs only certified, experienced teachers with content majors and master’s

degrees to align, create, and develop items; the content review, copy editing, and quality control

departments are also staffed by competent, qualified teachers with graduate degrees. Discovery Education

intentionally employs teachers with familiarity in varied subjects, age groups, and ability levels, which

gives the item development teams an impressive range of expertise. They work with psychometric staff to

review and systematically match items to Common Core standards. All items have appropriate

psychometric properties from field testing that permit accurate, valid, and reliable predictive tests. Our

commitment to rely on competent, seasoned educators throughout the entire development process assures

that items and tests are accurate, appropriate, and accessible.

Discovery Education content team begins the test development process with the state standards, test

blueprints and test specifications. We begin by matching our items to the Common Core standards that

are assessed.

No software tool can sufficiently match items to state standards. Curriculum experts must do this job one

item at a time. Discovery Education Assessment’s software facilitates the curriculum expert’s job of

aligning each item to a Common Core standard. This task is re-aligned every year, using prior year

student performance statistics to assure continuous alignment, reliability, and validity. All Common Core

standards are loaded into the Discovery Education Assessment tool, which allows our curriculum experts

to build state-specific tests. Item notes and field test data are available at the time of item selection for

each grade and subject test.

Bias Statistics

All Discovery Education assessments incorporate a systematic, official statistical bias analysis, using

Rasch analysis on gender, ethnicity, and differing abilities. Discovery Education Assessment is

committed to assuring students, teachers, and administrators that we are sensitive to and cognizant of the

need for assessments to be bias-free. While it is important to know that a test measures what it is

purported to measure, it is just as important to know what a test does not measure. Discovery Education

assessments are designed and reviewed to guard against culture or gender bias and to address issues of

disability.

Types of Items

Discovery Education assessments feature multiple choice questions that measure the maximum range of

cognitive skills in the content areas. Using multiple choice questions reduces the cost, test taking time,

and provides immediate results with diagnostic and predictive capabilities. However, we generally

include constructive response open response and performance tasks with the assessments that could be

manually scored and used by teachers in the classrooms for formative purposes. In addition, Discovery

Education services provide access to performance tasks and tools for locally created items.

Page | 6

Refreshing Item Pool

Discovery Education Assessment routinely works with schools to field test new benchmark items to

replenish our pool of available items. This approach assures that we continuously have field tested items

available. This improvement process also assists our test coordinators to receive specific feedback on

every item, thereby further increasing the reliability of each item included.

Quality Control

Quality control is a crucial aspect of Discovery Education Assessment’s approach to item construction.

The quality control process ensures that every component of a Discovery product is consistent and

accurate within and across printed or online versions of tests. The quality control department

determines that the online and printed versions of tests match exactly, that reports are accessible and

complete, and that scores on Discovery Education tests are correlated to the proficiency or mastery

specifications provided by the state. A key role of the quality control department involves testing the

functionality of the online interface and examining reports for accuracy.

Copy Editing & Proofreading

A vital part of the development team is the finalization team, responsible for ensuring the typographical

accuracy of all assessments. The copy editing personnel proofread the assessments after they are released

from the content review team. The editors correct any typographical or mechanical errors that appear in

the test, and they also look for errors in the layout and placement of graphics, instructions, page numbers,

or margins. Copy editors also perform a final examination of formatting to ensure that each test is

formatted to match the state’s high stakes tests’ formatting. When the assessment is proofread and

necessary changes are made, the test is converted into PDF format, then proofread again. If any errors still

exist, they are corrected before a copy of the final test is posted for printing. The proof from the printer is

then edited one final time. These multiple instances of thorough proofreading enable Discovery Education

to produce tests that are not only exemplary in content but also accurate in grammar and mechanics.

Overview of Benchmark Item Review Steps

Items are reviewed multiple times by experienced teachers, a psychometrician, a grammar

expert, and Director of Testing for spelling errors, errors in usage, and awkward phrasing.

All items are reviewed by at least 2 reviewers to have a single correct answer and appropriate

distracters.

All items are reviewed for grade appropriateness in content and readability, using Flesch-

Kincaid scale.

Items have field test and actual administration data to support reliability of grade level in terms

of difficulty and content validity.

Item Depth of Knowledge (DOK) is reviewed and displayed for educators on reports.

Page | 7

Where items are expected to include vocabulary “above grade level,” items are reviewed to

assure that context clues are on grade level.

In math items, required computations are reviewed to be appropriate for grade level and

appropriate to the time-constraints of the formative environment.

Items that require critical thinking skills are measured in terms of steps required and the

difficulty levels to provide an appropriate mix across the skill area.

Many items require graphics to support the thinking skill measurement. These items are

reviewed to assure that the graphics print and appear on the web with clarity, appropriate level

of detail, and appropriate grade level to measure the skill.

Graphic items are kept simple enough to assure web display within 2-3 seconds but complex

enough to measure the skill or accomplishment being tested.

Item answers are varied to distribute them randomly across answer options A, B, C, and D so

that no discernable pattern is possible for correct answers.

Web Alignment

Discovery Education contracted with an independent research team, Test Prep, led by Dr. Michael K.

Smith in April 2012 to conduct a Web alignment study of the Discovery Education interim benchmark

assessments with the Common Core State Standards. This process is repeated after any revisions are

made to the interim benchmark assessments.

The WAT (Web Alignment Tool) version 2 was used to measure categorical concurrence, depth-of-

knowledge consistency, range-of-knowledge correspondence and balance of representation. A summary

of the results are below. Results of this study are being used to make appropriate revisions to the 2012-

2013 interim benchmarks. A plan of action has already been set in place to increase depth-of-knowledge

consistency in the reading assessments and the categorical concurrence in the mathematics assessments.

For more details on this alignment study, please see Appendix B for the document Web Alignment Study

of Discovery Education Assessment Benchmarks with Common Core Standards.

Page | 8

Mathematics Tests Alignment Summary

YES WEAK NO TOTAL

# % # % # % #

Categorical

Concurrence 59 63% 34 37% 93

Depth-of-Knowledge

Consistency

93 100% 0 0% 0 0% 93

Range of Knowledge

90 97% 3 3% 0 0% 93

Balance of

Representation 91 98% 2 2% 0 0% 93

Reading Tests Alignment Summary

YES WEAK NO TOTAL

# % # % # % #

Categorical

Concurrence 58 82% 13 18% 71

Depth-of-Knowledge

Consistency 57 80% 8 11% 6 8% 71

Range of Knowledge

62 87% 9 13% 0 0% 71

Balance of

Representation 70 99% 1 1% 0 0% 71

Page | 9

III. Test Administration Online

To administer interim benchmark tests online an administrator must first import students to the Discovery

Education website (www.discoveryeducation.com). Once students are imported and populated into grade

pool and teacher classes, testing can begin at any time after the recommended window opens. For steps

on importing students, populating classes, and administering interim benchmarks online, please refer to

the Assessment User Guide. This document is located at: http://assessment.discoveryeducation.com/start

and on the help site at discoveryeducation.com.

http://www.discoveryeducation.com/

http://assessment.discoveryeducation.com/start

Page | 10

IV. Test and Item Scores and Test Reports

Test and Item Scores

Discovery Education Assessment reports the following item and test scores on its Interim Assessments.

Student Level Scores:

Test Percent Correct: The percent correct on an interim benchmark

Test Number Correct: The number correct on an interim benchmark

Reporting Category Percent Correct: The percent correct on a particular reporting category

Reporting Category Number Correct: The number correct on a particular reporting category

Vertical Scale Score: A scale score on a 1000 to 2000 scale

State Percentile: The percent of students that score lower than a particular scale

score. The state percentile is based on all students in a particular

state that completed an interim assessment

National Percentile: The percent of students that score lower than a particular scale

score. The national percentile is based on a stratified random

national sampling of students who completed an interim

assessment

Proficiency Prediction: A prediction of the level of student performance

Class, Grade, School, and District Level Scores:

Item Percent Correct: The percent of students in a class, grade, school or district that

answer an item correctly

Item Percent Incorrect: The percent of students in a class, grade, school, or district that

answer an item incorrectly

Test Mean Number Correct: The arithmetic mean (average) of the number correct on an

interim assessment for a class, grade, school, or district

Test Mean Percent Correct: The arithmetic mean (average) of the percent correct on an


Vertical Scale Mean: The arithmetic mean (average) of the vertical scale score on an


Page | 11

Proficiency Level Number: The number of students in each proficiency level on an interim

assessment for a class, grade, school, or district

Proficiency Percent: The percent of students in each proficiency level on an interim

assessment for a class, grade, school, or district

Median State Percentile: The middle value of all state percentiles for a school or district

Interim Assessment Reports

Discovery Education produces multiple standard report formats to coincide with each interim assessment,

as well as dynamic district reports. Interactive reports are linked to recommended digital remediation

selected from Discovery Education’s award-winning streaming. Discovery Education recognizes the

tremendous value in prompt, easy to read reports that allow all stakeholders, including students, parents,

teachers, administrators and district staff, to instantly determine how learning is progressing. While the

software and standard reports are not customizable, all reports allow easy export of data to CSV or

EXCEL file formats for flexible reporting or import into other reporting tools.

Discovery Education Assessment reports the following:

For Teachers:

Proficiency predictions by subject point to which students are at risk.

Predictions of proficiency levels within each Common Core standard define specifically

what to focus on in remediation.

Performance Indicator results viewed with the test items to define detailed gaps in student

thinking processes.

Growth of student performance across time and comparable across grade, school, and

district.

Additionally, digital instructional resources, targeted by Performance Predictor, which

teachers can assign.

For Administrators:

Summarized grade, school, and district learning status for each subject and standard by

proficiency level.

Growth of student performance across time and comparable across grade, school, and

district.

Identification of all student results by Common Core standards to examine possibilities

for professional development and need for new texts and resources.

Guided priority for instruction of demographic subgroups and examination of results of

special programs.

For community leaders, summarized status of each school for comparison purposes, with

information during the year similar to state assessment reports.

Page | 12

For Students and Parents:

Predictions of student proficiency by subject

Growth of individual performance across time and compared to school and district

performance

Detail of skills and subskills mastered or in need of remediation

Individual student responses by item

Access to engaging digital content aligned to targeted skills and objectives

Interim Assessment Example Reports

This section provides annotated examples of each of the following Interim Assessment Reports:

Class and Grade Skill Summary Report

Student Skill Report

Student Sub-Skill Report A and B

Item Summary Report

Answers Report

Individual Student Report

Drill Down Report

Comparison Report

School Comparison Report

Scale Comparison Report

Subgroup/Disaggregated Reports

Comparative Growth Reports

Page | 13

School Reports

Class and Grade Skill Summary Report

The Class Summary Report identifies performance by skill for entire class or grade. Using the red,

yellow, and green

stop light approach

(Common Core also

has blue),

proficiency is shown

for each standard

within reading, math,

and science. In this

example, the bar

chart displays the

percent of students

Level 1 (red), Level

2 (yellow), Level 3

(green), and Level 4

(blue) by each of the

reporting categories

for Reading. The

actual percentages

for each proficiency

level for each

reporting category are

given in the table below the bar chart.

Student Skill Report

The student skill report uses the same color-coded approach to plot individual student performance and

proficiency by skill. In this example, the level of performance on each skill for each student is presented.

Furthermore, in the two far left columns, the student overall proficiency level is presented together with

the number correct on the interim assessment.

Page | 14

Student Sub-skill Reports

This report displays performance on every Common Core sub-skill measured. Pale green means it is

correct, and the letter gives the student’s incorrect answer. This feature helps the teacher identify why the

student selected the wrong answer. Little teacher time is required to go through the whole set of reports.

Item Summary Report

The item summary report presents information on every question in an interim assessment. The summary

is tallied over a class or grade. For each question, the following information is provided: the correct

answer; the number and percent

correct; the number and

percent incorrect; the Common

Core reporting category (skill)

and reporting subcategory

(sub-skill); and the level of

difficulty of the question

(easy, medium, or hard). This

report is also available

interactively. This interactive

feature allows a user to sort by

any of the information above

and to search for resources tied

to the skill or sub-skill

questioned such as streaming

videos and quizzes.

Page | 15

Answer Report

The answer report provides each student’s specific response (ABCD) to each question on an interim

assessment. In the top row, the correct answers on the assessment are provided. Then, each student’s

individual choices (ABCD) on each question are listed. Summary information is provided on the right-

hand side: number correct, percent correct, state percentile rank, and vertical scale score. The student’s

overall subject proficiency is indicated by the highlighted color.

Page | 16

Individual Student Report

The individual student report summarizes scores on all interim assessments for an individual student. The

Overall Subject Summary (box to the right) displays the Number Correct, Total Questions in Test,

Percent Correct, Scale Score, and State Percentile for Test A, Test B, and Test C for this student. This

student has started at Level 2 (yellow) on Test A and has stayed consistently at that level on Test B and C.

A national percentile, based on Test B, is also included. The Growth chart on the left graphs the student’s

scores on the three assessments along with the school and district averages. Furthermore, the solid gray

line represents an End-of-Year Target score; to reach Level 3 at the end of the school year, a student

would need a scale score at or above this value. The Performance by Standard Summary table displays the

Proficiency level of this student by each of the five Mathematics standards. Finally, the student’s answers

to all questions are provided in the last table.

Page | 17

Drill-Down Report

District Administrators can compare schools. Both District and School Administrators can use a series of

drill down reports by grade, teacher, or student, and sort by proficiency prediction. They can also get

Microsoft Excel extractions of data, view comparisons across NCLB sub-group populations, and track

progress of all classes and schools.

Page | 18

Comparison Report

This report compares students across multiple testing periods and monitors student progress during the

year where they have recently concentrated instruction and where students have not retained learning

from earlier instruction.

Page | 19

Comparative Growth Report

This report helps teachers compare students with each other at the district or school level. This report will

be available when at least two benchmark tests have been completed. There are three sections to this

report.

Test 1 and Test 2 Regression

Each student is represented by a sphere.

A larger sphere indicates multiple

students with the same score on both

tests. In the teacher version, you can

mouse over the sphere to display the

student name.

The horizontal axis is the first test and

the vertical access is the second test. The

vertical scores shown on each axis is

defined by the range of each proficiency

level on that particular test. As a

comparison, the grid boxes with color

show all of the students who scored

within the same proficiency level on both

tests.

The heavy line across the chart shows the

regression line based on the students who

took these particular tests for either the

school or the district. A regression line

makes predictions of scores based on the

scores on a prior test. The dotted lines

represent the error bands of the

regression line. This is the lower and

upper bound values of the standard error

of estimate. For example, look at the chart

above. If a student scores a 1631 on test 1, we expect them to score somewhere approximately between

1611 and 1693 on test 2. A student scoring a 1499 on test 1, we only expect to score approximately

between 1479 and 1561 on the second test.

You can individually view the school line in red or the district line in blue on this section of the report.

You also have the option of viewing them together at the same time; just select how you would like to

view in the “Select Report Criteria” section. If students are outside of the dotted lines, they have scored

higher or lower than the expected range on the second test based on how they scored on the first test.

The students’ names shown as small red spheres scored below their expected vertical score range on the

second test. The ones shown as small green spheres scored above their expected vertical score range on

the second test. All of the gray spheres represent students with a vertical score in their expected range.

If both lines are shown, students will be compared to the district line.

Page | 20

Test 1 to Test 2 Distribution of Change

This bar chart allows you to see

the distribution of scores from

the regression line. The higher

above or below the 0 line, the

farther away that score is from

the regression line. This is

calculated by the actual score

on test 2 minus the expected

score on test 2.

This chart also displays the

standard error of the estimate

(SEE). The standard error of

estimate is a measure of the

average distance from the

regression line, or the accuracy of the predictions. Notice that those scores falling above or below the

SEE lines are the ones displayed as red or green. These are the scores identified as above or below

average.

Ranked Order of Change

All of the students’ names are listed and organized in three groups: Below Average Students, Above

Average Students and Average Students. This

section displays the scores on the first and

second test, color coded to display which

proficiency level their vertical score is

associated with. Students appear in ranked

order according to their residual scores, that is,

the difference between their actual test 2 scores

and their expected test 2 scores.

Page | 21

District Reports

School Comparison Report

This report summarizes

district performance by

proficiency level on a

specific interim assessment

together with performance

on the reporting categories

(skills) that comprise that

assessment. This report is

broken down by school to

show comparisons across the

district. This graph presents

an example of a Grade 3

Reading interim assessment.

Scale Comparison Report

The scale comparison report is a table that displays the average scale scores for each of the tests and the

average change for the whole district and each individual school. It is broken down by grades and

subjects.

Page | 22

Detail Export

The detail export is a .csv or .xls report with student level data. This report has all the different data

points for each test the student has taken including overall subject and skill proficiency levels, number &

percent correct, state & national percentiles and vertical scores.

Page | 23

V. Reliability, Proficiency Predictions, Validity and Growth

Test Reliability

Test reliability provides evidence that test questions are consistently measuring a given construct, such

as mathematics ability or reading comprehension. Furthermore, high test reliability indicates that the

measurement error for a test is low. Reliabilities are calculated using Cronbach’s alpha.

The following tables present test reliabilities and sample sizes for Discovery Education Assessments for

three time periods—Fall (Test A), Winter (Test B), and early Spring (Test C))—in the subject areas of

Reading and Mathematics for 2012-2013.

The median reading reliability was .85, with median sample sizes of 52,628. The median mathematics

reliabilities for the three time periods was .79 with a median sample size of 49,014.

2012-2013 Common Core Sample & Reliability Coefficients: Reading

Test A Test B Test C

N Reliability N Reliability N Reliability

Kindergarten 44,925 0.71 44,679 0.66 35,555 0.75

Grade 1 83,140 0.70 73,166 0.77 64,948 0.81

Grade 2 77,548 0.84 77,615 0.85 68,531 0.84

Grade 3 55,568 0.87 52,628 0.85 44,207 0.85

Grade 4 59,278 0.85 54,144 0.83 45,864 0.85

Grade 5 59,212 0.87 54,429 0.84 44,982 0.84

Grade 6 58,362 0.88 53,771 0.83 45,156 0.85

Grade 7 56,219 0.84 52,001 0.84 42,548 0.88

Grade 8 55,617 0.86 50,863 0.88 41,699 0.88

English 1 16,159 0.83 11,705 0.86

English 2 11,587 0.88 7,855 0.85

Page | 24

2012-2013 Common Core Sample & Reliability Coefficients: Math


N Reliability N Reliability N Reliability

Kindergarten 43,966 0.68 45,279 0.79 34,915 0.82

Grade 1 82,368 0.69 73,745 0.70 64,266 0.79

Grade 2 77,178 0.73 78,542 0.82 67,712 0.83

Grade 3 55,648 0.79 53,792 0.81 44,294 0.83

Grade 4 59,500 0.78 54,446 0.78 45,500 0.81

Grade 5 59,464 0.83 55,300 0.79 45,134 0.83

Grade 6 49,014 0.78 54,176 0.77 44,546 0.81

Grade 7 56,724 0.77 52,205 0.80 42,573 0.80

Grade 8 54,050 0.77 48,688 0.80 38,717 0.80

Algebra 1 11,525 0.72 8,848 0.77

Algebra 2 12,026 0.71 8,925 0.66

Geometry 10,529 0.50 7,148 0.63

Page | 25

0%

5%

10%

15%

20%

25%

30%

Level 1 Level 2 Level 3 Level 4

% o

f St

ud

en

ts

Discovery Education Assessment: 20/30/30/20 Model

Proficiency Predictions

Discovery Education Assessment provides each student with a prediction of proficiency status on each of

their benchmark assessments. These interim benchmarks are designed to measure skills tested by the

future PARCC and SBAC summative assessments but in a more time-limited format. The results of these

benchmarks are intended to provide teachers, administrators, and students with reliable and valid

predictors of student performance. Discovery Education Assessment uses the Continue-to-Learn model

for providing proficiency predictions. The Continue-to-Learn model differs from an Absolute model in

that it does not give a prediction based on if the student took the state test at the same time. Instead, this

gives a prediction based on that student continuing to learn throughout the rest of the year up to the time

of their state test. With this model, there are fairly stable distributions of proficiency levels throughout

the year. The Continue-to-Learn model helps teachers identify at the beginning of the year those students

who need the most instruction and in what area that instruction is needed. With an Absolute model, a

large percentage of students would score as Not Proficient in the beginning of the year because many of

the skills have not yet been taught.

Discovery assessments feature four performance levels. There are no state proficiency levels to predict so

we must rely on Discovery created criterion referenced cut scores. A large representative sample of scores

on the Common Core assessments will be separated into four performance groups or levels. Twenty

percent of scores are identified for Level 1, thirty percent for Level 2, thirty percent for Level 3, and

twenty percent for Level 4.

Blue (Level 4) indicates a performance

level achieved by the top twenty percent of

students on the Common Core assessment.

Students at this level may be ready to learn more

advanced standards or to broaden their

knowledge of the grade level standards.

Green (Level 3) indicates a good

performance level, at or slightly above grade

level.

Yellow (Level 2) is tied to a performance

level at or slightly below grade level.

Red (Level 1) suggests a poor

performance level. The lowest fifth of all test

scores are at this level. The student with “Red”

scores may need significant support to achieve

the grade level standards specified by the

Kentucky Department of Education. Some

schools will consider these students for additional

assessments and Tier 2 or Tier 3 instructional

strategies under an RTI model.

Page | 26

Validity

Area Under a ROC Curve

Area under a ROC curve is a measure of the discrimination of a test, or the ability of an assessment to

correctly classify those students as at-risk. AUC values above .90 are excellent, between .80-.90 are

good, and between .70-.80 are fair. During the Fall of 2012, Discovery Education conducted a ROC

analysis on the Common Core interim assessments from the 2011-2012 school year. AUC values ranged

from .77-.92, with a median value of .83.

2011-2012 Area Under the Curve: Common Core


Area Std. Error Area Std. Error Area

Std.

Error

Reading

3 0.81 0.03 0.80 0.03 0.83 0.02

4 0.80 0.03 0.84 0.02 0.82 0.03

5 0.82 0.03 0.81 0.03 0.80 0.03

6 0.84 0.02 0.85 0.02 0.85 0.02

7 0.85 0.02 0.83 0.03 0.83 0.03

8 0.87 0.02 0.86 0.02 0.85 0.03

Math

3 0.77 0.03 0.86 0.02 0.88 0.02

4 0.82 0.03 0.80 0.03 0.77 0.03

5 0.78 0.03 0.83 0.03 0.84 0.03

6 0.82 0.03 0.80 0.03 0.82 0.03

7 0.92 0.02 0.90 0.02 0.91 0.02

8 0.89 0.02 0.89 0.02 0.90 0.02

Predictive and Concurrent Validity

Predictive and concurrent validity are criterion-related validity methods that measure the correlation

between a test and pre-validated assessment. In the fall of 2012, Discovery Education partnered with a

district from the Commonwealth of Kentucky, a state that implemented a fully aligned Common Core

summative assessment in the spring of 2012. The school district of 12 schools provided their student-

level summative assessment data to Discovery Education for analyses. Correlations between state

summative scale scores and scale scores from the Discovery 2011-2012 fall and winter (predictive)

assessments and the 2011-2012 spring (concurrent) assessments were calculated. Reading predictive

validities ranged from .59-.72 while concurrent validities ranged from .62-.71, all were significant (p <

.01). Math predictive validities ranged from .56-.72 while concurrent validities ranged from .49-.76, all

were significant (p < .01).

Page | 27

Predictive and Concurrent Validity Statistics

Reading

Grade Predictive (A) Predictive (B) Concurrent (C )

3 Correlation 0.59 0.60 0.68

N 325 326 323

4 Correlation 0.66 0.69 0.67

N 321 319 298

5 Correlation 0.66 0.64 0.62

N 284 283 283

6 Correlation 0.72 0.69 0.71

N 320 318 321

7 Correlation 0.64 0.62 0.62

N 307 306 302

8 Correlation 0.72 0.71 0.69

N 274 269 277

Math

3 Correlation 0.58 0.68 0.76

N 326 322 323

4 Correlation 0.71 0.63 0.49

N 319 320 313

5 Correlation 0.56 0.62 0.68

N 284 283 273

6 Correlation 0.65 0.63 0.66

N 325 314 321

7 Correlation 0.72 0.67 0.65

N 307 310 306

8 Correlation 0.71 0.70 0.73

N 276 276 268

All correlations significant at the 0.01 level (2-tailed).

Proficiency Prediction Score

The Proficiency Prediction Score is used to determine the accuracy of predicted proficiency status. Under

the NCLB legislation, it is important that states and school districts help students progress from a “Not

Proficient” status to one of “Proficient”. The Proficiency Prediction Score is based on the percentage of

correct proficiency classifications (Not Proficient/Proficient). If a state uses two or more classifications

for “Proficient” (such as “Proficient” and “Advanced”), the percentage of students in these two or more

categories would be added together. Also, if a state uses two or more categories for “Not Proficient” (such

as “Below Basic” and “Basic”), the percentage of students in these two or more categories would be

added together. To see how to use this score, let’s assume a school district had the following data based

on its annual state test and a Discovery Education Assessment Spring benchmark assessment. Let’s use

data from a Grade 4 Mathematics Test as an example:

Page | 28

Predicted Percent Proficient or higher = 70%

Actual Percent Proficient or higher on the State Test = 80%

The error rate for these predictions is as follows:

Error Rate = /Actual Percent Proficient - Predicted Percent Proficient/

Error Rate = 80% - 70% = 10%

In this example, Discovery Education Assessment under predicted the percent of students proficient by

10%. The absolute value (the symbols / / ) of the error rate is used to account for cases where Discovery

Education Assessment over predicts the percent of students proficient and the calculation is negative (e.g.,

Actual - Predicted = 70% - 80% = -10%; absolute value is 10%).

The Proficiency Prediction Score is calculated as follows:

Proficiency Prediction Score = 100% - Error Rate

In this example, the score is as follows:

Proficiency Prediction Score = 100% - 10% = 90%

A higher Proficiency Prediction Score indicates a larger number or percentage of correct proficiency

predictions. In this example, Discovery Education Assessment had a score of 90%. Discovery Education

Assessment uses information from these scores to improve its benchmark assessments every year.

Discovery Education Assessment Proficiency Predictions vs. NM SBA Proficiency Levels

In the 2010-2011 school year, districts in New Mexico used the Discovery Education Common Core

interim benchmark assessments to predict performance on the NM Standards Based Assessments (SBA).

The following tables display the DEA percent Proficient on the Common Core interim benchmark

assessments, percent Proficient on the New Mexico SBA, the difference between the two and the

proficiency prediction score. The median reading proficiency prediction score was 98.1 while the median

math proficiency prediction score was 95.8.

Validity: New Mexico Reading Proficiency Prediction Scores from

2010-2011

DEA CC NM SBA Difference Prediction Score

Grade 3 56.5 57.3 0.8 99.2

Grade 4 51.8 51.4 0.4 99.6

Grade 5 58.0 59.0 1.0 99.0

Grade 6 54.9 39.6 15.3 84.7

Grade 7 40.6 49.6 9.0 91.0

Grade 8 57.5 60.4 2.9 97.1

Median 2.0 98.1

Page | 29

Validity: New Mexico Math Proficiency Prediction Scores from

2010-2011

DEA CC NM SBA Difference Prediction Score

Grade 3 52.8 58.2 5.4 94.6

Grade 4 49.1 45.3 3.8 96.2

Grade 5 44.4 45.3 0.9 99.1

Grade 6 54.6 34.6 20.0 80.0

Grade 7 38.3 33.6 4.7 95.3

Grade 8 39.4 39.3 0.1 99.9

Median 4.3 95.8

0.0

10.0

20.0

30.0

40.0

50.0

60.0

70.0

Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8

New Mexico Reading % Proficient:

DEA Common Core & SBA

DEA CC

NM SBA

0

10

20

30

40

50

60

70

Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8

New Mexico Math % Proficient:

DEA Common Core & SBA

DEA CC

NM SBA

Page | 30

Vertical Scale Averages and Growth

Growth models depend on a highly rigorous and valid vertical scale to measure student performance

over time. Discovery Education Assessment vertical scales are constructed using Rasch measurement

models with state-of-the-art psychometric techniques.

The accurate measurement of student achievement over time is becoming increasingly important to

parents, teachers, and school administrators. Student “growth” within a grade and across grades has

also been sanctioned by the U. S. Department of Education as a reliable way to measure student

proficiency in Reading and Mathematics and to satisfy the requirements of Adequate Yearly Progress

(AYP) under the No Child Left Behind Act. Accurate measurement and recording of individual student

achievement can also help with issues of student mobility: as students move within a district or state,

records of individual student achievement can help new schools administer to the needs of this mobile

population.

The assessment of student achievement over time is even more important with the use of benchmarks

tests. Discovery Education Assessment Benchmark tests provide a snapshot of student progress toward

state standards at up to four points during the school year. These benchmark tests are scientifically linked,

so that the reporting of student proficiency levels is both reliable and valid.

Discovery Education Assessment has added a scientifically based vertical scaled growth score to its

family of benchmark tests in 2007-08. These growth scores are based on the Rasch measurement model, a

state-of-the-art psychometric technique for scaling ability (e.g., Wright & Stone, 1979; Wright & Masters,

1982; Linacre 1999; Smith & Smith, 2004; Wilson, 2005). To accomplish vertical scaling, common items

are embedded across assessments to enable the psychometric linking of tests at different points in time.

For example, a Grade 3 mathematics benchmark test administered mid-year might contain below grade

level and above grade level items. Performance on these off grade level items provides an accurate

measurement of how much growth occurs across grades. Furthermore, benchmark tests within a grade are

also linked with common items, once again to assess change at different points in time within a grade.

Discovery Education Assessment is using established psychometric procedures to build calibrated item

banks and linked tests (i.e., Ingebo, 1997; Kolen & Brennan, 2004).

Isn’t student growth similar across grades? Don’t students change as much from Grade 3 to Grade 4 as

they do from Grade 7 to Grade 8? Previous research on the use of vertical scales has demonstrated that

student growth is not linear; that is, growth in student achievement is different from grade to grade (see

Young 2006). For instance, the figure on the next page shows preliminary Discovery Education

Assessment vertically scaled growth results. This graph shows growth from Grades Kindergarten to 10 in

Mathematics as measured by Discovery Education Assessment’s Spring benchmark tests. Typically,

students have larger gains in mathematics achievement in elementary grades with growth somewhat

slowing in middle and high school, as published by other major testing companies.

Page | 31

Discovery Within Year Growth for 4th Grade Math

1400

1420

1440

1460

1480

1500

1520

Test P Test A Test B Test C

Avera

ge S

cale

Sco

re

Math

Student growth can now be accurately measured at four points in time in each grade level. Discovery

Education Assessment benchmark tests are administered up to four times yearly: Early Fall, Late Fall,

Winter, and Spring. For each time period, we report scale scores and accompanying statistics. Most

testing companies only allow the measurement of student growth at two points in time: Fall and Spring.

Discovery Education Assessment benchmark tests provide normative information to assess student

growth multiple times each year. The figure to the right illustrates this growth for Grade 4 Mathematics

using our benchmark assessments.

National Math Scale Score Averages

1200

1300

1400

1500

1600

1700

Kinde

rgar

ten

Gra

de 1

Gra

de 2

Gra

de 3

Gra

de 4

Gra

de 5

Gra

de 6

Gra

de 7

Gra

de 8

Gra

de 9

Gra

de 1

0

Avg

Stu

den

t S

cale

Sco

re

Page | 32

Common Core 2012-2013 Average Vertical Scale Scores


N

Average

Scale

Score St. Dev. N

Average

Scale

Score St. Dev. N

Average

Scale

Score St. Dev.

Kindergarten 44,925 1218 61.98 44,679 1247 55.55 35,555 1250 66.77

Grade 1 83,140 1253 59.78 73,166 1290 70.64 64,948 1346 76.62

Grade 2 77,548 1349 78.53 77,615 1412 80.59 68,531 1406 78.58

Grade 3 55,568 1385 82.23 52,628 1426 74.22 44,207 1451 79.56

Grade 4 59,278 1453 77.14 54,144 1485 71.82 45,864 1529 80.70

Grade 5 59,212 1507 82.33 54,429 1516 76.67 44,982 1547 74.77

Grade 6 58,362 1529 90.53 53,771 1593 73.66 45,156 1586 77.34

Grade 7 56,219 1556 75.23 52,001 1600 77.06 42,548 1590 92.07

Grade 8 55,617 1599 82.42 50,863 1611 89.74 45,030 1630 94.05

English 1 16,159 1607 73.79 11,705 1662 83.62

English 2 11,587 1644 94.18 7,855 1635 82.44


N

Average

Scale

Score St. Dev. N

Average

Scale

Score St. Dev. N

Average

Scale

Score St. Dev.

Kindergarten 43,966 1144 61.40 45,279 1213 77.85 34,915 1223 87.49

Grade 1 82,368 1221 62.22 73,745 1301 60.85 64,266 1293 77.93

Grade 2 77,178 1297 61.35 78,542 1342 78.70 67,712 1401 87.35

Grade 3 55,648 1354 68.15 53,792 1404 72.56 44,294 1463 78.59

Grade 4 59,500 1443 62.77 54,446 1465 62.99 45,500 1520 72.04

Grade 5 59,464 1508 71.69 55,300 1541 66.92 45,134 1566 73.94

Grade 6 58,190 1544 65.97 54,176 1556 64.63 44,546 1580 72.40

Grade 7 56,724 1577 62.38 52,205 1594 69.13 42,573 1618 69.06

Grade 8 54,050 1577 61.76 48,550 1631 68.66 38,717 1637 68.07

Algebra 1 11,525 1618 57.72 8,848 1617 58.98

Algebra 2 12,026 1644 60.48 8,925 1634 54.06

Geometry 10,529 1641 42.81 7,148 1636 47.52

Page | 33

1200

1400

1600

1800Te

st A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

A

Test

B

K 1 2 3 4 5 6 7 8 E1 E2

Grade

Comon Core 1213 Reading Avg Scale Scores

1100

1300

1500

1700

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

C

Test

A

Test

B

Test

A

Test

B

Test

A

Test

B

K 1 2 3 4 5 6 7 8 A1 A1 GE

Grade

Common Core 1213 Math Avg Scale Scores

Page | 34

Appendix A: Test and Question Statistics, Reliability, and Scale Scores

The following section reports test and question statistics, reliability, and percentiles for the benchmark

tests, for grades 3-8, Reading and Mathematics. These benchmark tests were administered during the fall

of 2011-2012. Benchmark tests are revised each year based on test and question statistics.

Number of Students: Number of students used for calculation of test statistics.

Number of Items: Number of items in each benchmark test (including common items used

for scaling purposes).

Mean: Test mean in terms of number correct.

Standard Deviation: Test standard deviation.

Reliability: Cronbach’s alpha.

SEM: Standard Error of Measurement (SEM) for the test.

Scale Score:

Discovery Education Assessment Scale Score for each number correct

(Scale scores are vertically scaled using Rasch measurement. Scale scores

from grades K-12 range from 1000 to 2000).

Level The DEA proficiency level (Level 1 – Level 4) assigned to the student

based on the number of items correct on the assessment.

Question P-values: The proportion correct for each item.

Biserial: Item discrimination using biserial correlation.

Rasch Item Difficulty: Rasch item difficulty parameter calculated using WINSTEPS.

DIF Gender: Rasch item difficulty difference (Male vs. Female).

DIF Ethnicity: Rasch item difficulty difference (White vs. Black).

DIF Size

Negligible: 0 logits to .42 logits (absolute value).

Moderate: .43 logits to .63 logits (absolute value).

Large: .64 logits and up (absolute value).

(see p.1070 “An Adjustment for Sample Size in DIF Analysis”, Rasch Measurement Transactions, 20:3,

Winter 2006)

Technical Data

Common Core Fall 2011-2012 Reading Grade 3

Test Statistics

Number of Students 13,163

Number of Items 34

Average Number Correct 19.48

Std. Deviation 6.74

Avg. Scale Score 1401

Reliability 0.86

Std. Error of Measurement 2.52

Question Statistics

Scale Scores & Percentiles

Item No. P-Value Biserial

Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.49 0.36 0.43 0.02 0.16

0 1000 Level 1

2 0.46 0.31 0.59 0.04 0.21

1 1086 Level 1

3 0.78 0.48 -1.12 0.13 0.23

2 1142 Level 1

4 0.78 0.46 -1.13 0.07 0.19

3 1177 Level 1

5 0.8 0.43 -1.30 0.18 0.03

4 1203 Level 1

6 0.71 0.52 -0.72 0.14 0.13

5 1223 Level 1

7 0.48 0.43 0.49 0.04 0.43

6 1241 Level 1

8 0.76 0.51 -0.98 0.19 0.25

7 1257 Level 1

9 0.65 0.35 -0.39 0.11 0.07

8 1271 Level 1

10 0.5 0.39 0.40 0.06 0.35

9 1285 Level 1

11 0.77 0.5 -1.05 0.08 0.21

10 1297 Level 1

12 0.74 0.5 -0.89 0.16 0.19

11 1309 Level 1

13 0.61 0.47 -0.18 0.20 0.08

12 1320 Level 1

14 0.48 0.46 0.48 0.05 0.06

13 1331 Level 2

15 0.65 0.49 -0.36 0.07 0.25

14 1341 Level 2

16 0.54 0.41 0.18 0.22 0.05

15 1352 Level 2

17 0.68 0.53 -0.53 0.04 0.02

16 1362 Level 2

18 0.53 0.46 0.23 0.03 0.05

17 1372 Level 2

19 0.49 0.39 0.46 0.17 0.00

18 1382 Level 2

20 0.55 0.44 0.15 0.05 0.14

19 1392 Level 2

21 0.36 0.2 1.11 0.15 0.66

20 1402 Level 3

22 0.41 0.25 0.87 0.02 0.32

21 1412 Level 3

23 0.35 0.3 1.18 0.07 0.19

22 1423 Level 3

24 0.56 0.47 0.10 0.03 0.03

23 1434 Level 3

25 0.86 0.36 -1.76 0.04 0.08

24 1445 Level 3

26 0.47 0.42 0.54 0.01 0.06

25 1458 Level 3

27 0.5 0.44 0.38 0.13 0.05

26 1470 Level 4

28 0.63 0.47 -0.27 0.11 0.14

27 1484 Level 4

29 0.33 0.25 1.25 0.30 0.21

28 1500 Level 4

30 0.61 0.51 -0.16 0.16 0.06

29 1517 Level 4

31 0.63 0.5 -0.24 0.01 0.15

30 1537 Level 4

32 0.64 0.48 -0.32 0.16 0.11

31 1562 Level 4

33 0.39 0.39 0.95 0.08 0.19

32 1596 Level 4

34 0.27 0.17 1.63 0.11 0.70

33 1652 Level 4

34 1744 Level 4

Technical Data

Common Core Fall 2011-2012

Math Grade 3

Test Statistics


Number of Items 32


Std. Deviation 5.94


Reliability 0.82


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.73 0.4 -1.29 0.02 0.13

0 1000 Level 1

2 0.51 0.4 -0.18 0.33 0.17

1 1092 Level 1

3 0.52 0.38 -0.24 0.00 0.05

2 1148 Level 1

4 0.23 0.17 1.35 0.11 0.31

3 1182 Level 1

5 0.52 0.36 -0.23 0.08 0.13

4 1208 Level 1

6 0.3 0.38 0.90 0.27 0.18

5 1229 Level 1

7 0.68 0.3 -0.99 0.17 0.23

6 1247 Level 1

8 0.69 0.41 -1.07 0.12 0.03

7 1263 Level 1

9 0.58 0.34 -0.51 0.16 0.08

8 1277 Level 1

10 0.51 0.42 -0.15 0.03 0.01

9 1291 Level 1

11 0.73 0.29 -1.30 0.07 0.04

10 1303 Level 2

12 0.64 0.51 -0.79 0.15 0.52

11 1316 Level 2

13 0.55 0.45 -0.38 0.16 0.09

12 1327 Level 2

14 0.48 0.39 -0.03 0.23 0.15

13 1338 Level 2

15 0.25 0.38 1.20 0.13 0.03

14 1349 Level 2

16 0.56 0.49 -0.42 0.03 0.39

15 1360 Level 3

17 0.53 0.34 -0.28 0.12 0.02

16 1371 Level 3

18 0.38 0.44 0.46 0.10 0.27

17 1382 Level 3

19 0.35 0.43 0.63 0.31 0.22

18 1393 Level 3

20 0.28 0.37 1.02 0.29 0.24

19 1404 Level 3

21 0.27 0.37 1.04 0.05 0.07

20 1415 Level 4

22 0.42 0.34 0.27 0.35 0.20

21 1427 Level 4

23 0.67 0.42 -0.97 0.18 0.17

22 1439 Level 4

24 0.46 0.42 0.08 0.16 0.00

23 1451 Level 4

25 0.3 0.42 0.87 0.13 0.10

24 1465 Level 4

26 0.29 0.46 0.96 0.12 0.16

25 1479 Level 4

27 0.6 0.47 -0.60 0.03 0.16

26 1495 Level 4

28 0.37 0.28 0.50 0.09 0.22

27 1513 Level 4

29 0.33 0.5 0.70 0.02 0.26

28 1534 Level 4

30 0.5 0.29 -0.11 0.25 0.26

29 1559 Level 4

31 0.69 0.37 -1.04 0.09 0.14

30 1593 Level 4

32 0.35 0.31 0.62 0.11 0.31

31 1649 Level 4

32 1742 Level 4

Technical Data


Reading Grade 4

Test Statistics


Number of Items 34


Std. Deviation 6.31


Reliability 0.83


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.62 0.50 -0.53 0.16 0.33

0 1080 Level 1

2 0.80 0.47 -1.55 0.15 0.17

1 1173 Level 1

3 0.49 0.36 0.11 0.17 0.13

2 1228 Level 1

4 0.67 0.40 -0.77 0.21 0.11

3 1262 Level 1

5 0.65 0.44 -0.66 0.20 0.01

4 1287 Level 1

6 0.36 0.37 0.75 0.11 0.13

5 1308 Level 1

7 0.60 0.28 -0.40 0.17 0.22

6 1325 Level 1

8 0.44 0.33 0.37 0.16 0.01

7 1341 Level 1

9 0.60 0.45 -0.42 0.06 0.17

8 1355 Level 1

10 0.44 0.46 0.35 0.09 0.11

9 1368 Level 1

11 0.68 0.49 -0.80 0.06 0.12

10 1380 Level 1

12 0.52 0.43 -0.01 0.30 0.05

11 1391 Level 2

13 0.58 0.41 -0.30 0.03 0.11

12 1402 Level 2

14 0.57 0.40 -0.25 0.05 0.11

13 1413 Level 2

15 0.19 0.17 1.79 0.33 0.48

14 1423 Level 2

16 0.34 0.30 0.87 0.06 0.04

15 1433 Level 2

17 0.32 0.32 0.98 0.08 0.11

16 1443 Level 2

18 0.51 0.41 0.04 0.06 0.02

17 1453 Level 3

19 0.42 0.40 0.44 0.06 0.15

18 1463 Level 3

20 0.62 0.43 -0.48 0.21 0.04

19 1473 Level 3

21 0.63 0.43 -0.55 0.31 0.29

20 1484 Level 3

22 0.35 0.42 0.80 0.30 0.14

21 1494 Level 3

23 0.73 0.46 -1.07 0.26 0.05

22 1505 Level 3

24 0.66 0.45 -0.68 0.02 0.21

23 1516 Level 3

25 0.57 0.42 -0.27 0.09 0.13

24 1527 Level 4

26 0.40 0.42 0.54 0.11 0.03

25 1540 Level 4

27 0.54 0.43 -0.12 0.25 0.17

26 1553 Level 4

28 0.51 0.35 0.05 0.01 0.04

27 1567 Level 4

29 0.49 0.46 0.14 0.16 0.28

28 1582 Level 4

30 0.40 0.22 0.55 0.07 0.43

29 1600 Level 4

31 0.56 0.35 -0.20 0.02 0.03

30 1621 Level 4

32 0.32 0.14 0.99 0.05 0.22

31 1646 Level 4

33 0.67 0.47 -0.74 0.13 0.08

32 1680 Level 4

34 0.30 0.18 1.06 0.22 0.13

33 1736 Level 4

34 1829 Level 4

Technical Data


Math Grade 4

Test Statistics


Number of Items 34


Std. Deviation 5.52


Reliability 0.78


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.71 0.35 -0.9 0.18 0.24

0 1061 Level 1

2 0.64 0.42 -0.5 0.1 0.24

1 1155 Level 1

3 0.71 0.45 -0.89 0.12 0.28

2 1212 Level 1

4 0.53 0.46 0.02 0.06 0.14

3 1247 Level 1

5 0.24 0.12 1.45 0.03 0.5

4 1273 Level 1

6 0.69 0.27 -0.76 0.07 0.33

5 1294 Level 1

7 0.43 0.43 0.46 0.15 0.18

6 1313 Level 1

8 0.82 0.31 -1.61 0.03 0.06

7 1329 Level 1

9 0.73 0.48 -1.01 0.24 0.53

8 1343 Level 1

10 0.56 0.36 -0.11 0.18 0.33

9 1357 Level 1

11 0.46 0.21 0.33 0.07 0.14

10 1370 Level 1

12 0.73 0.35 -0.97 0.01 0.11

11 1382 Level 1

13 0.55 0.33 -0.09 0.15 0.23

12 1393 Level 1

14 0.64 0.43 -0.51 0.23 0.36

13 1405 Level 2

15 0.35 0.27 0.88 0.17 0.38

14 1416 Level 2

16 0.51 0.29 0.08 0.09 0.45

15 1426 Level 2

17 0.51 0.43 0.09 0.18 0.15

16 1437 Level 2

18 0.60 0.45 -0.3 0.03 0.02

17 1447 Level 2

19 0.55 0.48 -0.09 0.24 0.39

18 1458 Level 3

20 0.61 0.42 -0.35 0.24 0.73

19 1469 Level 3

21 0.87 0.38 -2.03 0.45 0.13

20 1479 Level 3

22 0.45 0.30 0.36 0.01 0.03

21 1490 Level 3

23 0.53 0.30 0.02 0.06 0.44

22 1502 Level 3

24 0.35 0.28 0.87 0.16 0.22

23 1513 Level 4

25 0.62 0.40 -0.4 0.19 0.28

24 1526 Level 4

26 0.78 0.46 -1.32 0.15 0.43

25 1538 Level 4

27 0.61 0.51 -0.39 0.2 0

26 1552 Level 4

28 0.52 0.37 0.03 0.24 0.37

27 1567 Level 4

29 0.13 0.20 2.31 0.07 0.61

28 1583 Level 4

30 0.40 0.32 0.61 0.22 0.31

29 1602 Level 4

31 0.24 0.14 1.46 0.02 0.51

30 1623 Level 4

32 0.23 0.13 1.55 0.29 0.5

31 1649 Level 4

33 0.35 0.35 0.88 0.1 0.16

32 1685 Level 4

34 0.35 0.18 0.85 0.05 0.55

33 1741 Level 4

34 1835 Level 4

Technical Data


Reading Grade 5

Test Statistics


Number of Items 34


Std. Deviation 6.90


Reliability 0.87


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.89 0.36 -1.91 0.03 0.14

0 1097 Level 1

2 0.80 0.37 -1.16 0.05 0.16

1 1190 Level 1

3 0.69 0.36 -0.44 0.03 0.19

2 1246 Level 1

4 0.65 0.37 -0.17 0.15 0.27

3 1280 Level 1

5 0.64 0.39 -0.16 0.03 0.29

4 1306 Level 1

6 0.74 0.50 -0.73 0.14 0.22

5 1327 Level 1

7 0.71 0.47 -0.51 0.00 0.04

6 1344 Level 1

8 0.61 0.51 0.00 0.14 0.14

7 1360 Level 1

9 0.55 0.38 0.32 0.17 0.08

8 1374 Level 1

10 0.40 0.42 1.09 0.07 0.15

9 1387 Level 1

11 0.64 0.36 -0.15 0.29 0.08

10 1399 Level 1

12 0.45 0.38 0.86 0.34 0.06

11 1411 Level 1

13 0.64 0.45 -0.14 0.00 0.07

12 1422 Level 1

14 0.60 0.42 0.05 0.07 0.07

13 1433 Level 1

15 0.59 0.42 0.11 0.12 0.10

14 1443 Level 2

16 0.71 0.52 -0.52 0.23 0.00

15 1453 Level 2

17 0.50 0.39 0.59 0.28 0.22

16 1463 Level 2

18 0.77 0.49 -0.91 0.17 0.04

17 1473 Level 2

19 0.51 0.48 0.55 0.14 0.22

18 1483 Level 2

20 0.34 0.36 1.44 0.35 0.04

19 1493 Level 2

21 0.65 0.48 -0.20 0.03 0.11

20 1504 Level 2

22 0.70 0.53 -0.49 0.03 0.01

21 1514 Level 2

23 0.80 0.49 -1.09 0.31 0.05

22 1525 Level 3

24 0.36 0.41 1.31 0.07 0.02

23 1536 Level 3

25 0.64 0.39 -0.12 0.05 0.24

24 1547 Level 3

26 0.70 0.56 -0.44 0.24 0.32

25 1559 Level 3

27 0.51 0.48 0.52 0.00 0.19

26 1572 Level 3

28 0.52 0.37 0.47 0.31 0.34

27 1586 Level 4

29 0.67 0.48 -0.28 0.32 0.09

28 1602 Level 4

30 0.73 0.43 -0.64 0.05 0.03

29 1619 Level 4

31 0.69 0.50 -0.40 0.24 0.02

30 1640 Level 4

32 0.35 0.37 1.35 0.17 0.04

31 1665 Level 4

33 0.27 0.19 1.83 0.07 0.23

32 1699 Level 4

34 0.62 0.48 -0.04 0.54 0.01

33 1755 Level 4

34 1848 Level 4

Technical Data


Math Grade 5

Test Statistics


Number of Items 34


Std. Deviation 6.11


Reliability 0.82


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.57 0.27 -0.46 0.18 0.09

0 1147 Level 1

2 0.60 0.40 -0.61 0.28 0.06

1 1240 Level 1

3 0.62 0.38 -0.72 0.22 0.05

2 1296 Level 1

4 0.35 0.37 0.59 0.21 0.07

3 1331 Level 1

5 0.37 0.34 0.51 0.08 0.04

4 1356 Level 1

6 0.59 0.32 -0.56 0.06 0.16

5 1377 Level 1

7 0.63 0.41 -0.76 0.06 0.08

6 1395 Level 1

8 0.42 0.42 0.23 0.16 0.15

7 1411 Level 1

9 0.66 0.40 -0.93 0.04 0.24

8 1425 Level 1

10 0.29 0.53 0.91 0.2 0.13

9 1438 Level 1

11 0.21 0.34 1.46 0.3 0.04

10 1450 Level 1

12 0.63 0.38 -0.76 0.33 0.02

11 1462 Level 2

13 0.33 0.33 0.7 0.13 0.1

12 1473 Level 2

14 0.48 0.40 -0.05 0.14 0.39

13 1484 Level 2

15 0.26 0.36 1.1 0.22 0.08

14 1494 Level 2

16 0.60 0.40 -0.62 0.11 0.06

15 1504 Level 2

17 0.43 0.37 0.18 0.05 0.2

16 1514 Level 3

18 0.26 0.22 1.12 0.15 0.45

17 1524 Level 3

19 0.51 0.43 -0.17 0.1 0.14

18 1534 Level 3

20 0.35 0.54 0.57 0.17 0.08

19 1545 Level 3

21 0.45 0.36 0.12 0.21 0.06

20 1555 Level 3

22 0.40 0.39 0.33 0.19 0.14

21 1565 Level 3

23 0.65 0.36 -0.85 0.17 0.18

22 1576 Level 4

24 0.50 0.40 -0.14 0.06 0.16

23 1587 Level 4

25 0.23 0.12 1.32 0.04 0.64

24 1599 Level 4

26 0.41 0.40 0.32 0.26 0.04

25 1611 Level 4

27 0.62 0.39 -0.71 0.18 0.21

26 1624 Level 4

28 0.37 0.39 0.51 0.01 0.26

27 1638 Level 4

29 0.35 0.29 0.59 0.13 0.2

28 1653 Level 4

30 0.76 0.42 -1.49 0.15 0.06

29 1671 Level 4

31 0.70 0.37 -1.13 0.16 0.22

30 1692 Level 4

32 0.68 0.41 -1.01 0.08 0.14

31 1717 Level 4

33 0.43 0.31 0.2 0.13 0.39

32 1751 Level 4

34 0.43 0.40 0.19 0.17 0.05

33 1807 Level 4

34 1900 Level 4

Technical Data


Reading Grade 6

Test Statistics


Number of Items 34


Std. Deviation 6.90


Reliability 0.87


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.78 0.40 -0.84 0.54 0.36

0 1114 Level 1

2 0.56 0.37 0.45 0.09 0.27

1 1207 Level 1

3 0.83 0.43 -1.24 0.15 0.04

2 1262 Level 1

4 0.71 0.25 -0.38 0.11 0.45

3 1296 Level 1

5 0.85 0.33 -1.42 0.11 0.11

4 1322 Level 1

6 0.71 0.42 -0.39 0.08 0.25

5 1342 Level 1

7 0.84 0.42 -1.29 0.20 0.03

6 1359 Level 1

8 0.59 0.42 0.31 0.21 0.09

7 1375 Level 1

9 0.55 0.41 0.52 0.01 0.17

8 1388 Level 1

10 0.62 0.42 0.12 0.39 0.12

9 1401 Level 1

11 0.74 0.50 -0.56 0.02 0.00

10 1413 Level 1

12 0.74 0.41 -0.55 0.19 0.12

11 1424 Level 1

13 0.61 0.48 0.21 0.20 0.21

12 1435 Level 1

14 0.52 0.42 0.63 0.20 0.28

13 1446 Level 1

15 0.57 0.45 0.42 0.12 0.12

14 1456 Level 2

16 0.26 0.15 2.08 0.09 0.45

15 1466 Level 2

17 0.62 0.51 0.16 0.23 0.03

16 1476 Level 2

18 0.58 0.50 0.35 0.19 0.22

17 1485 Level 2

19 0.68 0.58 -0.20 0.17 0.18

18 1495 Level 2

20 0.69 0.49 -0.27 0.15 0.06

19 1505 Level 2

21 0.64 0.48 0.04 0.08 0.07

20 1515 Level 2

22 0.47 0.45 0.92 0.16 0.01

21 1525 Level 2

23 0.77 0.47 -0.76 0.37 0.02

22 1536 Level 3

24 0.54 0.41 0.53 0.15 0.08

23 1546 Level 3

25 0.61 0.44 0.17 0.06 0.07

24 1558 Level 3

26 0.75 0.55 -0.63 0.07 0.44

25 1570 Level 3

27 0.44 0.35 1.04 0.15 0.30

26 1583 Level 3

28 0.72 0.43 -0.46 0.01 0.05

27 1596 Level 3

29 0.78 0.56 -0.84 0.10 0.10

28 1612 Level 4

30 0.52 0.50 0.66 0.36 0.09

29 1629 Level 4

31 0.39 0.37 1.33 0.00 0.21

30 1650 Level 4

32 0.66 0.49 -0.07 0.05 0.09

31 1675 Level 4

33 0.63 0.35 0.07 0.11 0.21

32 1709 Level 4

34 0.66 0.45 -0.10 0.09 0.12

33 1765 Level 4

34 1858 Level 4

Technical Data


Math Grade 6

Test Statistics


Number of Items 34


Std. Deviation 5.31


Reliability 0.76


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.25 0.10 1.02 0.09 0.33

0 1186 Level 1

2 0.53 0.33 -0.34 0.08 0.18

1 1279 Level 1

3 0.72 0.27 -1.26 0.28 0.13

2 1335 Level 1

4 0.36 0.34 0.44 0.08 0.32

3 1369 Level 1

5 0.51 0.37 -0.27 0.16 0.19

4 1395 Level 1

6 0.49 0.44 -0.15 0.14 0.04

5 1416 Level 1

7 0.27 0.19 0.91 0.15 0.3

6 1433 Level 1

8 0.53 0.19 -0.35 0.23 0.44

7 1449 Level 1

9 0.20 0.29 1.38 0 0.13

8 1464 Level 1

10 0.66 0.40 -0.98 0.17 0.08

9 1477 Level 1

11 0.73 0.43 -1.33 0.09 0

10 1490 Level 1

12 0.54 0.43 -0.39 0.12 0.06

11 1501 Level 2

13 0.39 0.25 0.29 0.05 0.27

12 1513 Level 2

14 0.79 0.39 -1.71 0.21 0.16

13 1524 Level 2

15 0.72 0.28 -1.26 0.03 0.23

14 1535 Level 2

16 0.58 0.39 -0.55 0.03 0.03

15 1545 Level 2

17 0.47 0.49 -0.08 0.2 0.39

16 1556 Level 3

18 0.36 0.13 0.46 0.03 0.32

17 1566 Level 3

19 0.54 0.35 -0.39 0 0.22

18 1577 Level 3

20 0.43 0.33 0.13 0.01 0.37

19 1588 Level 3

21 0.40 0.43 0.24 0.04 0.22

20 1598 Level 4

22 0.49 0.43 -0.17 0.22 0.06

21 1609 Level 4

23 0.33 0.20 0.61 0.04 0.13

22 1621 Level 4

24 0.55 0.40 -0.43 0.27 0.19

23 1632 Level 4

25 0.43 0.23 0.11 0.06 0.33

24 1645 Level 4

26 0.35 0.36 0.51 0.09 0.03

25 1658 Level 4

27 0.19 0.32 1.44 0.35 0.23

26 1672 Level 4

28 0.61 0.49 -0.73 0.12 0.38

27 1687 Level 4

29 0.21 0.24 1.27 0.01 0.09

28 1704 Level 4

30 0.69 0.49 -1.12 0.02 0.14

29 1723 Level 4

31 0.11 0.24 2.19 0.05 0.58

30 1745 Level 4

32 0.36 0.31 0.47 0.03 0.04

31 1772 Level 4

33 0.23 0.23 1.14 0 0.14

32 1808 Level 4

34 0.69 0.44 -1.1 0.31 0.5

33 1866 Level 4

34 1961 Level 4

Technical Data


Reading Grade 7

Test Statistics


Number of Items 34


Std. Deviation 6.19


Reliability 0.83


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.90 0.34 -2.20 0.42 0.04

0 1163 Level 1

2 0.71 0.42 -0.70 0.15 0.30

1 1256 Level 1

3 0.85 0.38 -1.67 0.07 0.16

2 1313 Level 1

4 0.53 0.36 0.21 0.10 0.09

3 1348 Level 1

5 0.70 0.44 -0.66 0.04 0.79

4 1374 Level 1

6 0.62 0.22 -0.22 0.19 0.26

5 1395 Level 1

7 0.56 0.56 0.06 0.13 0.51

6 1413 Level 1

8 0.31 0.36 1.30 0.12 0.04

7 1429 Level 1

9 0.65 0.39 -0.38 0.06 0.16

8 1444 Level 1

10 0.59 0.35 -0.10 0.19 0.18

9 1457 Level 1

11 0.76 0.42 -1.03 0.32 0.02

10 1470 Level 1

12 0.64 0.39 -0.32 0.03 0.14

11 1482 Level 1

13 0.58 0.53 -0.05 0.13 0.23

12 1493 Level 1

14 0.53 0.27 0.20 0.12 0.16

13 1504 Level 2

15 0.76 0.46 -1.01 0.08 0.21

14 1514 Level 2

16 0.43 0.43 0.68 0.24 0.18

15 1525 Level 2

17 0.79 0.39 -1.20 0.35 0.13

16 1535 Level 2

18 0.37 0.24 1.00 0.06 0.05

17 1545 Level 2

19 0.36 0.26 1.06 0.12 0.28

18 1555 Level 2

20 0.69 0.51 -0.60 0.07 0.01

19 1565 Level 3

21 0.43 0.30 0.69 0.16 0.12

20 1575 Level 3

22 0.56 0.51 0.04 0.00 0.08

21 1586 Level 3

23 0.64 0.37 -0.35 0.12 0.00

22 1596 Level 3

24 0.48 0.45 0.46 0.13 0.35

23 1607 Level 3

25 0.64 0.43 -0.31 0.03 0.18

24 1619 Level 3

26 0.37 0.33 0.97 0.33 0.13

25 1631 Level 4

27 0.50 0.46 0.36 0.05 0.00

26 1643 Level 4

28 0.36 0.25 1.04 0.02 0.09

27 1657 Level 4

29 0.54 0.46 0.14 0.13 0.08

28 1673 Level 4

30 0.58 0.35 -0.01 0.19 0.29

29 1690 Level 4

31 0.47 0.28 0.51 0.23 0.10

30 1710 Level 4

32 0.49 0.39 0.41 0.03 0.04

31 1735 Level 4

33 0.38 0.34 0.95 0.18 0.17

32 1769 Level 4

34 0.42 0.39 0.73 0.23 0.34

33 1824 Level 4

34 1917 Level 4

Technical Data


Math Grade 7

Test Statistics


Number of Items 34


Std. Deviation 5.88


Reliability 0.8


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.41 0.34 0.29 0.33 0.18

0 1225 Level 1

2 0.38 0.43 0.4 0.34 0.41

1 1318 Level 1

3 0.30 0.26 0.81 0.25 0.06

2 1373 Level 1

4 0.14 0.22 1.9 0.14 0.15

3 1407 Level 1

5 0.80 0.41 -1.77 0.08 0.16

4 1432 Level 1

6 0.75 0.38 -1.46 0.21 0.3

5 1453 Level 1

7 0.62 0.40 -0.76 0.03 0.01

6 1470 Level 1

8 0.32 0.38 0.71 0.16 0.18

7 1485 Level 1

9 0.21 0.34 1.38 0.08 0.05

8 1499 Level 1

10 0.48 0.49 -0.07 0.35 0.28

9 1512 Level 1

11 0.38 0.28 0.41 0.23 0.18

10 1524 Level 2

12 0.47 0.37 -0.02 0.05 0.01

11 1535 Level 2

13 0.34 0.21 0.61 0.04 0.19

12 1546 Level 2

14 0.67 0.38 -0.98 0.14 0

13 1557 Level 2

15 0.40 0.28 0.29 0 0.04

14 1567 Level 2

16 0.41 0.31 0.24 0.5 0.08

15 1577 Level 2

17 0.41 0.43 0.24 0.31 0.58

16 1587 Level 3

18 0.31 0.18 0.76 0.03 0.35

17 1597 Level 3

19 0.35 0.26 0.54 0.17 0.28

18 1606 Level 3

20 0.49 0.20 -0.09 0.12 0.34

19 1616 Level 3

21 0.55 0.52 -0.37 0.08 0.38

20 1626 Level 3

22 0.41 0.31 0.25 0.33 0.18

21 1637 Level 4

23 0.46 0.42 0.01 0.23 0

22 1647 Level 4

24 0.48 0.40 -0.05 0.18 0.03

23 1658 Level 4

25 0.55 0.39 -0.4 0.23 0.31

24 1669 Level 4

26 0.52 0.48 -0.25 0.08 0.07

25 1681 Level 4

27 0.22 0.13 1.33 0.07 0.59

26 1694 Level 4

28 0.58 0.48 -0.56 0.03 0.1

27 1708 Level 4

29 0.70 0.40 -1.12 0.17 0.01

28 1723 Level 4

30 0.62 0.46 -0.75 0.02 0.08

29 1741 Level 4

31 0.57 0.46 -0.48 0.07 0.04

30 1761 Level 4

32 0.47 0.39 -0.04 0.1 0.24

31 1786 Level 4

33 0.76 0.45 -1.5 0.17 0

32 1820 Level 4

34 0.36 0.38 0.51 0.11 0.11

33 1876 Level 4

34 1968 Level 4

Technical Data


Reading Grade 8

Test Statistics


Number of Items 34


Std. Deviation 6.75


Reliability 0.86


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.81 0.48 -1.14 0.24 0.25

0 1181 Level 1

2 0.70 0.38 -0.44 0.10 0.16

1 1274 Level 1

3 0.80 0.51 -1.08 0.04 0.28

2 1329 Level 1

4 0.85 0.38 -1.49 0.62 0.04

3 1363 Level 1

5 0.71 0.44 -0.51 0.18 0.07

4 1388 Level 1

6 0.59 0.43 0.17 0.19 0.37

5 1409 Level 1

7 0.74 0.42 -0.69 0.15 0.11

6 1426 Level 1

8 0.77 0.43 -0.84 0.16 0.07

7 1442 Level 1

9 0.68 0.52 -0.33 0.22 0.00

8 1456 Level 1

10 0.60 0.44 0.14 0.14 0.13

9 1469 Level 1

11 0.63 0.30 -0.05 0.08 0.09

10 1481 Level 1

12 0.75 0.50 -0.75 0.07 0.12

11 1492 Level 1

13 0.50 0.28 0.65 0.27 0.06

12 1503 Level 1

14 0.73 0.51 -0.60 0.06 0.12

13 1514 Level 1

15 0.54 0.51 0.42 0.11 0.16

14 1524 Level 2

16 0.83 0.53 -1.29 0.22 0.13

15 1535 Level 2

17 0.73 0.53 -0.57 0.06 0.13

16 1545 Level 2

18 0.34 0.27 1.46 0.26 0.30

17 1555 Level 2

19 0.71 0.55 -0.47 0.45 0.11

18 1565 Level 2

20 0.53 0.45 0.50 0.37 0.17

19 1575 Level 2

21 0.51 0.30 0.60 0.04 0.07

20 1585 Level 2

22 0.63 0.47 -0.04 0.26 0.12

21 1596 Level 3

23 0.46 0.33 0.82 0.37 0.14

22 1606 Level 3

24 0.49 0.43 0.71 0.12 0.15

23 1618 Level 3

25 0.73 0.42 -0.61 0.16 0.37

24 1629 Level 3

26 0.29 0.26 1.72 0.56 0.21

25 1642 Level 3

27 0.72 0.45 -0.54 0.15 0.06

26 1655 Level 3

28 0.50 0.40 0.62 0.05 0.33

27 1669 Level 4

29 0.36 0.23 1.34 0.13 0.23

28 1685 Level 4

30 0.59 0.56 0.19 0.35 0.37

29 1703 Level 4

31 0.55 0.47 0.40 0.04 0.13

30 1724 Level 4

32 0.51 0.52 0.60 0.13 0.03

31 1749 Level 4

33 0.48 0.41 0.73 0.11 0.26

32 1784 Level 4

34 0.55 0.31 0.41 0.00 0.15

33 1840 Level 4

34 1933 Level 4

Technical Data


Math Grade 8

Test Statistics


Number of Items 34


Std. Deviation 6.10


Reliability 0.82


Question Statistics



Rasch Item

Difficulty DIF

Gender DIF

Ethnicity

No. Correct

Scale Score Level

1 0.41 0.40 0.2 0.11 0.04

0 1228 Level 1

2 0.54 0.42 -0.39 0.04 0.23

1 1322 Level 1

3 0.47 0.34 -0.07 0.32 0.06

2 1380 Level 1

4 0.22 0.28 1.3 0.09 0.51

3 1415 Level 1

5 0.36 0.38 0.45 0.17 0.16

4 1442 Level 1

6 0.29 0.34 0.83 0.19 0.14

5 1464 Level 1

7 0.29 0.39 0.82 0.3 0.23

6 1482 Level 1

8 0.70 0.31 -1.21 0.31 0.1

7 1498 Level 1

9 0.47 0.55 -0.08 0.17 0.42

8 1513 Level 1

10 0.78 0.44 -1.7 0.45 0.27

9 1526 Level 1

11 0.55 0.47 -0.45 0.08 0.17

10 1539 Level 2

12 0.48 0.44 -0.1 0.16 0.26

11 1551 Level 2

13 0.48 0.45 -0.1 0.35 0.13

12 1562 Level 2

14 0.40 0.28 0.26 0.09 0.67

13 1573 Level 2

15 0.40 0.31 0.24 0.04 0.27

14 1583 Level 2

16 0.43 0.33 0.12 0.25 0.19

15 1594 Level 3

17 0.37 0.30 0.4 0.26 0.08

16 1604 Level 3

18 0.70 0.41 -1.2 0.22 0.28

17 1614 Level 3

19 0.41 0.39 0.23 0.12 0.17

18 1623 Level 3

20 0.43 0.36 0.11 0.15 0.17

19 1633 Level 3

21 0.30 0.33 0.78 0.09 0.1

20 1643 Level 3

22 0.81 0.43 -1.89 0.17 0.05

21 1654 Level 4

23 0.32 0.17 0.68 0.16 0.32

22 1664 Level 4

24 0.37 0.39 0.4 0.21 0

23 1675 Level 4

25 0.18 0.24 1.56 0.18 0.07

24 1686 Level 4

26 0.51 0.39 -0.25 0.02 0

25 1698 Level 4

27 0.57 0.51 -0.56 0.27 0.42

26 1711 Level 4

28 0.33 0.20 0.62 0.14 0.33

27 1725 Level 4

29 0.60 0.43 -0.67 0.11 0.15

28 1740 Level 4

30 0.45 0.33 0 0.15 0.3

29 1757 Level 4

31 0.61 0.44 -0.75 0.21 0.03

30 1777 Level 4

32 0.44 0.51 0.06 0.06 0.37

31 1802 Level 4

33 0.37 0.22 0.42 0 0.2

32 1836 Level 4

34 0.46 0.42 -0.04 0.16 0.09

33 1891 Level 4

34 1984 Level 4

Appendix B: Web Alignment

Web Alignment Study of Discovery Education Assessment Benchmarks with

Common Core Standards

Purpose

The purpose of this study is to report the results of a Web Alignment Study of Discovery

Education Assessment (DEA) benchmarks in reading and mathematics, grades 3 to 12, with

Common Core Standards. Discovery Education Assessment has created three benchmark

assessments (for use in fall, winter, and spring) for reading grades 3 to 10 and mathematics

grades 3 to 11. These benchmarks were created based on Common Core Standards in reading

and mathematics. The Web Alignment Tool (WAT) version 2 was used to record and analyze the

results of this study. The nature of an alignment is to measure “the degree to which expectations

and assessments are in agreement and serve in conjunction with one another to guide the system

toward students learning what they are expected to know and do.”

There are two aspects of this study. First, Common Core Standards are entered into the

WAT. In Phase I, subject matter experts rate the depth of knowledge (DOK) of each objective in

each of the standards. Then, in Phase II, subject matter experts rate each question in each

assessment for the objective and standard it matches and the depth of knowledge. Phase II results

are reported for each of the following categories:

Categorical Concurrence --- This criterion measures the extent to which the same or

consistent categories of content appear in the standards and the assessments. The criterion

is met for a given standard if there are more than five assessment items targeting that

standard.

Depth-of-Knowledge Consistency --- This criterion measures the degree to which the

knowledge elicited from students on the assessment is as complex within the context area

as what students are expected to know and do as stated in the standards. The criterion is

met if more than half of targeted objectives are hit by items of the appropriate

complexity.

Range-of-Knowledge Correspondence --- This criterion determines whether the span of

knowledge expected of students on the basis of a standard corresponds to the span of

knowledge that students need in order to correctly answer the corresponding assessment

items/activities. The criterion is met for a given standard if more than half of the

objectives that fall under that standard are targeted by assessment items.

Balance of Representation --- This criterion measures whether objectives that fall under

a specific standard are given relatively equal emphasis on the assessment.

Source of Challenge --- This criterion is met if the primary difficulty of the assessment

items is significantly related to students’ knowledge and skill in the content area as

represented in the standards.

Phase I: Entry of Standards and Depth of Knowledge Consensus

The Common Core Standards for Reading and Mathematics were entered into WAT. The

WAT identifies three level of entry for a standard:

Standard is the most general. It may be a broad statement of student activities, such as

“Students read for understanding,” or it may simply be a content classification like

“Geometry.” The data are reported out at this level.

Goal is the middle level of specificity. Each standard is composed of goals that may

involve smaller topic areas or more precise student activities.

Objective is the most specific level. Each goal is composed of objectives, which specify

particular kinds of activities or skills (e.g., “Read and identify types of poetry and the use

of inversion, rhyme, and rhythm,” or “Convert units within a measurement system.”).

When reviewers are coding items, they will match assessment items with one or more

objectives, if possible.

For English language arts Grade 3, the following presents two examples of these three levels,

one for RL 3.1 and the other for RI 3.7:

Standard: Reading: Literature

Goal: Key Ideas and Detail

Objective: Ask and answer questions to demonstrate understanding of a text,

referring explicitly to the text as the basis for the answers.

Standard: Reading: Informational Text

Goal: Integration of Knowledge and Ideas

Objective: Use information gained from illustrations (e.g., maps, photographs) and

the words in a text to demonstrate understanding of the text (e.g., where,

when, why, and how key events occur).

For mathematics Grade 3, the following presents two examples of these three levels, one for

3.NBT.1 and the other for 3.MD.5:

Standard: Number and Operations in Base Ten

Goal: Use place value understanding and properties of operations to perform

multi-digit arithmetic.

Objective: Use place value understanding to round whole numbers to the nearest 10

or 100.

Standard: Measurement and Data

Goal: Geometric measurement: understand concepts of area and relate area to

multiplication and to addition.

Objective: Recognize area as an attribute of plane figures and understand concepts of

area measurement.

The SMARTER Assessment Consortium commissioned a study by WESTED to

determine, among other factors, the depth of knowledge of each Common Core objectives. The

results of this DOK analysis were published in March 2011: SMARTER Balanced Assessment

Consortium Common Core State Standards Analysis: Eligible Content for the Summative

Assessment: Final Report. For each objective in reading and mathematics, a DOK value or range

of values was assigned.

For purposes of this study, these ranges were used as the initial consensus judgment on a

DOK level. If the range had two values, such as 1-2 or 2-3, the higher DOK value was selected.

If the DOK had a range of 1-3, the middle value of 2 was selected. No value of 4 was selected in

this initial judgment. All DEA assessment items were written in a multiple-choice format or

short answer constructed response. These types of items are often unable to measure a depth of

knowledge of 4. These initial consensus values were then presented to an additional subject

matter expert separately for reading and mathematics. These two experts concurred with the

chosen values or offered their own revisions. A final group consensus was undertaken to

reconcile differences. Final DOK values for each objective were entered into WAT.

Phase II: Reviewer Judgment of Objective and DOK Level

This study measured the alignment of the following DEA benchmarks. DEA offered

three benchmarks in three time periods: fall, winter, and spring. All benchmarks from fall and

winter 2012 were aligned along with a sampling of benchmarks from spring 2012. The following

tables summarize the benchmarks aligned in reading and mathematics. In total, sixteen reading

tests and nineteen mathematics tests were used in this alignment study.

Grade 3 Reading fall 2012 winter 2012


Grade 5 Reading fall 2012 winter 2012 spring 2012


Grade 7 Reading fall 2012 winter 2012 spring 2012


Grade 9 Reading fall 2012

Grade 10 Reading fall 2012

Grade 3 Math fall 2012 winter 2012


Grade 5 Math fall 2012 winter 2012 spring 2012



Grade 8 Math fall 2012 winter 2012 spring 2012

Algebra I fall 2012 winter 2012

Geometry fall 2012

Algebra II fall 2012 winter 2012

Three trained subject matter experts reviewed each benchmark test. These reviewers were

first trained on depth of knowledge using materials provided in the Web Alignment Tool (WAT):

Training Manual Versions 1.1 July 2005. Then reviewers were trained on Common Core

Standards and Objectives for reading and mathematics.

Each reviewer completed three tasks with each assessment item: (1) judging the primary

objective to which that item corresponds; (2) judging the depth of knowledge of that item; and

(3) judging whether there is a source-of-challenge with an assessment item.

Results

A total of 93 standards across 19 mathematics tests and 71 standards across 16 reading

tests were judged using WAT2. The four major categories of alignment and the degree of

alignment are as follows:

Categorical Concurrence --- A judgment of “YES” indicates that six or more items

target a standard; a judgment of “WEAK” indicates that five items target a standard; and

a judgment of “NO” indicates that fewer than five items target a standard.

Depth-of-Knowledge Consistency --- A judgment of “YES” indicates that 50% or more

of the items were rated “at” or “above” the depth-of-knowledge level of the

corresponding objectives; “WEAK” indicates that 40% to 50% of the items were rated as

“at” or “above” the depth-of-knowledge level of the corresponding objectives; and “NO”

indicates that less than 40% of the items were rated as “at” or “above” the depth-of-

knowledge level of the corresponding objectives

Range-of-Knowledge Correspondence --- “Yes” indicates that 50% or more of the

objectives had at least one coded objective. “Weak” indicates that 40% to 50% of the

objectives had at least one coded objective. “No” indicates that 40% or less of the

objectives had at least one coded objective.

Balance of Representation --- “Yes” indicates that the Balance Index was .7 or above

(items evenly distributed among objectives). “Weak” indicates that the Balance Index

was .6 to .7 (a high percentage of items coded as corresponding to two or three

objectives). “No” indicates that the Balance Index was .6 or less (a high percentage of

items coded as corresponding to one objective.)

The following table summarizes the Web alignment results for the mathematics tests. For

Categorical Concurrence, 63% of the 65 standards received a “YES”. For Depth-of-Knowledge

Consistency, 100% of the standards received a “YES”. For Range of Knowledge, 97% of the

standards received a “YES” and 98% received a “YES” for Balance of Representation. Thus, the

mathematics tests are highly aligned to Common Core Standards. Some standards did not have at

least six questions for that standard. This trend probably reflects the nature of some DEA

benchmark tests. All benchmarks comprise 30 to 40 questions and are designed to be completed

in a class period. Furthermore, some standards have numerous objectives. To ensure that each

benchmark is sampling these objectives, more questions are written to some standards than

others. Mathematics Tests Alignment Summary

YES WEAK NO TOTAL

# % # % # % #

Categorical Concurrence

59 63% 34 37% 93 Depth-of-Knowledge Consistency

93 100% 0 0% 0 0% 93 Range of Knowledge

90 97% 3 3% 0 0% 93 Balance of Representation 91 98% 2 2% 0 0% 93

The following table summarizes the Web alignment results for the reading tests. For

Categorical Concurrence, 82% of the 71 standards received a “YES”. For depth-of-knowledge

Consistency, 80% of the standards received a “YES” and 11% a “WEAK”. For Range of

Knowledge, 87% of the standards received a “YES” and 13% received a “WEAK”. For Balance

of Representation, 99% of the standards received a “YES”. Thus, the reading tests are highly

aligned to Common Core Standards. Some standards did not have at least six questions for that

standard. This trend probably reflects the nature of some DEA benchmark tests. All benchmarks

comprise 30 to 40 questions and are designed to be completed in a class period. Furthermore,

some standards have numerous objectives. To ensure that each benchmark is sampling these

objectives, more questions are written to some standards than others. Furthermore, the depth-of-

knowledge of some standards was rated during consensus at a three level. Some questions to

some objectives were rated slightly below this level.

Reading Tests Alignment Summary

YES WEAK NO TOTAL

# % # % # % #


58 82%

13 18% 71 Depth-of-Knowledge Consistency

57 80% 8 11% 6 8% 71 Range of Knowledge

62 87% 9 13% 0 0% 71 Balance of Representation 70 99% 1 1% 0 0% 71

The actual Web alignment results for all four categories for all 35 tests are presented in

the tables on the next pages.

Grade 3 Mathematics Fall 2012


Depth-of-Knowledge Consistency

Range of Knowledge

Balance of Representation

Operations & Algebra YES YES YES YES

Number/Operations Base Ten

NO YES YES YES

Number/Operations Fractions

NO YES YES YES

Measurement & Data YES YES YES YES

Geometry NO YES YES YES




Range of Knowledge




YES YES YES YES


YES YES YES YES






Range of Knowledge


Operations & Algebra NO YES YES YES


YES YES YES YES


YES YES YES YES


Geometry YES YES YES YES




Range of Knowledge


Ratios and Proportions NO YES YES YES

Number System YES YES YES YES

Expressions & Equations YES YES YES YES


Statistics and Probability

NO YES YES YES




Range of Knowledge


Ratios and Proportions YES YES YES YES

Number System NO YES YES YES




YES YES YES YES




Range of Knowledge




Functions YES YES YES WEAK



NO YES YES YES

Algebra I Fall 2012



Range of Knowledge


Number and Quantity NO YES YES YES

Algebra YES YES YES YES

Functions YES YES YES YES


NO YES YES YES

Algebra II Fall 2012



Range of Knowledge






YES YES YES YES

Geometry Fall 2012



Range of Knowledge


Congruence YES YES YES YES

Similarity, Right Triangles

YES YES WEAK YES

Circles NO YES YES YES

Expressing Geometric NO YES YES YES

Geometric Measurement

NO YES YES YES

Modeling with Geometry

NO YES YES YES

Statistics NO YES WEAK YES

Grade 3 Mathematics Winter 2012



Range of Knowledge




NO YES YES YES


NO YES YES YES






Range of Knowledge


Operations & Algebra YES YES YES WEAK


YES YES YES YES


YES YES YES YES






Range of Knowledge




YES YES YES YES


YES YES YES YES






Range of Knowledge


Ratios and Proportions NO YES YES YES

Number System YES YES YES YES




NO YES YES YES




Range of Knowledge


Ratios and Proportions YES YES YES YES





YES YES YES YES




Range of Knowledge







NO YES YES YES

Algebra 1 Winter 2012



Range of Knowledge






NO YES WEAK YES

Algebra 2 Winter 2012



Range of Knowledge






YES YES YES YES

Grade 5 Mathematics Spring 2012



Range of Knowledge




YES YES YES YES


YES YES YES YES



Grade 8 Mathematics Spring 2012



Range of Knowledge







NO YES YES YES

Grade 3 Reading Fall 2012



Range of Knowledge


Reading: Literature YES YES YES YES

Reading: Informational YES NO YES YES

Reading Foundation NO YES YES YES

Writing YES YES YES YES

Language YES YES YES YES




Range of Knowledge



Reading: Informational YES WEAK WEAK YES


Writing YES WEAK WEAK YES





Range of Knowledge


Reading: Literature YES WEAK YES YES

Reading: Informational YES YES WEAK YES


Writing YES WEAK YES YES





Range of Knowledge


Reading: Literature YES NO YES YES

Reading: Informational YES NO WEAK YES






Range of Knowledge



Reading: Informational YES NO YES WEAK

Writing YES WEAK YES YES

Language NO YES YES YES




Range of Knowledge


Reading: Literature YES NO YES YES

Reading: Informational YES NO YES YES






Range of Knowledge



Reading: Informational YES YES YES YES

Writing YES YES WEAK YES





Range of Knowledge



Reading: Informational YES WEAK YES YES



Grade 3 Reading Winter 2012



Range of Knowledge










Range of Knowledge










Range of Knowledge










Range of Knowledge





Language YES YES WEAK YES




Range of Knowledge









Range of Knowledge






Grade 5 Reading Spring 2012



Range of Knowledge







Grade 7 Reading Spring 2012



Range of Knowledge






Download - Discovery Education Assessment K-HS Benchmark … · In 2012-2013, K-2 students in the United States took over 3.3 million Common Core interim benchmark ... standards in English language

Top Related