generalizability and dependability of direct behavior ratings (dbrs) to assess social behavior of...

1
Generalizability and Dependability of Direct Behavior Ratings (DBRs) to Assess Social Behavior of Preschoolers Sandra M. Chafouleas 1 , Theodore J. Christ 2 , T. Chris Riley-Tillman 3 , Amy M. Briesch 1 , & Julie A.M. Chanese 1 University of Connecticut 1 University of Minnesota 2 East Carolina University 3 Although high quality behavior assessment tools exist for many purposes of assessment (e.g., screening, diagnosis, outcome evaluation), the same statement cannot be made regarding feasible tools available for the formative assessment of social behavior. Formative assessment becomes highly relevant when the goal is to have ongoing data regarding behavior in order to quickly modify an intervention strategy as appropriate. Thus, there is a need to establish reliable, valid, and feasible tools that can be customized to estimate a variety of social behaviors over time. One potentially feasible tool for use in the formative assessment of social behavior is the Direct Behavior Rating (DBR). Direct Behavior Ratings refer to a hybrid of assessment tools given that they combine characteristics of systematic direct observation and behavior rating scales. That is, when using a DBR, the rating process is similar to that of a behavior rating scale (e.g., On a scale of 1-6, how well did Johnny pay attention?), yet also similar to systematic direct observation given that the rating occurs following a specified shorter period of time (Chafouleas, Riley-Tillman, & Sugai, in press). Empirical support for the reliability of DBR use is limited and as such, we suggest that systematic empirical investigation of the psychometric properties of the DBR is needed. Thus, the purpose of this study was to provide preliminary psychometric data regarding the generalizability and dependability of the DBR for assessing the social behavior of preschoolers through investigation of the following questions: 1. What percentage of the variance in DBR ratings of social behavior in preschool students is accounted for by raters, time, and setting? 2. Is the DBR a reliable and valid method for assessing the social behavior of preschool students? Introduction Introduction Participants included four female, Caucasian teachers working in the preschool classroom at a center affiliated with a large university located in the Northeast. In addition to the teachers serving as primary participants given their role as the observers, children attending the preschool also served as participants as their behavior was observed and recorded. The 15 students ranged in age from 3 years, 9 months, to 4 years, 9 months, with an average age of 4 years, 4 months. Thirteen of the children were Caucasian and 2 were Hispanic. The DBR created for use in this study included two social behaviors selected from preschool curricular goals and benchmarks provided by the associated state guidelines (Connecticut State Department of Education). The two behaviors included Works to Resolve Conflicts (WRC) and Interacts Cooperatively (IC) with Peers. When rating a student on each of the behaviors, teachers were asked to place an X on a continuous line (115 mm in length), indicating the proportion of time that exemplary behavior was observed during the observation period. Data were collected daily over 13 consecutive school days in late spring. Two 30-minute observations were conducted each day, with all four teachers observing and rating all student participants during the same time period (i.e., fully crossed design). In all, 2576 data points were collected. Subsequent to data collection, three major analyses were conducted. First, generalizability (G) theory was utilized to analyze the variance components of the full model. Next, a second set of G-studies were conducted to examine the DBR rating within rater. Finally, dependability (D) studies were conducted to examine the likely magnitude of generalizability coefficients, ρ2, and Method Method Summary and Conclusions Summary and Conclusions Results Results For additional information, please direct all correspondence to Sandra Chafouleas at: [email protected] Chafouleas, S.M., Christ, T.J., Riley-Tillman, T.C., Briesch, A.M, & Chanese, J.A.M. (2007, March). Generalizability and dependability of Direct Behavior Ratings (DBRs) to assess social behavior of preschoolers . Poster presentation at the National Association of School Psychologists Annual Convention, New York, New York. Results Results Results of the G-studies suggested that: 1. Although the most substantial proportion of variance for both WRC and IC was attributed to person (18%, 38%), a fairly substantial proportion of measurement variance was attributable to the different raters (σ[raters] = 41% & 20%). That is, individual raters (i.e., teachers) tended to yield divergent ratings when the same person (i.e., student) was observed during the same interval. These inconsistencies in judgment of students’ WRC and IC should discourage the generalization of DBR ratings across raters. However, when the graphs presented in the above figure are visually compared, patterns among raters become apparent. A high degree of consistency was noted within and across students in the obtained profiles. Therefore, DBRs are not currently recommended for use in assessing behavior in relation to an absolute criterion; however, they do appear to have the potential to assist in intra- individual assessment. 2. When results were analyzed within raters, the proportion of variance attributed to person ranged from 30% to 63%, which represents an increase of 12% to 40% when compared to the original analysis. That is, when DBR ratings were analyzed within rater, the data were more indicative of the target student. 3. The percentage of variance accounted for with regard to day and setting was somewhat surprising. Both day and settings accounted for 0% of the variance, suggesting that these particular DBR ratings were not sensitive to small fluctuations in behavior across time or setting. This has important implications for the implementation of DBRs, in that behaviors that are more static might permit less frequent rating (e.g., weekly) while providing equally useful information. Future research is needed to discern those behaviors that are and are not likely to be more variable in similar and different types of settings, as well as to determine the optimal frequency with which ratings should be conducted. Results of the D-studies suggested that: 1.DBRs are likely to approximate or exceed a reliability coefficient of .70 after seven ratings have been collected across 4 to 7 days, or .90 after 10 DBR ratings have been collected. G-study Results for the F ull Model: Person, Rater, Day, Setting, and Interactions W RC a IC a V ar b % V ar c V ar % V ar Person 229.17 18 404.62 38 Rater 524.86 41 216.61 20 D ay 4.46 0 0.51 0 Setting 0 0 0 0 Person x rater 98.23 8 44.94 4 Person x D ay 28.89 2 37.73 4 Person x Setting -0.18 0 8.16 1 Raterx D ay 57.65 5 40.41 4 Raterx Setting 0 0 0 0 D ay x Setting 3.10 0 6.20 1 Person x Raterx D ay 33.41 3 11.90 1 Person x Raterx Setting 0 0 0 0 Person x D ay x Setting 0 0 0 0 Error 292.53 23 290.69 27 Total 1272.12 100 1061.77 100 N ote . a W RC refersto “w orksto resolve conflicts” and IC refersto “interacts cooperatively”. b V ar– variance calculated using Type IIIsum ofsquares. c % V ar– percentage oftotalvariance. G-study Between Raters: Person by Observation for E ach Behavior W RC a IC a Rater Com ponent V ar b % V ar V ar % V ar 1 Person 145.75 .30 464.07 0.55 Observation c 30.21 .06 31.59 0.04 Error 308.94 .64 349.50 0.41 2 Person 513.40 .48 539.06 0.49 Observation b 128.50 .12 118.27 0.11 Error 431.60 .40 443.83 0.40 3 Person 214.06 .36 377.58 0.63 Observation b 37.55 .06 21.89 0.04 Error 348.03 .58 201.76 0.34 4 Person 452.28 .58 459.59 0.55 Observation b 58.45 .07 51.95 0.06 Error 271.77 .35 319.05 0.38 Note . a W RC refersto “w orksto resolve conflicts” and IC refersto “interacts cooperatively”. b V ar– variance calculated using Type IIIsum ofsquares, % V ar– percentage oftotal variance. c O bservation isthe day and setting facetscom bined. The most substantial proportions of measurement variance for both WRC and IC were attributed to person (18%, 38%) and rater (41%, 38%). Both day and settings accounted for 0% of the variance, suggesting that these particular DBR ratings were not sensitive to small fluctuations in behavior across time or setting. Comparison of DBR Ratings Across Teachers

Upload: paul-rice

Post on 16-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Generalizability and Dependability of Direct Behavior Ratings (DBRs) to Assess Social Behavior of Preschoolers

Sandra M. Chafouleas1, Theodore J. Christ2, T. Chris Riley-Tillman3, Amy M. Briesch1, & Julie A.M. Chanese1

University of Connecticut1 University of Minnesota2 East Carolina University3

Although high quality behavior assessment tools exist for many purposes of assessment (e.g., screening, diagnosis, outcome evaluation), the same statement cannot be made regarding feasible tools available for the formative assessment of social behavior. Formative assessment becomes highly relevant when the goal is to have ongoing data regarding behavior in order to quickly modify an intervention strategy as appropriate. Thus, there is a need to establish reliable, valid, and feasible tools that can be customized to estimate a variety of social behaviors over time. One potentially feasible tool for use in the formative assessment of social behavior is the Direct Behavior Rating (DBR). Direct Behavior Ratings refer to a hybrid of assessment tools given that they combine characteristics of systematic direct observation and behavior rating scales. That is, when using a DBR, the rating process is similar to that of a behavior rating scale (e.g., On a scale of 1-6, how well did Johnny pay attention?), yet also similar to systematic direct observation given that the rating occurs following a specified shorter period of time (Chafouleas, Riley-Tillman, & Sugai, in press). Empirical support for the reliability of DBR use is limited and as such, we suggest that systematic empirical investigation of the psychometric properties of the DBR is needed. Thus, the purpose of this study was to provide preliminary psychometric data regarding the generalizability and dependability of the DBR for assessing the social behavior of preschoolers through investigation of the following questions:

1. What percentage of the variance in DBR ratings of social behavior in preschool students is accounted for by raters, time, and setting?

2. Is the DBR a reliable and valid method for assessing the social behavior of preschool students?

IntroductionIntroduction

Participants included four female, Caucasian teachers working in the preschool classroom at a center affiliated with a large university located in the Northeast. In addition to the teachers serving as primary participants given their role as the observers, children attending the preschool also served as participants as their behavior was observed and recorded. The 15 students ranged in age from 3 years, 9 months, to 4 years, 9 months, with an average age of 4 years, 4 months. Thirteen of the children were Caucasian and 2 were Hispanic.

The DBR created for use in this study included two social behaviors selected from preschool curricular goals and benchmarks provided by the associated state guidelines (Connecticut State Department of Education). The two behaviors included Works to Resolve Conflicts (WRC) and Interacts Cooperatively (IC) with Peers. When rating a student on each of the behaviors, teachers were asked to place an X on a continuous line (115 mm in length), indicating the proportion of time that exemplary behavior was observed during the observation period. Data were collected daily over 13 consecutive school days in late spring. Two 30-minute observations were conducted each day, with all four teachers observing and rating all student participants during the same time period (i.e., fully crossed design). In all, 2576 data points were collected.

Subsequent to data collection, three major analyses were conducted. First, generalizability (G) theory was utilized to analyze the variance components of the full model. Next, a second set of G-studies were conducted to examine the DBR rating within rater. Finally, dependability (D) studies were conducted to examine the likely magnitude of generalizability coefficients, ρ2, and dependability coefficients, Φ, along with the magnitudes of relative SEM, Δ, and absolute SEM, δ

MethodMethodSummary and ConclusionsSummary and Conclusions

ResultsResults

For additional information, please direct all correspondence to Sandra Chafouleas at: [email protected], S.M., Christ, T.J., Riley-Tillman, T.C., Briesch, A.M, & Chanese, J.A.M. (2007, March). Generalizability and dependability of Direct Behavior Ratings (DBRs) to assess social behavior of preschoolers. Poster presentation at the National Association of School Psychologists Annual Convention, New York, New York.

ResultsResults

Results of the G-studies suggested that:

1. Although the most substantial proportion of variance for both WRC and IC was attributed to person (18%, 38%), a fairly substantial proportion of measurement variance was attributable to the different raters (σ[raters] = 41% & 20%). That is, individual raters (i.e., teachers) tended to yield divergent ratings when the same person (i.e., student) was observed during the same interval. These inconsistencies in judgment of students’ WRC and IC should discourage the generalization of DBR ratings across raters. However, when the graphs presented in the above figure are visually compared, patterns among raters become apparent. A high degree of consistency was noted within and across students in the obtained profiles. Therefore, DBRs are not currently recommended for use in assessing behavior in relation to an absolute criterion; however, they do appear to have the potential to assist in intra-individual assessment.

2. When results were analyzed within raters, the proportion of variance attributed to person ranged from 30% to 63%, which represents an increase of 12% to 40% when compared to the original analysis. That is, when DBR ratings were analyzed within rater, the data were more indicative of the target student.

3. The percentage of variance accounted for with regard to day and setting was somewhat surprising. Both day and settings accounted for 0% of the variance, suggesting that these particular DBR ratings were not sensitive to small fluctuations in behavior across time or setting. This has important implications for the implementation of DBRs, in that behaviors that are more static might permit less frequent rating (e.g., weekly) while providing equally useful information. Future research is needed to discern those behaviors that are and are not likely to be more variable in similar and different types of settings, as well as to determine the optimal frequency with which ratings should be conducted.

Results of the D-studies suggested that:

1. DBRs are likely to approximate or exceed a reliability coefficient of .70 after seven ratings have been collected across 4 to 7 days, or .90 after 10 DBR ratings have been collected.

2. The behavior “Interacts Cooperatively” consistently demonstrated better dependability coefficients, reduced SEMs, and the values were less rater dependent than the behavior “Works to Resolve Conflicts.” Future research should therefore investigate the dependability of the DBR when used to rate classroom behaviors that are both more commonly assessed (e.g., on-task behavior) and more discretely defined (e.g. raising hand).

G-study Results for the Full Model: Person, Rater, Day, Setting, and Interactions

WRC a IC a

Var b % Varc Var % Var

Person 229.17 18 404.62 38

Rater 524.86 41 216.61 20

Day 4.46 0 0.51 0

Setting 0 0 0 0

Person x rater 98.23 8 44.94 4

Person x Day 28.89 2 37.73 4

Person x Setting -0.18 0 8.16 1

Rater x Day 57.65 5 40.41 4

Rater x Setting 0 0 0 0

Day x Setting 3.10 0 6.20 1

Person x Rater x Day 33.41 3 11.90 1

Person x Rater x Setting 0 0 0 0

Person x Day x Setting 0 0 0 0

Error 292.53 23 290.69 27

Total 1272.12 100 1061.77 100

Note. aWRC refers to “works to resolve conflicts” and IC refers to “interacts

cooperatively”. bVar – variance calculated using Type III sum of squares.

c%Var – percentage of total variance.

G-study Between Raters: Person by Observation for Each Behavior

WRCa ICa

Rater Component Varb % Var Var % Var

1 Person 145.75 .30 464.07 0.55

Observationc 30.21 .06 31.59 0.04

Error 308.94 .64 349.50 0.41

2 Person 513.40 .48 539.06 0.49

Observationb 128.50 .12 118.27 0.11

Error 431.60 .40 443.83 0.40

3 Person 214.06 .36 377.58 0.63

Observationb 37.55 .06 21.89 0.04

Error 348.03 .58 201.76 0.34

4 Person 452.28 .58 459.59 0.55

Observationb 58.45 .07 51.95 0.06

Error 271.77 .35 319.05 0.38

Note. aWRC refers to “works to resolve conflicts” and IC refers to “interacts

cooperatively”. bVar – variance calculated using Type III sum of squares, %Var –

percentage of total variance. cObservation is the day and setting facets combined.

The most substantial proportions of measurement variance for both WRC and IC were attributed to person (18%, 38%) and rater (41%, 38%). Both day and settings accounted for 0% of the variance, suggesting that these particular DBR ratings were not sensitive to small fluctuations in behavior across time or setting.

Comparison of DBR Ratings Across Teachers