training teachers for large-scale fitness testing: the georgia experience mike metzler shannon...
TRANSCRIPT
Training Teachers for Large-Scale Training Teachers for Large-Scale Fitness Testing: The Georgia Fitness Testing: The Georgia
ExperienceExperience
Mike MetzlerMike Metzler
Shannon WilliamsShannon Williams
Georgia State UniversityGeorgia State University
NASPE PETE ConferenceNASPE PETE Conference
Las Vegas, NVLas Vegas, NV
October 5, 2012October 5, 2012
11
OverviewOverview
History of required fitness assessments in GeorgiaHistory of required fitness assessments in Georgia
2010-2011 Pilot Evaluation Project2010-2011 Pilot Evaluation Project
2011-2012 First Year Implementation Evaluation 2011-2012 First Year Implementation Evaluation Project (EP)Project (EP)
What we’ve learned in GeorgiaWhat we’ve learned in Georgia
Q&A as time permitsQ&A as time permits22
History of Required Fitness History of Required Fitness Assessments in Georgia Public SchoolsAssessments in Georgia Public Schools
Spring 2010 HB 229 passed and signed:Spring 2010 HB 229 passed and signed:• Established SHAPE PartnershipEstablished SHAPE Partnership
• Required fitness assessments in all gradesRequired fitness assessments in all grades
• Data to be reported by school, district and stateData to be reported by school, district and state
• First annual report due to Governor in October 2012First annual report due to Governor in October 2012
Spring-Summer 2010 GA DOE Fitness Assessment Committee:Spring-Summer 2010 GA DOE Fitness Assessment Committee:• 5 components of FITNESSGRAM would be used statewide5 components of FITNESSGRAM would be used statewide
• FITNESSGRAM 9 on-line software to be used for data entry and reportingFITNESSGRAM 9 on-line software to be used for data entry and reporting
• Determined rules for student exemptions from testingDetermined rules for student exemptions from testing
• FITNESSGRAM Parent Reports to be sent home for all tested studentsFITNESSGRAM Parent Reports to be sent home for all tested students
• ““Test familiarity” in grades 1-3Test familiarity” in grades 1-3
• BMI measured in 1BMI measured in 1stst-3-3rdrd grade but not reported grade but not reported
• All other grades are to be tested and reportedAll other grades are to be tested and reported
33
History of Required Fitness History of Required Fitness Assessments in Georgia Public SchoolsAssessments in Georgia Public Schools
2010-2011 School Year Pilot Training and Evaluation:2010-2011 School Year Pilot Training and Evaluation:• 4 participating districts4 participating districts
• 250 teachers trained by state/national FG trainers250 teachers trained by state/national FG trainers
• ~25,000 students tested and data entered for~25,000 students tested and data entered for
• GSU research team contracted by SHAPE to conduct evaluation of the training and GSU research team contracted by SHAPE to conduct evaluation of the training and testing programtesting program
2011-2012 statewide implementation of required fitness assessments:2011-2012 statewide implementation of required fitness assessments:• All 182 districts participatedAll 182 districts participated
• ~3,000 teachers trained by state/national FG trainers~3,000 teachers trained by state/national FG trainers
• ~1.2 million students tested, with entered data on some/all FG tests~1.2 million students tested, with entered data on some/all FG tests
• GSU research team contracted by SHAPE to conduct evaluation of the training and GSU research team contracted by SHAPE to conduct evaluation of the training and testing programtesting program
44
Teacher Training for 2011-2012Teacher Training for 2011-2012Statewide ImplementationStatewide Implementation
~3,000 teachers trained by 6 state/national FG ~3,000 teachers trained by 6 state/national FG trainerstrainers
1-day, 8-hour training1-day, 8-hour training
Training manual developed by HealthMPowersTraining manual developed by HealthMPowers
Training mostly on testing, some on data entryTraining mostly on testing, some on data entry
Data entry training done with on-line webinarsData entry training done with on-line webinars55
Subject Selection or 2011-2012Subject Selection or 2011-2012Statewide Implementation EvaluationStatewide Implementation Evaluation
Requests sent to 177 school district superintendentsRequests sent to 177 school district superintendents
27 districts agreed to allow their teachers to be asked to participate in 27 districts agreed to allow their teachers to be asked to participate in EPEP
Initial pool of 1050 teachers, taken from training sign-in logsInitial pool of 1050 teachers, taken from training sign-in logs
Random sample of 371 teachers: Distributed by location, district Random sample of 371 teachers: Distributed by location, district enrollment, school type (ES, MS, HS), gender, experienceenrollment, school type (ES, MS, HS), gender, experience
Final pool of 351 teachers with usable email contact addressesFinal pool of 351 teachers with usable email contact addresses
N of teacher participants varied by evaluation componentN of teacher participants varied by evaluation component
66
Scope of the 2011-2012Scope of the 2011-2012Evaluation ProjectEvaluation Project
1.1. Fitness testing trainingFitness testing training
2.2. Fitness test administrationFitness test administration
3.3. Preparing and distributing fitness assessment reportsPreparing and distributing fitness assessment reports
4.4. Teacher, student, and parent/guardian perceptions of Teacher, student, and parent/guardian perceptions of fitness testingfitness testing
5.5. Recommendations for future assessmentsRecommendations for future assessments77
Evaluation of Fitness Testing TrainingEvaluation of Fitness Testing Training
Four data sources were used to evaluate the fitness testing Four data sources were used to evaluate the fitness testing training:training:
1.1.““Ticket out the Door” (comment slips immediately after each training session) (n Ticket out the Door” (comment slips immediately after each training session) (n = 331)= 331)
2.2.Teacher knowledge test before and immediately following training (n = 331)Teacher knowledge test before and immediately following training (n = 331)
3.3.Teacher responses on surveys after completion of testing (n = 71)Teacher responses on surveys after completion of testing (n = 71)
4.4.Teacher comments on surveys after completion of testing (n = 157)Teacher comments on surveys after completion of testing (n = 157)
88
““Ticket Out The Door”Ticket Out The Door”
““Actually seeing the test and getting a chance to test on each other” (multiple Actually seeing the test and getting a chance to test on each other” (multiple comments)comments)
““Every aspect!” or “Everything” (by many teachers)Every aspect!” or “Everything” (by many teachers)
““Hands on activities, pre-written letters for parents, score sheets.”Hands on activities, pre-written letters for parents, score sheets.”
““Helpful hints on how to administer the test”Helpful hints on how to administer the test”
““Seeing the test and the protocol [trainer] modeled useful teaching techniques”Seeing the test and the protocol [trainer] modeled useful teaching techniques”
““[trainer’s name ] knowledge of the material”[trainer’s name ] knowledge of the material”
““The training manual and the powerpoint”The training manual and the powerpoint” 99
Teacher Knowledge Test After TrainingTeacher Knowledge Test After Training
Teachers were given a short test of their knowledge of the Teachers were given a short test of their knowledge of the fitness testing procedures, applicable policies, and testing fitness testing procedures, applicable policies, and testing requirements before and then immediately after each training requirements before and then immediately after each training session.session.
Pre-training mean score correct = 58.0% (50% for pilot Pre-training mean score correct = 58.0% (50% for pilot
year teachers)year teachers)
Post-training mean score correct = 77.0%Post-training mean score correct = 77.0%
1010
Evaluation of Fitness TestingEvaluation of Fitness Testing
Five data sources were used to evaluate the fitness Five data sources were used to evaluate the fitness testing:testing:1.1.Observations of teachers’ adherence to test protocolsObservations of teachers’ adherence to test protocols
2.2.Observations for test reliabilityObservations for test reliability
3.3.Time analysis for testing, data entry and report generation Time analysis for testing, data entry and report generation and distributionand distribution
4.4.Teacher comments on surveys after completion of testingTeacher comments on surveys after completion of testing
5.5.Focus group interviews with teachers and studentsFocus group interviews with teachers and students 1111
Teachers’ Adherence to Test ProtocolsTeachers’ Adherence to Test Protocols
Test component
N of students
Observed
N of itemson checklist
Mean % included and
correct
Low -- High %
Push Ups 271 7 92.8 42.9 -- 100
PACER 454 6 84.0 44.4 – 100
Sit and Reach 557 9 78.9 57.1 -- 100
Curl Ups 426 11 94.5 36.4 -- 100
Height 130 6 87.6 83.3 – 100
Weight 184 4 82.9 50.0 -- 100
1212
Observations for Test ReliabilityObservations for Test Reliability
““Expert” Observer TrainingExpert” Observer Training• 8 GSU graduate students (PETE and Exercise Science)8 GSU graduate students (PETE and Exercise Science)
• 2 GSU faculty trainers2 GSU faculty trainers
• On-campus training (similar to the 8-hour teacher training)On-campus training (similar to the 8-hour teacher training)
• On-campus training for observation/data collectionOn-campus training for observation/data collection
• School-based (high school) observation of all test School-based (high school) observation of all test componentscomponents
• Video-based form break training (curl-ups, push ups, and sit Video-based form break training (curl-ups, push ups, and sit and reach)and reach)
• All observers had >80% IOA before collecting data in All observers had >80% IOA before collecting data in schoolsschools
1313
Observations for Test ReliabilityObservations for Test Reliability
On site observations by expert observers:On site observations by expert observers:• Prior arrangements with teacher and signed consentPrior arrangements with teacher and signed consent
• Conducted checklist for adherence to testing protocolsConducted checklist for adherence to testing protocols
• Observed student/s being testedObserved student/s being tested
• Recorded their “expert” score and the student’s “official” score (what Recorded their “expert” score and the student’s “official” score (what was used by the teacher in that student’s report)was used by the teacher in that student’s report)
• Recorded the number of students being tested at one timeRecorded the number of students being tested at one time
• Recorded what non-tested students were doingRecorded what non-tested students were doing
• Recorded the duration of each testRecorded the duration of each test
• Recorded who the “counter” was (self-, peer, PE teacher, parapro, Recorded who the “counter” was (self-, peer, PE teacher, parapro, volunteer)volunteer)
• Did not Did not report student age and genderreport student age and gender1414
Observations for Test ReliabilityObservations for Test Reliability
Agreement definitionsAgreement definitions
Test component
Unit of measurement
Reliability Agreement Range
Push ups Number completed correctly +/- 1
Curl ups Number completed correctly +/- 1
Sit and reach .50 inch +/- .50 inch
PACER Laps completed +/- 1
Height .25 in. +/- .25 in.
Weight .25 lbs +/- .25 lbs.
1515
Observations for Test ReliabilityObservations for Test Reliability
Test componentScorer/Reporter
(n scores reported)
Agreements with expert observers’
scores
Mean variation from expert
observers’ scores
Height PE Teacher (130) 98.1% - 1.8%Weight PE Teacher (184) 100% 0.0%Sit and Reach PE Teacher (557) 71.8% + 0.05%
PACER
All (454) 72.5% + 15.1%PE Teacher (186) 70.4% + 29.2%Other Teacher (18) 88.9% - 1.1%Peer Student (200) 66.0% + 6.8%Self reported (50) 60.0% + 10.4%
Push Ups
All (271) 45.7% + 90.3%PE Teacher (146) 65.8% + 32.8%Paraprofessional (15) 33.3% + 226.5%Peer Student (110) 20.9% + 198.4%
Curl Ups
All (426) 56.9% + 45.7%PE Teacher (208) 69.7% + 13.6%Paraprofessional (19) 63.2% + 89.0%Peer Student (194) 42.3% + 72.0%Self reported (5) 0.0% + 550.0%
1616
Observations for Test ReliabilityObservations for Test Reliability
1. On those tests that the PE teachers scored and recorded the performance of one student at a time (height, weight), the data are extremely reliable;
2. On Sit and reach, with 1 student tested at once by the teacher, the data are also not reliable (problem with multiple scales on box)
3. On those tests that the PE teachers shared the responsibility for scoring/reporting and multiple students were tested at once, the reliability of the data in unacceptable (PACER, 72.5%; Push ups, 45.7%; Curl ups, 45.7%)
4. Student peer and self-reported scores were extremely unreliable on PACER, Push ups and Curl ups;
5. With the exception of “Other teachers” on PACER the means for scorers/recorders on PACER, Push up and Curl up tests overestimated actual performance, often by significant amounts.
1717
Compliance with Testing GuidelinesCompliance with Testing Guidelines
Test component
RecommendedNumber to Test at
One time
Compliance with Recommendations
Height 1 100%
Sit and reach 1 100%
Weight 1 100%
Curl ups No more than 4 64.5%
Push ups No more than 4 54.6%
PACER No more than 6 41.4%
1818
Involvement by Non-Tested StudentsInvolvement by Non-Tested Students
Test
SittingAnd/or Waiting
Counting other students
Engaged in physical activity or lesson content
Sit and Reach 52.9% 0.0% 47.1%
Height 57.1% 0.0% 42.9%
Weight 63.6% 0.0% 36.4%
Curl Ups 32.5% 48.4% 19.4%
PACER 48.3% 35.5% 16.1%
Push Ups 46.2% 46.2% 7.6%
All tests combined 46.3% 30.9% 22.8%
1919
Time analysis for testing, data entry and Time analysis for testing, data entry and report generation and distribution*report generation and distribution*
Indicator Elem. MS HS
Class size Mean 41.4 32.5 22.6
PE Class time Mean 44.5 mins 66.6 mins 74.3 mins
PE classes/week Mode 2 5 5
PE class days to complete testing
Mode 10(5 weeks)
10(2 weeks)
3(< 1 week)
Percent of PEInstructional time
Mean 14%(Annual)
11%(9 weeks)
3.5%(9 weeks)
Data entry Mean 157 mins 62.7 mins 64.5 mins
Report prep and distribution
Mean 105 mins 62.7 mins 110.5 mins
*Based on 1 intact class of tested students, identified by each teacher*Based on 1 intact class of tested students, identified by each teacher
2020
Teacher comments on surveys after completion of testing
On their training:
“The training was excellent.”
“Practicing with the students helped work out the kinks. The training also helped because we got to see exactly what was expected of the students during the test.”
“I have had experience with conducting the FITNESSGRAM in 1994-95. HOWEVER, the training was a refresher course and was helpful towards collecting data from students.”
2121
Teacher comments on surveys after completion of testing
On conducting testing in their schools:
“There were no problems.” (many same or similar responses on all tests)
“It is hard to watch for all the form errors that might occur when testing more than 2 students per trained adult.” (Curl Ups)
“It is just time-consuming. To test with fidelity, one student per adult was the most that could be tested at a time.” (Push Ups)
“Keeping non testing students engaged and supervised while testing. It was difficult to count and supervise at the same time.”
“It was difficult to motivate some students to do their best. A few students walked a good bit even after being encouraged to run at a steady pace and walk only if they needed to.”
2222
Teacher comments on surveys after completion of testing
On data entry:
“The data entry was the worst part, it was not organized like it has been in the past. I like it better when only my students are on a list. It is too much to have to sort through 20 different classes. I hated it [data entry] this year.”
“I had to use pencil and paper to record my scores, then take them back to the office to input data, so it took twice as long. It would have better if I had a device (like an Ipad) to enter scores directly into the program.”
“Instead of only my class appearing to enter data, all physical education teachers’ students were on the list. I accidentally deleted my coworkers' scores because I thought only my class was listed on my log in.”
“The entire process is time consuming. More training is definitely needed. We were actually taught HOW to administer the test, but data entry was where the training REALLY needed to be.” 2323
Teacher Comments on Surveys After Completion of Testing
On report preparation and distribution“While generating reports, I had several student reports that were not printed. I discovered that this was due to the fact that these students had a report in another class in which they were tested. Even though the testing data was shared between teachers, it was a problem printing the reports. It would only print with one teacher's name and not both.”
“Generating reports--very slow and labor-intensive. I finally printed them out from home on a Sunday evening. This was much quicker than trying to print them from the FITNESSGRAM (sic) website during my 30-minute planning period. The software would often time out before the info to be printed was loaded.”
“Additionally, I couldn't sort it because the name of the teacher (my identifier) is not listed at the top of the report when printed out. So unless you have an awesome memory once the reports print all mixed up there is no way to determine what students were in what class. I also had some issues with how the report came out based upon Spanish or not. The school I teach in is majority Hispanic. I got it figured out but it was through trial and error.”
2424
Teacher Confidence in Accurate Scoring
Level of Confidence After Training/Before Testing
After Testing
1. Not at all confident 0.3% 0.0%
2. 0.3% 1.7%
3. Somewhat confident 3.0% 20.3%
4. 34.0% 52.5%
5. Extremely confident 62.3% 25.4%
Mean confidence score 4.58
N = 331
4.02**
N = 71
** p. < .0002525
Focus Group Interviews with Teachers and Students
From 6 different districts around the state
57 PE teachers
56 students (upper elementary and middle school)
Semi-structured interviews, transcribed and analyzed with NVivo Qualitative Software
2626
Focus Group Interviews with Students
Overall Experience.
Focus groups facilitators asked students to tell them about the best and worst parts of the testing. A majority of children reported that fitness testing made them feel good about themselves. Focus groups showed that children seemed to enjoy testing more as they saw themselves progress in their physical abilities. One student echoed a response heard from many, “…we set a goal or we had our own personal goal, but if we exceeded that, we felt really good…I like the personal boost you get from doing well.” An elementary student reported, “I like the test, because if you do the test you’ll have a healthy life and you can do more stuff than you used to be able to do.”
2727
Focus Group Interviews with Students
Communication to them about the Fitness Assessments
Students had a clear understanding of the purpose of the testing (as communicated by teachers), and received a clear explanation of the reason for fitness testing. Many had a sense of the value of fitness in their lives. Students expressed an understanding that the “state is trying to get a good assessment of the general health of the school’s population…” or “trying to find out how healthy is the state of Georgia.” Students expressed that they felt informed and prepared before testing began. One student commented “they didn’t just throw it at us. We knew long before that we were going to have to do this.”
2828
Focus Group Interviews with Students
Time for Conducting Testing
The amount of time perceived by students to complete fitness testing varied greatly among districts. Some perceived testing to have only “taken three days so it wasn’t that bad…” Some students “didn’t mind doing it at all…” Others felt that time taken to do testing detracted from regular PE and was not as enjoyable for that reason. One student commented, “Games are the best part…but we only got to play [games] half the time.” For a fair amount of students this was a reason that they did not like testing, whereas others did not mind and enjoyed the time taken to test. Students recommended cutting down on the time spent testing, however did not provide recommendation on how to do so.
2929
Focus Group Interviews with StudentsAccuracy of Data
It was widely reported among students that peer testing generated inaccurate testing results. One student remarked, “…if it was your friend spotting you, they’d let you slide… and we’re all kind of friends, so the numbers might not be terribly accurate. I think with like a peer review, it’s not very accurate because your friends cheat for you all the time.” Students recommend not using peer testers, because they tend to report inaccurate scores for their friends. Students also expressed frustration when they realized scores were not being recorded accurately, or when obvious cheating was occurring. Students recommended testing students in smaller groups to avoid cheating. Cheating was particularly obvious to some students when classes were testing in large groups. 3030
Focus Group Interviews with Students
Make Fitness Testing More Enjoyable
Students had general recommendations of how to improve fitness testing for students. One student commented that “girls don’t like that everyone is watching them, so they don’t go as far as they can.” This same student, among others, commented that perhaps separating testing for boys and girls would improve girls’ testing experiences. Students recommended that fitness testing could be done in another way, “…ya’ll could make a game out of this fitness test. That would be cool. So we don’t actually know we’re doing the fitness test.”
3131
Focus Group Interviews with Teachers
Overall Experience:
The teachers who participated in the focus groups were generally supportive of FITNESSGRAM® testing. Overall, it was viewed as important and as an improvement upon the Presidential Fitness Test. Teachers overwhelmingly agreed that the strength of the program was that tests were based on individual goals, and that this contributed to children developing self-esteem, confidence, and in some cases highlighted the accomplishments of students who may not have realized what they could achieve. Teachers expressed general satisfaction with training, preparedness, and administrative support provided to them to implement testing. They also discussed some of the challenges, and had suggestions for improvements.
3232
Focus Group Interviews with Teachers
Primary Challenge 1: Time Required for Testing
Teachers in elementary and middle grades unanimously expressed concern regarding the large volume of time required to complete fitness testing. Teachers noted that field days, CRCTs, tutoring, and other kind of disruptions frequently challenged their ability to test. One teacher complained, “We don’t have enough time to get this done. That is the biggest challenge.” Time challenges varied according to grade level. Middle school had a particular challenge because they received a new class every nine weeks and had to test every new student who enters at that time. Elementary teachers were challenged because “[young students] need time; they need help and it really goes slow.” High school classes, on the other hand, seemed to be able “roll really quickly.” A number of teachers commented that it would be valuable to test at the beginning and end of the year, but added that it would be difficult due to time constraints.
3333
Focus Group Interviews with Teachers
Primary Challenge 2: Software Issues
A very clear theme of the focus groups was that teachers faced challenges with the FITNESSGRAM® software, with entering, maintaining, and printing data. A number of teachers described the software as “aggravating.” School district technology staff support was available but often was slow to respond due to dealing with many complaints or issues. Some teachers commented that data entry “once it got rolling, was a piece of cake.” A few teachers described their use of the Ipad for data entry, explaining that the software did not work well with it. Most significantly, teachers overwhelmingly reported wanting to have control over their class rosters, to be able to add and remove students as needed. Teachers described their frustration when students whose names appeared on their roster were “not even enrolled in our school.”
3434
Focus Group Interviews with Teachers
Primary Challenge 3: Achieving Accurate Scores
Many teachers were concerned with the accuracy of data. It was difficult to ensure standardized readings on the push-ups and curl-ups. Teachers also explained that because this was the first year of testing and students did not know how to properly do curl-ups and push-ups. According to one teacher, “it was a matter of just getting it done, the data will reflect that.” Also, teachers reported that they did not have confidence in peer testers, that “there [were] definitely some inconsistencies with that.” Another teacher commented that human error would always be a factor in testing, “…one person is going to implement and test them in a certain way, and then another teacher may do it another way. And so the data that you get for one class may be totally differently skewed from what you would get from another teacher.” 3535
What Have We Learned In Georgia?
• It is possible to train almost every physical education teacher in a state as large as Georgia
•It takes more than good training to get reliable data
•We know that some teachers did a really good job, but other teachers didn’t
•In order to get consistently reliable data, teachers need to comply with the recommended testing group sizes
•Kids can’t be trusted to count for peers or themselves
•Accuracy vs time tradeoff
•There needs to be a high degree of coordination between FG software and district technology coordinators
•Teachers need a voice about how fitness testing (and the reporting of results) will be administered
3636