practical measurement and productive persistence ... · pdf filepractical measurement and...

23
(2016). Practical measurement and productive persistence: Strategies for using digital learning system data. Journal of Learning Analytics, 3(2), 116–138. http://dx.doi.org/10.18608/jla.2016.32.6 ISSN 1929-7750 (online). The Journal of Learning Analytics works under a Creative Commons License, Attribution - NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0) 116 Practical Measurement and Productive Persistence: Strategies for Using Digital Learning System Data to Drive Improvement Andrew E. Krumm SRI International, USA [email protected] Rachel Beattie Carnegie Foundation for the Advancement of Teaching, USA Sola Takahashi Carnegie Foundation for the Advancement of Teaching, USA Cynthia D’Angelo SRI International, USA Mingyu Feng SRI International, USA Britte Cheng SRI International, USA ABSTRACT: This paper outlines the development of practical measures of productive persistence using digital learning system data. Practical measurement refers to data collection and analysis approaches originating from improvement science; productive persistence refers to the combination of academic and social mindsets as well as learning behaviours that are important drivers of student success within the Carnegie Foundation for the Advancement of Teaching’s Community College Pathways (CCP) Network Improvement Community (NIC). Strategies for operationalizing non-cognitive factors using digital learning system data as well as approaches for utilizing them during improvement efforts are described. Keywords: Improvement science, evidence-centered design, productive persistence. 1 INTRODUCTION As interest in 21 st century skills and non-cognitive factors increases, there is a growing need to measure these factors in valid, reliable, and actionable ways (Kosovich, Hulleman, Barron, & Getty, 2014). Researchers have developed self-report and observation instruments with high internal reliability and construct validity (Duckworth & Yeager, 2015), and while many instruments have attractive measurement properties, they are often time-intensive to complete and can require teachers and learners to disengage from learning activities in order to collect the desired information (Schraw, 2010).

Upload: vuongkhanh

Post on 10-Mar-2018

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 116

Practical Measurement and Productive Persistence: Strategies for Using Digital Learning System Data to Drive Improvement

AndrewE.KrummSRIInternational,USA

[email protected]

RachelBeattieCarnegieFoundationfortheAdvancementofTeaching,USA

SolaTakahashiCarnegieFoundationfortheAdvancementofTeaching,USA

CynthiaD’AngeloSRIInternational,USA

MingyuFengSRIInternational,USA

BritteChengSRIInternational,USA

ABSTRACT:Thispaperoutlinesthedevelopmentofpracticalmeasuresofproductivepersistenceusingdigital learningsystemdata.Practicalmeasurementrefers todatacollectionandanalysisapproaches originating from improvement science; productive persistence refers to thecombinationofacademicandsocialmindsetsaswellas learningbehavioursthatare importantdrivers of student successwithin the Carnegie Foundation for the Advancement of Teaching’sCommunity College Pathways (CCP) Network Improvement Community (NIC). Strategies foroperationalizingnon-cognitivefactorsusingdigitallearningsystemdataaswellasapproachesforutilizingthemduringimprovementeffortsaredescribed.

Keywords:Improvementscience,evidence-centereddesign,productivepersistence.

1 INTRODUCTION

Asinterestin21stcenturyskillsandnon-cognitivefactorsincreases,thereisagrowingneedtomeasurethese factors in valid, reliable, and actionable ways (Kosovich, Hulleman, Barron, & Getty, 2014).Researchershavedeveloped self-report andobservation instrumentswithhigh internal reliability andconstruct validity (Duckworth & Yeager, 2015), and while many instruments have attractivemeasurement properties, they are often time-intensive to complete and can require teachers andlearnerstodisengagefromlearningactivitiesinordertocollectthedesiredinformation(Schraw,2010).

Page 2: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 117

Datacollectiontechniquesthatkeepteachersandlearnersengagedintheworkofteachingandlearningcanbeanimportantresourceforimprovingschools(Yeager,Bryk,Muhich,Hausman,&Morales,2013).Onereadysourceofdatathatcanbeleveragedbybothresearchersandpractitionersisgeneratedfromlearners’ direct interactions with digital learning tools and environments (Hadwin, Nesbit, Jamieson-Noel,Code,&Winne,2007).

Inthispaper,wedescribeapproachesfordevelopingpracticalmeasuresofhard-to-measureconstructs,suchas intrapersonalcompetenciesand21stcenturyskills,usingsystemlogdata(Stecher&Hamilton,2014).Practicalmeasures,alsoreferredtoas“improvementmeasures,”aredevelopedandusedaspartof quality improvement efforts and are distinguished from two other, perhaps more familiarmeasurementpurposes: researchandaccountability.Data collected for thepurposesof researchandaccountability are used to build knowledge and evaluate the performances of individuals andorganizations, respectively.However, researchandaccountabilitymeasures canhave limitedmeaningandutilitytonon-researchersforreasonsthatwedetailbelow.Asmoreandmoreattentionisplacedondevelopingstudents’21st century skillsandnon-cognitivecompetencies, researchersandpractitionerswill need new approaches for collecting data in rapid and validways that can inform their efforts todeveloptheseskillsandcompetenciesinlearners.

AsStecherandHamilton(2014)observe,acriticalchallengeindesigninginterventionstoaidstudentsindeveloping21stcenturyskillsandcompetenciesismeasurement.Ourfocusinthispaperisonthewaysweasresearchersdevelopedmeasuresthatcouldaidinstructors incommunitycollegedevelopmentalmathematicscoursesindeliveringmoretargetedinterventionstoimprovestudentpersistenceanduseof good learning strategies, i.e.,productivepersistence. Productive persistence is related towhat theNational Research Council refers to as skills within the intrapersonal domain: “intellectual openness,work ethic and conscientiousness, and positive core self-evaluation. These clusters includecompetencies,suchasflexibility, initiative,appreciationfordiversity,andmetacognition(theabilitytoreflectonone’sown learningandmakeadjustments accordingly)” (2012,p. 4).Persistence addresseseffort and continuing to work on a task, often in the face of initial failure (DiCerbo, 2015); goodstrategies represent behaviours, such as self-regulation and study strategies that can support thesuccessfulcompletionofatask(Zimmerman,2002).

Inthispaper,wedescribehowweusedonlinelearningsystemdatatodevelopbehaviouralmeasuresofproductive persistence within an overarching improvement effort directed at increasing the successrates of students enrolled in community college developmentalmath courses. Behaviouralmeasureswere developed as part of the Carnegie Foundation for the Advancement of Teaching’s CommunityCollege Pathways (CCP) Network Improvement Community (NIC), which organizes colleges anduniversitiesthroughouttheUnitedStatesaroundacommonaimoftriplingthesuccessrateoflearnersin half the time of traditional developmental math offerings. Key to achieving this ambitious aim isimplicatinginstructors,administrators,andresearchersinthecollectiveworkofcreatingmoreeffectivelearningenvironmentsusingimprovementsciencebestpractices(e.g.,Langleyetal.,2009).

Page 3: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 118

Improvement science represents a different approach than traditional educational research, whichtypically focuses on what works (e.g., evaluation research) or why something works (e.g., theorybuilding; Bryk, Gomez, Grunow, & LeMahieu, 2015). Importantly, improvement research uses bothevaluation and theory-building to improve processes that occur within educational organizations(Russell, Jackson, Krumm, & Frank, 2013). However, improving processes in complex systems likeschools requiresnotonly knowledgeofwhatworksandwhybutalso knowledgeofhow to integratewhatworksintoregularorganizationalactivities(Lewis,2015;Penuel,Fishman,Cheng,&Sabelli,2011).Basedon the Institute forHealthcare Improvement’sModel for Improvement, integratingwhatworkscan be focused on answering three questions: 1)What arewe trying to accomplish? 2) Howwillweknowthatachangeisanimprovement?3)Whatchangecanwemakethatwillresultinimprovement?Answeringthesedeceptivelysimplequestions iscomplexwork,andcentral tothatwork is theroleofmeasurement.

2 PRACTICAL MEASUREMENT

Measures developed for the purposes of supporting improvement efforts involve differentconsiderationsandcantakeondifferentformsthanthosedevelopedforthepurposesofaccountabilityor research (Solberg,Mosser,&McDonald, 1997).Over the last fewdecades, considerable effort hasbeen invested in the development of accountability and research measures. In particular, thedevelopment of accountability measures has been prompted by the broader standards andaccountabilitymovementintheUnitedStates(Coburn&Turner,2012).Accountabilitymeasures,suchas graduation rates, student test scores, and teacherevaluation ratings, canbeuseful indefining theultimateaimsofimprovementefforts,buttheyareoftentoogeneral,infrequentlycollected,anddistantfrom day-to-day activity to inform improvement work (Coburn & Turner, 2011). Researchmeasures,which represent another common purpose for educational measurement, prioritize the disciplinarystandardsofaresearchcommunityoverpracticaluse;moreover,theytendtobecollectedinfrequentlyandarenotnecessarilydesignedtomeasurechangeovertime(Solbergetal.,1997).Themeasuresthatmost centrally benefit improvement efforts are “practical measures” — those taken directly frompracticeandeasytouseinworkingtochangeeducationalprocesses(Bryketal.,2015).

Practical measures are characterized by several qualities that distinguish them from measures foraccountability or research (Yeager, Bryk, et al., 2013). First, practical measures are specific to animprovement effort. In contrast to accountabilitymeasures,which capture general systemoutcomes,practicalmeasuresprovide informationabout theparticularprocesses thateducatorsaimto improve.Second,becausepracticalmeasuresareusedtolearnaboutspecificchangestoeducationalprocesses,theyneedtobesensitivetofluctuationsinbehavioursorperformance.Whenchangestoprocessesaretested,apracticalmeasureshouldquantifythedegreetowhichchangesoccurred.Third,intheroleofprovidingfeedbackaboutspecificchanges,measuresalsoneedtobeaccessible inatimelymannertothepractitionersimplementingthechanges.

Page 4: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 119

Fourth, the effectiveness of practical measures depends on their usefulness as information to guidesubsequent action (Bryk et al., 2015). Practitioners should be able to use thesemeasures to informdecisions and next steps related to changes being tested. Moreover, practical measures should bemeaningfultobothpractitionersandthoseinadministrativerolessupportinganimprovementproject.Therefore, practical measures depend upon the organizational routines enabling their use (Spillane,Parise,&Sherer,2011).Acultureofdatausedrivenbyjudgmentandevaluationisnotconducivetothetransparencynecessary to usedata in support of improvement. Instead, practicalmeasures aremostbeneficialwithinacontextwherefailuresareseenasopportunitiestolearn,wheretrustpervades,andwherepractitionersarewillingtotakerisksaswellasmakechangesintheirpractice.

Fifth,practicalmeasuresshouldbebasedonatheorythatoutlineshowchangestothesystemwillleadtoimprovementsintheserviceofanultimateaim(Bennett&Provost,2015).Withoutanimprovementtheory,practicalmeasuresareof limitedvaluebecause the theoryprovides the rationale forwhy themeasures are important to an overall improvement effort. By articulating a theory, aligning specificchanges to that theory, and testing changes against measures organized around the theory,improvementteamscanlearnwhatisworkingintheir localsystem,wheretherearebreakdowns,andtheconditionsunderwhichchangesarenotleadingtoimprovements.Sixth,alongwithbeingbasedinalocal improvement theory, effective practical measures are also predictive of the outcomes thatpractitioners are trying to improve. In thisway, practicalmeasures can serve as leading indicators ofimprovementgoals(Provost&Murray,2011).

Potentialsourcesofpracticalmeasuresinschoolsaredatageneratedbystudentuseofonlinelearningsystems (U.S.DepartmentofEducation,2012,2013).Asmoreandmore technology finds itsway intoschoolsunderthebannerofblendedlearning,thereisagrowingopportunityforbothresearchersandpractitioners to take advantage of these data for research, accountability, and process improvement.Forimprovementpurposes,onlinelearningsystemdataisattractivebecauseitcanbecollectedduringlearning activities without interrupting learners as well as analyzed rapidly to support formativedecision-making (Nelson, Nugent, & Rupp, 2012). As we illustrate in the following sections, onlinelearning system data can be used to develop behavioural measures that are predictive of long-termoutcomesandusefulforunderstandingchangesmadeaspartofanoverallimprovementeffort.

3 PRODUCTIVE PERSISTENCE

Thebehaviouralmeasuresdescribedinthispaperweredevelopedwithinanoverallimprovementeffortdirected at enhancing student outcomes in developmental math courses in community collegesthroughout the United States. Upwards of 70% of students who enroll in a community college areassignedtodevelopmentalmath;upwardsof80%ofthosestudentswillnotreceivecollege-levelcreditevenafterthreeyears(Bailey,Jeong,&Cho,2010).Recognizingthatthefailuretoservedevelopmentalmath students has numerous, interrelated causes, Carnegie developed Statway and Quantway toaddressmultiple factors affecting student success. Statway is a two-term course that combines bothdevelopmentalmathematics and statistical reasoning content.Quantway is a single-term, accelerated

Page 5: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 120

developmentalmath course pairedwith a college-level, credit-bearing quantitative reasoning course.BothPathwaysemphasizethecoremathematicalskillsneededforwork,personal life,andcitizenship.Moreover,bothPathwaysstressconceptualunderstandingandtheabilitytoapplywhatislearnedinavariety of contexts and problems. Cutting across both Pathways is a focus on rapid analytics, facultydevelopment, network engagement, relevant content, and productive persistence, which combined,have contributed to tripling the success rate of students in less than half the time of traditionaldevelopmentalmathprograms(Strother,VanCampen,&Grunow,2013;Strother&Sowers,2014;VanCampen,Sowers,&Strother,2013).

Carnegie and the CCP NIC define productive persistence as tenacity plus the use of good strategies.Underthisdefinition,productivepersistenceentailsstudentshavingtheacademicandsocialmindsetsas well as strategies and behaviours necessary to move effectively past challenges (see Figure 1).Academic and socialmindsets, which are also referred to as intra-personal competencies (Stecher &Hamilton,2014),areassociatedwithstudentsuccessinavarietyofeducationalsettings(Farringtonetal.,2012)andcanoftenbeaddressedby targeted interventions (Dweck,Walton,&Cohen,2014).Forexample,it iscommonforastudentintheU.S.tobelievethatheorsheisnota“mathperson”whenmathdoesnotcomeeasily(Blackwell,Trzesniewski,&Dweck,2007;Stigler&Heibert,1999).Studentsmayalsoquestionwhethertheytrulybelongorwillberespectedinacollegesetting—somethingthatmay especially beset students of colour, who may face negative stereotypes about the academicpotentialofmembersoftheirracialorethnicgroup(CCCSE,2010;Gardenshire-Crooks,Collado,Martin,&Castro,2010;Walton&Cohen,2007).

Figure1:Productivepersistencedriverdiagram.

Basedonmultiple literature reviewsaswell as interviewswith leadingexperts, facultymembers, andstudents,researchersatCarnegieidentifiedthatmanystudentswhodonotcompleteadevelopmentalmathcourseeitherwithdraweffortorgettoofarbehindduringthefirstfourweeksofacourse(Cook,Purdie-Vaughns,Garcia,&Cohen,2012;Vaquero&Cebrian,2013).Forthatreason,CarnegiedevelopedtheStartingStrongPackagetopreparefacultytolaunchacoursesuccessfullyandhelpstudentsdevelop

Page 6: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 121

themindsetsandskillsneededtosucceedincollege.TheStartingStrongPackageisacombinationof10instructor-led daily routines and activities launched during the first month of all CCP courses. Theroutines and activities aim to reduce stereotype threat and belonging uncertainty (Walton & Cohen,2007), reducemathematical anxiety (Jamieson,Mendes, Blackstock,& Schmader, 2010), increase theperceivedrelevanceofmathematicsandstatistics(Hulleman&Harackeiwicz,2009),andfosterasenseof purpose (Yeager et al., 2014). It includes a brief, one-time “growthmindset” reading and writingactivitythathasbeenshowntoincreaseoverallmathematicsgradesamongcommunitycollegestudents(Yeager&Dweck,2012;Yeager,Paunesku,Walton,&Dweck,2013).Therearealsoroutinesembeddedintotheonlinelearningenvironmentthatpromotethedevelopmentofself-regulatedlearningstrategies(Zimmerman,Moylan,Hudesman,White,&Flugman,2011).

MeasuringwhenstudentsarenotproductivelypersistingisanincreasinglyimportantpracticewithintheCCPNIC. Academic and socialmindsets around productive persistence are currentlymeasured in theCCPthroughaseriesofshort,contextsensitiveself-reportitemsgiventostudentsinthefirstandfourthweeksofthecourse(seeYeager,Bryk,etal.,2013,foradescriptionofthedevelopmentoftheseself-reportmeasures).Behaviouralmeasuresstemmingfromonlinelearningsystemdatawereintendedtosupplement pre-existing self-reportmeasureswithin an overallmeasurement systemusedwithin theCCPNIC(Martin,Nelson,Lloyd,&Nolan,2007).

Whenusingonlinelearningsystemdata,therearefewstandardizedprocessesforgoingfromclickstoconstructs.Therelativenewnessofusingonlinelearningsystemdatanecessitatedtheidentificationofanapproach thatgives“communicablemeaning” tobehavioursoperationalizedusingsystem logdata(Provost & Murray, 2011). In our efforts to develop practical, behavioural measures of productivepersistence,weusedelementsoftheassessmentprocessknownasevidence-centreddesign(ECD)andthespecifictoolofdesignpatternstospecifykeyelementsofeachmeasure.

4 EVIDENCE CENTRED DESIGN

Manydigital learningenvironmentsgeneratelargeamountsofdataonstudentinteractionswithinthesystem, making it possible to track and identify an array of behaviours. To support our efforts inunderstandingstudentuseofonlinesystemswithintheCCPNIC,weutilizedaseriesofECDtoolsandprocesses.MislevyandHaertel succinctlydefineECDas follows:“Evidence-centredassessmentdesign(ECD) provides language, concepts, and knowledge representations for designing and deliveringeducationalassessments,allorganizedaroundtheevidentiaryargumentthatanassessmentismeanttoembody” (2006, p. v). Key to this definition is the idea of an “evidentiary argument,” wherebyassessment developers inherently make claims about what students know and can do based on theevidencecollectedfromstudentperformances.TheaimofECDistopromotetheidealsofexplicitnessandreusabilitywhenmakingclaimsaboutstudentsfromoften-imperfectevidence.

ECD isaprocess; itorganizestheworkof forminganevidentiaryargument intofive layers:1)domainanalysis, 2) domain modelling, 3) conceptual assessment framework, 4) assessment implementation,

Page 7: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 122

and5)assessmentdelivery.Domainanalysis involvesunderstandingthefocaldomainof interest,suchas scientific inquiry or self-regulated learning strategies.Domain modelling includes laying out one’sargumentforconnectingclaimsaboutconstructstothetasksinwhichstudentsareengagedandfromwhich the evidence will be drawn. Design patterns are a specific tool used in the domainmodellinglayer; design patterns are built around threemodels and their interrelationships: 1) a studentmodel(whatknowledge, skills,andabilitiesarebeingmeasured?),2)a taskmodel (whatactivitieswill allowstudentstodemonstratethosebehaviours?),and3)anevidencemodel(whatdataprovideevidence—clicks,events,or log files—of thosebehaviours?).Theconceptualassessment frameworkbuilds fromthe argument expressed in the domain-modelling layer as the actual algorithms and measurementmodelsusedtoanalyzecollectedevidence.Assessmentimplementationandassessmentdeliveryinvolvepreparinganddeployingtheassessmentaswellascollectingandanalyzingdata.

For thepurposesofdevelopingpracticalmeasuresofproductivepersistence,weengaged in fourECDlayers,payingparticularattentiontoourevidentiaryargumentintheformatofadesignpattern.Forthedomain-modelling layer, SRI and Carnegie drew on existing research conducted by Carnegie onproductivepersistence;wealsoengagedinsubsequentliteraturescanstobetterunderstandthetypesof behaviours identified using online learning system data. Following this step, we then alignedresearch-basedbehavioursandstrategiesinrelationtothewayinwhichlearningtaskswereorganizedwithintheonlinelearningsystemsusedintheCCPNIC.Lastly,wereconciledresearch-basedbehavioursandonline learning system taskswith the actual data tracked and storedby theonline systems. Thisreconciliationhelpedinframingourevidentiaryargumentfortheanalyticalapproaches(i.e.,conceptualassessment framework)we used to distill each behaviour from the broader dataset (i.e., assessmentimplementation).

WhileECDisoftencharacterizedasanapproachforprospectivelydevelopingassessmenttasksgearedtowardsurfacingspecificconstructs,ECDanddesignpatternscanalsobeusedretrospectivelytomakesense of data generated by an online environment (DiCerbo, 2015). Using design patterns to reverseengineer pre-existing learning environments holds a great deal of potential for learning analyticsresearchersasdesignpatternsdrawattentiontothewaysinwhichcertaindigitallearningenvironmentscouldbeusedtomeasurespecificconstructsbasedonthedataloggedbythesystemandthestructureoflearningtaskswithinthesystem.

Given thenewways inwhichdata canbe collectedandanalyzed fromdigital learningenvironments,there is an increasing need to address the meaning behind these measures where millions ofobservations can be logged longitudinally.Measurement challenges, such as 1) under-representing aconstructand2)introducingconstruct-irrelevantvariance(Messick,1994),cancomeaboutregardlessofthe analytical technique used or the size of the dataset. Underrepresentation of a construct, forexample, implies that the data used tomeasure a construct does not adequately address underlyingcomponentsof the construct. This is particularly problematicwhenusingonline learning systemdatabecause,unlessengineeredaroundspecifictaskstogenerateclearevidence(e.g.,CognitiveTutors),thelearningenvironmentmaynotcollectnorstoredataonthespecificbehavioursaresearchermightwish

Page 8: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 123

to use. Instead, researchers must make do with the data at hand, which could lead to using dataunrelated to the construct. In providing a structuredprocess, ECD, and in particular, design patterns,holdpromisefordevelopingmeasuresusingonlinelearningsystemdatabecause,asaprocess,ECDcanhelp researchers seewhereandhow specificbehaviours canbemeasuredwithin specific activities indigitalenvironments(Mislevy,Behrens,DiCerbo,&Levy,2012;Ruppetal.,2012).

5 BEHAVIOURAL MEASURES OF PRODUCTIVE PERSISTENCE

Duringthe2013–2014academicyear,StatwaycoursesusedCarnegieMellon’sOpenLearningInitiative(OLI) platform. Using data from OLI as well as primary data collected by Carnegie researchers andcommunity college facultymembers,wedistilledmultiple practicalmeasures associatedwith studentuse of the system. The types of events logged by OLI included pages viewed by students (PV),assessmentsreferredtoas“Checkpoints”(CP),andpracticeactivitiesreferredtoas“LearningbyDoing”(LBD) and “Did I Get This?” (DIGT). LBDs and DIGTs are interspersed throughout pages students areexpectedtoread,andastheirnamesimply,helpstudentspracticecontent-relatedknowledgeandskillsthrough interactive tasks (i.e., LBD) and self-assessments (i.e., DIGT). Statway I, which is oftenimplemented in the first semester of an academic year, is typically comprised of six modules (e.g.,“TypesofStatisticalStudiesandProducingData”)andtwelvetopics(e.g.,“CollectingData—Sampling”).For thepurposesof thispaper,wereportonbehaviouralmeasuresdevelopedusingPV,CP,LBD,andDIGTeventsacrosssixmoduleswithinStatwayI.

Along with OLI data, Carnegie also collected common assessments from community collegesparticipating in theCCPNIC. Thepre-assessmentwasdeveloped toassess students’understandingofmathematicalconcepts(e.g.,fractions,decimals,andalgebraicnotation)beforeenteringaCCPcourse.Thepre-assessmentiscomprisedof21itemswithaninternalreliabilityof.80.Itisgiventostudentsinthe first week of the course and has been identified as one of the best early predictors of studentsuccess in CCP courses. The summative assessment is made up of 40 items; it is also a rigorouslyvalidated and comprehensive instrument. For example, all items on the post-assessment were field-tested with over 500 members of a comparison sample who had taken a college mathematics orstatistics course. A recent report by Strother and Sowers (2014) describes the relationship betweenhighersummativeassessmentsscoresandstudents’end-of-termgradeswithintheCCP.Boththepre-assessment and the summative assessment were scored using a 1-parameter item-response-theorymodel,i.e.,aRaschmodel.

5.1 Development Process

The dataset used to develop behavioural measures of productive persistence included the pre-assessment,thesummativeassessment,andOLIdata,whichincludedPV,CP,LBD,andDIGTevents.Thishistoricaldatasetprovidedtheopportunityto identifypredictivemeasuresthatcouldthenbeusedbyCCP instructors using real-time data displays in later semesters to measure ongoing improvementefforts.Asweusedtheproductivepersistenceframeworktoidentifypotentialbehaviours,makesense

Page 9: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 124

ofOLIlearningtasks,andanalyzeevidencegeneratedfromthosetasks,weusedthestudent,task,andevidencepromptsfromatypicalECDdesignpatterntoorganizetheprocess.Forexample,aneffectivetask model includes characteristic features and variable features associated with the task in whichstudents are engaged along with specific task products (i.e., what students explicitly or implicitlyproduce as they engage in and complete a task). Characteristic features help in clarifying what isrequiredofa task inorder to identifya focalbehaviour.Forexample,characteristic featuresofa taskfrom which one might wish to measure persistence would require 1) a threshold for determiningsuccess,2)studentfeedbackinrelationtothatthreshold,and3)opportunitiesforstudentstoretrythetask.Variablefeaturesareaspectsofthetaskthatcanvaryorbevariedintentionally.Variablessuchasdifficulty of the task, whether a task is graded, and time allotted for the task can vary or be variedintentionallywhenmeasuringpersistence.

The evidencemodel, unlike the taskmodel, specifies how task products are analyzed in the form ofpotentialobservationsandpotentialframeworks.Potentialobservationsintheevidencemodeloutlinewhattofocusonacrossvarioustaskproductsalongwithspecificationsforthequalitiesoftaskproducts,such as high,medium, or low, aswell as right orwrong. Lastly,potential frameworks help in puttingqualitativeassessmentsofevidenceintocontext.Forexample,potentialframeworksaidinarticulatingtheconditions(e.g.,earlyorlateinatask)underwhichhighorlowisappropriate.Throughthisiterativeandcomparativeprocessusingdesignpatterns,webegantoidentifypotentialbehavioursthatcouldbeoperationalizedusingOLIdataelementsinrelationtothebehavioursidentifiedacrossmultipleresearchscans.1

As we aligned student, task, and evidence models, we also engaged in preliminary descriptive andcorrelational analyses related to CPs, PVs, LBDs, and DIGTs. Early work, for example, pointed to theimportanceofcompletingandsucceedingonCPsinrelationtostudentperformanceonthesummativeassessment. With an initial understanding of the importance of passing CPs, we returned to theliterature in an effort to better understand how to conceptualize thatwhich students do before andafterengagingintheseassessments.Specifically,weusedliteraturerelatedtoa“preparationforfuturelearning” orientation, which addresses students’ abilities to adapt to new environments and act asindependent learners (Bransford& Schwartz, 1999; Schwartz, Bransford,& Sears, 2005).We focusedtask and evidence models around the choices that students made prior to and after taking a CP.Schwartz and Arena (2013) argue that in learning environments where students have discretion, theactivitiesand resourcesused in the serviceof learning canbemore illustrative than traditional, item-basedassessmentsinunderstandingstudents’developmentasindependentlearners.

Schwartz, Arena, and colleagues prospectively design the environments that assess the choices thatstudentsmake inordertoevokespecificconstructs.Thus, thechoicesthatstudentsmake intermsoftheactivitiesandresourcestheyuseincompletingataskaredesignedtoelicitdatathatisinterpretablein relation to the task and intended construct, such as critical thinking.Moreover, whether or not a

1Visitwww.a4l.sri.comtoviewsampledesignpatternsaswellastheproductivepersistencedesignpattern.

Page 10: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 125

learnerissaidtohavedemonstratedtheconstructcombinessystemlogdataandanoutcomemeasure,wherebytheoutcome,suchasawin-stateinagame,isnotastandaloneactivitybutembeddedintheoveralltask.AsRuppetal.(2012)observe,theabilitytoseecommonalityacrossdiversetasksisoneofthemanybenefitsofusingECD.WithinOLI,aCPisastandaloneactivity;readingandpracticeactivitiesavailable to studentsarealignedwith theCPandareat the student’sdiscretion touseduringanOLIsession. These characteristic features of learning tasks within OLI provided the opportunity to makesenseoftheactivitiesstudentsengagedinpriortoandaftertakingaCP.

ThechoicesthatstudentsmadepriortoandfollowingCPswereoperationalizedinthefollowingways.We first ordered the events that students generated across PV, LBD, DIGT, and CP activities by thetimestamploggedbyOLIwithinasession,definedasthetimebetweenauniqueloginandlogout.Weorganizedouroveralltaskmodelaroundindividualsessions,wherebystudentscouldexercisediscretionin the activities in which they engaged. For example, students could take an assessment that had athresholdforsuccess,retakeassessmentsmultipletimes,andmovefreelyamongreading,practice,andassessmentactivities.

Oneofthefirstbehavioursthatwewereinterestedinunderstandingwasthedegreetowhichstudentsread and practiced during a session where they took one or more CPs. In some courses, CPs wererequired, so itwas likely thatwewould see consistentCPusewithin these coursesbut that studentsmightnotengageinothertypesofactivities.Thus,thefirstmeasurethatwedevelopedcapturedwhatwereferredtoasaCP-onlysession.CP-onlysessionsweredefinedasastudentloggingintoOLI,takingone or more CPs, and not logging other activities during that same session. Preliminary analysesrevealed that the more CP-only sessions that a student logged, the less well he or she did on thesummativeassessment.

TheCP-onlymeasurewasonewayofunderstandingthetypesofsessionsthatstudentslogged.Relatedto CP-only sessions, we then developed a measure capturing whether students read, practiced, andassessedwithinasinglesession.Wereferredtothesemulti-activitysessionsasrobustsessions.Forthismeasure, studentsneeded to engage in PV, LBD/DIGT, andCPwithin the same session; however,wewere agnostic to the order inwhich students engaged in these activities.We further operationalizedstudentbehavioursinrelationtothenumberofeventsthatastudentloggedbeforetakingtheirfirstCPwithinasession.Whilerobustsessionsdidnotattendtotheorderofevents,forthismeasure,referredtoaseventsbeforefirstCP,wewantedtocapturewhatstudentsdidinpreparationforanassessment.Schwartz and Arena (2013) point to the importance of attending to what students do prior to anassessment as indicative of broader patterns of development in relation to becoming independentlearners.ThefinalbehaviourthatweillustrateinthispapercaptureshowstudentsendanOLIsession.Forthismeasure,referredtoasnoattemptafterlowscore,wefocusedonwhenstudentsscoredbelow60% on a CP andwhether or not they engaged in any activity following a low score. Unlike CP-onlysessions,robustsessions,andthenumberofeventsbeforetakingaCP—allofwhichcouldbearguedto be productive or less productive learning strategies, ending a session on a low score directlyaddressedthepersistenceelementofproductivepersistence.

Page 11: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 126

Weassessed thepredictivevalidityof theabovebehaviours in the followforms:1) thepercentageofCP-onlysessionsloggedacrossallsessions(PCPONLY),2)thepercentageofrobustsessions,i.e.,sessionsinwhichstudentsread,practiced,andtookaCP,acrossallsessions(PROBUST),3)averagenumberofevents,acrossall sessions,engaged inprior totakingthefirstCPwithinasession(AVGBFRCP),and4)thepercentageof times a student ended a session after scoring at or below60%on a CP (PNOATT).Table 1 provides descriptive statistics for the two assessments, number of overall sessions logged bystudents,andbehaviouralmeasures.Thefourbehaviourmeasuresarereportedascontinuousvariablesin Table 1. For bothmodelling and interpretability reasons, we created categorical versions of thesebehaviouralmeasuresinourfinalpredictivevaliditymodels.

Table1:Descriptivestatisticsofassessments,numberofsessions,andbehaviours. Mean SD Min MaxPre-Assessment(PRE) -.074 1.19 -3.02 3.78SummativeAssessment(SUM) .72 .78 -1.32 3.65NumberofSessionsLogged(SESS) 19.14 14.55 0 160%NoAttemptAfterLowScoreSessions(PNOATT) 14.59 22.58 0 100%Read,Practice,andAssessSessions(PROBUST) 19.09 15.91 0 100%CPOnlySessions(PCPONLY) 25.18 28.10 0 100AverageNumberofEventsBeforeFirstCP(AVGBFCP) 6.90 5.91 0 48

5.2 Assessing Predictive Validity of Behavioural Measures

TheprevioussectiondescribedhowweusedECDtoalignresearch-basedbehavioursandstrategiestodatageneratedbystudentuseofOLI.Afteraligningstudent,task,andevidencemodelsthoughdesignpatterns, we identified whether behaviours predicted student performance in Statway I. Critical tohelping instructors promote productive persistence is the degree to which behavioural measuresattachedtospecific interventionsarepredictive, interpretable,andcansupport instructors inworkingwithstudentsinamoretimelyandtargetedway.

Tohelpincreatingmoreinterpretablemeasures,wecreatedcategoricalversionsofeachvariablethatcomparedstudentswhodidnotengageinthebehaviourwithstudentswhodidengageinthebehaviourto some degree and those who did to a much higher degree. Categorical versions of variables alsohelpedinovercomingmoregeneral interpretationissues inthatthePNOATT,PROBUST,andPCPONLYbehaviourshadmultiplestudentswhodidnotevidencethebehaviour.ForPNOATT,730studentswhoscored60%or lowerendedasessionbyengaging inanotheractivity.However,644studentsendedasession,at leastonce,onthat lowscore.Giventhelargeproportionofstudentswhodidnotevidencethebehaviour,wecreatedseparatecategories,whereappropriate, for studentswhodidn’tengage inthebehaviourinordertoisolatethecostsandbenefitsofengaginginthesebehavioursinthefirstplaceandtovaryingdegrees.

For PNOATT, we created four categories. The first category, “No Low Score,” was unique to thisbehaviourbecause itwaspremisedonstudentsexperiencinga lowscore in the firstplace.Therefore,

Page 12: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 127

wetreatedstudentswhodidnotscoreatorbelow60%onaCP(n=92)asaseparategroup.Studentswhodidscoreatorbelow60%butdidnotendasessiononalowscorewerecodedas“None”(n=730).Thosewhodidendasessiononalowscorewereplacedintotwocategoriesbasedonthepercentageofsessions that theyengaged in thebehaviour. Studentswereplace into twogroupsbasedonwhetherthey were above or below the median for non-0% scores: “Below Median” and “Above Median.”PROBUST and PCPONLY did not have similar issues around multiple interpretations for 0% scores;therefore,thosewhodidnotengageinthebehaviourwerecodedas“None,”andthosewhodidengageinthefocalbehaviourwerealsoplaced intotwogroupsbasedonwhethertheywereaboveorbelowthemedianfornon-0%scores.AVGBFCPdidnothavethesameproportionofstudentswhoevidenced0, so this behaviourwas broken out into tertiles and coded as “Low,” “Medium,” or “High.” Table 2providesthedegreetowhichstudentsengagedinafocalbehaviourforeachre-codedcategory.

For studentswhoexperienceda lowscoreonaCP, themedianpercentageof sessionswas25%,andstudentsbelow thismedian (n=321)averaged16.69%of their sessionswhereby theyendedona lowscore. For the group of students above the median, they averaged more than three times thepercentage of sessions per individual where they evidenced this behaviour (Mean%=50.57, n=323).Some320studentsdidnothavearobustsessionandwerecodedas“None.”Themedianpercentageofnon-0%robustsessionswas23.26%,andstudentsbelowthismedianaveraged14.60%robustsessions(n=578)andstudentsabovethismedianaveraged35.12%robustsessions(n=568).Inasimilarway,thepercentage of students’ overall sessions only comprised of CP (PCPONLY) events were as follows:“None”(n=358),“BelowMedian”(Mean%=11.33,n=560),and“AboveMedian”(Mean%=56.74,n=548).Lastly, the average number of events for the “Low” category of AVGBFCP was 1.38 (n=438), the“Medium”group(n=495)averaged5.55,andthe“High”group(n=488)averaged14.03events.

Table2:Descriptivestatisticsforre-codedbehaviouralcategories. Mean SD Min Max N%NoAttemptAfterLowScoreSessions(PNOATT) NoLowScore 0 0 0 0 92None 0 0 0 0 730BelowMedian 16.69 5.36 6.25 25 321AboveMedian 50.57 21.57 25 100 323%Read,Practice,andAssessSessions(PROBUST) None 0 0 0 0 320BelowMedian 14.60 5.30 2.86 23.26 578AboveMedian 35.12 11.72 23.26 100 568%CPOnlySessions(PCPONLY) None 0 0 0 0 358BelowMedian 11.33 6.98 0.81 26.32 560AboveMedian 56.74 21.49 26.32 100 548AverageNumberofEventsBeforeFirstCP(AVGBFCP) Low 1.38 0.61 0 2.69 483Medium 5.55 1.88 2.7 9.24 495High 14.03 4.19 9.25 48 488

Page 13: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 128

Ourmodellingstrategyforassessingpredictivevalidityinvolvedusingtheabovecategoricalmeasuresofstudents’enactmentofbehavioursaspredictorsoftheirsummativetestperformance(SUM).Alongwithbehaviouralmeasures,weincludedtwoothercovariates:Z-scorednumberofsessionsthatthestudentloggedduringthesemester(i.e.,SESSZ)andstudentperformanceonapre-assessmentofmathematicalconceptual knowledge (i.e., PRE). By modelling both sessions and students’ incoming conceptualknowledge, we controlled for two factors that could contribute to student performance on thesummativeassessmentandthatwerealsovaluablemeasuresforinstructors.

We first fitanullmodelwithSUMas thedependentvariableandnoothercovariatesusinga2-LevelHierarchical LinearModelwith students nestedwithin instructors (HLM; Raudenbush& Bryk, 2002).2ThenullmodelassessedthedegreetowhichstudentperformanceonSUMvariedwithinandbetweeninstructors(Table3).Themodel-basedestimateforthegrandmeanofSUM(γ00)was.74,whichisnearlyidentical to thenaïveaveragereported inTable1.Thebetween instructorvariationaroundthegrandmean (u0) was .28. Therefore, instructors 2 standard deviations abovemean or below themean, onaverage, scored .56 logits higher or lower on SUM. At the student level (r), a student who scored 2standard deviations above or below the mean scored 1.44 logits higher or lower. The intra-classcorrelation (ICC) for the null model (i.e., the degree to which students with the same instructorresembledoneanother)was12.9%,signallingthatmostofthevariationinSUMscoreswasattributabletostudent-levelasopposedtoinstructor-leveldifferences.3

Table3:Nullmodel.FixedEffect Coefficient Std.Err. t-ratio App.d.f. p-valueSUM,γ00 .741 .045 16.629 52 <0.001RandomEffect SD Variance d.f. χ2 p-valueINTRCPT1,u0 .280 .078 52 224.760 <0.001level-1,r .724 .524 Note:1,158studentsand53instructors.Table4presentstheinclusionofbaselinecovariates,PREandSESSZ.Thefollowing2-LevelHLMwasfit:

SUMij=γ00+γ10*PREij+γ20*SESSZij+u0j+rijSUMijisthesummativeassessmentscoreforstudentiwithinstructorj.Allpredictorsweregrandmeancentred;thus,γ00istheinterceptforstudentswithanaveragepre-assessment(PRE)scoreandaveragenumberofZ-scoredsessions(SESSZ).BothPREandSESSZweresignificantpredictors.Forexample,a1-unitchange(i.e.,logits)onPRE(γ10)correspondedtoa.307unitchangeinSUM.Similarly,a1-standard-deviationchangeinsessionsloggedonOLI(SESSZ,γ20)correspondedtoa.075unitchangeinSUM.

2AllmodelswerefitusingHLM7.3ICC=u02/(u02+r2)

Page 14: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 129

Table4:BaselineCovariates.FixedEffect Coefficient Std.Err. t-ratio App.d.f. p-valueINTERCEPT,γ00 .694 .039 17.894 52 <0.001PRE,γ10 .307 .019 16.447 985 <0.001SESSZ,γ20 .075 .026 2.887 985 0.004RandomEffect SD Variance d.f. χ2 p-valueINTRCPT1,u0 .240 .058 52 193.965 <0.001level-1,r .646 .417 Note:1,040studentsand53instructors;Robuststandarderrorsreported.ThebaselinecovariatesincludedinTable4wereimportantforunderstandingthepredictivevalidityofeachbehaviouralmeasure,aswewereinterestedinthedegreetowhicheachbehaviourprovidednewinsights intostudents’summativetestperformances. IncludingPREandSESSZ,wefit2-LevelHLMsforeachbehaviouralmeasure.Finalmodels foreachbehaviouralmeasurewerespecified inthefollowingwaysusingmixed-effectsnotation:

Model1:SUMij=γ00+γ10*PREij+γ20*SESSZij+γ30*PNOATT0ij+γ40*PNOATT2ij+γ50*PNOATT3ij+u0j+rijModel2:SUMij=γ00+γ10*PREij+γ20*SESSZij+γ30*PROBUST1ij+γ40*PROBUST2ij+u0j+rijModel3:SUMij=γ00+γ10*PREij+γ20*SESSZij+γ30*PCPONLY1ij+γ40*PCPONLY2ij+u0j+rijModel4:SUMij=γ00+γ10*PREij+γ20*SESSZij+γ30*AVGBFCP1ij+γ40*AVGBFCP2ij+u0j+rij

Table5:Behaviouralmeasureswithbaselinecovariates.FixedEffect Model1 Model2 Model3 Model4INTERCEPT,γ00 .704***

(.038).682***(.039)

.697***(.040)

.689***(.039)

PRE,γ10 .286***(.018)

.303***(.018)

.304***(.019)

.305***(.019)

SESSZ,γ20 .087***(.027)

.071*(.029)

.038(.029)

.047(.028)

PNOATT0“NoLowScore,”γ30 .405***(.126)

PNOATT2“BelowMed.,”γ40 -.187***(.054)

PNOATT3“AboveMed.,”γ50 -.142*(.057)

PROBUST1“BelowMed.,”γ30 .128^(.070)

PROBUST2“AboveMed.,”γ40 .295***(.074)

PCPONLY1“BelowMed.,”γ30 .046(.053)

PCPONLY2“AboveMed.,”γ40 -.143*(.070)

Page 15: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 130

FixedEffect Model1 Model2 Model3 Model4 AVGBFCP2“Medium,”γ30 .178**

(.067)AVGBFCP3“High,”γ40 .215***

(.061)RandomEffect INTRCPT1,u0 .231***

(.053).243***(.059)

.249***(.062)

.248***(.061)

level-1,r .636(.405)

.639(.408)

.642(.412)

.641(.411)

Note: ^<.1, *<.05, **<.01, ***<.001; For Fixed Effects ()=Standard Error; For Random Effects ()=Variance; Robust standarderrorsusedforsignificancetests.

Basedontheabovemodelspecifications,γ00representsthemeanofSUMijacrossthegrandmeanforallcovariates.Thevarianceofγ00,u0j, is therefore interpretedasvariancebetween instructorsacross theadjustedmeans.Eachγn0 represents theadjustedmean foragivencovariate, andcovariates, suchasSESSZij, represent thescore for student iwith instructor j,and rij represents thestudent-level residualvariance. As noted above, PNOATT was modelled using four as opposed to three distinct categoriesgiven the multiple interpretations for 0% values. All other behaviours were modelled as threecategories. For all models, “None” (i.e., PNOATT, PROBUST, and PCPONLY) or “Low” (i.e., AVGBFCP)served as the reference category. As with previous models, SUMij represented the summativeassessmentscoreforeachstudent iwithinstructor j,andallcovariates, includingcategoricalvariableswere grand mean centred. Therefore, γ00across all models represented the average SUM score forstudents with average PRE and SESSZ values, irrespective of behavioural category. We chose thiscentring strategy inorder to compare intercepts acrossmodels. Ifwe chosenot to centre categoricalvariables, then the interceptwouldhave represented the average SUMscore, dependentuponothercovariates, for students in the reference category, which was “None” or “Low” depending upon thebehaviour,whichwouldhave led tomultiplevalues for the interceptacrossmodels including thenullmodel (Table 2) and baseline covariate model (Table 3). We explored both group- and grand-meancentringstrategies,andtheDevianceLikelihoodRatioTestindicatedthatgrand-meancentredversionsofthemodelfitthedatabetterthangroup-meancentredmodels.

Table5presentsthefinalmodelsforeachbehaviouralmeasure.Acrossallmodels,PREwasasignificantpredictor with similar orders of magnitude. The number of sessions that a student logged (SESSZ),however,was only significant forModels 1 and 2. ForModel 1, all three categories of PNOATTweresignificant.StudentswhodidnothavealowCPscore(“NoLowScore,”γ30=.405***),scoredsignificantlyhigheronSUMthanstudentswhodidnotendasessionsona lowscore, i.e., thereferencecategory.Thosewhodidendsessionsonalowscore,onaverage,didlesswellonSUM(PNOATT“BelowMed.,”γ40=-.187***;PNOATT“AboveMed.,”γ50=-.142*).

Page 16: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 131

For Model 2, students whose percentage of robust sessions was above the median, on average,performedhigheronSUM(PROBUST“AboveMed.,”γ40=.295***).Asidefromstudentsnotexperiencinga low CP score (PNOATT “No Low Score,” γ30=.405***), students experiencing a high proportion ofrobust sessions had the largest effect on SUM, controlling for all other factors. For Model 3, thepercentage of CP-only sessions (PCPONLY)was significant for the abovemedian category (PCPONLY2“AboveMed.,”γ40=-.143*).Forthesamemodelbutwiththehighestgroup,“AboveMed.,”heldoutasthereferencecategory,“None”and“BelowMed.”werealsofoundtobedifferent(“None,”γ30=.143,p=.041; “Below Med.,” γ40=.189, p=.004). Lastly, an increase in the average number of activities astudentloggedbeforetakingaCPacrossallsessionswasrelatedtoincreasedSUMscores,withstudentsin both groups scoring higher on SUM (AVGBFCP2 “Medium,” γ30=.178**; AVGBFCP3 “High,”γ40=.215***).

Across the fourmodels described above, the highest categories (i.e., “AboveMed.” or “High”) weresignificantly related to SUM, controlling for PRE and SESSZ. For PROBUST and PCPONLY, the belowmedian categories, were not significant at .05 level; however, the directionality of the parameterestimatescorrespondtothegeneralobservationthatacrossallbehavioursthemorestudentsengagedin the behaviour, the stronger the effect on summative test performance. The statistical modelspresented above represent one element of validity, namely predictive validity, and the precedingdiscussiononhowthebehaviouralmeasureswereconstructedusingECDanddesignpatternsspeaktothe construct validity. Each aspect of the work adds to the evidentiary argument aroundmeasuringbehaviouralelementsofproductivepersistence.

Tosummarize,theaboveanalysesaddressedbothproductiveandpersistencebehavioursbasedondatacollected from student use of an online learning system across an entire semester. For productivebehaviours,wedevelopedameasurethatcapturedthedegreetowhichstudentsavailedthemselvesofmultiple typesof activitieswithinagiven session.Anotherproductivebehaviourentailed readingandpracticing before taking an assessment for the first time. Both behaviours, when engaged in acrossmultiplesessions,wererelatedtohigherperformancesonthesummativeassessment.Whilethesetwobehaviours are potentially productive, we also developed measures of potentially less productivebehaviours,suchasloggingintotheonlinesystemandonlytakingassessments,wherebystudentswhoconsistentlyengaged in thisbehaviourdid lesswell than studentswhodidnot.Tocaptureaspectsofstudentpersistence,wemeasuredthedegreetowhichstudentsendedasessiononascoreof60%orlowerwithoutengaginginanyfurtheractivity.Studentswhowereconsistentlylesspersistent,didlesswell,onaverage,onthesummativeassessmentthanstudentswhodemonstratedpersistenceonamoreconsistentbasis.

For practical measurement purposes, each behavioural measure, described above, was predictive ofdistaloutcomesandeachcapturedeasilyunderstandable,compositebehavioursthatcouldbetrackedover time as well as used to measure formative processes associated with an improvement effort.Importantly, each measure was gathered from data tracking actual student behaviours withoutinterruptingtheirlearning.

Page 17: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 132

6 DISCUSSION AND CONCLUSION

Supporting students’ development of 21st century skills requires intentional effort (NRC, 2012).Whilecommonsensical, the ways in which efforts to improve specific skills are organized (e.g., evaluation,research, or improvement) plays a significant role in how measurement is positioned in that work(Duckworth&Yeager,2015). In thispaper,wedescribed theways inwhichdata fromdigital learningenvironments can be used to develop practical measures of productive persistence, defined by theCarnegie Foundation for the Advancement of Teaching as tenacity plus the use of good strategies.Productive persistence is a key driver within Carnegie’s Community College Pathways (CCP) NetworkImprovement Community (NIC). Theway inwhichwe developed behaviouralmeasures has potentialimplicationsforthefieldoflearninganalyticsinsupportingimprovementscienceresearch.Forexample,learninganalyticsresearcherspotentiallyhaveauniqueroletoplay in leveragingpreviouslyuntappedsourcesofdatafromlearningtechnologies,longitudinaldatasystems,andstudentinformationsystemstobothbetterunderstandeducationalprocessesandprovidejust-in-timemeasuresforpractitioners.

Two approaches informed the development work described in this paper: improvement science andevidence-centred design (ECD). From an improvement science perspective, we developed measureslinkedtoanevidence-andpractice-basedtheoryforimprovement(seeFigure1),trackedkeyprocesses,and were predictively valid. We used the principles of ECD and design patterns to align potentialbehavioursandstrategieswithtasksandevidencestemmingfromstudentengagementwithanonlinelearningsystem.TheoverarchingaimforthesemeasureswastohelpinstructorsparticipatingintheCCPNIC track student behaviours, and we are currently engaged in co-development activities aroundinstructor-facingdatavisualizationsandinterventionstiedtoco-developedvisualizations.

Throughout this paper, predictive validity was held out as an important component of practicalmeasurementbecausepredictivemeasuresofvaluedoutcomescanincreasepractitionermotivationtousedata in supportof their improvementwork (Provost&Murray,2011; Solberget al., 1997).Alongwithpredictive validity, theoretically groundedmeasures can also support practitioner perceptionsofvalue in using a measure over time. For example, based on the success of self-report productivepersistencemeasuresusedinCCPNICcourses(seeYeager,Bryk,etal.,2013),predictivevalidityandarobust research base— combined— promoted initial as well as sustained use of the measures byinstructorsandresearchers.

Whilepredictivevaliditywasestablishedusingasummativetestofmathematicalknowledge,itwasnottheonlyoutcomeexplored,northeonlyoutcomevaluedbyresearchersor instructors. Inordertobevaluable for improvement purposes, eachmeasure had to predict outcomes associatedwith the CCPNIC’s overall improvement goals of increasing the number of studentswho complete developmentalmathcourses.Therefore,weexploredoutcomessuchasgradesandcoursecompletion.Establishingthelinkbetweenbehaviouralmeasuresandanoutcome iscrucial inthattherearemanyopportunitiestomeasurestudentuseoftheonlinesystemandonlyoneopportunitytomeasurehowwellstudentsdidin a course. By developing predictive measures of distal outcomes, change ideas attempted by

Page 18: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 133

instructorsaspartof theoverallNICcanbeassessedusingthesemeasureswhile there isstill timetolearnandimprove.

Todeveloppracticalmeasuresofproductivepersistence,weleveragedahistoricaldatasettoengineerfeaturesinrelationtoknownoutcomes.Alongwithknownoutcomes,thehistoricaldatasetincludedapre-assessmentofstudents’incomingconceptualmathematicalknowledge.Duringearlydescriptiveandstatisticalanalyses,weobservedrelationshipsbetweenstudentscoresonthepre-assessmentandthenumber of reading and practice activities in which they engaged, whereby students with higherincoming content knowledge engaged with the system to higher degrees and in different ways. Therelationships between domain knowledge and use of the system not only highlights the utility ofincludingthepre-assessmentinourpredictivemodels(seeTables3and4),italsopointstopotentiallyimportantfutureresearchdirectionsontherelationshipsbetweencontentknowledgeand21stcenturyskills.

Thereareimportantlimitationstothemeasuresdescribedinthispaper.First,themeasuresdevelopedusedhistoricaldata.Ashistoricaldata,importantfactorsaffectingbothstudentuseoftheonlinesystemaswellastheiroverallperformanceinacoursecouldbemissing.Second,thefourbehaviouralmeasuresarespecifictotheimprovementworkoftheCCPNICandarenotintendedtosupporttheorybuildingorgeneralize beyond the CCP NIC. Future work, however, is directed at surfacing new behaviouralmeasures and validating the ones outlined in this paper using data from new students.While thesemeasures, in their current form, are limited in their generalizability, they can serve as examplebehaviours for other improvement and research teams to use, and the process bywhich the currentmeasuresweredevelopedandvalidatedcouldalsoserveasareadyapproachforfutureteams.

Anotherimportantlimitationofthemeasuresdescribedinthispaperisthat,evenasawhole,theyonlymeasurecertainelementsofproductivepersistence,namely,studentuseoftheonlinelearningsystem.Productivepersistenceentailsmultiplebeliefsaboutoneselfasalearnerandasamemberofalearningcommunityalongwithhowoneengages in learningactivities (seeYeager,Bryk,etal.,2013).Multiplelearning behaviours were not measured, as they were not captured by the online learning system.Therefore, the data reported in this paper represent potential elements of an overall measurementsystemthatcanbeusedtodevelopandteststrategiesthat instructorscouldenact intheircoursestopromotestudents’productivepersistence.

As noted above, the types of behaviours identified and then validated were based on data alreadycollectedand storedby theonline system.Weusedevidence-centreddesign (ECD)asanoverarchingapproach along with relevant prior literature in making sense of how students could use the onlinelearningsysteminrelationtoevidencecollectedbythesystem. Inthisway,ECDwasusedtoreverse-engineer potential behaviours fromwhat was available. However, ECD is more commonly used as aprospective approach, whereby designers engineer specific tasks to generate specific evidence insupportofmakingclaimsrelatedtospecificconstructs.

Page 19: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 134

Onebenefitofpracticalmeasuresstemmingfromstudentuseofanonlinelearningsystemisthatdatacanbe collected frequently andwithout interrupting teaching and learning activities.While there arebenefits to using online learning system data, practical measures are not limited to certain datacollectionapproaches;theyarerecognizableintermsoftheirutilitytosupportfrontlineworkerslearnand improve. A fundamental tenet of improvement science is that variation in outcomes is the keyproblem to solve (Bryk et al., 2015; Langley et al., 2009).Data from currently untapped sources holdgreat potential for both the fields of improvement science and learning analytics in measuring andpositivelyinterveninginthesesourcesofvariation.

ACKNOWLEDGEMENTS

This paper is based upon work supported in part by the National Science Foundation (NSF) (SMA-1338487)andSRIInternational.AnyfindingsoropinionsexpressedinthispaperarethoseoftheauthorsanddonotnecessarilyreflecttheviewsofNSForSRI.

REFERENCES

Bailey, T., Jeong, D. W., & Cho, S. (2010). Referral, enrollment, and completion in developmentaleducation sequences in community colleges.Economics of Education Review, 29(2), 255–270.http://dx.doi.org/10.1016/j.econedurev.2009.09.002

Bennett,B.,&Provost,L.(2015,July).What’syourtheory?Driverdiagramservesastoolforbuildingandtestingtheoriesforimprovement.QualityProgress,36–43.

Blackwell, L., Trzesniewski, K., & Dweck, C. S. (2007). Implicit theories of intelligence predictachievement across an adolescent transition: A longitudinal study and an intervention. ChildDevelopment,78,246–263.http://dx.doi.org/10.1111/j.1467-8624.2007.00995.x

Bransford, J. D., & Schwartz, D. L. (1999). Rethinking transfer: A simple proposal with multipleimplications.ReviewofResearchinEducation,24(1),61–100.

Bryk,A.S.,Gomez, L.M.,Grunow,A.,&LeMahieu,P.G. (2015). Learning to improve:HowAmerica’sschoolscangetbetteratgettingbetter.Cambridge,MA:HarvardEducationPress.

CCCSE (Center for Community College Student Engagement). (2010). The heart of student success:Teaching,learning,andcollegecompletion(2010CCCSEFindings).Austin,TX:TheUniversityofTexasatAustin,CommunityCollegeLeadershipProgram.

Coburn,C.E.,&Turner,E.O.(2011).Researchondatause:Aframeworkandanalysis.Measurement,9,173–206.http://dx.doi.org/10.1080/15366367.2011.626729

Coburn, C. E., & Turner, E. O. (2012). The practice of data use: An introduction.American Journal ofEducation,118(2),99–111.http://dx.doi.org/10.1086/663272

Cook, J. E., Purdie-Vaughns, V., Garcia, J., & Cohen, G. L. (2012). Chronic threat and contingentbelonging: Protective benefits of values affirmation on identity development. Journal ofPersonalityandSocialPsychology,102,479–496.http://dx.doi.org/10.1037/a0026312

Page 20: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 135

Duckworth,A.L.,&Yeager,D.S.(2015).Measurementmatters:Assessingpersonalqualitiesotherthancognitive ability for educational purposes. Educational Researcher, 44(4), 237–251.http://dx.doi.org/10.3102/0013189X15584327

DiCerbo, K. E. (2015). Assessment of task persistence. In Y. Rosen, S. Ferrara, &M.Mosharraf (Eds.)Handbook of research on computational toolsfor real-world skill development (pp. 780–806).Hershey,PA:IGIGlobal.

Dweck,C.,Walton,G.,&Cohen,G.(2014).AcademicTenacity:MindsetsandSkillsthatPromoteLong-TermLearning(ResearchReport).Seattle,WA:Bill&MelindaGatesFoundation.

Farrington,C.A.,Roderick,M.,Allensworth,E.,Nagaoka,J.,Keyes,T.S.,Johnson,D.W.,&Beechum,N.O.(2012).Teachingadolescentstobecomelearners.Theroleofnon-cognitivefactorsinshapingschoolperformance:Acriticalliteraturereview.Chicago,IL:UniversityofChicagoConsortiumonChicagoSchoolResearch.

Gardenshire-Crooks,A.,Collado,H.,Martin,K.,&Castro,A.(2010).Termsofengagement:Menofcolordiscusstheirexperiencesincommunitycollege.NewYork/Oakland,CA:MDRC.

Hadwin,A.F.,Nesbit,J.C.,Jamieson-Noel,D.,Code,J.,&Winne,P.H.(2007).Examiningtracedatatoexplore self-regulated learning. Metacognition and Learning, 2(2–3), 107–124.http://dx.doi.org/10.1007/s11409-007-9016-7

Hulleman,C.S.,&Harackiewicz,J.M.(2009).Promotinginterestandperformanceinhighschoolscienceclasses.Science,326,1410–1412.http://dx.doi.org/10.1126/science.1177067

Jamieson,J.P.,Mendes,W.B.,Blackstock,E.,&Schmader,T.(2010).Turningtheknotsinyourstomachinto bows: Reappraising arousal improves performance on the GRE. Journal of ExperimentalSocialPsychology,46,208–212.http://dx.doi.org/10.1016/j.jesp.2009.08.015

Kosovich, J. J., Hulleman, C. S., Barron, K. E., & Getty, S. (2014). A practical measure of studentmotivation:Establishingvalidityevidencefortheexpectancy-value-costscale inmiddleschool.The Journal of Early Adolescence, 35(5–6),790–816.http://dx.doi.org/10.1177/0272431614556890

Langley, G. J., Moen, R. D., Nolan, K. M., Nolan, T. W., Norman, C. L., & Provost, L. P. (2009). Theimprovementguide.SanFrancisco,CA:Jossey-Bass.

Lewis, C. (2015).What is improvement science?Doweneed it in education?EducationalResearcher,44(1),54–61.http://dx.doi.org/10.3102/0013189X15570388

Martin,L.A.,Nelson,E.C.,Lloyd,R.C.,&Nolan,T.W.(2007).WholeSystemMeasures(IHIInnovationSerieswhitepaper).Cambridge,MA:InstituteforHealthcareImprovement.

Messick, S. (1994). The interplay of evidence and consequences in the validation of performanceAssessments. Educational Researcher, 23(2), 13–23.http://dx.doi.org/10.3102/0013189X023002013

Mislevy,R.,&Haertel,G.(2006).Implicationsofevidence-centereddesignforeducationaltesting(DraftPADITechnicalReport17).MenloPark,CA:SRIInternational.

Mislevy, R. J., Behrens, J. T., DiCerbo, K. E., & Levy, R. (2012). Design and discover in educationalassessment:Evidence-centereddesign,psychometrics,andeducationaldatamining. Journalof

Page 21: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 136

Educational Data Mining, 4(1), 11–48. Retrieved fromhttp://educationaldatamining.org/JEDM13/index.php/JEDM/article/view/22

NRC (National Research Council). (2012). J.W. Pellegrino&M. L. Hilton (Eds.).Education for life andwork: Developing transferable knowledge and skills in the 21st century. Washington, DC:Committee on Defining Deeper Learning and 21st Century Skills, Board on Testing andAssessment and Board on Science Education, Division of Behavioral and Social Sciences andEducation,TheNationalAcademiesPress.

Nelson, B.,Nugent, R.,&Rupp,A. A. (2012).On instructional utility, statisticalmethodology, and theaddedvalueofECD:Lessonslearnedfromthespecialissue.JournalofEducationalDataMining,4(1), 224–230. Retrieved fromhttp://www.educationaldatamining.org/JEDM/index.php/JEDM/article/view/27

Penuel,W.R.,Fishman,B.J.,Cheng,B.H.,&Sabelli,N.(2011).Organizingresearchanddevelopmentattheintersectionoflearning,implementation,anddesign.EducationalResearcher,40,331–337.http://dx.doi.org/10.3102/0013189X11421826

Provost,L.P.,&Murray,S.K.(2011).Thehealthcaredataguide:Learningfromdataforimprovement.SanFrancisco,CA:Jossey-Bass.

Raudenbush, S.W.,&Bryk, A. S. (2002).Hierarchical linearmodels in social and behavioral research:Applicationsanddataanalysismethods(2nded.).ThousandOaks,CA:SagePublications.

Rupp,A.A.,Levy,R.,DiCerbo,K.E.,Sweet,S.J.,Crawford,A.V.,Calico,T.,Benson,M.,Fay,D.,Kunze,K.L.,Mislevy,R.J.,&Behrens,J.T.(2012).PuttingECDintopractice:Theinterplayoftheoryanddata in evidence models within a digital learning environment. Journal of Educational DataMining, 4(1), 49–110. Retrieved fromhttp://www.educationaldatamining.org/JEDM/index.php/JEDM/article/view/23

Russell,J.,Jackson,K.,Krumm,A.,&Frank,K.(2013).Theoriesandresearchmethodologiesfordesign-basedimplementationresearch:Examplesfromfourcases.InB.J.Fishman,W.R.Penuel,A.R.Allen, & B. H. Cheng (Eds.), Design based implementation research: Theories, methods, andexemplars.National Society for the Studyof Education Yearbook,112(2), 157–191.NewYork:TeachersCollegePress.

Schwartz,D. L.,&Arena,D. (2013).Measuringwhatmattersmost: Choice-basedassessments for thedigitalage.CambridgeMA:MITPress.

Schwartz,D.L.,Bransford,J.D.,&Sears,D.(2005).Efficiencyandinnovationintransfer.InJ.P.Mestre(Ed.),Transfer of learning fromamodernmultidisciplinary perspective (pp. 1–51).Greenwich,CT:InformationAge.

Schraw, G. (2010). Measuring self-regulation in computer-based environments. EducationalPsychologist,45(4),258–266.http://dx.doi.org/10.1080/00461520.2010.515936

Solberg, L., Mosser, G., & McDonald, S. (1997). The three faces of performance measurement:Improvement, accountability, and research. The Joint Commission Journal on QualityImprovement,23(3),135–147.

Page 22: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 137

Spillane, J. P., Parise, L. M., & Sherer, J. Z. (2011). Organizational routines as coupling mechanisms:Policy, school administration, and the technical core.American Educational Research Journal,48(3),586–619.http://dx.doi.org/10.3102/0002831210385102

Stecher,B.M.,&Hamilton,L.S.(2014).Measuringhard-to-measurestudentcompetencies:Aresearchanddevelopmentplan.SantaMonica,CA:RANDCorporation.

Stigler,J.W.,&Heibert,J.(1999).TheTeachingGap.NewYork:FreePress.Strother, S., & Sowers, N. (2014).Community college pathways: Summative assessments and student

learning.Stanford,CA:CarnegieFoundation for theAdvancementofTeaching.Retrievedfromhttp://www.carnegiefoundation.org/resources/publications/community-college-pathways-summative-assessments-student-learning/

Strother,S.,VanCampen,J.,&Grunow,A.(2013).Communitycollegepathways:2011–2012descriptivereport. Stanford, CA: Carnegie Foundation for the Advancement of Teaching. Retrievedfromhttp://www.carnegiefoundation.org/sites/default/files/CCP_Descriptive_Report_Year_1.pdf

U.S. Department of Education. (2012). Enhancing teaching and learning through educational datamining and learning analytics. Washington, DC: Author. Retrieved fromhttp://www.ed.gov/edblogs/technology/files/2012/03/edm-la-brief.pdf

U.S.Department of Education. (2013).Expanding evidence approaches for learning in a digitalworld.Washington, DC: Author. Retrieved fromhttp://www.ed.gov/edblogs/technology/files/2013/02/Expanding-Evidence-Approaches.pdf

VanCampen,J.,Sowers,N.,&Strother,S.(2013).Communitycollegepathways:2012–2013descriptivereport. Stanford, CA: Carnegie Foundation for the Advancement of Teaching. Retrieved fromhttp://www.carnegiefoundation.org/resources/publications/community-college-pathways-2012-2013-descriptive-report/

Vaquero,L.M.,&Cebrian,M.(2013).The“richclub”phenomenonintheclassroom.Nature(ScientificReports),3,1174.http://dx.doi.org/10.1038/srep01174

Walton,G.M.,&Cohen,G.L.(2007).Aquestionofbelonging:Race,socialfit,andachievement.JournalofPersonalityandSocialPsychology,92,82–96.http://dx.doi.org/10.1037/0022-3514.92.1.82

Yeager, D. S., & Dweck, C. S. (2012). Mindsets that promote resilience: When students believe thatpersonal characteristics can be developed. Educational Psychologist, 47, 1–13.http://dx.doi.org/10.1080/00461520.2012.722805

Yeager,D.S.,Paunesku,D.,Walton,G.,&Dweck,C.S.(2013).Howcanweinstillproductivemindsetsatscale?A review of the evidence and an initial R&D agenda. AWhite Paper prepared for theWhiteHousemeetingon“ExcellenceinEducation:TheImportanceofAcademicMindsets.”

Yeager,D.S.,Bryk,A.,Muhich,J.,Hausman,H.,&Morales,L.(2013).PracticalMeasurement.Stanford,CA:CarnegieFoundationfortheAdvancementofTeaching.

Yeager, D. S., Henderson, M., Paunesku, D., Walton, G., Spitzer, B., D’Mello, S., & Duckworth, A. L.(2014). Boring but important: A self-transcendent purpose for learning fosters academic self-regulation. Journal of Personality and Social Psychology, 107(4), 559–580.http://dx.doi.org/10.1037/a0037637

Page 23: Practical Measurement and Productive Persistence ... · PDF filePractical measurement and productive ... , and positive core self-evaluation. ... which represent another common purpose

(2016).Practicalmeasurementandproductivepersistence:Strategiesforusingdigitallearningsystemdata.JournalofLearningAnalytics,3(2),116–138.http://dx.doi.org/10.18608/jla.2016.32.6

ISSN1929-7750(online).TheJournalofLearningAnalyticsworksunderaCreativeCommonsLicense,Attribution-NonCommercial-NoDerivs3.0Unported(CCBY-NC-ND3.0) 138

Zimmerman,B.J.(2002).Becomingaself-regulatedlearner:Anoverview.TheoryintoPractice,41,64–70.http://dx.doi.org/10.1207/s15430421tip4102_2

Zimmerman,B.J.,Moylan,A.,Hudesman,J.,White,N.,&Flugman,B.(2011).Enhancingself-reflectionand mathematics achievement of at-risk urban technical college students. Psychological TestandAssessmentModeling,53,141–160.