validity evidence of an electronic portfolio for preservice teachers

15
Validity Evidence of an Electronic Portfolio for Preservice Teachers Yuankun Yao, Matt Thomas, Nicole Nickens, Joyce Anderson Downing, Ruth S. Burkett, and Sharon Lamson, University of Central Missouri This study applied Messick’s unified, multifaceted concept of construct validity to an electronic portfolio system used in a teacher education program. The subjects included 128 preservice teachers who recently completed their final portfolio reviews and student teaching experiences. Four of Messick’s six facets of validity were investigated for the portfolio in this study, along with a discussion of the remaining facets examined in two previous studies. The evidence provided support for the substantive and generalizability aspects of validity, and limited support for the content, structural, external, and consequential aspects of validity. It was suggested that the electronic portfolio may be used as one requirement for certification purposes, but may not be valid for the purpose of assessing teacher competencies. Keywords: electronic portfolio, validity, teacher education Introduction O ver the past decade, the quality of public education has come under increased scrutiny. Some believe that America’s teachers lack the con- tent knowledge and pedagogical skills needed to help P-12 students achieve in the school (Levine, 2006). Many university-based teacher education programs have responded to demands for accountability by instituting rela- tively new assessment systems (Bredo, 2005; Ledoux & McHenry, 2006; Ma & Rada, 2005). The assessments were de- signed to measure such program out- comes as the competencies of the pre- service teacher, instead of the inputs of a program such as the number of student credit hours (Vaughn & Ev- erhart, 2005). The shift from the as- sessment of program inputs to that of outputs, however, is a relatively re- cent phenomenon (AASCU, 2007). Al- though national professional organiza- tions have established content-specific teacher competency standards, there is little consensus about the best way to measure the preservice teacher’s es- sential knowledge, skills, and dispo- sitions that will result in P-12 class- room success. Many teacher education programs have developed or adopted strategies for assessing the competen- cies of its preservice teachers often without examining the reliability or validity of those measures (Burns & Haight, 2005). The portfolio is one such measure widely used in teacher education pro- grams in the United States. A port- folio is a systematic and purposeful collection of work samples that docu- ment student achievement or progress over a period of time. A recent, spe- cialized form of the portfolio, the electronic portfolio, has become par- ticularly popular (Herner, Karayan, McKean, & Love, 2003; Norton-Meier, 2003; Strudler & Wetzel, 2005). An elec- tronic portfolio is defined as “a digi- tal container capable of storing visual and auditory content including text, images, video, and sound” (Abrami & Barrett, 2005, p. 2). It contains the same contents as the regular paper-based portfolio, except that “the information is collected, stored, and managed elec- tronically” (Lambert, Depaepe, Lam- bert, & Anderson, 2007, p. 76). Accord- ing to Wetzel and Strudler (2006), the electronic portfolio provides the preser- vice teacher with better access to and organization of portfolio documents, and an opportunity to enhance the stu- dent’s technology skills, develop reflec- tive skills, and understand the stan- dards. It was considered a seamless way for the teacher candidate to develop, demonstrate, and reflect on the per- son’s own pedagogical practice, knowl- edge, skills, and attitudes (Sherry & Bartlett, 2005). It was also believed to allow the preservice teacher to demon- strate more complex learning outcomes (Woodward & Nanlohy, 2004). In a recent themed issue on elec- tronic portfolios, the editor of the Jour- nal of Adolescent & Adult Literacy la- beled the electronic portfolio as an Yuankun Yao is an Assistant Professor, Department of Curriculum and Instruc- tion, 3300 Lovinger, University of Cen- tral Missouri, Warrensburg, MO 64093; [email protected]. Matt Thomas is an As- sociate Professor, Department of Curricu- lum and Instruction, 3300 Lovinger, Uni- versity of Central Missouri, Warrensburg, MO 64093. Nicole Nickens is an Assistant Professor, Department of Curriculum and Instruction, 3300 Lovinger, University of Central Missouri Warrensburg, MO 64093. Joyce Anderson Downing is an Associate Professor and Associate Dean, College of Education, Lovinger 2190, University of Central Missouri, Warrensburg, MO 64093. Ruth S. Burkett is an Assistant Professor, Department of Curriculum and Instruc- tion, 3300 Lovinger, University of Central Missouri, Warrensburg, MO 64093. Sharon Lamson is a Professor and Chair, Depart- ment of Curriculum and Instruction, 3300 Lovinger, University of Central Missouri, Warrensburg, MO 64093. 10 Copyright C 2008 by the National Council on Measurement in Education Educational Measurement: Issues and Practice

Upload: ucmo

Post on 03-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Validity Evidence of anElectronic Portfolio forPreservice TeachersYuankun Yao, Matt Thomas, Nicole Nickens, Joyce Anderson Downing,Ruth S. Burkett, and Sharon Lamson, University of Central Missouri

This study applied Messick’s unified, multifaceted concept ofconstruct validity to an electronic portfolio system used in ateacher education program. The subjects included 128 preserviceteachers who recently completed their final portfolio reviews andstudent teaching experiences. Four of Messick’s six facets ofvalidity were investigated for the portfolio in this study, alongwith a discussion of the remaining facets examined in twoprevious studies. The evidence provided support for thesubstantive and generalizability aspects of validity, and limitedsupport for the content, structural, external, and consequentialaspects of validity. It was suggested that the electronic portfoliomay be used as one requirement for certification purposes, butmay not be valid for the purpose of assessing teachercompetencies.

Keywords: electronic portfolio, validity, teacher education

Introduction

Over the past decade, the quality ofpublic education has come under

increased scrutiny. Some believe thatAmerica’s teachers lack the con-tent knowledge and pedagogical skillsneeded to help P-12 students achievein the school (Levine, 2006). Manyuniversity-based teacher educationprograms have responded to demandsfor accountability by instituting rela-tively new assessment systems (Bredo,2005; Ledoux & McHenry, 2006; Ma &Rada, 2005). The assessments were de-signed to measure such program out-comes as the competencies of the pre-service teacher, instead of the inputsof a program such as the number ofstudent credit hours (Vaughn & Ev-erhart, 2005). The shift from the as-sessment of program inputs to that ofoutputs, however, is a relatively re-cent phenomenon (AASCU, 2007). Al-though national professional organiza-tions have established content-specificteacher competency standards, there islittle consensus about the best way to

measure the preservice teacher’s es-sential knowledge, skills, and dispo-sitions that will result in P-12 class-room success. Many teacher educationprograms have developed or adoptedstrategies for assessing the competen-cies of its preservice teachers oftenwithout examining the reliability orvalidity of those measures (Burns &Haight, 2005).

The portfolio is one such measurewidely used in teacher education pro-grams in the United States. A port-folio is a systematic and purposefulcollection of work samples that docu-ment student achievement or progressover a period of time. A recent, spe-cialized form of the portfolio, theelectronic portfolio, has become par-ticularly popular (Herner, Karayan,McKean, & Love, 2003; Norton-Meier,2003; Strudler & Wetzel, 2005). An elec-tronic portfolio is defined as “a digi-tal container capable of storing visualand auditory content including text,images, video, and sound” (Abrami &

Barrett, 2005, p. 2). It contains the samecontents as the regular paper-basedportfolio, except that “the informationis collected, stored, and managed elec-tronically” (Lambert, Depaepe, Lam-bert, & Anderson, 2007, p. 76). Accord-ing to Wetzel and Strudler (2006), theelectronic portfolio provides the preser-vice teacher with better access to andorganization of portfolio documents,and an opportunity to enhance the stu-dent’s technology skills, develop reflec-tive skills, and understand the stan-dards. It was considered a seamless wayfor the teacher candidate to develop,demonstrate, and reflect on the per-son’s own pedagogical practice, knowl-edge, skills, and attitudes (Sherry &Bartlett, 2005). It was also believed toallow the preservice teacher to demon-strate more complex learning outcomes(Woodward & Nanlohy, 2004).

In a recent themed issue on elec-tronic portfolios, the editor of the Jour-nal of Adolescent & Adult Literacy la-beled the electronic portfolio as an

Yuankun Yao is an Assistant Professor,Department of Curriculum and Instruc-tion, 3300 Lovinger, University of Cen-tral Missouri, Warrensburg, MO 64093;[email protected]. Matt Thomas is an As-sociate Professor, Department of Curricu-lum and Instruction, 3300 Lovinger, Uni-versity of Central Missouri, Warrensburg,MO 64093. Nicole Nickens is an AssistantProfessor, Department of Curriculum andInstruction, 3300 Lovinger, University ofCentral Missouri Warrensburg, MO 64093.Joyce Anderson Downing is an AssociateProfessor and Associate Dean, College ofEducation, Lovinger 2190, University ofCentral Missouri, Warrensburg, MO 64093.Ruth S. Burkett is an Assistant Professor,Department of Curriculum and Instruc-tion, 3300 Lovinger, University of CentralMissouri, Warrensburg, MO 64093. SharonLamson is a Professor and Chair, Depart-ment of Curriculum and Instruction, 3300Lovinger, University of Central Missouri,Warrensburg, MO 64093.

10 Copyright C© 2008 by the National Council on Measurement in Education Educational Measurement: Issues and Practice

“emerging genre” (Goodson, 2007) andmade the following prediction:

The position electronic portfolios willhave in the future assessment land-scape remains to be seen; however, itis difficult to imagine that their usewill not continue to increase . . . it islikely teacher education institutionswill continue the ongoing transitionto electronic portfolios. . . As a gener-ation of teachers completes electronicportfolios at each stage of their ca-reers, it is reasonable to assume theywill demand similar systems for the as-sessment of their students (pp. 433–434).

The use of the electronic portfolioin the field of teacher education is notwithout its issues, however. Wetzel andStrudler (2006), for instance, foundthat preservice teachers using the elec-tronic portfolio faced challenges re-garding access to and reliability of tech-nology, and the amount of time andeffort involved in the process. Chal-lenges have also been identified regard-ing the design of the portfolio. Accord-ing to Beck, Livne, and Bear (2005),electronic portfolios designed primar-ily for accountability purposes were notas effective as electronic portfolios forformative purposes in facilitating theprofessional outcomes of the preserviceteacher. To ensure the success of theelectronic portfolio, it was importantfor it to have clearly defined purposes.

Portfolio Purposes

A variety of purposes have been as-sociated with the use of portfolios inteacher education, including: (a) doc-umenting a teacher candidate’s compe-tencies; (b) providing opportunities forreflection; (c) documenting programeffectiveness; (d) developing studentunderstanding of relevant standards;and (e) preparing the preserviceteacher for job interviews (Darling-Hammond, 2006; Delandshere & Arens,2003; Zeichner & Wray, 2001). Whilethese purposes may have been articu-lated, they are not always agreed uponby all parties (Delandshere & Arens).

In some teacher education programs,the portfolio was designed in sucha way that it could serve multiplepurposes. For example, the portfolioimplemented at Washington College(Donnelly, 2005) was introduced as aworking portfolio early in the programand involved ongoing reflection andself-assessment at various checkpoints.During the student teaching semester,the portfolio was restructured as a

showcase portfolio to demonstrate thepreservice teacher’s readiness to enterthe profession.

The use of the portfolio as a ba-sis for certifying entry-level teach-ers was a response to the need forstandards-based assessment in teachereducation (Pecheone & Stansbury,1996; Quatroche, Duarte, Huffman-Joley, & Watkins, 2002). This ap-proach was modeled after the certifi-cation portfolio of the National Boardfor Professional Teaching Standards(NBPTS), which required teachers de-siring advanced certification to an-alyze and comment on their prac-tices using the Board’s own standardsfor accomplished teacher practices(Burroughs, Schwartz, & Hendricks-Lee, 2000; Darling-Hammond, 1999).NBPTS defines accomplished teachersas “those who are able to articulate rea-sons for the many practices they en-gage in as teachers” (Burroughs et al.,p. 348). Such teachers are expected tohave the ability to “apply steady, disci-plined judgment and reflective scrutinywithin the bounds set by (a) constantlyexpanding body of knowledge” (NBPTS,2006, p. 13).

The articulation of the use of a port-folio for such purposes as initial teachercertification, however, does not ensurethat the portfolio will eventually servesuch purposes. In order to verify that aportfolio has been designed and imple-mented in such a way as to serve theintended purposes, there is a need toexamine its validity evidence.

Portfolio Validity

Although the portfolio has been pro-moted as a valid, authentic approachto assessment in teacher education(Darling-Hammond, 2000), the valida-tion of the portfolio as implementedin specific programs has received lit-tle attention. This is especially truein the case of the electronic portfolio(Derham & Diperna, 2007), althoughconcerns about its validity have beenexpressed by education faculty and pre-service teachers alike (Reis & Villaume,2002).

In one of the few studies on portfo-lio validity in teacher education, Burnsand Haight (2005) examined a portfoliodesigned for an undergraduate specialeducation assessment course. The con-current validity was examined by cor-relating the portfolio scores with re-sults of quizzes and papers assigned tothe students in the course. The port-folio scores were also correlated with

assessment-related scores in an intern-ship course to obtain evidence for pre-dictive validity, and to non-assessment-related scores in the internship coursefor evidence of discriminative validity.The Pearson r for the various corre-lations was .60, .53, and −.01, respec-tively. The interrater reliability was alsohigh (r = .91). A potential concern withthe study was that the validity resultswere based on comparisons of portfolioscores with assessment data that mayhave reliability and validity issues oftheir own. In addition, since the sameperson was responsible for grading allthe assignments, the measures couldhave been confounded with each other.

In another study of a course levelportfolio, Naizer (1997) examined thevalidity of a portfolio that was requiredfor a mathematics/science elementarymethods course. The generalizabilityanalysis showed that the rater effectaccounted for very little of the to-tal variance, in spite of some degreeof disagreement among the raters intheir relative rankings of the portfolios.The concurrent validity of the portfo-lio was investigated through a discrim-inant analysis of placement in differentgroups based on the portfolio scores.The predictors included the numberof education courses taken, the totalhours of prior teaching experience, thefinal exam scores, and the scores fromtwo standardized instruments: the Testof Logical Thinking, and the MotivatedStrategies for Learning Questionnaire.It was found that 69% of the cases werecorrectly classified by the predictors.Major contributors to the predictionpower included the number of educa-tion courses taken, and the scores onthe Test of Logical Thinking. However,the correlation between the portfolioscores and the final exam scores wasonly .22.

Derham and Diperna (2007) recentlyreported a study of the validity ofa program-wide electronic portfoliodesigned to assess the preserviceteacher’s competencies based on theInterstate New Teacher Assessmentand Support Consortium (INTASC)standards. The study examined the in-terrater reliability and internal con-sistency of the portfolio scores. Inaddition, the authors correlated theportfolio scores of 30 preservice teach-ers with their scores of the studentteacher evaluations, the PRAXIS I Test(academic skills in reading, writing,and mathematics) scores, the PRAXISII Test (knowledge of principles of

Spring 2008 11

learning and teaching) scores, andtheir GPA in the education meth-ods courses. The portfolio scores werefound to correlate significantly withonly the PRAXIS II scores (r = .39,p < .05), and the GPA (r = .34, p< .05). Limited support was found forinterrater reliability. The median Co-hen’s kappa that accounted for chancerater agreement was found to be low(k = .17) across the different areasof ratings, although the Pearson cor-relation that measured rater agree-ment in relative rankings was moder-ately high (r = .60) for the portfoliocomposite score. Both sets of portfolioscores (from the two raters) reachedhigh levels of internal consistency(α1 = .88; α2 = .80). The results ofthe study needed validation from simi-lar studies, since the study used a smallsample size.

One of the earliest studies on port-folio reliability and validity was con-ducted by Koretz, Stecher, Klein, Mc-Caffrey, and Deibert (1992), based on astatewide portfolio assessment of K-12students in Vermont. The study founda lack of interrater reliability, with thePearson coefficient on the math sec-tion of the portfolio averaging around.56 for the total score, .45 for the meandimension score, and .35 for the meanitem score. To obtain validity evidence,the researchers correlated the portfo-lio scores with scores on the multiple-choice section of the state uniformassessments. After controlling for theeffect of the low reliability of the port-folio scores, the correlations betweenthe portfolio total scores and the stateuniform test scores reached moderatelevels (r ranged from .43 to .61).

According to Koretz et al. (1992), thelow level of interrater reliability pre-sented a barrier to the investigation ofthe portfolio validity. Another challengefor the validity study was the lack ofa clear definition of the targeted port-folio assessment domain, and the un-clear relationship between the portfolioscores and the alternative measures ofstudent achievement. As Koretz et al.pointed out, the portfolio often actedas a bridge across traditional domainsof learning, unlike nonperformance as-sessments that were strictly alignedwith the commonly defined domains.

Concept of Validity

Validity has been defined as “the degreeto which evidence and theory supportthe interpretations of test scores en-tailed by proposed uses of tests” (AERA,

APA, & NCME, 1999, p. 9). The con-cept of validity applies to all kinds ofassessment activities, including portfo-lio and other performance assessments(Messick, 1994a, 1994b, 1995a, 1995b).According to Birenbaum (2007), thequality of any assessment is determinedby “the appropriateness, meaningful-ness, and usefulness of these infer-ences/interpretations” (p. 30).

Traditionally, validity has been con-ceptualized as three related yet sepa-rate aspects: content, criterion (furtherdivided into concurrent and predictivevalidity), and construct related valid-ity (which includes convergent and dis-criminant validity). The validity stud-ies reviewed above were mostly basedon this traditional conceptualization ofvalidity. According to Ellis and Blustein(1991), the traditional trinitarian con-ception of validity led one to believe inthe existence of three distinct types ofvalidity, and to focus the validation onthe test instead of the inferences of thetest scores. It also emphasized undulythe observable human behaviors with-out paying sufficient attention to theunderlying theoretical constructs.

Messick’s Framework of ConstructValidity

An alternative way to conceptualize va-lidity is represented by Messick’s uni-fied concept of construct validity, whichconsists of six interrelated facets: (a)content, (b) substantive, (c) struc-tural, (d) external, (e) generalizabil-ity, and (f) consequential validity. Inthe unified conceptualization of valid-ity, content validity was expanded totwo related facets of validity: contentand substantive, with the former fo-cused on the fit of content in the targetdomain with that in the test domain,and the latter on the fit of targeted cog-nitive processes with the response pro-cesses. A commonly used method to val-idate the content facet of validity is theuse of content alignment based on judg-ments of content experts (Birenbaum,2007; Porter, 2002, 2007). Validation ofthe response processes is often basedon the concurrent or retrospective ver-bal reports of the test takers (Biren-baum, 2007; Leighton, 2004). Due tochallenges in obtaining quality verbalreports, the substantive facet has beenthe least frequently investigated aspectof validity (NRC, 2001).

A key characteristic of Messick’s con-cept is the obscuration of the traditionaldistinction between validity and relia-

bility, with each aspect subsumed un-der the concept of construct validity.The traditional definition of interraterreliability was expanded to that of gen-eralizability, the consistency of assess-ment results across raters, tasks, andoccasions (Brennan, 2001). Importantresults from a generalizability study in-clude the percentage of variance ac-counted for by each factor, the degreeof relative agreement (generalizabilitycoefficient), and the degree of abso-lute agreement (index of dependabil-ity). Messick’s framework of constructvalidity also expanded the traditionalconcept of internal consistency to thatof structural validity, the fit of the in-ternal structure of the target domainwith that of the response data. Sta-tistical techniques that may be usedin validating the internal structure ofan assessment include factor analysis,cluster analysis, and multidimensional-scaling analysis (Birenbaum, 2007).

In Messick’s conception of constructvalidity, the traditional concepts of con-current, predictive, convergent, anddiscriminant validity were merged intoa single facet of external validity, or therelationship with external variables. In-vestigation of this validity facet may bebased on convergent and discriminantevidence by correlating the targetedassessment data with alternative mea-sures. The last facet in Messick’s valid-ity conceptualization is consequentialvalidity, the investigation of intendedand unintended consequences of as-sessment use. Evidence for consequen-tial validity is obtained if an assessmentdevice is found to serve its intendedpurposes and minimize the adverseconsequences (Birenbaum, 2007).

Taken as a whole, Messick’s concep-tualization of construct validity has pro-vided a more coherent framework fortest validation than the traditional con-ception of validity. As a seminal workon test validity, the framework has re-ceived increasing endorsement in themeasurement community (Birenbaum,2007). It also has been adopted as partof the new standards for educational as-sessment (AERA, APA, & NCME, 1999).

The study presented in this articleused Messick’s unified concept of con-struct validity to examine the use ofan electronic portfolio in a teacher ed-ucation program at the University ofCentral Missouri (UCM). The study wasintended to inform fellow teacher edu-cators of inferences that may be madeabout the portfolio used in the initialcertification of the preservice teacher,

12 Educational Measurement: Issues and Practice

and changes that may be needed in theportfolio structure in order to measurethe teacher competencies. The studyalso provided specific steps that maybe taken for validating portfolio usein other teacher education programsbased on Messick’s framework of con-struct validity.

MethodContext of Study

This study was situated in the Ele-mentary Cluster (early childhood, ele-mentary, and middle school programs)within the Department of Curriculumand Instruction at UCM. At the timeof the study, the number of full timefaculty in the cluster was 20. As partof a campus-wide teacher educationprogram that had received continuousNCATE accreditation since 1954, theElementary Cluster faculty had beendeveloping and implementing an elec-tronic portfolio system for its preserviceteachers for more than a decade.

Portfolio purpose and designTwo major purposes were associatedwith the Elementary Cluster’s elec-tronic portfolio: (a) facilitation of theprofessional development and reflec-tive skills of the preservice teacher,and (b) documentation of a candi-date’s competencies as a basis for ini-tial teacher certification. The focus ofthis study was to validate whether theportfolio served the purpose of docu-menting teacher competencies.

In order to successfully completethe electronic portfolio, the preserviceteacher was required to provide arti-facts and corresponding reflective nar-ratives addressing the Missouri Stan-dards for Teacher Education Programs(MoSTEP), which were based on the In-terstate New Teacher Assessment andSupport Consortium (INTASC) stan-dards. The 11 general MoSTEP stan-dards were referred to as the QualityIndicators (QIs) (Appendix A), and themore specific benchmarks for the stan-dards were referred to as the Perfor-mance Indicators (PIs) (See AppendixB for a partial list). The artifacts usedto address the PIs came from educa-tion course assignments. Accompany-ing each artifact was a reflection writ-ten by the preservice teacher as anexplanation of how the artifact demon-strated the candidate’s competency re-lated to the PI.

The portfolio was submitted to thereview team at three different pointsof the preservice teacher’s program.The three reviews were referred to asthe initial, midlevel, and final reviews.The initial portfolio was submitted inthe preservice teacher’s sophomoreyear prior to entry into the educa-tion program. This initial portfolio sub-mission indicated that the preserviceteacher had established the electronicportfolio matrix, could save and link ar-tifacts to the matrix, and could generateappropriate reflections for the PIs. Theinitial review served both a formativeand a summative function, with eachcandidate receiving evaluations andcomments from the reviewers as wellas any personal coaching needed forimprovement. A passing score was re-quired for admission to the teacher ed-ucation program. The feedback, whichaccompanied the initial review score,was usually detailed enough for the pre-service teacher to make revisions andprepare for the next level of portfolio re-view. The second review, the midlevelportfolio review, occurred prior to stu-dent teaching, when a majority of theportfolio artifacts and reflections werelinked to the portfolio matrix. This re-view also served a formative and a sum-mative function, with extensive feed-back and coaching on a single QI Meta-Reflection due at that point. A pass-ing score was required for recommen-dation to student teaching. Finally, thethird review served a summative pur-pose, with a passing score required forteacher certification. The completedportfolio, due at a designated point nearthe end of student teaching, containedartifacts and reflections for all of thePIs, as well as the comprehensive Meta-Reflections written for the 11 QIs. Inaddition to the three reviews, the pre-service teacher also could contact theirportfolio advisors before each review forany questions or suggestions regardingthe portfolio.

Evolution of the electronic portfolioSince the original implementation ofthe portfolio, several iterations had oc-curred in attempts to improve the port-folio structure and the requirementsfor the preservice teacher. When theportfolio program was initiated, the El-ementary Cluster faculty developed aset of approximately 40 criteria or out-comes, stratified across developmen-tal levels, to judge the competenciesof the preservice teacher as the can-

didate moved through the program. In1999, when the MoSTEP standards wereofficially published, the program crite-ria were aligned with the new MoSTEPstandards. A further revision of theportfolio in Spring 2002 resulted ineliminating the original 40 criteria andhaving the preservice teacher directlyaddress the MoSTEP standards. Alsoduring Spring 2002, based on the find-ings of a task force consisting of fac-ulty members involved in the portfolioprogram, the cluster faculty made sev-eral changes to the technology inter-face and the general structure of theportfolio. While the original electronicportfolio had used Netscape Composerto develop and manage the documentsand artifacts, the revised version usedMicrosoft Word, listing the requiredMoSTEP Standards in a simple ma-trix template (Appendix B). Using MSWord’s hyperlink function, the preser-vice teacher was able to link the stan-dards on the template to the artifactsand reflections, which could be saved insuch formats as MS Word, Excel, Pow-erPoint, html, PDF, or text formats.

Portfolio scoreThe portfolio was reviewed by teamsof two or three faculty members, whoremained with that preservice teacherthrough the three checkpoints. A scor-ing guide was used at each checkpointof portfolio review. Appendix C containsthe scoring guide for the final portfolioreview. A preservice teacher’s portfoliowould receive a score—based on a 0–50 numeric scale, which would be con-verted to a nominal grade: (a) not pass-ing (0–34 points), (b) passing: satisfac-tory (35–39 points), (c) passing: good(40–44 points), and (d) passing: excel-lent (45–50 points). A portfolio mustreceive at least a “satisfactory” grade ateach checkpoint in order for the pre-service teacher to progress through theprogram of study. If a candidate didnot pass a portfolio review on the firsttry, the individual would be providedwith corrective feedback and coaching,then allowed to revise and resubmit theportfolio. Passing the final review sig-nified the completion of the portfolioprocess for the preservice teacher andwould make the candidate eligible forreceiving recommendation for teachercertification.

It may be worthwhile to note thatthe portfolio score was primarily basedon the quality of the reflections (forboth the PIs and the QIs), especially

Spring 2008 13

the Meta-Reflections for the QIs. Al-though the quality of the artifacts waslisted as part of the scoring criteria(Appendix C), the cluster faculty de-cided not to factor it into the portfo-lio score, for the reason that the arti-facts had already been scored by thefaculty teaching the various educationcourses.

Generalizability studyIn Fall 2004, the cluster faculty partici-pated in a whole day portfolio workshop,facilitated by some of the researchersof the study. After a regular trainingsession on portfolio grading, eight port-folio review teams were paired up intofour groups. Each group of review teamsreviewed six to 10 initial level portfo-lios, with some of the reviews completedduring the workshop and the rest com-pleted after the workshop. Altogether31 portfolios were reviewed by differentpairs of the portfolio review teams. Thegeneralizability of the portfolio scoresacross the different review teams wasanalyzed (Yao, Foster, & Aldrich, 2006).It was found that the contribution of therating team factor to the overall vari-ance of the portfolio composite scorewas small, ranging from 0% within onepair of review teams to 11% withinanother pair. The average generaliz-ability coefficient (a measure of rela-tive agreement) was .83, and the av-erage dependability index (a measureof absolute agreement) reached .81.Since the portfolio scores at the threestages of portfolio review were basedon basically the same scoring guide(Appendix C) and the same faculty re-view teams were involved in each stage,it was reasonable to assume that thegeneralizability results based on the ini-tial portfolio study would also apply tothe final portfolio scores.

Portfolio interview studyIn the summer of 2006, four facultymembers in the Elementary Clusterinterviewed eight preservice teacherswho had each just completed theirmidlevel portfolio reviews regardingtheir perceptions of the portfolio (Yao,Aldrich, Foster, & Pecina, 2007). Theparticipants felt that the portfolio pro-cess had provided a mechanism forthem to store documents that could beused for the future, and helped themto develop their reflective skills. How-ever, they felt that their portfolio scoredid not factor in the quality of their arti-facts and therefore did not reflect theircompetencies. Some of the participants

also felt that the artifacts should be allexperience based. They also expressedconcern about the amount of work andtime required by the portfolio process,and about a number of barriers thatthey had encountered.

Student teaching assessment dataAs an alternative measure of the pre-service teacher’s qualifications, assess-ment data collected during studentteaching were used as part of the mea-sures to validate the external structureof the portfolio scores. The data werebased on a summative student teacherevaluation instrument completed bythe university supervisors. The instru-ment consisted of 11 items for as-sessing the student teacher’s knowl-edge, skills, and dispositions based onthe 11 MoSTEP standards. The super-visors provided the ratings based ontheir observations of the competen-cies that each student teacher demon-strated at the end of the student teach-ing semester. Prior to the summativeevaluation, each university supervisormade four visits, which were approxi-mately four weeks apart, to the studentteaching site to observe a preserviceteacher. The last formative evaluationwas conducted one or two weeks priorto the summative evaluation. Basedon each site visit, the supervisor pro-vided formative evaluation of the stu-dent teacher by filling out a formativeevaluation instrument, which was al-most identical to the summative eval-uation instrument. The results of theformative evaluations were providedimmediately to the student teacher fol-lowing each observation, so that the stu-dent teacher would address the areas ofconcern raised by the supervisor.

The item scores on both the forma-tive and summative student teacherevaluations were based on a scaleranging from “not observed,” “does notmeet = 0,” “progressing = 1,” “meets =2,” to “exceeds = 3.” Descriptions andexplanations for each point on the scalewere provided in a scoring rubric thatincorporated language drawn directlyfrom the MoSTEP standards. Based onthe data collected for this study, itwas found that, for both the formativeand summative evaluations, the scoresacross the 11 standards were highly ho-mogeneous, with Cronbach’s α rangingfrom .88 for the first formative evalua-tion to .95 for the fourth formative eval-uation. As a result, the composite scoresacross all 11 standards were used in thesubsequent analyses.

The university supervisors who pro-vided the ratings for the student teach-ers were either full-time or adjunct fac-ulty members with at least a Master’sdegree and state certification as teach-ers or administrators. All had receivedtraining on the evaluation forms inAugust 2005, prior to their first observa-tions using the new formative and sum-mative instruments. The intrarater reli-ability of the student teacher evaluationdata was investigated by correlating thesummative ratings with the results ofthe fourth formative evaluations. Sincethe two evaluations were based on thesame standards and conducted at ap-proximately the same time, when littlechange would have taken place in thetargeted variable (i.e., competencies ofthe student teacher), the correlationrepresented a measure of intrarater re-liability for the student teaching assess-ment data, or the consistency of the uni-versity supervisor’s ratings across dif-ferent occasions. Table 1 provides thebivariate correlations between the fourformative evaluations and the semesterend summative evaluation. Since theformative evaluation data may be miss-ing (i.e., coded “Not Observed”) for astandard if a supervisor did not find theopportunity in a particular visit to ob-serve the student teacher related to thestandard, the composite score for somestudents would not be obtained. Conse-quently, the number of cases with non-missing composite scores varied greatlyacross the formative evaluations. Thepairwise deletion method, instead oflistwise deletion, was used to obtain thecorrelations, because the latter wouldhave resulted in a much smaller num-ber of cases useable for checking thecorrelations. As indicated in Table 1,all correlations were found to be signif-icant at p < .01. The most significantcorrelation was found between the sum-mative evaluation and the last forma-tive evaluation (r = .81), suggestingan adequate level of intrarater relia-bility for the student teaching assess-ment data. At the end of each semester,student teaching supervisors again metbriefly with one of the authors of thisstudy to further refine evaluation forms,instructions, and scoring criteria. Nodata for the interrater reliability of thestudent teaching assessment data wereavailable at this point.

Other external measuresOther assessment data used in the ex-amination of the external validity of theportfolio score included the American

14 Educational Measurement: Issues and Practice

Table 1. Correlations of Formative and Summative Student Teacher EvaluationData

Formative 1 Formative 2 Formative 3 Formative 4 Summative(N = 75a) (N = 114a) (N = 86a) (N = 92a) (N = 146b)

Formative 1 –Formative 2 .62∗∗ –Formative 3 .37∗∗ .64∗∗ –Formative 4 .32∗∗ .52∗∗ .76∗∗ –Summative .38∗∗ .66∗∗ .72∗∗ .81∗∗ –

Note: Formative 1, 2, 3, 4 = University supervisor formative ratings of the student teacher evaluation on the first, second, third,and fourth site visit, respectively; Summative = Summative ratings negotiated between the university supervisor, the districtsupervisor/cooperating teacher, and the student teacher.aAs the correlations were based on composite scores summed over the 11 MoSTEP standards, there were many cases missingbecause one or two subscores were missing. Pairwise deletion was used for the correlation instead of the listwise deletion becauseotherwise the sample size would be substantially reduced. Correlations using listwise deletion resulted in similar results, with thecorrelation between the last formative evaluation and the summative evaluation the largest (r = .67), although it was smaller thanthe one obtained through pairwise deletion (r = .81).bSince some of the student teachers did not complete the final level portfolio, they were included in the reliability analysis butexcluded from the validity study.

College Test (ACT) composite score,the College Basic Academic SubjectsExamination (CBASE) score, the over-all Grade Point Average (GPA), and thePRAXIS II Subject Assessment score.As the majority of the students camefrom the Middle West, the ACT scoreprovided a commonly used measure ofa preservice teacher’s academic pre-paredness when the student enteredcollege. The CBASE is a criterion ref-erenced achievement test measuringknowledge and skills in language arts,mathematics, science, and social stud-ies that a student usually has obtainedthrough a general education programduring the first year of undergraduatestudy. Successful completion of the testis required by the State of Missouri asa condition for a candidate entering ateacher education program. The pass-ing score for each subtest of the CBASEwas 235. The GPA used in this studywas an accumulated grade point aver-age that represented all college coursesa preservice teacher had attempted.The PRAXIS II is part of the PRAXISSeries for Professional Assessments forBeginning Teachers, which was devel-oped by the Educational Testing Ser-vice to assess a teacher candidate’sknowledge and skills acquired throughprofessional education programs. Thescores used in this study were fromthe PRAXIS II Subject Assessment—Elementary Education: Curriculum, In-struction, and Assessment. Missouri re-quires a passing score on this test of 164for elementary certification candidates.

The portfolio scores were obtainedfrom the Curriculum and InstructionDepartment’s assessment database.

The student teaching assessment datawere aggregated through the Dean’s Of-fice of the College of Education and Hu-man Services. Other assessment datafor the subjects, including scores onthe ACT, CBASE, and PRAXIS II andthe overall GPA were obtained fromthe university’s Office of InstitutionalResearch.

Subjects of Study

A total of 195 preservice teachers inthe early childhood, elementary, andmiddle school education programs hadportfolios due for final review in eitherFall 2005 or Spring 2006. Of those, 161 ofthe portfolios were due for the first time,with an additional 34 that had eithernot been submitted or not passed thefinal review in the previous semesters.Only 128 candidates finished their stu-dent teaching and submitted their finalportfolios during the semesters of Fall2005 and Spring 2006. These 128 pre-service teachers constituted the sub-jects of this study. They were predomi-nantly Caucasian (98%), and mostly fe-male (83%). Descriptive statistics onthe various assessments are provided inTable 2. Although the minimum portfo-lio score achieved was well below thepassing score of 35, most of the stu-dents scored above the passing score(M = 42.61, SD = 5.73). The subscaleportfolio scores for the first and thirdsections (“complete information” and“mechanics”) seemed to have very lim-ited variability, with most students re-ceiving a score approaching the maxi-mum score of 5. The limited variation in

scores was particularly pronounced forthe PRAXIS II, with the standard devi-ation around 11, compared with a max-imum score of 196. The limitation mayreflect the fact that a student had mul-tiple opportunities to take the PRAXISII test until the person passed theexam.

Validation Procedures

The procedures used in this study tovalidate the electronic portfolio in theearly childhood, elementary, and mid-dle school education programs at UCMwere summarized in Table 3. The tablealso lists the intended inferences thatare matched with the various validationprocedures.

As indicated in Table 3, the contentfacet of a portfolio may be examinedthrough the comparison of portfoliodocuments with the relevant standards(Klecker, 2000). In this study, we ex-amined how the content of the portfo-lio artifacts and reflections was relatedto that of the MoSTEP standards, andchecked the qualifications of the peopleinvolved in the matching of the artifactsand reflections with the standards.

To investigate the substantive facetof validity, we used Bloom’s cognitivetaxonomy as a framework to comparethe cognitive processes in the port-folio artifacts that were self-reportedin the corresponding Meta-Reflectionswith the processes of the 11 QIs ofthe MoSTEP standards. The portfolioreflections are not equivalent to ver-bal reports designed for the student toreport the actual cognitive processes

Spring 2008 15

Table 2. Descriptive Statistics of the Subjects

N Min. Max. M SD

Portfolio scores:Complete Information 128 2 5 4.74 .69Artifacts and Reflections 128 4 10 8.89 1.28Mechanics 128 2 5 4.46 .74Meta-reflections 128 6 30 24.49 4.65Total Score 128 15 50 42.61 5.73

Other MeasuresACT Score 90 15 32 21.66 3.46C-Base Composite Score 119 211 399 286.87 41.19GPA 128 2.73 4 3.46 .35PRAXIS II Score 97 143 196 176.35 11.02Summative Evaluation 106 12 33 28.00 4.89

Table 3. A Summary of Validation Procedures and Intended Inferences Basedon Messick’s Framework

Validation Procedures Intended Inferences

Content ValidityExamined how artifacts were linked to PIs and

QIs; Checked qualifications of faculty.Did the portfolio artifacts address all the topics of

MoSTEP standards?Substantive Validity

Examined cognitive processes described inMeta-Reflections with those of QIs.

Did the cognitive processes reported match thetarget processes?

Structural ValidityChecked the internal consistencies and factor

structures of portfolio subscale scores andMeta-Reflection scores.

Did the subscale scores contribute to thecomposite portfolio scores in expected ways?

Generalizabilitya

Variance components; generalizabilitycoefficient; index of dependability.

Could the portfolio scores generalize acrossdifferent review teams?

External ValidityCorrelated portfolio scores with ACT, overall

GPA, CBASE, PRAXIS II, and student teachingassessment data in an expected way?

Did portfolio ratings correlate with other measuresof preservice teachers in an expected way?

Consequential Validityb

Interviewed students regarding the perceivedbenefits and concerns of the portfolio.

Were the benefits relevant to the intended use ofthe portfolio? Were there negativeconsequences?

aGeneralizability was investigated in a separate study.bThe consequential validity was based on data that were available from another study.

the student experienced when perform-ing an assessment task (Birenbaum,2007). However, the second componentof the Meta-Reflections (and the PI re-flections) in this study contained thesame information as usually found in averbal report, by requiring the preser-vice teacher to describe how the arti-facts provided evidence that the can-didate had met the relevant standards(Appendix D).

The structural facet of construct va-lidity was investigated by checking theinternal consistency and factor struc-

ture of the portfolio scores, includingthe section scores of the portfolio andthe Meta-Reflection ratings pertainingto the 11 Quality Indicators. Prior to thedata analysis, however, some prelimi-nary judgments can be made regardingthe internal structure of the portfoliobased on the scoring guide for the finalportfolio review (Appendix C). Exam-ination of the scoring guide suggestedthat the structural validity of the portfo-lio score was most likely compromised,as it was primarily based on the qual-ity of the reflections for the PIs and

QIs rather than on the actual qualifica-tions based on the competency stand-ards.

The external facet of construct va-lidity was investigated by judging thecorrelations of portfolio scores with se-lected external variables against theexpected correlations between the im-plied variables. For this study, a cor-relation that is around .8 was con-sidered high, a correlation around.5 was considered moderate, and acorrelation around .2 was consideredsmall. The various measures of student

16 Educational Measurement: Issues and Practice

achievement were expected to havesignificant correlations with a portfo-lio score that measured the preser-vice teacher’s teaching competencies.In particular, significant correlationswere expected between the portfolioscore and the student teacher evalu-ation ratings and the PRAXIS II score.Since the student teacher evaluationsand the portfolios were both based onthe same 11 MoSTEP standards, mod-erate correlations would be expectedbetween the two measures. However,the fact that the student teacher eval-uations were solely based on the su-pervisor’s site visits during the studentteaching experience, and that the port-folio reflected student work through-out their teacher education program,led the investigators to believe that thecorrelations would not be high. Thelack of correlation found by Derhamand Diperna (2007) between portfo-lio scores and student teaching eval-uation data seemed to suggest that itwould be unrealistic to expect a highcorrelation between the two. The rela-tionship between the PRAXIS II sub-ject test scores and a measure of teach-ing competencies based on the MoSTEPstandards seems to be complicated. Un-like in the Derham and Diperna (2007)study, where the PRAXIS II test was ameasure of general knowledge of teach-ing and learning, the PRAXIS II testused in this study was a content-basedtest. “Although some questions concerngeneral issues, most questions are set inthe context of the subject matters mostcommonly taught in the elementaryschool: reading/language arts, mathe-matics, science, social studies, fine arts,and physical education” (ETS, 2005,p. 1). The Elementary Cluster portfo-lio that was based on the MoSTEP (orINTASC) standards, on the other hand,was primarily a measure of a preserviceteacher’s general pedagogy, since onlythe first of the 11 QIs pertained directlyto content knowledge (Appendix A).As a result, even if the electronic port-folio was a valid measure of the pre-service teachers’ competencies, it maynot reach a high level of correlationswith the scores on the PRAXIS II subjecttest.

The evidence for examining the gen-eralizability and consequential valid-ity of the portfolio came from thetwo aforementioned studies. A sum-mary of the results of these two stud-ies is included in the discussion of thisstudy.

ResultsContent Facet

Appendix B includes portions of theportfolio matrix developed by the clus-ter faculty and distributed to the preser-vice teacher to guide the organization ofthe electronic portfolio. The templatespecified what artifacts were to be in-cluded in the portfolio, and linked themto the 11 MoSTEP standards and bench-marks (QIs and PIs). The portfolio tem-plate served as a table of specificationsthat may be used to judge the contentvalidity of the portfolio (Birenbaum,2007). The faculty who developed thetemplate were full-time instructors di-rectly responsible for teaching the var-ious courses from which the portfolioartifacts were obtained. Ninety percentof the faculty had a doctoral degree intheir relevant field. Most had extensiveexperience teaching in K-12 settings,and were well versed in standards set-ting and the NCATE accreditation pro-cess. It was determined that they werewell qualified to make the match be-tween the portfolio artifacts and theMoSTEP standards.

The portfolio template, however, alsoshows that there was only one assign-ment addressing each benchmark (PI)of the MoSTEP standards. Although theportfolio entries covered all the rele-vant standards or competencies, theuse of a single entry most likely un-derrepresented the domain of each PI,thus reducing the content validity of theelectronic portfolio.

Substantive Facet

Using Bloom’s Taxonomy of cognitivelearning, the primary researcher wentthrough the cognitive processes im-plied in each QI, and looked for sim-ilar cognitive processes reported inthe Meta-Reflections in a portfolio thatreceived an excellent rating duringthe final level review. Appendix Erepresented the portion of a Meta-Reflection that described how the threeartifacts addressed the following QIof the MoSTEP standards: “The pre-service teacher understands how stu-dents learn and develop, and pro-vides learning opportunities that sup-port the intellectual, social, and per-sonal development of all students.”The cognitive processes implied inthis QI were found to be at twolevels: comprehension and applica-tion. As the candidate pointed outin the Meta-Reflection (Appendix E),

the three artifacts all came from herstudent teaching experience, suggest-ing that the application level was usedfor each artifact. The candidate alsoexplained how each artifact providedevidence of her comprehension of therelevant concepts and theories. For in-stance, in her discussion of the firstartifact, the preservice teacher ob-served: “This lesson (on sentence frag-ment) supported the intellectual de-velopment of students as well as Vygot-sky’s social learning theory because thestudents were strengthening their sen-tence skills while learning from eachother” (Appendix E). Similar matcheswere found between the cognitive pro-cesses reported in the other Meta-Reflections and the processes in theircorresponding QIs.

Structural Facet

The correlations between the differentportfolio subsection scores and the to-tal score are summarized in Table 4.Although the total score was signif-icantly correlated with each sectionscore, the correlations were fairly smallbetween the overall portfolio scoresand the scores for Section 1 and Sec-tion 3. The most significant correlationwas found between the portfolio’s to-tal score and the score for Section 4(Meta-Reflections). A check on the in-ternal consistency of the four subscalescores revealed a relatively low alpha(α = .53), suggesting that more thanone factor was involved in the com-posite portfolio score. Factor analysisof the four subsection scores identifiedtwo factors, with the eigenvalue for thefirst factor being 1.95 and that for thesecond factor being 1.00. The first andthird section scores were found to loadon the first factor, and the second andfourth section scores loaded on the sec-ond factor (Table 5). This result waslargely expected, as the first and thirdsection scores dealt with the mechan-ical aspect of the portfolio, while theother two section scores dealt with theportfolio reflections.

The correlations between the in-dividual Meta-Reflections are summa-rized in Table 6. The bivariate Pear-son correlations (r) ranged from .47to .80, with all of the correlations sig-nificant at p < .01. Factor analysis ofthe 11 Meta-Reflection scores failed toyield more than one factor. This wasexpected since the Meta-Reflectionswere meant primarily to demonstrate

Spring 2008 17

Table 4. Correlation Matrix Table for Portfolio Subsection Scores and TotalScore (N = 128)

Section 1 Section 2 Section 3 Section 4 Total Score

Section 1 –Section 2 .29∗∗ –Section 3 .43∗∗ .44∗∗ –Section 4 .15 .44∗∗ .11 –Total Score .36∗∗ .67∗∗ .37∗∗ .94∗∗ –

Note: Section 1 = Complete information; Section 2 = Quality of artifacts and reflections; Section 3 = Aesthetics and mechanics;Section 4 = Quality of Meta-Reflections.∗∗p < .01.

Table 5. Factor Loadings of Portfolio Section Scores

Factor 1a Factor 2b

Section 1: Complete Information .81 .07Section 2: Reflections for Performance Indicators .44 .72Section 3: Aesthetics/Mechanics .84 .14Section 4: Meta-Reflections −.03 .92

Note: The Varimax rotation method was used in deriving the factors.aThe eigenvalue for Factor 1 was 1.93, accounting for 49% of the total variance.bThe eigenvalue for Factor 2 was 1.00, accounting for 25% of the total variance.

Table 6. Correlations between Portfolio Meta-Reflection Ratings (N = 128)

M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11

M1 –M2 .75∗∗ –M3 .75∗∗ .79∗∗ –M4 .72∗∗ .65∗∗ .66∗∗ –M5 .63∗∗ .76∗∗ .76∗∗ .68∗∗ –M6 .62∗∗ .60∗∗ .58∗∗ .77∗∗ .69∗∗ –M7 .68∗∗ .57∗∗ .65∗∗ .72∗∗ .66∗∗ .70∗∗ –M8 .66∗∗ .69∗∗ .66∗∗ .69∗∗ .74∗∗ .76∗∗ .72∗∗ –M9 .63∗∗ .65∗∗ .67∗∗ .66∗∗ .68∗∗ .69∗∗ .72∗∗ .78∗∗ –M10 .68∗∗ .68∗∗ .72∗∗ .58∗∗ .67∗∗ .64∗∗ .65∗∗ .72∗∗ .75∗∗ –M11 .52∗∗ .64∗∗ .59∗∗ .47∗∗ .55∗∗ .54∗∗ .49∗∗ .57∗∗ .66∗∗ .80∗∗ –

∗∗p < .01.

the preservice teacher’s reflectiveskills.

External Facet

The correlations between portfolio rat-ings and other measures of the preser-vice teachers are shown in Table 7. Theportfolio total score had small but sig-nificant correlations with the preser-vice teachers’ ACT score (r = .33, p <.01), CBASE composite score (r = .24,p < .05), overall GPA (r = .34, p < .01),and summative student teacher evalu-ation (r = .21, p < .05). The portfo-lio score failed to correlate significantlywith the PRAXIS II score.

Conclusions and DiscussionsBased on the results of this studyand two other related studies, the fol-lowing section makes inferences re-garding the validity of the electronicportfolio, discusses the implications ofthe study, and makes suggestions forpossible portfolio revisions and futureinvestigations.

Validity Inferences

Results from the generalizability study(Yao et al., 2006) suggested that theelectronic portfolio scores were gener-alizable across different review teams.However, since the scores were primar-

ily based on the reflections, the general-izability only applied to the reflection-based scores. The portfolio interviewstudy (Yao et al., 2007) suggested thatthe use of the electronic portfolio pro-vided the candidates with opportuni-ties to reflect on their teaching anddevelop a better understanding of thestandards. However, the participantsdid not believe the portfolio measuredthe construct of their teaching compe-tencies. In addition, the portfolio pro-cess had cost the preservice teacher alarge amount of time that could havebeen used more effectively for otherlearning activities. This factor, alongwith the existence of various barriers

18 Educational Measurement: Issues and Practice

Table 7. Correlation Matrix for Portfolio Scores and Other Measuresof Preservice Teachers

Portfolio ACT C-Base GPA PRAXIS II Summativea

(N = 128) (N = 90) (N = 119) (N = 128) (N = 97) (N = 106)

Portfolio –ACT .33∗∗ –C-Base .25∗ .84∗∗ –GPA .34∗∗ .41∗∗ .34∗∗ –PRAXIS II .12 .63∗∗ .75∗∗ .44∗∗ –Summativea .21∗ .19 .09 .26∗∗ .12 –

aSummative = Composite score on the summative evaluations of student teachers by the university supervisor.∗p < .05; ∗∗p < .01.

in the portfolio process, had resultedin an overall negative attitude of thepreservice teacher toward the portfolioproject. In other words, portfolio inter-view study provided limited evidenceto support the consequential validity ofthe electronic portfolio.

This study found support for thesubstantive validity of the portfolio.The cognitive processes underlying theportfolio artifacts, as described in thesecond part of the Meta-Reflections,were found to match the processes im-plied in the competencies. There wasalso evidence to support the belief thatthe artifacts of the portfolio matchedclosely the target domain of the port-folio, yet the use of a single artifactfor each performance indicator sug-gested that the domain of competen-cies was most likely underrepresented,thus reducing the content validity ofthe portfolio. Limited support was alsofound for the structural validity of theportfolio. The anticipated factor struc-tures of the portfolio scores and theMeta-Reflection scores confirmed thesuspicion that the portfolio score wasprimarily a reflection of the preserviceteacher’s reflective skills. Furthermore,the relatively small, although some-times significant, correlations betweenthe portfolio ratings and the externalmeasures of student performance sug-gested that the portfolio had limitedexternal validity. The weak correlationswould not have been expected of a port-folio that was supposed to measure thecompetencies of a preservice teacher.

The evidence that was found per-taining to the structural, external, andconsequential validity of the electronicportfolio suggests that the portfolioscore lacked a critical component: ar-tifacts that would measure the actualteaching competencies of the preser-vice teacher. Nevertheless, the vari-ous artifacts that were specified in and

linked to the portfolio template werealigned with the competencies definedin the MoSTEP standards. The substan-tive validity evidence from this studyalso suggested that the cognitive pro-cesses represented in those artifactsmirrored the target processes. As a re-sult, the artifacts in the portfolios, ifincluded in the portfolio score, couldprovide documentation of the preser-vice teacher’s teaching competencies.

Discussions

According to Koretz et al. (1992),finding validity evidence for a com-prehensive portfolio project can bechallenging, due to the potential lack ofreliability and the fact that portfolios of-ten bridge traditional content domains.Possibly due to the extensive trainingin portfolio review and effective team-work among the reviewers, the port-folio scores investigated in this studywere found to have adequate levels ofinterrater reliability (Yao et al., 2006).However, since the scores left out theartifacts, the evidence of reliability mayonly apply to the portfolio reflectionscores.

In this study, one factor that wouldhave further underestimated the cor-relations was the issue of restrictionof range. Since a preservice teacherhad multiple opportunities to resubmitthe portfolios until the student finallypassed the assessment, there was lim-ited variability in the portfolio scores.The same was true of the PRAXIS IIscores, since a student had multiple op-portunities to take and retake the examuntil the person finally passed it.

The issues of reliability, contentbridging, and restriction of range, how-ever, do not seem to fully explain theoverall weak correlations between theportfolio scores and the other measuresof the preservice teacher. Of particular

concern was the lack of significant cor-relations between the portfolio and thePRAXIS II scores. Although there wasnot an exact match in the targeted do-mains, which would result in a corre-lation that was lower than the moder-ate correlation reported in Derham andDiperna (2007), a significant correla-tion between the two measures wouldbe expected. The lack of significantcorrelation between the two measuresseemed to highlight the perception heldby the preservice teachers, who felt thata portfolio that did not factor in thequality of the artifacts would not reflecttheir competencies as teachers (Yaoet al., 2007). To the preservice teacherswho participated in the interviews, theportfolio score demonstrated their abil-ity to reflect upon the state standards,not their standards-based competencyin the classroom. This view was alsosupported by the structural evidencefound in this study.

Suggestions

The results of the portfolio validitystudy seem to suggest a need to reflectthe quality of the artifacts in the portfo-lio score, if the purpose of the portfoliois to document the complete set of com-petencies for the preservice teacher.The electronic portfolio in this studyhad a list of artifacts already built intoit. In addition, the substantive valid-ity evidence in this study suggestedthat the cognitive processes underly-ing these artifacts mirror the targetprocess of the teaching competencies.What seems needed is to make theseartifacts a major criterion in the scor-ing guide (Appendix C). Although theartifacts had been graded in the vari-ous courses where they were originallyassigned, the standards against whichthe artifacts need to be scored in theportfolio may be different, thus a second

Spring 2008 19

grading may be warranted. The reflec-tion, instead of being the primary crite-rion for determining the quality of theportfolio, should serve as an aid to thegrading/interpretation of the artifacts,as in the case of the portfolio requiredfor NBPTS certification (NBPTS, 2006).

There also seems to be a need toreduce the number of artifacts in theportfolio, in order to make the portfoliotask manageable for both the preserviceteacher and the faculty reviewer. For in-stance, the portfolio may only contain alimited number of artifacts document-ing the process of teaching, and theimpact of the teaching upon P-12 stu-dent learning, which are both empha-sized in the NBPTS portfolio (NBPTS,2006). Instead of being tied to one stan-dard only, each artifact should addressmultiple standards. At the same time,the same standard may be addressedby multiple artifacts. This may resolvethe issue of content underrepresenta-tion for the portfolio, while making theportfolio task manageable for the pre-service teacher. Whether this idea isfeasible needs to be tested in practice.The overall lack of validity evidence inthis study, however, seems to questionthe use of the electronic portfolio asa measure of the teaching competen-cies. Two barriers seem to be difficult,if not impossible, to overcome. One isthe difficulty of reliably grading arti-facts in comprehensive portfolios (Ko-retz et al., 1992), even though reliablegrading of reflections may be possible(Yao et al., 2006). The other is the po-tential for underrepresentation of thedomain of competencies by the portfo-lio entries. If one artifact can only beused for one benchmark of a standard,as was the case in this study, then re-solving the issue of underrepresenta-tion would mean at least doubling thenumber of artifacts and reflections forall the standards involved, which wouldmake the already large workload of theportfolio almost impossible to deal withfor the preservice teacher. Unless thesetwo barriers are resolved, the portfo-lio cannot be used as a measure of thecandidate’s complete teaching compe-tencies, even though its use in the ini-tial teacher certification process may bejustified on the grounds that it assessedthe preservice teacher’s reflective skillsand understanding of the relevant stan-dards (Yao et al., 2007).

Future InvestigationsThis study examined the various facetsof construct validity for an electronic

portfolio that was used as part ofthe basis for teacher certification. Thelack of support as found in this studyand a previous study for the content,structural, external, and consequen-tial facets of validity seemed to sug-gest that a portfolio built for certifica-tion purposes did not provide a mea-sure of the complete competencies ofthe preservice teacher. The study, how-ever, did not investigate the use of theportfolio for developing the reflectiveskills of the candidate, although theportfolio perception study (Yao et al.,2007) provided some information re-garding such use. A complete validationof the portfolio for helping the preser-vice teacher to develop reflective skillsremains an important area for futureresearch.

One facet of validity that needs fur-ther examination in a complete valida-tion of the portfolio is consequential va-lidity, or the verification of the intendedand unintended consequences of usingthe portfolio. It is important, for in-stance, to investigate the potential con-sequences if the portfolio is used as amajor criterion for a preservice teacherto receive a full-time teaching position.It will be also important to know if theinservice teacher will continue to usethe same type of reflection in the pro-fession. There is also a need to assessthe impact of the preservice teacher’sportfolio on P-12 student learning.

The issues raised in this study, how-ever, suggest a need to reevaluate theworth of the portfolio in teacher educa-tion in general, based on a cost/benefitanalysis. What are the benefits of usingthe electronic portfolio, if it cannot pro-vide a measure of the complete compe-tencies of the preservice teacher? Whatare some of the formative functions thatthe portfolio can serve? What is thepotential cost that is involved in theportfolio process, aside from the timeand resources spent by the preserviceteacher and the faculty? Is there an al-ternative assessment tool that providesa better cost/benefit ratio?

This study also introduced to teachereducation a validation model for portfo-lio assessment based on Messick’s con-cept of construct validity. The methodmay be applied to the validation of port-folios used in other teacher educationprograms. The results of such studieswould inform us whether the findingsfrom this study and the two relatedstudies only applied to the portfolio atUCM, or to portfolios in teacher educa-tion in general.

References

Abrami, P. C., & Barret, H. (2005). Directionsfor research and development on electronicportfolio. Canadian Journal of Learningand Technology, 31(3), 1–15.

American Association of State Colleges andUniversities (2007). Developing evidenceand gathering data about teacher edu-cation program quality. Washington, DC:Author.

American Educational Research Association,American Psychological Association, andNational Council on Measurement in Ed-ucation (1999). Standards for educationaland psychological testing. Washington, DC:American Educational Research Associa-tion.

Beck, R. J., Livne, N. L., & Bear, S. L. (2005).Teachers’ self-assessment of the effects offormative and summative electronic port-folios on professional development. Euro-pean Journal of Teacher Education, 28(3),221–244.

Bredo, E. (2005). Addressing the social foun-dations “accountability” dilemma. Educa-tional Studies, 38(3), 230–241.

Brennan, R. L. (2001). Generalizability the-ory. New York: Springer.

Birenbaum, M. (2007). Evaluating the assess-ment: Sources of evidence for quality assur-ance. Studies in Educational Evaluation,33, 29–49.

Burns, M. K., & Haight, S. L. (2005). Psycho-metric properties and instructional utilityof assessing special education teacher can-didate knowledge with portfolios. TeacherEducation & Special Education, 28(3–4),185–194.

Burroughs, R., Schwartz, T. A., & Hendricks-Lee, M. (2000). Communities of discourseand discourse communities: Negotiatingboundaries in NBPTS certification. Teach-ers College Record, 102(2), 344–374.

Darling-Hammond, L. (1999). Reshapingteaching policy, preparation, and prac-tice: Influence of the National Board forProfessional Teaching Standards. Wash-ington, DC: Office of Educational Researchand Improvement (ERIC Document Repro-duction Service No. ED432570).

Darling-Hammond, L. (2000). Authentic as-sessment of teaching in context. Teachingand Teacher Education, 16(5–6), 523–545.

Darling-Hammond, L. (2006). Powerfulteacher education: Lessons from exem-plary programs. San Francisco, CA:Jossey-Bass.

Delandshere, G., & Arens, S. A. (2003). Exam-ining the quality of evidence in preserviceteacher portfolios. Journal of Teacher Ed-ucation, 54(1), 57–72.

Derham, C., & Diperna, J. (2007). Digitalprofessional portfolios of preservice teach-ing: An initial study of score reliabilityand validity. Journal of Technology andTeacher Education, 15(3), 363–381.

Donnelly, A. M. (2005). Let me show youmy portfolio! Demonstrating competencythrough peer interview. Action in TeacherEducation, 27(3), 55–63.

20 Educational Measurement: Issues and Practice

Educational Testing Service (2005). ThePRAXIS Series—Elementary education:Curriculum, instruction, and assessment.Retrieved August 2, 2007, from http://www.ets.org/Media/Tests/PRAXIS/pdf/0011.pdf.

Ellis, M. V., & Blustein, D. L. (1991). Develop-ing and using educational and psycholog-ical tests and measures: The unification-ist perspective. Journal of Counseling andDevelopment, 69(6), 550–555.

Goodson, F. T. (2007). The electronic portfo-lio: Shaping an emerging genre. Journal ofAdolescent & Adult Literacy, 50(6), 432–434.

Herner, L. M., Karayan, S., McKean, G., &Love, D. (2003). Special education teacherpreparation and the electronic portfolio.Journal of Special Education Technology,18(1), 44–49.

Klecker, B. M. (2000). Content validity of pre-service teachers’ portfolios in a standards-based program. Journal of InstructionalPsychology, 27(1), 35–38.

Koretz, D., Stecher, B., Klein, S., McCaffrey,D., & Deibert, E. (1992). Can portfoliosassess student performance and influenceinstruction? The 1991–92 Vermont experi-ence (Report No. CSE-TR-371). Washington,DC: Office of Educational Research and Im-provement (ERIC Document ReproductionService No. ED365699).

Lambert, C., DePaepe, J., Lambert, L., & An-derson, D. (2007). E-folios in action. KappaDelta Pi Record, 43(2), 76–83.

Ledoux, M. W., & McHenry, N. (2006). Elec-tronic portfolio adoption for teacher edu-cation candidates. Early Childhood Edu-cation Journal, 34(2), 103–116.

Leighton, J. P. (2004). Avoiding misconcep-tion, misuse and missed opportunities:The collection of verbal reports in edu-cational achievement testing. EducationalMeasurement: Issues and Practice, 24(4),6–15.

Levine, A. (2006). Educating school teachers.Washington, DC: The Education SchoolsProject.

Ma, X., & Rada, R. (2005). Building a web-based accountability system in a teachereducation program. Interactive LearningEnvironments, 13(1–2), 93–119.

Appendix A. MoSTEP Standards (Quality Indicators) for Preservice Teachers

1. Understands the central concepts, tools of inquiry, and structures of the discipline(s) within the context of a globalsociety and creates learning experiences that make these aspects of subject matter meaningful for students.

2. Understands how students learn and develop, and provides learning opportunities that support the intellectual,social, and personal development of all students.

3. Understands how students differ in their approaches to learning and creates instructional opportunities that areadapted to diverse learners.

4. Recognizes the importance of short term and long range planning and curriculum development and develops,complements, and evaluates curriculum based upon students, district, and state performance standards.

5. Uses a variety of instructional strategies to encourage students’ development of critical thinking, problem solving,and performance skills.

Messick, S. (1994a). The interplay of evi-dence and consequences in the validationof performance assessments. EducationalResearcher, 23(2), 13–23.

Messick, S. (October, 1994b). Alternativemodes of assessment, uniform standardsof validity. Paper presented at a Confer-ence on Evaluating Alternatives to Tradi-tional Testing for Selection. Bowling Green,OH.

Messick, S. (1995a). Standards of validity andvalidity of standards in performance as-sessment. Educational Measurement: Is-sues and Practice, 14(4), 5–8.

Messick, S. (1995b). Validity of psycholog-ical assessment: Validity of inferencesfrom persons’ responses and performancesas scientific inquiry into score mean-ing. American Psychologist, 50(9), 741–749.

Naizer, G. L. (1997). Validity and reliability is-sues of performance-portfolio assessment.Action in Teacher Education, 18(4), 1–9.

National Research Council (2001). Knowingwhat students know: The science and de-sign of educational assessment. Commit-tee on the Foundation of Assessment. InJ. W. Pellegrino, N. Chudowsky, & R. Glaser,(Eds.), Division of Behavioral and SocialSciences and Education. Washington, DC:National Academic Press.

NBPTS (2006). Portfolio instructions: Middlechildhood generalist. Retrieved July21, 2007, from http://www.nbpts.org/for_candidates/the_portfolio?ID=27&x=38&y =10.

Norton-Meier, L. A. (2003). To efoliate or notto efoliate? The rise of the electronic port-folio in teacher education. Journal of Ado-lescent and Adult Literacy, 46(6), 516–518.

Porter, A. C. (2002). Measuring the con-tent of instruction: Uses in research andpractice. Educational Researcher, 31(7),3–14.

Porter, A. C. (2007). Alignment as a teachervariable. Applied Measurement in Educa-tion, 20(1), 27–51.

Pecheone, R. L., & Stansbury, K. (1996). Con-necting teacher assessment and school re-

form. Elementary School Journal, 97(2),163–177.

Quatroche, D. J., Duarte, V., Huffman-Joley,G., & Watkins, S. (2002). Redefining assess-ment of preservice teachers: Standards-based exit portfolios. The Teacher Educa-tor, 37(4), 268–281.

Reis, N. K., & Villaume, S. K. (2002). The ben-efits, tensions, and visions of portfolios as awide-scale assessment for teacher educa-tion. Action in Teacher Education, 23(4),10–17.

Sherry, A. C., & Bartlett, A. (2005). Worth ofelectronic portfolios to education majors: A“two by four” perspective. Journal of Edu-cational Technology Systems, 33(4), 399–419.

Strudler, N., & Wetzel, K. (2005). The diffu-sion of electronic portfolios in teacher ed-ucation: Issues of initiation and implemen-tation. Journal of Research on Technologyin Education, 37(4), 411–433.

Vaughn, M., & Everhart, B. (2005). A processof analysis of predictors on an assessmentcontinuum of licensure candidates’ successin K-12 classrooms. Research for Educa-tional Reform, 10(1), 3–15.

Wetzel, K., & Strudler, N. (2006). Costs andbenefits of electronic portfolios in teachereducation: Student voices. Journal of Com-puting in Teacher Education, 22(3), 99–108.

Woodward, H., & Nanlohy, P. (2004). Digitalportfolios: Fact or fashion? Assessment andEvaluation in Higher Education, 29(2),227–238.

Yao, Y., Foster, K., & Aldrich, J. (April, 2006).Generalizability study of a team basedscoring approach. Paper presented at theAnnual Meeting of the American Edu-cational Research Association. San Fran-cisco, CA.

Yao, Y., Aldrich, J., Foster, K., & Pecina, U.(2007). Preservice teachers’ perceptions ofportfolio as an assessment tool. Manuscriptsubmitted for publication.

Zeichner, K., & Wray, S. (2001). The teach-ing portfolio in US teacher education pro-grams: What we know and what we need toknow. Teaching and Teacher Education,17, 613–621.

Spring 2008 21

Appendix A. continued

6. Uses an understanding of individual and group motivation and behavior to create a learning environment thatencourages positive social interaction, active engagement in learning, and self-motivation.

7. Models effective verbal, nonverbal, and media communication techniques to foster active inquiry, collaboration,and supportive interaction in the classroom.

8. Understands and uses formal assessment strategies to evaluate and ensure the continuous intellectual, social, andphysical development of the learner.

9. Is a reflective practitioner who continually assesses the effects of choices and actions on others. Is a reflectivepractitioner who actively seeks opportunities to grow professionally and utilizes assessment and professionalgrowth to generate more learning for more students.

10. Fosters relationships with school colleagues, parents, and educational partners in the larger community tosupport student learning and well being.

11. Understands theories and applications of technology in educational settings and has adequate technologicalskills to create meaningful learning opportunities for all students.

Appendix B. The Portfolio Template

Quality Indicators ArtifactsPerformance Indicators Course Reflections Date

1.2.1 The preservice teacherunderstands the central concepts,tools of inquiry, and structures ofthe discipline(s) within thecontext of a global society andcreates learning experiences thatmake these aspects of subjectmatter meaningful for students.

Completed duringStudent Teaching

Meta-Reflection

1.2.1.1 knows the subject(s)applicable to the area(s) ofcertification or endorsement

EDCI 2101 Program of Study

(defined by Subject SpecificCompetencies for BeginningTeachers in Missouri)

1.2.1.2 presents the subject(s) inmultiple ways;

EDCI 3220 Lesson plans (also add to LP file)

1.2.1.3 uses students’ priorknowledge;

EDCI 3210 Field Experience-Reflection Journal

1.2.1.4 engages students in themethods of inquiry used in thesubject(s);

EDCI 1310 Inquiry Project (includes partneredinquiry with elementary students)

EDCI 4340 In class activity, research paper, fieldexperience

EDCI 43501.2.1.5 creates interdisciplinary

learning.EDCI 3420 Web quest, unit (also add to LP file)

research paper, science integratedunit; Lesson Plan (also add to LPfile)

EDCI 43401.2.2 The preservice teacher

understands how students learnand develop, and provideslearning opportunities thatsupport the intellectual, social,and personal development of allstudents.

Completed duringStudent Teaching

Meta-Reflection

1.2.2.1 knows and identifieschild/adolescent development;

EDCI 2240 or EDCI2101

Developmental Stages Summary

22 Educational Measurement: Issues and Practice

Appendix B. continued

Quality Indicators ArtifactsPerformance Indicators Course Reflections Date

EDCI 4830 Case Study1.2.2.2 strengthens prior knowledge

with new ideas;EDCI 2310 iAdventure or Unit

(also add to LP file)1.2.2.3 encourages student

responsibility;EDCI 4400 Classroom management plan or

philosophy paperEDCI 4340

1.2.2.4 knows theories of learning. EDCI 2240 or EDCI2101

Theories of Learning Summary

Note: Only part of the portfolio template is reproduced here.

Appendix C. The Scoring Guide for Final Portfolio Review

Student Name _______________________________________________________________Student No. __________________________________ Date ____________________________Total Points∗ ____________________________ ∗∗Rating ____________________________Raters _____________________________________________________________________

∗Points awarded according to the following scale:

Still needs a lot of work Needs some work Satisfactory Solid work Excellent

1 2 3 4 5

Final Level Review Criteria Points Qualitative Feedback

Artifacts and reflections for all of the PerformanceIndicators are complete and the strategy andassessment file and lesson plan file arecomplete.

1 2 3 4 5 X 1 = /5Quality of the artifacts and reflections used to

address the Performance Indicators1 2 3 4 5 X 2 = /10Quality of the aesthetics (colors, graphics, etc.),

writing mechanics, spacing/formatting, andfunctioning of portfolio links and organization

1 2 3 4 5 X 1 = /5Quality of the eleven comprehensive Quality

Indicator Meta-Reflections1 2 3 4 5 X 6 = /30Total Points /50

∗∗Ratings awarded according to the following scale

Not Passing Yet Passing: Satisfactory Passing: Good Passing: ExcellentLess than 35 points 35–39 points 40–44 points 45–50 points

Meta-Reflections for Quality Indicators1.2.1 1.2.2 1.2.3 1.2.4 1.2.5 1.2.6 1.2.7 1.2.8 1.2.9 1.2.10 1.2.11

Spring 2008 23

Appendix D. Writing the Reflections

The written reflections (for both the Performance Indicator assignments/artifacts and the Quality Indicator Meta-Reflections) are required to include the following:• A short description of the assignment/artifact. This should include a brief description of the artifact’s context

and should also explain the knowledge base (experiential, observational, and theoretical) anddecision-making that informed its creation.

• A description of how the assignment/artifact you are discussing meets a specific indicator (PerformanceIndicator or Quality Indicator). Be sure to use keywords from the indicator that make a clear connectionbetween the assignment/artifact and the specific indicator. This should also demonstrate your understanding ofthe standards by which the teaching profession is evaluated.

• An assessment of what you have learned and the competence you gained from the activity or experience yourassignment/artifact represents.

• An analysis of the assignment’s/artifact’s impact on PreK-12 student learning.• A projection of what you might do in the future to increase your effectiveness related to the activity or

experience reflected in the assignment/artifact.

Appendix E. Excerpts of a Sample Meta-Reflection on Quality Indicator (QI) 2

The three examples from my student teaching that serve as evidence that I have met this Quality Indicator are: (1) Asentence fragment lesson plan that applies the Constructivist Theory and supports intellectual development, (2) Aneditor’s marks lesson plan that applies Vygotsky’s Theory and supports social development, and (3) A personalpoem lesson plan that applies Gardner’s multiple intelligences and supports personal development. All of theseexamples were experiential during my student teaching. The first example, the sentence fragment lesson plan thatapplies the Constructivist Theory and supports intellectual development (Link: Sentence Fragment Lesson Plan),demonstrated my proficiency with the concepts of this Quality Indicator. I used peer partners to create completesentences from sentence fragments found in magazine advertisements. Pairs of students actively looked to findsentence fragments in the advertisements and wrote complete sentences, building upon the sentence fragment.The students constructed new knowledge from activating their prior knowledge and connecting it to the concept.This lesson supported the intellectual development of students as well as Vygotsky’s social learning theorybecause the students were strengthening their sentence skills while learning from each other. In the secondexample, an editor’s marks lesson plan that applies Vygotsky’s theory and supports social development (Link:Editor’s Marks Lesson Plan), used peer partners working together to learn the editor’s marks and demonstrated myproficiency of the Quality Indicator. From Vygotsky’s theory, I learned that students learn from each other, and thisidea was extended when each pair taught their editor’s mark to the rest of the class using mnemonics and artwork.Besides developing the students’ learning intellectually, the lesson helped students develop socially by usingactive listening, cooperation, and other social skills. For the third example, a personal poem lesson plan thatapplies Gardner’s Multiple Intelligences and supports personal development (Link: Personal Poem Lesson Plan),demonstrated my proficiency of the Quality Indicator. The students wrote a personal poem that expressed theirfeelings, personal interests, and biographical information. They illustrated the poem in a variety of ways thatdemonstrated their individual strengths in verbal/linguistic, spatial, kinesthetic, and other multiple intelligences.When children are given choices of expression, such as in the personal poem, their personal development issupported and encouraged.

Note: The QI addressed by this Meta-Reflection is: Understands how students learn and develop, and provides learningopportunities that support the intellectual, social, and personal development of all students. Highlights were provided by thepreservice teacher herself.

24 Educational Measurement: Issues and Practice