assessing higher education teachers through peer assistance and review
TRANSCRIPT
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
1/17
104The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
Assessing Higher Education Teachers through Peer Assistance and ReviewCarlo Magno
De La Salle University, Manila
AbstractThe present study advances the practice of assessing teacher performance by constructing a
rubric that is systematically anchored on an amalgamated professional practice and learner-
centered framework (see Magno & Sembrano, 2009). The validity and reliability of the
rubric was determined using both classical test theory and item response theory, and
implications for a new way of looking at the function of teacher performance assessment
results for higher education institutions. The rubric used by fellow teachers is called the
Peer Assistance and Review Form (PARF). The items reflect learner -centered practices
with four domains anchored on Danielsons Components of Professional Practice
principles: Planning and preparation, classroom environment, instruction, and professional
responsibility. The rubric was pilot tested with 183 higher education faculty. The
participants were observed by two raters in their class. Concordance of the two raters was
established across the four domains (=.47, p
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
2/17
105The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
2004; Kerchner & Koppich 1993; Bruce & Ross, 2008). Peer evaluations in
teaching are described as involving teachers in the summative [also formative]
evaluation of other teachers (Goldstein, 2004, p. 397). Itwas further described by
Graves, Sulewski, Dye, Deveans, Agras, and Person (2009, p. 186) as evaluating
ones peers allow the assessment of ones teaching by another person who has
similar experience and goals. A more explicit definition was provided by Bruceand Ross (2008, p. 350) about peer evaluation, they described it as:
a structured approach for building a community in which pairs of
teachers of similar experience and competence observe each other
teach, establish improvement goals, develop strategies to implement
goals, observe one another during the revised teaching, and provide
specific feedback.
The purposes of rating teachers, such as hiring, clinical supervision, and
modeling, are best facilitated using peer evaluations. Teachers performance from
peer reviews should be conceptualized with the aim of helping teachers to improve
their teaching rather than solely pointing out their mistakes (Oakland &
Hambleton, 2006; Stiggins, 2008). It is described as a constructive process wherethe peer aims to provide assistance to a less experienced teacher in improving their
instruction with a focus on student-teacher interaction. Blackmore (2005) reiterated
the constructive idea of peer review where the aim of assessing teachers should
bring about changes and improvement in the practice of teaching.
Goldstein (2003) indicates that there is a need for extensive research in the
area of peer assessment of teacher performance especially with regard to
implementation issues. The present study constructed an instrument that serves the
purpose of peer assistance and review for higher education faculty. This instrument
will be carried out by faculty peers that serve to provide feedback for the faculty in
higher education.
Teachers View of Peer ReviewPeer review of teachers performance is defined and described with several
intentions but the teachers who are constantly observed create their own views.
These views are described in studies as thoughts and perceptions created by
teachers as part of the process. Views were also quantitatively assessed using attitude
scales reflecting certain components such as general attitudes and domain specific
attitude (Wen, Tsai, & Chang 2006).
The teachers view about their fellow teachers assessment was shown in the
study of Kell and Annetts (2009) where they invited teaching staffs to verbalize their
perceptions about the Peer Review of Teaching (PRT) and clarify issues. Theteaching staffs were asked to provide their personal reflections about the PRT.
They found that giving the teaching staff ownership of the PRT makes them
autonomous and develop flexibility in the process. In terms of rationale and
purpose of the PRT, the staff saw it to be formative and useful for personal and
professional development, while the newer staff viewed it as summative and audit-
like. The ethics behind PRT included comments like lack of time and the review
being potentially biased that they do not like to participate. The affective issues
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
3/17
106The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
were complaints about pulling of ranks and undercurrents of power gains. On the
other hand, the study by Atwood, Taylor, and Hutchings (2000) on the peer review
teaching program for science professors was able to identify the barriers for the
peer review practice. The barriers include: (1) fear, (2) uncertainty about what
should be reviewed, and (3) how the process is reviewed.
A more positive approach to studying peer reviews was conducted by Carter(2008). He presented useful ways for peer reviewers to enrich the peer review
process. The tips are meant to make the review as pleasant as possible: (1)
understand alternative views of teaching and learning; (2) prepare seriously for pre-
visit interview; (3) listen carefully to what the students says in the pre-visit interview;
(4) watch the students not just the teacher. The views by Carter (2008) provide
alternative ways of implementing the peer review process that focus more on the
constructive aspect.
Milanowsi (2006) explained that peer review can become more constructive
when peers discuss performance problems and suggestions (without the
responsibility for making an administrative evaluation, evaluators will be able to
provide more assistance toward improving performance). It is constructive whenthe function of the review is split into administrative and developmental. The
developmental evaluation and feedback is provided by a peer mentor, while
administrative evaluation by managers and peer evaluators, or a combined role
group, in which developmental evaluation, feedback, and administrative evaluation
were provided by a peer. The views of the teachers in the study about the peer
review showed that ratees in the split role group were slightly less open to
discussions of problems or weaknesses than those in the combined role group. The
results of the interview showed that a larger proportion of those in the split role
group reported being comfortable discussing their problems or weaknesses than
those in the combined group. However, the difference is small. The study by Keig
(2000) determined the perceptions of the faculty on several factors that mightdetract from and/or enhance their likelihood to take part in formative peer review
of teaching. They also determined the perception of faculty how peer assessment
might benefit the faculty, colleagues, students, and the institution. They found that
the more the faculty is willing to participate in the peer review, the less likely they
would become a detractor to the faculty. This indicates that the faculty who engages
in peer reviews has good intentions for their fellow faculty.
Effects of Peer Reviews of TeachingDifferent studies have shown that when peer reviews are intended for a
positive and constructive approach, it can be beneficial for its intended outcomes(Bruce & Ross, 2009; Reid, 2008; Bernstein & Bass, 2000; Blackmore, 2005; Yon,
Burnap, & Kohut, 2002; Kumrow & Dahlem, 2002). For example, an anonymous
writer (2006) reported that when the peer assistance and review was implemented
statewide in Canada, it reinforced the value for teaching as a highly skilled vocation,
it helped teachers become more reflective on their teaching, and increased student
learning as reflected through the increased SAT scores. Bruce and Ross (2008)
found that peer reviews increased teachers efficacy. Moreover, Reid (2008) found
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
4/17
107The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
that teachers and peers saw opportunities for developing relationships. The
implementation by Kumrow and Dahlem (2002) reported that the number and
quality of classroom observations exponentially increased.
The Present StudyThe empirical studies about peer assistance and review are still not as rich as
those about teachers performance based on students perspectives. The majority of
the literature about peer assistance is comprised of articles or just reviews explaining
the process and ways on how it will be implemented. The few studies completed
report improvement in practice (Bruce & Ross, 2008; Kumrow & Dahlem, 2002),
highlight teaching practices (Bernstein & Bass, 2000), development of framework
for teaching (Blackmore, 2005), and autonomy of the teacher (Yon, Burnap, &
Kohut 2002). These benefits necessitate the proper implementation of the peer
assistance and review in a higher education setting.
The present study constructed a rubric called Peer Assistance Review Form
(PARF) that is applicable in the Philippine higher education institutions which alsopurports to yield the same benefits mentioned in the reviews. The rubrics validity
and reliability were established using concordance among raters, convergence, item
fit through the Rasch Partial credit model, and Confirmatory Factor Analysis. The
items in the rubric are anchored on the Learner-centered principles and
Danielsons Components of Professional Practice. The learner-centered principles
are perspectives that allow the teachers ability to facilitate the learners in their
learning, the learning in the programs, and other processes that involve the learner
(Magno & Sembrano, 2007; McCombs, 1997). On the other hand, Danielsons
Components of Professional Practice identified aspects of the teachers
responsibilities that have been documented through empirical studies and
theoretical research promoting improved student learning (Danielson, 1996). Theframework is divided into 22 components clustered into four domains of teaching
(planning and preparation, classroom environment, instruction, and professional
responsibility). The theoretical combination of the learner-centered and
components of professional practice in a framework was discussed in the study by
Magno and Sembrano (2009, p. 168). The amalgamation was further described as
a combination of aspects of the teaching and learning process. More so, this
amalgamation is representative in the assessment of the teaching and learning
process in higher education.
MethodParticipantsThe participants in the study were 183 randomly selected teachers in a higher
education institution in Manila, Philippines. These teachers have finished their
masters and doctorate degree, and some are still in progress. These teachers are
teaching in five different major areas: Multidisciplinary studies, management and
information technology, hotel, restaurant, and institutional management, design and
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
5/17
108The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
arts, and deaf studies. A proportion of faculty was randomly selected for each
school that served as ratee.
InstrumentThe criteria used in the Peer Assistance and Review (PARF) were based on
the four domains and the underlying components of Danielsons Components of
Professional Practice. The descriptions for each criterion and four gradations of
responses are also framed within the learner-centered principles. The gradations of
the responses for each criterion were established based on the descriptions of each
domain and components. The descriptions were confirmed and reviewed by higher
education teachers and administrators through a focus group discussion (FGD)
method. The faculties invited as reviewers arrived at a consensus on the rating
categories according to its suitability of ideal teaching and facilitation of learning for
higher education. The FGD guide was facilitated by allowing the participants to
determine if the provided descriptors in the rubric are applicable for them, relevant
in their teaching, phrased accordingly, produce consistent meanings for differentusers, and will have a wide variety of uses.
The revised rubric was distributed to all teaching faculty. They judged
whether the items are relevant for their teaching.
A copy of the revised version of the PARF was given to experts in the field of
performance assessment specifically for teachers performance. The reviewers were
given one week to accomplish the task. The definitions of the components and
purpose of the PARF were also provided so that the reviewers were guided if the
criteria are relevant. After receiving the forms with comments, the instrument was
revised once again.
The instrument that was pretested was composed of 88 items under each of
the four domains: Planning and preparation (25 items), classroom environment (21items), instruction (22 items), and professional responsibility (20 items). Each item
is rated using an analytical rubric using a four point scale (4=exemplary,
successful=3, limited=2, poor=1).
ProcedureBefore the actual observation of the raters commenced, the selected faculty
who served as raters were oriented on the process of conducting the peer assistance
and review and how to accomplish the forms. The orientation was meant to train
the faculty about the purpose, importance, and specific processes involved in the
peer assistance and review. The orientation was conducted before the start of theterm. After the training, each ratee was informed about their schedule as to when
they would be observed and rated. Each ratee was provided with a copy of the
PARF in advance to prepare them for in the actual observation. The faculty
members serving as ratees were informed that the purpose of the observation was
simply to test the instrument; it would have no impact on administrative evaluation
or salary. The observation took place within the class periods within the whole
term. The raters visited and communicated with the ratee several times to complete
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
6/17
109The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
evidence for the scale. These visits and meetings were conducted outside of the
classroom. The ratee was requested to provide a syllabus and other pertinent
documents during the period of observation for the raters reference. A detailed
implementation guide for the observation was provided to the ratees and raters.
The ratee, during the period of observation was requested to refrain from
giving exams, writing activities, group works, reporting, etc. that would consume theentire period. This was to ensure that there would be some teaching samples to be
observed and rated.
In the observation period, there were two raters for each faculty: The primary
and secondary rater. This procedure was done to establish the concordance of the
ratings. If there was no common time among the three raters, the observation could
take place in different periods. Each rater observed the same teacher in the same
class.
The data from the pretest were encoded and analyzed for reliability and
validity. Acceptable items were determined using the Polytomous Rasch Model
(Rating Scale Analysis) by assessing item fit (Andrich, 1978). The approach is a
probabilistic measurement model for sequential integer scores such as a Lickertscale. The WINSTEPS software was used to generate results of the Polytomous
Rasch Model. The PARF criteria with inadequate fit were revised.
ResultsThe data with N=183 teachers were used to analyze the reliability and validity
of the PARF. Each ratee was rated by a primary and secondary rater. Missing values
in the data were treated using mean replacement and the descriptive statistics that
includes means, standard deviations, skewness, and kurtosis were obtained. The
reliability was also obtained using Cronbachs alpha. Convergent validity of the
rating scale was established by correlating the factor scores for each rater andbetween the two raters. The Polytomous Rasch Model was used to investigate the
step calibration of the scale and fit of the items. The factor structure of the
theoretical model was tested using Confirmatory Factor Analysis (CFA). Parceled
solutions resulted in less bias in estimates of structural parameters under normal
distributions than did solutions based on the individual items (Bandalos, 2002).
The means (M=3.40, M=3.41) for the rating given by the primary and
secondary raters are high given the highest possible score of 4.0. The means
provided by both raters are almost the same indicating that the ratings were very
consistent. The distribution of scores tends to be negatively skewed with peak
modes. This is consistent with the high values of the means where majority of the
ratings were between 3 and 4, and very few gave a rating of 1.00.The overall internal consistency of the scores, using Cronbachs alpha, for
both primary and secondary raters is .98 which indicates high reliability. For
primary raters alone, the internal consistency is .98 and for second raters alone, the
internal consistency is .97 which also indicates high reliability.
When the ratings of the primary raters and secondary raters were tested for
concordance, the results of the correlation was significant (=.47, p
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
7/17
110The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
moderate level. This significant concordance also indicates the reliability of the
scale across two external raters. This implies similarity in the understanding of the
items and observations for the same teacher being rated.
The means, standard deviations, and internal consistencies were broken
down by the domains in the instrument and the results were still consistent across
the primary and secondary raters. The mean rating were still high (M=3.45 toM=3.38). This shows that even across domains the ratings between the primary and
secondary raters for one teacher was consistent. In the same way, the Cronbachs
alpha for each domain had very high internal consistencies.
The convergence of the domains was tested across the same rater and across
the primary and secondary rater.
Table 1
Convergence of the Domains for Primary and Secondary Raters
Secondary Rater
Primary Rater 1 2 3 4 M SDCronbachs
Alpha
1. Planning and preparation --- .76**a .83**a .64**a 3.45 0.36 .94
2. Classroom Environment .85**b --- .82**a .66**a 3.38 0.37 .93
3. Instruction .88**b .87**b --- .67**a 3.38 0.37 .94
4. Professional responsibility .76**b .73**b .79**b --- 3.40 0.38 .93
M 3.45 3.38 3.39 3.39
SD 0.32 0.33 0.32 0.33
Cronbachs alpha .93 .92 .92 .91
Note. a values represent correlations among the secondary raters, b values are
correlations for the primary raters
**p
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
8/17
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
9/17
112The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
Item fit mean square (MNSQ) using WINSTEPS was computed to
determine if the items under each domain have a unidimensional structure. MNSQ
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
10/17
113The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
INFIT values within 1.2 and 0.8 are acceptable. High values of item MNSQ
indicate a lack of construct homogeneity with other items in a scale, whereas low
values indicate redundancy with other items (Linacre & Wright, 1998). Two
Rasch analyses were conducted separately for each rating provided by the primary
and secondary raters.
For the primary rater, four items lacked construct homogeneity whichmeans that they are not measuring the same construct as the other factors. These
items are about service to the school, participation in college wide-activities,
enhancement of content-knowledge pedagogy, and service to the profession
respectively. On the other hand, six items are redundant with other items. These
items are about instructional materials, lesson and unit structure, quality of
feedback, lesson adjustment, and student progress in learning respectively.
For the secondary rater, eight items lacked construct homogeneity. These
items are about student interaction, importance of content, student pride in work,
quality of questions, engagement of families and student services, service to the
school, participation in college-wide activities, and service to the profession. On the
other hand, three items were redundant with other items. These items are aboutquality of feedback, timeliness of feedback, and lesson adjustment.
A Confirmatory Factor Analysis (CFA) was conducted to examine the factor
structure of the Danielsons Components of Professional Practice as a four-factor
scale. The first model tested a four-factor structure with the indicators or manifest
variables used were the actual items (the ratings of the primary and secondary raters
for each item was averaged). There were 25 items for planning and preparation, 21
items for the classroom environment, 22 items for instruction, and 20 items for
professional responsibility. The result of the measurement model showed that the
four factors are significantly related and all 88 indicators had significant paths to
their respective factors. However the data did not fit the specific model, 2=8829.23,
df=3734, PGI=.57, Bentler-Bonnett Normed Fit Index=.46, Bentler-Bonnett Non-Normed Fit Index=.56. A second measurement model was constructed retaining
the four factors with few constraints. The constraints were reduced by having less
parameter estimates in the model. This was done by creating three parcels as
indicators to each factor. The parcels were created by combining item scores for
both primary and secondary rater. Given few indicators per factor, the dfin the
second analysis was reduced to 132 that yielded a larger statistical power and model
fit. The results in the second analysis showed that all four factors are significantly
correlated and each parcel is also significant. The fit of the model improved as
compared with a measurement model with more constraints, 2=262.47, df=132,
PGI=.86, Bentler-Bonnett Normed Fit Index=.89, Bentler-Bonnett Non-Normed
Fit Index=.87. The results of the CFA showed that the four factors of DanielsonsComponents of Professional Practice is adequate and can be used.
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
11/17
114The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
DiscussionA rating scale anchored on Danielsons Components of Professional
Practice and learner-centered principles was constructed to rate teachers
performance. The analysis involved statistics to determine the internal
consistencies, convergence, item fit, and factor structure of the scale. These
analyses somehow had favorable results regarding the validity and reliability of the
scale.
For the scales internal consistency, the obtained Cronbachs alpha was
high, given the ratings provided by the primary and secondary raters for the whole
scale and for each factor. Internal consistency of the items was achieved in a similar
fashion for the two raters. The items indicate that the scale is measuring the same
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
12/17
115The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
overall construct. When the internal consistency of the items were computed for
each domain, high Cronbachs alpha were also obtained. Even if the items were
reduced, as in the case of each factor, high internal consistency was still achieved.
When the primary and secondary raters were tested if they concord on the
same observation, a significant coefficient was obtained (=.47). There is
consistency of ratings across two separate raters. This consistency reflects a
common understanding of the items meaning and observation of the teacher being
observed. This is a good indicator for future use of the test considering that the
actual implementation involves two or even multiple raters. These two raters need
to concord with their ratings of the same teacher to achieve a more consistent result
of the teachers performance. This concordance is facilitated by the items where
each rater had a common understanding and frame of assessment for the teacher
being observed. When the concordance analysis was conducted for each domain,
significant relationships occurred for the four factors across the two raters. The two
raters do not only have consistent understanding and reference of observation for
the whole scale, the same consistency is carried for each factor.
The scale also showed convergence across the domains for each rater. Theresults show significant correlations of all the four factors in the case of the primary
and secondary raters. The same pattern of correlation was also achieved for the
primary and secondary raters. The pattern of correlations showed that domains
planning and preparation, the classroom environment, and instruction were highly
correlated. Even if all four factors were significantly correlated, the correlation
coefficients for professional responsibility with the other factors are not as high as
compared with the coefficients for the first three. The same pattern of correlations
is true for both the primary and secondary raters. This shows that professional
responsibility is not seen as highly linked to teaching as compared to the first three
domains (planning and preparation, classroom environment, and instruction). The
raters and teachers do not seem to consider much the professional responsibility tobe integrated strongly with classroom performance or its translation into the actual
teaching process as compared to the kind of integration in the first three domains.
The item analysis using the Polytomous Rasch Model showed that the items
on student interaction, importance of content, student pride in work, quality of
questions, engagement of families and student services, service to the school,
participation in college wide-activities, enhancement of content-knowledge
pedagogy, and service to the profession are out of bounds as compared to other
items. These items did not seem applicable for majority of the teachers. There was
agreement between the primary and secondary raters on this misfit especially on
three items (participation in college wide-activities, enhancement of content-
knowledge pedagogy, service to the profession). This was consistent in theconvergence of the domains. Given these three items, the raters and teachers has a
tendency to view a weak integration of participating in college activities, attending
seminars, and publication as part of their teaching performance or their role to
improve ones teaching (items of professional responsibility).
The item analysis using the Polytomous Rasch Model also showed that the
items on instructional materials, lesson and unit structure, quality of feedback,
timeliness of feedback, lesson adjustment, student progress in learning respectively
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
13/17
116The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
are redundant with the other items. There was agreement between primary and
secondary raters on quality of feedback and lesson adjustment. These items were
rated more likely in the same way as the other items. These items were carefully
reviewed again by the faculty and agreed to remove them from the pool of items.
The adequacy of the model composed of a four factor structure was proven
in the study. This shows that the four factors (planning and preparation, classroomenvironment, instruction, and professional responsibility) can be used as essential
components in assessing teacher performace in the higher education. This shows
that the scale measures effectively four distinct domains. Previous studies using
Danielsons components of professional practice were widely applied for teachers
teaching in the elementary and high school. However, the present study showed the
appropriateness of the domains even for higher education institutions.
The results of the present study points to three perspectives on assessing
teacher performance: (1) The need to inculcate professional responsibility such as
research and continuing education programs for higher education faculty, (2) the
advantage of the instrument having multiple raters, and (3) expectations that needs
to be set for higher education institutions in the Philippines.Professional responsibility is an important part of higher education faculties
work requirements. However, the study found that service to the profession such as
research and publications, participation to school activities, and enhancement of
pedagogy were less integrated with instruction among teachers in higher education.
This scenario is typical in most higher education institutions where the teachers
work is concentrated on teaching, whereas the professional responsibility is
underrated. Once a teacher is hired in a higher education institution, the teacher is
defined on how much teaching load is given and much expectation is placed on
teaching. The entire semester of the teacher is devoted on teaching and no time for
professional responsibility is provide such as engagement in research, looking for
publication opportunities, and attending contributing professional development. Ascompared to other countries, universities and colleges balance both teaching and
research (Calma, 2009; Magno, 2009a). Colleges and universities in the Philippines
have limited opportunities and resources given for a faculty to conduct research and
establish their own research laboratories and facilities. This is reflective of the very
few universities entering and very low status of universities in the world university
rankings by the Times Higher Education (Magno, 2009a). For other professional
and pedagogical enhancements, the selection is very limited and the funds provided
are very minimal for a faculty to attend conferences within and outside of the
country. The same scenario is true for teachers in the grade school and high school.
Much of the rewards are for teaching and not on certain professional responsibility
such as research, publications, and involvement in professional organizations.The strength of the Peer Assistance Review Form developed in the study
rests on the consistency obtained through multiple raters and scale calibration
procedure. The raters were consistent in their interpretations, ratings, and
calibration of the scales. The calibration of the scale from lowest to highest in terms
of its degree is one aspect of scale fidelity that most test developers neglect to report
(Magno, 2009b). This procedure can accurately be estimated using a Polytomous
Rasch Model. A new perspective for rating scales is not only to establish its internal
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
14/17
117The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
consistencies and factorial structure, it is also important to determine and report its
scale calibration. The category structure allows scale developers to decide on the
appropriateness of the scale length and the type of scale used. Another advantage
that led to the results is the refined description of the scale framed in an analytical
rubric format (Reddy & Andrade, 2010). The raters can easily and elaborately
distinguish among skills presented in each global criterion. This ensures theappropriate gradation of the scale.
Lastly is the need to look further at the standards and competencies for
higher education teachers. This issue is addressed in the study by testing specific
competencies required of higher education teachers. These standards of
competencies need to be set to ensure that students are benefitting through
instruction (Berdrow & Evers, 2009). Colleges and universities need to adhere to
teaching and learning frameworks that will serve and carry out their mission and
vision well. Very few universities in the Philippines adhere to specific teaching and
learning thrusts which led to poor educational standards (Magno, 2009a). In the
Philippine setting, the competencies of teachers in the basic education are
specified. However, this move is also not impossible because of the rich tradition ofliterature for higher education. The present study attempted to frame these
competencies using an amalgamated framework of the learner-centered practices
and Danielsons components of professional practice (see Magno & Sembrano,
2009). This study pioneers the setting of specific teaching and learning frameworks
for faculty in the Philippines.
The move on assessing teacher performance rigorously needs to be
advocated in Philippine higher education institutions to ensure accountability of
graduates and quality of faculty. Assessing teacher performance also needs to take a
developmental process where results should be used to help teachers reach
specified expectations (Bruce & Ross, 2009; Reid, 2008; Bernstein & Bass, 2000;
Blackmore, 2005; Yon, Burnap, & Kohut, 2002; Kumrow & Dahlem, 2002). Thismove is carried out by having good instrument to facilitate these benefits. The use
of assessment instruments for rating teachers should coincide with practices that will
also help teachers improve their teaching.
Having established avalid and reliable scale for teachers performance means
that proper and appropriate assessment tool can be used. Rigorous assessment of
teacher performance is known to occur in the basic education (grade school and
high school teacher) in the Philippine setting. There is very limited advocacy in
maintaining the move for teacher performance assessment and measures in the
Philippines higher education institutions because of the complexity of its structure
(involvement in research and professional development). However, the present
study pushed these frontiers first by providing an instrument evidenced to beappropriate and implemented the possibility of proper assessment practices among
higher education faculty.
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
15/17
118The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
ReferencesAllison-Jones, L. L., & Hirt, J. B. (2004). Comparing the teaching effectiveness of
part-time and full-time clinical nurse faculty. Nursing Education
Perspectives, 25, 238-242.
Andrich, D. (1978). A rating formulation for ordered response categories.Psychometrika, 43, 561-73.
Anonymous. (2006). Standards-based teacher evaluations. Gifted Child Today, 29,
8-9.
Atwood, C. H., Taylor, J. W., & Hutchings, P. A. 2000. Why are chemists and
other scientists afraid of the peer review of teaching? Journal of Chemical
Education, 77, 239-244.
Bandalos, D. L. (2002). The effects of item parceling on goodness-of-fit and
parameter estimate bias in structural equation modeling. Structural
Equation Modeling, 9, 78-102.
Berdrow, I., & Evers, F. T. (2009). Bases of competence: an instrument for self and
institutional assessment. Assessment and Evaluation in Higher Education,35, 419-434.
Bernstein, D., & Bass, R. 2005. The scholarship of teaching and learning.
Academe, 91, 37-44.
Blackmore, J. A. (2005). A critical evaluation of peer review via teaching
observation within higher education. The International Journal of
Educational Management, 19, 215-320.
Bruce, C. D., & Ross, J. A. (2008). A model for increasing reform implementation
and teacher efficacy: Teacher peer coaching in grades 3 and 6 mathematics.
Canadian Journal of Education,31, 346-370.
Calma, A. (2010). The context of research training in the Philippines: Some key
areas and their implications. The Asia-Pacific Education Researcher, 18,167-184.
Carter, V. K. (2008). Five steps to become a better peer reviewer. College
Teaching,56, 85-90.
Centra, J. A. (1998). The development of the student instructional report II.
Princeton, New Jersey: Educational Testing Service.
Danielson, C. (1996). Enhancing professional practice: A framework for teaching.
Alexandria, VA: Association for Supervision and Curriculum Development.
Goldstein, J. (2003). Making sense of distributed leadership: The case of peer
assistance and review. Educational Evaluation and Policy Analysis, 25, 397-
421.
Goldstein, J. (2004). Making sense of distributed leadership: The case of peerassistance and review. Educational Evaluation and Policy Analysis,26, 173-
197.
Gosling, D. (2002). Models of peer observation of teaching. LTSN Generic Centre.
Graves, G., Sulewski, C. A., Dye, H. A., Deveans, T. M., Agras, N. M., & Pearson,
J. M. (2009). How are you doing? Assessing effectiveness in teaching
mathematics. Primus, 19, 174-193.
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
16/17
119The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
Heckert, T. M., Latier, A.., Ringwald, A., & Silvey, B. (2006). Relation of course,
instructor, and student characteristics to dimensions. College Student
Journal, 40, 1-11.
Howard, F. J., Helms, M. M., & Lawrence, E. P. (1997). Development and
assessment of effective teaching: an integrative model for implementation in
schools of business administration. Quality Assurance in Education,5, 159-161.
Keig, L. (2000). Formative peer review of teaching: Attitudes of faculty at liberal arts
colleges toward colleague assessment. Journal of Personnel Evaluation in
Education,14, 67-87.
Kell, C., & Annetts, S. (2009). Peer review of teaching embedded practice or policy-
holding complacency? Innovations in Education and Teaching
International,46, 61-70.
Kerchner, C. T., & Koppich, J. E. (1993).A union of professionals: Labor relations
and education re-form. New York: Teachers College Press.
Kumrow, D., & Dahlem, B. (2002). Is peer review an effective approach for
evaluating teachers? The Clearing House,75, 236-240.Linacre, J. M., & Wright, B. D. (1998). A user's guide to Winsteps, Bigsteps, and
Ministeps: Rasch-model computer programs. Chicago: MESA Press.
Louis, K. S., & Marks, H. M. (1998). Does professional community affect the
classroom? Teachers work and student experience in restructuring schools.
American Journal of Education,106, 532-575.
Magno, C. (2009a). A metaevaluation study on the assessment of teacher
performance in an assessment center in the Philippines. The International
Journal of Educational and Psychological Assessment, 3, 75-93.
Magno, C. (2009b). Demonstrating the difference between classical test theory and
item response theory using derived test data. The International Journal of
Educational and Psychological Assessment, 1, 1-11.Magno, C., & Sembrano, J. (2007). The Role of teacher efficacy and characteristics
on teaching effectiveness, performance, and use of learner-centered
practices. The Asia-Pacific Education Researcher, 16, 73-91.
Magno, C., & Sembrano, J. (2010). Integrating learner-centeredness and teaching
performance in a theoretical model. International Journal of Teaching and
Learning in Higher Education, 21(2), 158-170.
Marsh, H. W., & Bailey, M. (1993). Multidimensional students' evaluations of
teaching effectiveness. The Journal of Higher Education, 64, 1-18.
McCombs, B. L. (1997). Self-assessment and reflection: Tools for promoting
teacher changes toward learner-centered practices. NASSP Bulletin,81, 1-
14.McLymont, E. F., & da Costa, J. L. (1998, April). Cognitive coaching the vehicle for
professional development and teacher collaboration. Paper presented at the
annual meeting of the American Educational Research Association, San
Diego, CA.
Oakland, T., & Humbleton, R. K. (2006). International perspectives on academic
assessment. New York: Springer.
-
7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review
17/17
120The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)
2012 Time Taylor Academic JournalsISSN 2094-0734
Pike, C. K. (1998). A validation study of an instrument designed to measure
teaching effectiveness.Journal of Social Work Education,34, 261-272.
Reddy, Y. M., & Andrade, H. (2010). A review of rubric use in higher education.
Assessment and Evaluation in Higher Education,35, 435-448.
Reid, E. S. (2008). Mentoring peer mentors: Mentor education and support in the
composition program. Composition Studies,36, 1-31.Ross, J. A., McDougall, D., & Hogaboam-Gray, A. (2002). Research on reform in
mathematics education, 1993-2000. Alberta Journal of Educational
Research,48, 122-138.
Scriven, M. (1994). Duties of as teacher. Journal of Personnel Evaluation in
Education, 8, 151-184.
Stiggins, R. (2008). Assessment for learning, the achievement gap, and truly
effective schools. Portland, OR: ETS Assessment Training Institute.
Stolle, C., Goerss, B., & Watkins, M. (2005). Implementing portfolios in a teacher
education program. Issues in Teacher Education,14, 25-34.
Stringer, M., & Irwing, P. (1998). Students' evaluations of teaching effectiveness: A
structural modelling approach. British Journal of Educational Psychology,68, 409-511.
Tang, L. T. (1997). Teaching evaluation at a public institution of higher education:
Factors related to the overall teaching effectiveness. Public Personnel
Management,26, 379-380.
Wen, M. L., Tsai, C., & Chang, C. (2006). Attitudes towards peer assessment: a
comparison of the perspectives of pre-service and in-service teachers.
Innovations in Education and Teaching International, 43, 83-93.
Wray, S. (2008). Swimming upstream: Shifting the purpose of the an existing
teaching portfolio requirement. Professional Educator, 32, 1-17.
Yon, M., Burnap, C., & Kohut, G. 2002. Evidence of effective teaching:
Perceptions of peer reviewers. College Teaching, 50, 104-111.Young, S., & Shaw, D. G. 1999. Profiles of effective college and university teachers.
The Journal of Higher Education,70, 670-687.
About the AuthorDr. Carlo Magno is presently a faculty of the Counseling and Educational
Psychology Department at De La Salle University, Manila. Most of his research
focuses on the development of different forms of teacher assessment protocols. He
is involved with several projects that involve assessment of teacher competencies inthe Philippines. Further correspondence can be addressed to him at the College of
Education, De La Salle University, 2401 Taft Ave., Manila, Philippines, e-mail: