assessing higher education teachers through peer assistance and review

7/28/2019 Assessing Higher Education Teachers through Peer Assistance and Review

1/17

104The International Journal of Educational and Psychological AssessmentJanuary 2012, Vol. 9(2)

2012 Time Taylor Academic JournalsISSN 2094-0734

Assessing Higher Education Teachers through Peer Assistance and ReviewCarlo Magno

De La Salle University, Manila

AbstractThe present study advances the practice of assessing teacher performance by constructing a

rubric that is systematically anchored on an amalgamated professional practice and learner-

centered framework (see Magno & Sembrano, 2009). The validity and reliability of the

rubric was determined using both classical test theory and item response theory, and

implications for a new way of looking at the function of teacher performance assessment

results for higher education institutions. The rubric used by fellow teachers is called the

Peer Assistance and Review Form (PARF). The items reflect learner -centered practices

with four domains anchored on Danielsons Components of Professional Practice

principles: Planning and preparation, classroom environment, instruction, and professional

responsibility. The rubric was pilot tested with 183 higher education faculty. The

participants were observed by two raters in their class. Concordance of the two raters was

established across the four domains (=.47, p


2/17



2004; Kerchner & Koppich 1993; Bruce & Ross, 2008). Peer evaluations in

teaching are described as involving teachers in the summative [also formative]

evaluation of other teachers (Goldstein, 2004, p. 397). Itwas further described by

Graves, Sulewski, Dye, Deveans, Agras, and Person (2009, p. 186) as evaluating

ones peers allow the assessment of ones teaching by another person who has

similar experience and goals. A more explicit definition was provided by Bruceand Ross (2008, p. 350) about peer evaluation, they described it as:

a structured approach for building a community in which pairs of

teachers of similar experience and competence observe each other

teach, establish improvement goals, develop strategies to implement

goals, observe one another during the revised teaching, and provide

specific feedback.

The purposes of rating teachers, such as hiring, clinical supervision, and

modeling, are best facilitated using peer evaluations. Teachers performance from

peer reviews should be conceptualized with the aim of helping teachers to improve

their teaching rather than solely pointing out their mistakes (Oakland &

Hambleton, 2006; Stiggins, 2008). It is described as a constructive process wherethe peer aims to provide assistance to a less experienced teacher in improving their

instruction with a focus on student-teacher interaction. Blackmore (2005) reiterated

the constructive idea of peer review where the aim of assessing teachers should

bring about changes and improvement in the practice of teaching.

Goldstein (2003) indicates that there is a need for extensive research in the

area of peer assessment of teacher performance especially with regard to

implementation issues. The present study constructed an instrument that serves the

purpose of peer assistance and review for higher education faculty. This instrument

will be carried out by faculty peers that serve to provide feedback for the faculty in

higher education.

Teachers View of Peer ReviewPeer review of teachers performance is defined and described with several

intentions but the teachers who are constantly observed create their own views.

These views are described in studies as thoughts and perceptions created by

teachers as part of the process. Views were also quantitatively assessed using attitude

scales reflecting certain components such as general attitudes and domain specific

attitude (Wen, Tsai, & Chang 2006).

The teachers view about their fellow teachers assessment was shown in the

study of Kell and Annetts (2009) where they invited teaching staffs to verbalize their

perceptions about the Peer Review of Teaching (PRT) and clarify issues. Theteaching staffs were asked to provide their personal reflections about the PRT.

They found that giving the teaching staff ownership of the PRT makes them

autonomous and develop flexibility in the process. In terms of rationale and

purpose of the PRT, the staff saw it to be formative and useful for personal and

professional development, while the newer staff viewed it as summative and audit-

like. The ethics behind PRT included comments like lack of time and the review

being potentially biased that they do not like to participate. The affective issues


3/17



were complaints about pulling of ranks and undercurrents of power gains. On the

other hand, the study by Atwood, Taylor, and Hutchings (2000) on the peer review

teaching program for science professors was able to identify the barriers for the

peer review practice. The barriers include: (1) fear, (2) uncertainty about what

should be reviewed, and (3) how the process is reviewed.

A more positive approach to studying peer reviews was conducted by Carter(2008). He presented useful ways for peer reviewers to enrich the peer review

process. The tips are meant to make the review as pleasant as possible: (1)

understand alternative views of teaching and learning; (2) prepare seriously for pre-

visit interview; (3) listen carefully to what the students says in the pre-visit interview;

(4) watch the students not just the teacher. The views by Carter (2008) provide

alternative ways of implementing the peer review process that focus more on the

constructive aspect.

Milanowsi (2006) explained that peer review can become more constructive

when peers discuss performance problems and suggestions (without the

responsibility for making an administrative evaluation, evaluators will be able to

provide more assistance toward improving performance). It is constructive whenthe function of the review is split into administrative and developmental. The

developmental evaluation and feedback is provided by a peer mentor, while

administrative evaluation by managers and peer evaluators, or a combined role

group, in which developmental evaluation, feedback, and administrative evaluation

were provided by a peer. The views of the teachers in the study about the peer

review showed that ratees in the split role group were slightly less open to

discussions of problems or weaknesses than those in the combined role group. The

results of the interview showed that a larger proportion of those in the split role

group reported being comfortable discussing their problems or weaknesses than

those in the combined group. However, the difference is small. The study by Keig

(2000) determined the perceptions of the faculty on several factors that mightdetract from and/or enhance their likelihood to take part in formative peer review

of teaching. They also determined the perception of faculty how peer assessment

might benefit the faculty, colleagues, students, and the institution. They found that

the more the faculty is willing to participate in the peer review, the less likely they

would become a detractor to the faculty. This indicates that the faculty who engages

in peer reviews has good intentions for their fellow faculty.

Effects of Peer Reviews of TeachingDifferent studies have shown that when peer reviews are intended for a

positive and constructive approach, it can be beneficial for its intended outcomes(Bruce & Ross, 2009; Reid, 2008; Bernstein & Bass, 2000; Blackmore, 2005; Yon,

Burnap, & Kohut, 2002; Kumrow & Dahlem, 2002). For example, an anonymous

writer (2006) reported that when the peer assistance and review was implemented

statewide in Canada, it reinforced the value for teaching as a highly skilled vocation,

it helped teachers become more reflective on their teaching, and increased student

learning as reflected through the increased SAT scores. Bruce and Ross (2008)

found that peer reviews increased teachers efficacy. Moreover, Reid (2008) found


4/17



that teachers and peers saw opportunities for developing relationships. The

implementation by Kumrow and Dahlem (2002) reported that the number and

quality of classroom observations exponentially increased.

The Present StudyThe empirical studies about peer assistance and review are still not as rich as

those about teachers performance based on students perspectives. The majority of

the literature about peer assistance is comprised of articles or just reviews explaining

the process and ways on how it will be implemented. The few studies completed

report improvement in practice (Bruce & Ross, 2008; Kumrow & Dahlem, 2002),

highlight teaching practices (Bernstein & Bass, 2000), development of framework

for teaching (Blackmore, 2005), and autonomy of the teacher (Yon, Burnap, &

Kohut 2002). These benefits necessitate the proper implementation of the peer

assistance and review in a higher education setting.

The present study constructed a rubric called Peer Assistance Review Form

(PARF) that is applicable in the Philippine higher education institutions which alsopurports to yield the same benefits mentioned in the reviews. The rubrics validity

and reliability were established using concordance among raters, convergence, item

fit through the Rasch Partial credit model, and Confirmatory Factor Analysis. The

items in the rubric are anchored on the Learner-centered principles and

Danielsons Components of Professional Practice. The learner-centered principles

are perspectives that allow the teachers ability to facilitate the learners in their

learning, the learning in the programs, and other processes that involve the learner

(Magno & Sembrano, 2007; McCombs, 1997). On the other hand, Danielsons

Components of Professional Practice identified aspects of the teachers

responsibilities that have been documented through empirical studies and

theoretical research promoting improved student learning (Danielson, 1996). Theframework is divided into 22 components clustered into four domains of teaching

(planning and preparation, classroom environment, instruction, and professional

responsibility). The theoretical combination of the learner-centered and

components of professional practice in a framework was discussed in the study by

Magno and Sembrano (2009, p. 168). The amalgamation was further described as

a combination of aspects of the teaching and learning process. More so, this

amalgamation is representative in the assessment of the teaching and learning

process in higher education.

MethodParticipantsThe participants in the study were 183 randomly selected teachers in a higher

education institution in Manila, Philippines. These teachers have finished their

masters and doctorate degree, and some are still in progress. These teachers are

teaching in five different major areas: Multidisciplinary studies, management and

information technology, hotel, restaurant, and institutional management, design and


5/17



arts, and deaf studies. A proportion of faculty was randomly selected for each

school that served as ratee.

InstrumentThe criteria used in the Peer Assistance and Review (PARF) were based on

the four domains and the underlying components of Danielsons Components of

Professional Practice. The descriptions for each criterion and four gradations of

responses are also framed within the learner-centered principles. The gradations of

the responses for each criterion were established based on the descriptions of each

domain and components. The descriptions were confirmed and reviewed by higher

education teachers and administrators through a focus group discussion (FGD)

method. The faculties invited as reviewers arrived at a consensus on the rating

categories according to its suitability of ideal teaching and facilitation of learning for

higher education. The FGD guide was facilitated by allowing the participants to

determine if the provided descriptors in the rubric are applicable for them, relevant

in their teaching, phrased accordingly, produce consistent meanings for differentusers, and will have a wide variety of uses.

The revised rubric was distributed to all teaching faculty. They judged

whether the items are relevant for their teaching.

A copy of the revised version of the PARF was given to experts in the field of

performance assessment specifically for teachers performance. The reviewers were

given one week to accomplish the task. The definitions of the components and

purpose of the PARF were also provided so that the reviewers were guided if the

criteria are relevant. After receiving the forms with comments, the instrument was

revised once again.

The instrument that was pretested was composed of 88 items under each of

the four domains: Planning and preparation (25 items), classroom environment (21items), instruction (22 items), and professional responsibility (20 items). Each item

is rated using an analytical rubric using a four point scale (4=exemplary,

successful=3, limited=2, poor=1).

ProcedureBefore the actual observation of the raters commenced, the selected faculty

who served as raters were oriented on the process of conducting the peer assistance

and review and how to accomplish the forms. The orientation was meant to train

the faculty about the purpose, importance, and specific processes involved in the

peer assistance and review. The orientation was conducted before the start of theterm. After the training, each ratee was informed about their schedule as to when

they would be observed and rated. Each ratee was provided with a copy of the

PARF in advance to prepare them for in the actual observation. The faculty

members serving as ratees were informed that the purpose of the observation was

simply to test the instrument; it would have no impact on administrative evaluation

or salary. The observation took place within the class periods within the whole

term. The raters visited and communicated with the ratee several times to complete


6/17



evidence for the scale. These visits and meetings were conducted outside of the

classroom. The ratee was requested to provide a syllabus and other pertinent

documents during the period of observation for the raters reference. A detailed

implementation guide for the observation was provided to the ratees and raters.

The ratee, during the period of observation was requested to refrain from

giving exams, writing activities, group works, reporting, etc. that would consume theentire period. This was to ensure that there would be some teaching samples to be

observed and rated.

In the observation period, there were two raters for each faculty: The primary

and secondary rater. This procedure was done to establish the concordance of the

ratings. If there was no common time among the three raters, the observation could

take place in different periods. Each rater observed the same teacher in the same

class.

The data from the pretest were encoded and analyzed for reliability and

validity. Acceptable items were determined using the Polytomous Rasch Model

(Rating Scale Analysis) by assessing item fit (Andrich, 1978). The approach is a

probabilistic measurement model for sequential integer scores such as a Lickertscale. The WINSTEPS software was used to generate results of the Polytomous

Rasch Model. The PARF criteria with inadequate fit were revised.

ResultsThe data with N=183 teachers were used to analyze the reliability and validity

of the PARF. Each ratee was rated by a primary and secondary rater. Missing values

in the data were treated using mean replacement and the descriptive statistics that

includes means, standard deviations, skewness, and kurtosis were obtained. The

reliability was also obtained using Cronbachs alpha. Convergent validity of the

rating scale was established by correlating the factor scores for each rater andbetween the two raters. The Polytomous Rasch Model was used to investigate the

step calibration of the scale and fit of the items. The factor structure of the

theoretical model was tested using Confirmatory Factor Analysis (CFA). Parceled

solutions resulted in less bias in estimates of structural parameters under normal

distributions than did solutions based on the individual items (Bandalos, 2002).

The means (M=3.40, M=3.41) for the rating given by the primary and

secondary raters are high given the highest possible score of 4.0. The means

provided by both raters are almost the same indicating that the ratings were very

consistent. The distribution of scores tends to be negatively skewed with peak

modes. This is consistent with the high values of the means where majority of the

ratings were between 3 and 4, and very few gave a rating of 1.00.The overall internal consistency of the scores, using Cronbachs alpha, for

both primary and secondary raters is .98 which indicates high reliability. For

primary raters alone, the internal consistency is .98 and for second raters alone, the

internal consistency is .97 which also indicates high reliability.

When the ratings of the primary raters and secondary raters were tested for

concordance, the results of the correlation was significant (=.47, p


7/17



moderate level. This significant concordance also indicates the reliability of the

scale across two external raters. This implies similarity in the understanding of the

items and observations for the same teacher being rated.

The means, standard deviations, and internal consistencies were broken

down by the domains in the instrument and the results were still consistent across

the primary and secondary raters. The mean rating were still high (M=3.45 toM=3.38). This shows that even across domains the ratings between the primary and

secondary raters for one teacher was consistent. In the same way, the Cronbachs

alpha for each domain had very high internal consistencies.

The convergence of the domains was tested across the same rater and across

the primary and secondary rater.

Table 1

Convergence of the Domains for Primary and Secondary Raters

Secondary Rater

Primary Rater 1 2 3 4 M SDCronbachs

Alpha

1. Planning and preparation --- .76**a .83**a .64**a 3.45 0.36 .94

2. Classroom Environment .85**b --- .82**a .66**a 3.38 0.37 .93

3. Instruction .88**b .87**b --- .67**a 3.38 0.37 .94

4. Professional responsibility .76**b .73**b .79**b --- 3.40 0.38 .93

M 3.45 3.38 3.39 3.39

SD 0.32 0.33 0.32 0.33

Cronbachs alpha .93 .92 .92 .91

Note. a values represent correlations among the secondary raters, b values are

correlations for the primary raters

**p


8/17


9/17



Item fit mean square (MNSQ) using WINSTEPS was computed to

determine if the items under each domain have a unidimensional structure. MNSQ


10/17



INFIT values within 1.2 and 0.8 are acceptable. High values of item MNSQ

indicate a lack of construct homogeneity with other items in a scale, whereas low

values indicate redundancy with other items (Linacre & Wright, 1998). Two

Rasch analyses were conducted separately for each rating provided by the primary

and secondary raters.

For the primary rater, four items lacked construct homogeneity whichmeans that they are not measuring the same construct as the other factors. These

items are about service to the school, participation in college wide-activities,

enhancement of content-knowledge pedagogy, and service to the profession

respectively. On the other hand, six items are redundant with other items. These

items are about instructional materials, lesson and unit structure, quality of

feedback, lesson adjustment, and student progress in learning respectively.

For the secondary rater, eight items lacked construct homogeneity. These

items are about student interaction, importance of content, student pride in work,

quality of questions, engagement of families and student services, service to the

school, participation in college-wide activities, and service to the profession. On the

other hand, three items were redundant with other items. These items are aboutquality of feedback, timeliness of feedback, and lesson adjustment.

A Confirmatory Factor Analysis (CFA) was conducted to examine the factor

structure of the Danielsons Components of Professional Practice as a four-factor

scale. The first model tested a four-factor structure with the indicators or manifest

variables used were the actual items (the ratings of the primary and secondary raters

for each item was averaged). There were 25 items for planning and preparation, 21

items for the classroom environment, 22 items for instruction, and 20 items for

professional responsibility. The result of the measurement model showed that the

four factors are significantly related and all 88 indicators had significant paths to

their respective factors. However the data did not fit the specific model, 2=8829.23,

df=3734, PGI=.57, Bentler-Bonnett Normed Fit Index=.46, Bentler-Bonnett Non-Normed Fit Index=.56. A second measurement model was constructed retaining

the four factors with few constraints. The constraints were reduced by having less

parameter estimates in the model. This was done by creating three parcels as

indicators to each factor. The parcels were created by combining item scores for

both primary and secondary rater. Given few indicators per factor, the dfin the

second analysis was reduced to 132 that yielded a larger statistical power and model

fit. The results in the second analysis showed that all four factors are significantly

correlated and each parcel is also significant. The fit of the model improved as

compared with a measurement model with more constraints, 2=262.47, df=132,

PGI=.86, Bentler-Bonnett Normed Fit Index=.89, Bentler-Bonnett Non-Normed

Fit Index=.87. The results of the CFA showed that the four factors of DanielsonsComponents of Professional Practice is adequate and can be used.


11/17



DiscussionA rating scale anchored on Danielsons Components of Professional

Practice and learner-centered principles was constructed to rate teachers

performance. The analysis involved statistics to determine the internal

consistencies, convergence, item fit, and factor structure of the scale. These

analyses somehow had favorable results regarding the validity and reliability of the

scale.

For the scales internal consistency, the obtained Cronbachs alpha was

high, given the ratings provided by the primary and secondary raters for the whole

scale and for each factor. Internal consistency of the items was achieved in a similar

fashion for the two raters. The items indicate that the scale is measuring the same


12/17



overall construct. When the internal consistency of the items were computed for

each domain, high Cronbachs alpha were also obtained. Even if the items were

reduced, as in the case of each factor, high internal consistency was still achieved.

When the primary and secondary raters were tested if they concord on the

same observation, a significant coefficient was obtained (=.47). There is

consistency of ratings across two separate raters. This consistency reflects a

common understanding of the items meaning and observation of the teacher being

observed. This is a good indicator for future use of the test considering that the

actual implementation involves two or even multiple raters. These two raters need

to concord with their ratings of the same teacher to achieve a more consistent result

of the teachers performance. This concordance is facilitated by the items where

each rater had a common understanding and frame of assessment for the teacher

being observed. When the concordance analysis was conducted for each domain,

significant relationships occurred for the four factors across the two raters. The two

raters do not only have consistent understanding and reference of observation for

the whole scale, the same consistency is carried for each factor.

The scale also showed convergence across the domains for each rater. Theresults show significant correlations of all the four factors in the case of the primary

and secondary raters. The same pattern of correlation was also achieved for the

primary and secondary raters. The pattern of correlations showed that domains

planning and preparation, the classroom environment, and instruction were highly

correlated. Even if all four factors were significantly correlated, the correlation

coefficients for professional responsibility with the other factors are not as high as

compared with the coefficients for the first three. The same pattern of correlations

is true for both the primary and secondary raters. This shows that professional

responsibility is not seen as highly linked to teaching as compared to the first three

domains (planning and preparation, classroom environment, and instruction). The

raters and teachers do not seem to consider much the professional responsibility tobe integrated strongly with classroom performance or its translation into the actual

teaching process as compared to the kind of integration in the first three domains.

The item analysis using the Polytomous Rasch Model showed that the items

on student interaction, importance of content, student pride in work, quality of

questions, engagement of families and student services, service to the school,

participation in college wide-activities, enhancement of content-knowledge

pedagogy, and service to the profession are out of bounds as compared to other

items. These items did not seem applicable for majority of the teachers. There was

agreement between the primary and secondary raters on this misfit especially on

three items (participation in college wide-activities, enhancement of content-

knowledge pedagogy, service to the profession). This was consistent in theconvergence of the domains. Given these three items, the raters and teachers has a

tendency to view a weak integration of participating in college activities, attending

seminars, and publication as part of their teaching performance or their role to

improve ones teaching (items of professional responsibility).

The item analysis using the Polytomous Rasch Model also showed that the

items on instructional materials, lesson and unit structure, quality of feedback,

timeliness of feedback, lesson adjustment, student progress in learning respectively


13/17



are redundant with the other items. There was agreement between primary and

secondary raters on quality of feedback and lesson adjustment. These items were

rated more likely in the same way as the other items. These items were carefully

reviewed again by the faculty and agreed to remove them from the pool of items.

The adequacy of the model composed of a four factor structure was proven

in the study. This shows that the four factors (planning and preparation, classroomenvironment, instruction, and professional responsibility) can be used as essential

components in assessing teacher performace in the higher education. This shows

that the scale measures effectively four distinct domains. Previous studies using

Danielsons components of professional practice were widely applied for teachers

teaching in the elementary and high school. However, the present study showed the

appropriateness of the domains even for higher education institutions.

The results of the present study points to three perspectives on assessing

teacher performance: (1) The need to inculcate professional responsibility such as

research and continuing education programs for higher education faculty, (2) the

advantage of the instrument having multiple raters, and (3) expectations that needs

to be set for higher education institutions in the Philippines.Professional responsibility is an important part of higher education faculties

work requirements. However, the study found that service to the profession such as

research and publications, participation to school activities, and enhancement of

pedagogy were less integrated with instruction among teachers in higher education.

This scenario is typical in most higher education institutions where the teachers

work is concentrated on teaching, whereas the professional responsibility is

underrated. Once a teacher is hired in a higher education institution, the teacher is

defined on how much teaching load is given and much expectation is placed on

teaching. The entire semester of the teacher is devoted on teaching and no time for

professional responsibility is provide such as engagement in research, looking for

publication opportunities, and attending contributing professional development. Ascompared to other countries, universities and colleges balance both teaching and

research (Calma, 2009; Magno, 2009a). Colleges and universities in the Philippines

have limited opportunities and resources given for a faculty to conduct research and

establish their own research laboratories and facilities. This is reflective of the very

few universities entering and very low status of universities in the world university

rankings by the Times Higher Education (Magno, 2009a). For other professional

and pedagogical enhancements, the selection is very limited and the funds provided

are very minimal for a faculty to attend conferences within and outside of the

country. The same scenario is true for teachers in the grade school and high school.

Much of the rewards are for teaching and not on certain professional responsibility

such as research, publications, and involvement in professional organizations.The strength of the Peer Assistance Review Form developed in the study

rests on the consistency obtained through multiple raters and scale calibration

procedure. The raters were consistent in their interpretations, ratings, and

calibration of the scales. The calibration of the scale from lowest to highest in terms

of its degree is one aspect of scale fidelity that most test developers neglect to report

(Magno, 2009b). This procedure can accurately be estimated using a Polytomous

Rasch Model. A new perspective for rating scales is not only to establish its internal


14/17



consistencies and factorial structure, it is also important to determine and report its

scale calibration. The category structure allows scale developers to decide on the

appropriateness of the scale length and the type of scale used. Another advantage

that led to the results is the refined description of the scale framed in an analytical

rubric format (Reddy & Andrade, 2010). The raters can easily and elaborately

distinguish among skills presented in each global criterion. This ensures theappropriate gradation of the scale.

Lastly is the need to look further at the standards and competencies for

higher education teachers. This issue is addressed in the study by testing specific

competencies required of higher education teachers. These standards of

competencies need to be set to ensure that students are benefitting through

instruction (Berdrow & Evers, 2009). Colleges and universities need to adhere to

teaching and learning frameworks that will serve and carry out their mission and

vision well. Very few universities in the Philippines adhere to specific teaching and

learning thrusts which led to poor educational standards (Magno, 2009a). In the

Philippine setting, the competencies of teachers in the basic education are

specified. However, this move is also not impossible because of the rich tradition ofliterature for higher education. The present study attempted to frame these

competencies using an amalgamated framework of the learner-centered practices

and Danielsons components of professional practice (see Magno & Sembrano,

2009). This study pioneers the setting of specific teaching and learning frameworks

for faculty in the Philippines.

The move on assessing teacher performance rigorously needs to be

advocated in Philippine higher education institutions to ensure accountability of

graduates and quality of faculty. Assessing teacher performance also needs to take a

developmental process where results should be used to help teachers reach

specified expectations (Bruce & Ross, 2009; Reid, 2008; Bernstein & Bass, 2000;

Blackmore, 2005; Yon, Burnap, & Kohut, 2002; Kumrow & Dahlem, 2002). Thismove is carried out by having good instrument to facilitate these benefits. The use

of assessment instruments for rating teachers should coincide with practices that will

also help teachers improve their teaching.

Having established avalid and reliable scale for teachers performance means

that proper and appropriate assessment tool can be used. Rigorous assessment of

teacher performance is known to occur in the basic education (grade school and

high school teacher) in the Philippine setting. There is very limited advocacy in

maintaining the move for teacher performance assessment and measures in the

Philippines higher education institutions because of the complexity of its structure

(involvement in research and professional development). However, the present

study pushed these frontiers first by providing an instrument evidenced to beappropriate and implemented the possibility of proper assessment practices among

higher education faculty.


15/17



ReferencesAllison-Jones, L. L., & Hirt, J. B. (2004). Comparing the teaching effectiveness of

part-time and full-time clinical nurse faculty. Nursing Education

Perspectives, 25, 238-242.

Andrich, D. (1978). A rating formulation for ordered response categories.Psychometrika, 43, 561-73.

Anonymous. (2006). Standards-based teacher evaluations. Gifted Child Today, 29,

8-9.

Atwood, C. H., Taylor, J. W., & Hutchings, P. A. 2000. Why are chemists and

other scientists afraid of the peer review of teaching? Journal of Chemical

Education, 77, 239-244.

Bandalos, D. L. (2002). The effects of item parceling on goodness-of-fit and

parameter estimate bias in structural equation modeling. Structural

Equation Modeling, 9, 78-102.

Berdrow, I., & Evers, F. T. (2009). Bases of competence: an instrument for self and

institutional assessment. Assessment and Evaluation in Higher Education,35, 419-434.

Bernstein, D., & Bass, R. 2005. The scholarship of teaching and learning.

Academe, 91, 37-44.

Blackmore, J. A. (2005). A critical evaluation of peer review via teaching

observation within higher education. The International Journal of

Educational Management, 19, 215-320.

Bruce, C. D., & Ross, J. A. (2008). A model for increasing reform implementation

and teacher efficacy: Teacher peer coaching in grades 3 and 6 mathematics.

Canadian Journal of Education,31, 346-370.

Calma, A. (2010). The context of research training in the Philippines: Some key

areas and their implications. The Asia-Pacific Education Researcher, 18,167-184.

Carter, V. K. (2008). Five steps to become a better peer reviewer. College

Teaching,56, 85-90.

Centra, J. A. (1998). The development of the student instructional report II.

Princeton, New Jersey: Educational Testing Service.

Danielson, C. (1996). Enhancing professional practice: A framework for teaching.

Alexandria, VA: Association for Supervision and Curriculum Development.

Goldstein, J. (2003). Making sense of distributed leadership: The case of peer

assistance and review. Educational Evaluation and Policy Analysis, 25, 397-

421.

Goldstein, J. (2004). Making sense of distributed leadership: The case of peerassistance and review. Educational Evaluation and Policy Analysis,26, 173-

197.

Gosling, D. (2002). Models of peer observation of teaching. LTSN Generic Centre.

Graves, G., Sulewski, C. A., Dye, H. A., Deveans, T. M., Agras, N. M., & Pearson,

J. M. (2009). How are you doing? Assessing effectiveness in teaching

mathematics. Primus, 19, 174-193.


16/17



Heckert, T. M., Latier, A.., Ringwald, A., & Silvey, B. (2006). Relation of course,

instructor, and student characteristics to dimensions. College Student

Journal, 40, 1-11.

Howard, F. J., Helms, M. M., & Lawrence, E. P. (1997). Development and

assessment of effective teaching: an integrative model for implementation in

schools of business administration. Quality Assurance in Education,5, 159-161.

Keig, L. (2000). Formative peer review of teaching: Attitudes of faculty at liberal arts

colleges toward colleague assessment. Journal of Personnel Evaluation in

Education,14, 67-87.

Kell, C., & Annetts, S. (2009). Peer review of teaching embedded practice or policy-

holding complacency? Innovations in Education and Teaching

International,46, 61-70.

Kerchner, C. T., & Koppich, J. E. (1993).A union of professionals: Labor relations

and education re-form. New York: Teachers College Press.

Kumrow, D., & Dahlem, B. (2002). Is peer review an effective approach for

evaluating teachers? The Clearing House,75, 236-240.Linacre, J. M., & Wright, B. D. (1998). A user's guide to Winsteps, Bigsteps, and

Ministeps: Rasch-model computer programs. Chicago: MESA Press.

Louis, K. S., & Marks, H. M. (1998). Does professional community affect the

classroom? Teachers work and student experience in restructuring schools.

American Journal of Education,106, 532-575.

Magno, C. (2009a). A metaevaluation study on the assessment of teacher

performance in an assessment center in the Philippines. The International

Journal of Educational and Psychological Assessment, 3, 75-93.

Magno, C. (2009b). Demonstrating the difference between classical test theory and

item response theory using derived test data. The International Journal of

Educational and Psychological Assessment, 1, 1-11.Magno, C., & Sembrano, J. (2007). The Role of teacher efficacy and characteristics

on teaching effectiveness, performance, and use of learner-centered

practices. The Asia-Pacific Education Researcher, 16, 73-91.

Magno, C., & Sembrano, J. (2010). Integrating learner-centeredness and teaching

performance in a theoretical model. International Journal of Teaching and

Learning in Higher Education, 21(2), 158-170.

Marsh, H. W., & Bailey, M. (1993). Multidimensional students' evaluations of

teaching effectiveness. The Journal of Higher Education, 64, 1-18.

McCombs, B. L. (1997). Self-assessment and reflection: Tools for promoting

teacher changes toward learner-centered practices. NASSP Bulletin,81, 1-

14.McLymont, E. F., & da Costa, J. L. (1998, April). Cognitive coaching the vehicle for

professional development and teacher collaboration. Paper presented at the

annual meeting of the American Educational Research Association, San

Diego, CA.

Oakland, T., & Humbleton, R. K. (2006). International perspectives on academic

assessment. New York: Springer.


17/17



Pike, C. K. (1998). A validation study of an instrument designed to measure

teaching effectiveness.Journal of Social Work Education,34, 261-272.

Reddy, Y. M., & Andrade, H. (2010). A review of rubric use in higher education.

Assessment and Evaluation in Higher Education,35, 435-448.

Reid, E. S. (2008). Mentoring peer mentors: Mentor education and support in the

composition program. Composition Studies,36, 1-31.Ross, J. A., McDougall, D., & Hogaboam-Gray, A. (2002). Research on reform in

mathematics education, 1993-2000. Alberta Journal of Educational

Research,48, 122-138.

Scriven, M. (1994). Duties of as teacher. Journal of Personnel Evaluation in

Education, 8, 151-184.

Stiggins, R. (2008). Assessment for learning, the achievement gap, and truly

effective schools. Portland, OR: ETS Assessment Training Institute.

Stolle, C., Goerss, B., & Watkins, M. (2005). Implementing portfolios in a teacher

education program. Issues in Teacher Education,14, 25-34.

Stringer, M., & Irwing, P. (1998). Students' evaluations of teaching effectiveness: A

structural modelling approach. British Journal of Educational Psychology,68, 409-511.

Tang, L. T. (1997). Teaching evaluation at a public institution of higher education:

Factors related to the overall teaching effectiveness. Public Personnel

Management,26, 379-380.

Wen, M. L., Tsai, C., & Chang, C. (2006). Attitudes towards peer assessment: a

comparison of the perspectives of pre-service and in-service teachers.

Innovations in Education and Teaching International, 43, 83-93.

Wray, S. (2008). Swimming upstream: Shifting the purpose of the an existing

teaching portfolio requirement. Professional Educator, 32, 1-17.

Yon, M., Burnap, C., & Kohut, G. 2002. Evidence of effective teaching:

Perceptions of peer reviewers. College Teaching, 50, 104-111.Young, S., & Shaw, D. G. 1999. Profiles of effective college and university teachers.

The Journal of Higher Education,70, 670-687.

About the AuthorDr. Carlo Magno is presently a faculty of the Counseling and Educational

Psychology Department at De La Salle University, Manila. Most of his research

focuses on the development of different forms of teacher assessment protocols. He

is involved with several projects that involve assessment of teacher competencies inthe Philippines. Further correspondence can be addressed to him at the College of

Education, De La Salle University, 2401 Taft Ave., Manila, Philippines, e-mail:

[email protected].

assessing higher education teachers through peer assistance and review

Documents