Cornell University Teaching Evaluation Handbook
Third Edition, 1997Table of Contents
Chapter III-Supporting Data: Collection and Presentation
To increase the efficiency of the review process for all involved, it is imperative that the supporting data be collected in a way that reduces bias, is representative, and yields valid assessment. It is also important to understand that the manner in which the data are presented can itself bias the evaluative outcome. Fortunately this is an area that has stimulated abundant research. This chapter will provide a synthesis of some of that work, focusing on two major sources of evaluation data: students and peers.
Use of Student Evaluation Data
It has been argued that students are not valid sources of evaluation information,
that their numerical and written responses on questionnaires used to make
tenure and promotion decisions are based on superficial criteria like
appearance and popularity. This assumption has not been empirically supported.
"Based on the findings of the meta-analysis, we can safely say that
student ratings of instruction are a valid index of instructional effectiveness.
Students do a pretty good job of distinguishing among teachers on the
basis of how much they have learned. Thus, the present study lends support
to the use of ratings as one component in the evaluation of teaching effectiveness.
Both administrators and faculty should feel secure that to some extent
ratings reflect an instructor's impact on students."1
Key in the use of student data is the notion that as a data source it
is only one component available for a committee to make an informed judgment.
When incorporated into a thorough analysis, student evaluation data are
useful not only because they represent the learner's perspective, but
they can stand to round off a picture of a candidate's teaching quality
when presented in relationship to peer data and data supplied from the
candidate's own perspective.
Student data can be solicited and presented in both quantitative and qualitative forms. Quantitative data, in the form of numerical student evaluation questionnaire scores, were the more prevalent form in the tenure files analyzed during the preparation of the report, Evaluation and Recognition of Teaching. Qualitative student evaluation data, in the form of written letters of evaluation, appeared in less abundance, although when such data were provided they took up a lot of space in the file-in one case a tenure file contained 88 student evaluation letters.
The potential to statistically manipulate quantitative data is a mixed blessing. Assuming the data were gathered properly-using questionnaire items that were validated to ensure that what was being measured by the question was, in fact, what the question purported to measure-quantitative data can be very efficient in that many teaching factors from many individual perspectives can be presented in relatively little space. In the analysis of tenure files conducted for the report, many of the guidelines listed below were not followed, however, which resulted in a students' picture of the candidate's teaching that was not only limited but possibly biased.
Instrumentation
Research in the area of student evaluation of instruction has resulted
in the publication of more than 2,500 studies. Much has been learned about
proper questionnaire design. One finding is that the purpose of the evaluation
should determine the format and kinds of questions included in the evaluation
instrument. In general, summative evaluation questionnaires designed for
tenure and promotion decisions contain fewer items than formative questionnaires.
Summative instruments focus on global items ("Overall, how would
you rate the quality of the instructor's teaching?") and use evaluative
scales (Excellent, Good, Fair, Poor-or Strongly Agree, Strongly Disagree)
rather than frequency scales (Frequently, Somewhat Frequently, Rarely,
Never).
The use of "core" items allow an individual's scores to be compared to scores determined from a group aggregation, such as interdepartmental, or across a college's faculty. Core items are more generic aspects of teaching that are not influenced as much by course design or size. Core items enable the development of normative scores so an individual can be validly compared to his or her peers. Examples of such core items that have been validated through controlled quantitative methods include the following:
The instructor is well prepared for class.
The instructor has a thorough knowledge of the subject.
The instructor communicated his/her subject well.
The instructor stimulated interest in the course subject.
The instructor is one of the best Cornell teachers I have known.
The instructor clearly interprets abstract ideas and theories.
The instructor demonstrates a favorable attitude toward students.
The instructor is willing to experiment and be flexible.
The instructor encourages students to think for themselves.
Administration of Questionnaires
Research on questionnaire validity suggests that if the following guidelines
are followed for administering student evaluation questionnaires, reliability
and validity of results will be improved.
response format should be clear and consistent
students should remain anonymous
students should be given adequate time to complete the questionnaire
students should not be allowed to discuss their ratings while they are being administered
questionnaires should be administered during the last 2 weeks of semester (but not the last day and not during or after an exam
someone other than the one being evaluated should administer the questionnaire, or at the very least, the one being evaluated should leave the room
a student should collect the questionnaires and mail them to an independent office for scoring
80% minimum attendance of the student population in a course is necessary on the day an evaluation is administered
don't use a numeric questionnaire in courses with fewer than 10 students (use open-ended, written response items instead)2
Reporting scores
How summative evaluation scores are reported in a tenure file or in the
tenure/promotion process can bias that process, either positively or negatively.
Some general principles for proper questionnaire score reporting include:
report frequency distribution for each item
don't carry mean scores beyond one decimal place
multiple sets of scores should be provided for each type of course (survey, lab, seminar) and collected over a period of time
narrative (qualitative) data from the candidate, colleagues or chair about the contextual circumstances of the quantitative student rating scores is an aid in their interpretation.
normative data sets should be established yearly for course type (elective, required, lecture, lab, etc.) on both a department level and college level for comparison with a tenure candidate's own scores.
appropriate normative data should be provided wherever possible
Larger scale version of figure
Figure 8 is an example of a visually clear way of reporting a candidate's
relative standing in relation to departmental normative data.
Figure 8
Larger scale version of figure
Qualitative data is less generalizable and harder to aggregate because it is in a more open-ended form. Its potential bulkiness can be reduced through a synthesis by an objective individual familiar with the case, such as a department head. For others who must review this kind of synthesized data and who are less familiar with the candidate's situation, such as deans and administrators outside of the department, a supplementary reflective statement from the candidate synthesizing the student letters can be useful in concert with the department head's report. If these reports are well written and address major developmental issues in the candidate's teaching practice, the time necessary to write them is well justified, especially if their creation leads to improved practice. The work of synthesizing can be spread out over time, on a year-by-year basis, as part of an annual review process.
During the preparation of the report, Evaluation and Recognition of Teaching, the deans of the colleges were interviewed. One dean raised the issue of anonymity of student evaluation data. Quantitative questionnaire scores, combined with letters of recommendation, provide a good balance of general and specific information. Letters in their original form, however, do not preserve the anonymity of the student. While students, either undergraduate or graduate, are still working with the candidate, they are in what one dean called the candidate's "power web." This may prevent students from being as candid in their written remarks if they know they may be identified at some point by the candidate during the tenure decision process. If letters by students are returned to someone other than the candidate-the department head or ad hoc chair, for example-and if they are then keyboarded on a computer and summarized by an independent person (a member of a departmental standing committee on teaching) and students are informed that these precautions are being taken when they are asked to write their letters, the validity of their responses will be enhanced.
An example of a department chair's synthesis of relevant comments from undergraduate student reviewers who were asked to write letters of recommendation is included below.
undergraduates uniformly describe him as an unusually effective, conscientious, enthusiastic teacher who enables students to do their best work, master difficult subject-matter, and gain confidence in their own intellectual abilities.
This [student quote from a review letter] clear and convincing testimony describes the experience of all the students who wrote to us from the courses he taught in spring 1988 and in fall 1989. Since the most disturbing aspect of some of the student responses two years ago was the suggestion that he could be authoritarian and coercive in his teaching, we are reassured by all these letters which suggest precisely the opposite.
It seems clear that like many young assistant professors [candidate] was too demanding in his first dealings with graduate students, imposing admirable but often excessive standards of professionalism both in the classroom and as a special committee member, and expecting his students to share his commitment to his own projects. As the letter from [student] suggests, however, he has since become more realistic and flexible. And all the letters attest that he is always extremely conscientious and helpful.
One should conclude, I think that [candidate] is an intellectually stimulating and enabling graduate teacher, with an expertise and commitment that many of our students find particularly valuable, one who has had trouble finding the appropriate mode in which to exercise authority, but who has now learned to do so.3
The usefulness and reliability of student letters of evaluation, whether undergraduate or graduate, can be improved if specific criteria are communicated when letters are solicited to help focus the students. If the students are all requested to respond to the same questions, reliability will be enhanced and it will be easier to summarize all the letters. The following is an example of the kinds of questions about teaching that can be used to aid students in writing evaluation letters:
1. Factual Knowledge: how well did the candidate help you acquire and integrate new terms, information and methods? Please give explicit examples where possible.
2. Concepts and Principles: how well did the candidate organize the material covered into a comprehensive whole? Were important concepts and principles from theory interrelated? Please give explicit examples where possible.
3. Application: Do you feel that the candidate's teaching and course structure enabled you to apply what you learned in the course to concrete problems? Were you able to generalize beyond the text? Please give explicit examples where possible.
4. Motivation: Did you feel the candidate was sufficiently motivated about the subject matter to excite your own interest in it? Describe how the candidate communicated a sense of enthusiasm about teaching.
5. Self Understanding: To what degree did the candidate help you become more aware of yourself as a learner? Describe specific experiences where the candidate contributed to your feeling empowered in your ability to learn.
6. Improvement of Instruction: Did the candidate seek out information from you and experiment with ways of improving his or her teaching? To what degree was the candidate open to feedback on improving the course? How confident are you in the candidate's ability to continually develop as a teacher? Please be as specific as possible.
To avoid biasing faculty opinion of a candidate's teaching effectiveness, student letters, in any form, summarized or not, should not be available to the voting faculty until all file data on both teaching and research has been assembled into the tenure file. This is true for all data: everyone voting on the candidate should have the same data base to make an informed and unbiased decision.
Peer Data
Evaluation of the candidate's teaching by peers is a practice that has
become more prevalent in tenure and promotion decisions over the last
20 years (Seldin, 1984). During that time peer review has taken on an
increasingly significant role in the tenure process. Effective peer review
depends to a large degree on the explicitness of the criteria by which
candidates are to be judged. Colleagues and peers are necessary contributors
to evaluation of a tenure candidate's teaching. They are best qualified
to evaluate the candidate's breadth and depth of subject matter knowledge,
course design skills, and assessment strategies for determining students'
learning the material. The information necessary for colleagues and peers
to evaluate these kinds of skills must be thorough without being redundant.
The candidate can help in peer evaluation by supplying the kind of information
described in chapter 2; however, colleagues from within the institution,
both within the candidate's own department and outside it, and peers outside
the candidate's institution who represent the discipline, will be required
to provide their own data.
To be most effective, the peer evaluation process should be neutral, open, relatively unthreatening and structured, all of which can be enhanced through the use of standardized rating and observation procedures and criteria. Standardization is a precaution stimulated by the evidence that colleagues' ratings may not be statistically reliable. In one study (Centra, 1979), the average correlation between colleagues was .26 per item. Another study4 revealed the potential for positive bias of peer evaluation: fifty-four teachers were evaluated on the basis of two classroom visits by each of three different colleagues, which showed that 94 percent of all ratings were in the top two categories of a 5-point scale.
Several factors are critical in ensuring a valid and fair peer review process. What questions are asked and answered by the reviewers is central. Some kind of replicable protocol is necessary to ensure fairness and accountability for the process. This is true for whatever data is being reviewed, whether course materials, classroom performance or student learning. Developing a set of questions to focus the reviewer can make the task less arbitrary and subjective. The entire review process by peers should be governed by a set of procedures established within the department. Examples of such procedures include the following:
peer ratings should be used in conjunction with student ratings . . . dimensions [of teaching] should be decided upon in advance . . . [the] procedure should guarantee the anonymity and independence of the rater . . . at least three colleagues be chosen to rate an instructor's teaching . . . these raters . . . may come from . . . an elected committee of the college faculty whose function is to evaluate teaching. . . . raters do not meet together and preferably do not know who else is involved in the evaluation process. Rather, each judge independently rates the instructor on the preselected dimensions and submits the ratings to the dean [or department head], who then compiles a pooled rating for each dimension.5
Developing these procedures and the questions used to review the candidate can be a useful accomplishment of a departmental standing committee on teaching.
Qualification of Peer Reviewers
How peer reviewers are selected is another critical factor in establishing validity in peer review. No one should be placed in a position to review or observe a colleague for tenure or promotion decisions who is not qualified to carry out that task. A consistent finding of peer observation studies is that observers should have some kind of training that prepares them for this responsibility. Peers typically are capable of evaluating subject matter knowledge, what must be taught by the candidate, whether the appropriate methodology is being employed for teaching specific content areas, the degree to which the candidate has applied adequate and appropriate evaluation techniques for course objectives, and the degree to which professional behavior has been exhibited according to current ethical standards. The following questions may be useful to undertake a specific peer review task:
Do you believe you can properly judge the teaching-learning process in the classroom visited?
Would you recommend this instructor to students advised by you? Why or why not?
What specific changes are needed to strengthen teaching performance?
How would you rate this instructor against others teaching similar courses in the department?
Footnotes 1. Peter Cohen
(1981). " Student Ratings of Instruction and Student Achievement:
A Meta-analysis of Multisection Validity Studies." In Review of
Educational Research, 51,no. 3: 305.
2. G.R. Sell and N. Chism (l988). Assessing Teaching Effectiveness
for Promotion and Tenure: A Compendium of Reference Materials, Center
for Teaching Excellence (Columbus, Ohio: Ohio State University), 5-6.
3. A Report of the Select Committee, Jan. 14, 1992. Evaluation and
Recognition of Teaching Appendices, (Ithaca, N.Y.: Cornell University),
21.
4 . J. A. Centra (1975). "Colleagues as Raters of Classroom Instruction"
Journal of Higher Education 46: 327-337.
5. Peter Cohen and Wilbert McKeachie. "The Role of Colleagues in
the Evaluation of College Teaching"Improving College and University
Teaching. 28 (4): 150.

