tugas language testing

Upload: nur-alfi

Post on 09-Oct-2015

58 views

Category:

Documents


0 download

DESCRIPTION

Language testing

TRANSCRIPT

NUR ALFI LAELA20400112016PBI 1

CHAPTER 3: DESIGNING CLASSROOM LANGUAGE TESTS

In this chapter we will examine tests type to begin the process of designing tests or revising existing ones.

TEST TYPESDetermine the purpose for the test is the first task that we will face in designing a test for students so that we as a teacher can focus on the specific objectives of the test.

Language Aptitude TestsA language aptitude test is designed to measure capacity or general ability to learn a foreign language. Language aptitude tests are ostensibly designed to apply to the classroom learning of any language. MLAT (Modern Language Aptitude Test) task consists of five different tasks: Number learning, phonetic script, spelling clues, word in sentence, and paired associates. Virtually, theres no unequivocal evidence that language aptitude test predict communicative success in a language. Moreover, any test that claims to predict success in learning a language is undoubtedly flawed.

Proficiency TestsThe purpose of proficiency test is to test global competence in a language. A proficiency test is not limited to any one course, curriculum, or single skill in the language; rather, it test overall. It includes: standardized multiple choice items on grammar, vocabulary, reading comprehension, and aural comprehension. Proficiency test are almost always summative and norm referenced. These kinds of tests are usually not equipped to provide diagnostic feedback. One of a standardized proficiency test is TOEFL.

Placement TestsThe purpose of placement test is to place a student into a particular level or section of a language curriculum or school. It usually includes a sampling of the material to be covered in the various courses in a curriculum. In a placement test, a student should find the test material neither too easy nor too difficult but appropriately challenging. Placement tests come in many varieties: assessing comprehension and production, responding through written and oral performance, multiple choice, and gap filling formats. One of the examples of Placement tests is the English as a Second Language Placement Test (ESLPT) at San Francisco State University.

Diagnostic TestsThe purpose is to diagnose specific aspects of a language. These tests offer a checklist of features for the teacher to use in discovering difficulties. It should elicit information on what students need to work in the future; therefore the test will typically offer more detailed subcategorized information on the learner. For example, a writing diagnostic test would first elicit a writing sample of the students. Then, the teacher would identify the organization, content, spelling, grammar, or vocabulary of their writing. Based on that identifying, teacher would know the needs of students that should have special focus. A typical diagnostic test of oral production was created by Clifford Prator (1972) to accompany a manual of English pronunciation. Test-takers are directed to read a 150-word passage while they are tape recorded. The test administrator then refers to an inventory of phonological items for analyzing a learners production. After multiple listening, the administrator produces a checklist for errors in five separate categories. The main categories include: Stress and rhythm, Intonation, Vowels, Consonants, and Other factors.

Achievement TestsAn achievement test is related directly to classroom lessons, units, or even a total curriculum. The purpose of achievement tests is to determine whether course objectives have been met with skills acquired by the end of a period of instruction. Achievement tests should be limited to particular material addressed in a curriculum within a particular time frame. Achievement tests belong to summative because they are administered at the end on a unit/term of study but effective achievement tests can serve as useful wash back by showing the errors of a students and helping them analyze their weaknesses and strengths.Achievement tests are often summative because they are administered at the end of a unit or term of study. The specifications for an achievement test should be determined by : The objectives of the lesson, unit, or course being assessed The relative importance (or weight) assigned to each objective The tasks employed in classroom lessons during the unit of time.

SOME PRACTICAL STEPS TO TEST CONSTRUCTIONSome practical steps in constructing classroom tests:1) Assessing Clear, Unambiguous Objective.In this part, you should know the purpose of the test you are creating and what do you want to test should be according with curriculum objectives, of course, you cannot possibly test each one of the objectives but choose a possible one in order to evaluate it.2) Drawing Up Test SpecificationIn this criterion, you have to create a practical outline of your test, what skills you test, and what the items will look like. So the objectives must be according to the skill that is going to be examined.3) Devising Test TasksAs you devise your test items, consider such factors as how students will perceive them (face validity), the extent to which authentic language and contexts are present, potential difficult caused by cultural schemata, the length of the listening stimuli, how well a story line comes across, how things like the cloze testing format will work, and other practicalities.4) Designing multiple choice test itemsHoghes (2003, pp. 76 78) cautions against a number of weaknesses of multiple choice items : The technique tests only recognition knowledge Guessing may have a considerable effect on test scores The technique severely restricts what can be tested It is very difficult to write successful items Wash back may be harmful Cheating may be facilitated

If the objective is to design a large-scale standardized test for repeated administrations, then a multiple choice format does indeed become viable. A primer on terminology:1. Multiple choice items are all receptive, or selective, response items in that the test-taker chooses from a set of responses (commonly called a supply type of response) rather than creating a response. Other receptive item types include true-false questions and matching lists. (In the discussion here, the guidelines apply primarily to multiple-choice item types and not necessarily to other receptive types).2. Every multiple-choice item has a stem, which presents a stimulus, and several (usually between three and five) options or alternatives to choose from.3. One of those options, the key, is the correct response, while the others serve as distractions.Since there will be occasions when multiple-choice items are appropriate, consider the following four guidelines for designing multiple-choice items for both classroom based and large-scale situations (adapted from Gronlund, 1998, pp. 60-75 and J.D. Brown, 1996, pp. 54-57)1. Design each item to measure a specific objective. 2. State both stem and options as simply and directly as possible.3. Make certain that the intended answer is clearly the only correct one.4. Use item indices to accept, discard, or revise items.

SCORING, GRADING, AND GIVING FEED BACK

ScoringScoring plan reflects the relative weight that you place on each section and items in each season.

Grading Grading doesnt mean just giving A for 90-100, and a B for 80-89. Its not that simple. How you assign letter grades to this test is a product of the country, culture, and context of the English classroom, institutional expectations (most of them unwritten),\ explicit and implicit definitions of grades that you have set forth, the relationship you have established with the class, and student expectations that have been engendered (cause) in previous test and quizzes in the class.

Giving Feedback Feedback should become beneficial wash back. Those are some examples of feedback:1. A letter grade2. A total score3. Four sub scores (speaking, listening, reading, writing)4. For the listening and reading sections An indication of correct/incorrect responses Marginal comments5. For the oral interview Scores for each element being rated A checklist of areas needing work A post-interview conference to go over the results6. On the essay Scores for each element being rated A checklist of areas needing work Marginal and end of-essay comments, suggestions A post-test conference to go over work A self-assessment7. On all or selected parts of the test, peer checking of results8. A whole-class discussion of results of the test9. Individual conferences with each student to review the whole test 1. A latter grade

In this chapter, guidelines and tools were provided to enable you to address the five questions about (1) how to determine the purpose or criterion of test, (2) how to state objectives, (3) how to design specifications, (4) how to select and arrange test tasks, including evaluating those task with item indices, and (5) how to ensure appropriate washback to the student.

CHAPTER 4: STANDARDIZED TESTING

This chapter has two goal: to introduce the process of constructing, validating, administering, and interpreting standardized tests of language; and to acquaint you with a variety of current standardized tests that claim to test overall language proficiency.

WHAT IS STANDARDIZATION?A standardized test presupposes certain objectives, or criteria, that are held constant across one form of the test to another. They measure a broad band of competencies that are usually not exclusive to one particular curriculum. They are norm-referenced and the main goal is to place test-takers on their relative ranking. Standardized tests are used to assess progress in schools (child's academic performance), ability to attend institutions of higher education, and to place students in programs suited to their abilities. The example of them are TOEFL, IELTS, and etc. The tests are standardized because they specify a set of competencies for a given domain and through a process of construct validation they program a set of tasks.

ADVANTAGES AND DISADVANTAGES OF STANDARDIZED TESTAdvantages standardized testing are a ready-made previously validated product that frees the teacher from having to spend hours creating a test, it can be administered to a large groups in a time limit, and it is also easy to score (computerized or hole-punched grid scoring). Moreover it has face validity (to measure test-taker knowledge: better or worse)Disadvantages center largely on inappropriate use of such tests, for example, using an overall proficiency test as an achievement test simply because of the convenience of the standardization. Standardized tests are ultimately not a very good measure of individual student performance and intelligence, because the system is extremely simplistic. A standardized test can measure whether or not a student knows when Sangkuriang was written, for example, but it cannot determine whether or not the student has absorbed and thought about the larger issues surrounding the historical document.

DEVELOPING A STANDARDIZED TESTKnowing how to develop a standardized test can be helpful to revise an existing test, adapt or expand an existing test, create a smaller-scale standardized test. In the steps outlined below, three different standardized tests will be used to exemplify the process of standardized test design:(A) The Test of English as a Foreign Language (TOEFL) test of general language ability or proficiency.(B) The English as a Second Language Placement Test (ESLPT), San Francisco State University (SFSU) placement test at a university.(C) The Graduate Essay Test (GET), SFSU gate-keeping essay test.

1. Determine the purpose and objectives of the testStandardized tests are expected to provide high practically in administration and scoring without unduly compromising validity. (A) TOEFL : To evaluate the English proficiency of people whose native language is not English. Most of colleges and universities in the US use TOEFL score to admit or refuse international applicants for admission.(B) ESLPT To place already admitted students at SFSU in an appropriate course in academic writing, oral production and grammar-editing. To provide teachers with some diagnostic information about their students.(C) GET To determine whether their writing ability is sufficient to permit them to enter graduate-level courses in their programs and it is offered beginning of each term.

2. Design test specification(A) TOEFL Because TOEFL is a proficiency test, the first step is to define the construct of language proficiency. After breaking language competence down into subset of 4 skills each performance mode can be examined on a continuum of linguistic units: pronunciation, spelling, word, grammar, discourse, pragmatic features of language. Oral production tests can be tests of overall conversational fluency or pronunciation of a particular subset of phonology and can take form of imitation. Listening comprehension test can focuses on a particular feature of language or on overall listening for general meaning. Reading test aims to test comprehension of long/short passages, single sentences, phrases, and words. Writing section tests writing ability in the form of open-ended (free composition) or it can be structured to elicit anything from correct spelling to discourse-level competence.(B) ESLPT Designing test specs for ESLPT was simpler tasks because the purpose is placement and construct validation of the test consisted of an examination of the content of the ESL courses. In recent revision of ESLPT, content & face validity was the central theoretical issues to be considered. Then, the major issue centered on designing practical and reliable task and item response formats. The specification mirrored reading-based and process writing approach used in the courses.(C) GET Specifications for GET are the skills of writing grammatically and rhetorically acceptable prose on a topic, with clearly produced organization of ideas and logical development. The GET is a direct test of writing ability in which test-takers must write an essay on a given topic in a two-hour time period.

3. Design, select, and arrange test tasks/items(A) TOEFL Content coding: the skills and a variety of subject matter without biasing (the content must be universal and as neutral as possible) Statistical characteristic: it include IF and ID Before items are released into a form of TOEFL, they are piloted and scientifically selected to meet difficulty specifications within each subsection, section and the test overall.(B) ESLPT For written parts; the main problems are; selecting appropriate passages (conform the standards of content validity) providing appropriate prompts (they should fit the passages) processing data form pilot testing In the multiple-choice editing test; first (easier task) choose an appropriate essay within which to embed errors. A more complicated one is to embed a specified number errors from a previously determined taxonomy of error categories.(Teacher can perceive the categories from student previous error in written work & students error can be used as distractors)(C) GET Topics are appealing and capable of yielding intended product of an essay that requires an organized logical arguments conclusion. No pilot testing of prompts is conducted. Be careful about the potential cultural effect on the numerous international students who must take the GET.4. Make appropriate evaluations of different kinds of itemsIF, ID and distractor analysis may not be necessary for classroom (one-time) test, but they are must for standardized multiple-choice test. For production responses, different forms of evaluation become important. (ex: practicality, reliability & facility) practicality: clarity of directions, timing of test, ease of administration & how much time is required to score reliability: a major player is instances where more than one scorer is employed and to a lesser extent when a single scorer has to evaluate tests over long spans of time that could lead to deterioration of standards facilities: key for valid and successful items. Unclear direction, complex language, obscure topic, fuzzy data, culturally biased information may lead to higher level of difficulty(A) TOEFL The IF, ID, and efficiency statistics of the multiple choice items of current forms the TOEFL are not publicly available information

(B) ESLTP The multiple-choice editing passage showed the value of statistical findings in determining the usefulness of items and pointing administrator toward revisions.(C) GET No data are collected from students on their perceptions, but the scorers have an opportunity to reflect on the validity of given topic

5. Specify scoring procedures and reporting formats(A) TOEFL Scores are calculated and reported for three sections of TOEFL a total score a separate score(B) ESLPT It reports a score for each of the essay section (each essay is read by 2 readers) Editing section is machined scanned It provides data to place student and diagnostic information Student dont receive their essay back(C) GET Each GET is read by two trained reader. They give scores between 1 to 4 Recommended score is 6 as threshold for allowing student to pursue graduate-level courses If the students gets score below 6, he either repeat the test or take a remedial course

6. Performing ongoing construct validation studiesAny standardized test must be accompanied by systematic periodic corroboration of its effectiveness and by steps toward its improvement(A) TOEFL The latest study on TOEFL examined the content characteristics of the TOEFL from a communicative perspective based on current research in applied linguistics and language proficiency assessment.(B) ESLPT The developments of the new ESLPT involved a lengthy process of both content and construct validation, along with facing such practical issues as scoring the written sections and a machine-scorable multiple-choice answer sheet.(C) GET At this time, there is no research to validate the GET itself. Administrators rely on the research on university level academic writing tests such as TWE. In recent years, some criticism of the GET has come from international test-takers who posit that the topics and time limits of the GET work to the disadvantage of writers whose native language is not English.

STANDARDIZED LANGUAGE PROFIENCY TESTINGTest of language proficiency presuppose a comprehensive definition of the specific competencies that comprise overall language ability. Swain (1990) offered multidimensional view of proficiency assessment by referring to the three linguistic traits (grammar, discourse, and sociolinguistic) that can be assessed by means of oral, multiple choice, and written responses.ACTFL association describing about proficiency in four levels: superior, advanced, intermediate, and novice. Within each level, description of listening, speaking, reading, and writing are provided as guidelines for assessment.