A Comparative Analysis of Efficacy Among Three Writing Scoring Rubrics for EFL Teacher-Raters
- 주제(키워드) writing assessment , EFL writing assessment , scoring rubrics , rating scales , performance test
- 발행기관 고려대학교 대학원
- 지도교수 최인철
- 발행년도 2015
- 학위수여년월 2015. 2
- 학위구분 박사
- 학과 일반대학원 영어교육학과
- 세부전공 영어교육학과
- 원문페이지 174 p
- 실제URI http://www.dcollection.net/handler/korea/000000058278
- 본문언어 영어
- 제출원본 000045825729
초록/요약
The present study is to investigate the efficacy of Evidence Anchored Rating Scales (EARS), which is a newly-designed alternative rating scale for EFL secondary level test-takers’ writing performance, in three areas: reliability, pedagogical validity and practicality. In order to do so, EARS was empirically developed by five teacher-raters recruited for this study, and its efficacy was examined in comparison with an alternative binary-choice type scale, EBB and the conventional grid-pattern rating scale, TABLE. Mixed research methods involving both quantitative and qualitative analyses were attempted with an aim to gain in-depth observations and guarantee a thorough investigation into the efficacy of EARS. Psychometrical or statistical analyses of the scores were conducted using Cronbach Alpha, the multi-facet Rasch model and Generalizability Theory. Qualitative analysis was made of the two data sources: 1) the responses from the Questionnaire for Raters and 2) the Think Aloud Protocols (TAPs) by the raters. One interesting finding is that alternative scales designed for a specific group of examinees were found to surpass the conventional, generic scale in terms of reliability. This suggests that scoring scales developed through analysis of actual writing samples by a specific group of examinees can measure those writing samples more reliably. EARS showed a high degree of score reliability by precluding raters’ idiosyncratic interpretations through transparency between scale descriptors and performance data, which also yields objective and justifiable scores. Its micro-level text-based descriptors are expected to provide maximized pedagogical validity since they can directly serve as concrete feedback for examinees’ writing improvement. It is also concluded that dichotomized dimension, e.g., Within Sentence and Across Sentences, better fits EFL secondary level examinees at a developmental stage. However, as opposed to the hypothesis that evidence-based decision-making would ease raters’ cognitive burden and streamline the scoring process, participant raters considered EARS to be the most timeconsuming scale. The weakness EARS has in terms of practicality could be improved by adopting thorough rater training and/or employing a computerized text analyzer such as Coh-Metrix (Graesser, McNamara, & Louwerse, 2003) as an additional toolkit.
more목차
CHAPTER 1 1
INTRODUCTION 1
CHAPTER 2 7
RESEARCH BACKGROUND 7
1. Rater behavior 7
1.1. Positive view toward rater training 8
1.2. Negative view toward rater training 9
2. Assessment criteria 11
2.1. Effort to specify criteria 12
2.2. Empirical approach to develop assessment criteria 15
2.3. Empirical approach in Korean EFL context 18
3. Rating scale format 21
3.1. Checklist format 21
3.2. Binary choice format 24
3.3. Revised type of binary choice format 28
3.4. Experimental format, EARS 31
3.5. Automated scoring 34
CHAPTER 3 38
RESEARCH METHOD 38
1. Research Questions 39
2. Participants 39
3. Instruments 42
4. Procedure 43
5. Analysis 48
CHPATER 4 54
RESULTS 54
1. Reliability 54
1.1. Comparison of Cronbach Alpha coefficients 54
1.2. Results of the multi-facet Rasch analysis 57
1.2.1. Inter-rater reliability 62
1.2.2. Intra-rater reliability 64
1.2.3. Severity gap among raters 68
1.3. Generalizability theory 70
1.3.1. G-study 71
1.3.2. D-study 74
2. Pedagogical Validity 78
2.1. Descriptive analysis of the questionnaire for raters 78
2.1.1. Relevancy to the target examinees 79
2.1.2. Measurement accuracy 81
2.1.3. Usefulness of Feedback 82
2.2. Descriptive analysis of Think Aloud Protocols by raters 84
2.2.1. TAPs of TABLE 85
2.2.2. TAPs of EBB 88
2.2.3. TAPs of EARS 90
2.2.4. The size of common references adopted 94
3. Practicality 96
3.1. Descriptive analysis of the questionnaire on practicality 97
3.2. Possibility to use a computational tool 99
CHAPTER 5 104
DISCUSSION 104
1. EARS vs. TABLE 106
2. EARS vs. EBB 110
3. Practicality issue of EARS 113
CHAPTER 6 116
CONCLUSION 116
1. Summary 116
2. Implications 119
3. Limitations 123
REFERENCES 125
APPENDICES 143

