검색 상세

Validation of the Criterion-referenced English Assessment in the College Scholastic Ability Test (CSAT) in Korea

초록/요약

ABSTRACT Validation of the Criterion-referenced English Assessment in the College Scholastic Ability Test (CSAT) in Korea Wonhwa Seo This study is aimed at validating the English domain of the newly implemented Criterion-referenced College Scholastic Ability Test (CR-CSAT). As the English section of CSAT greatly influences on test-takers, English classrooms, and general English education in Korea, validation of the CSAT is required for successful implementation of the CR-CSAT. This study investigated the new criterion-referenced English test in the CSAT (CR-CSAT) in terms of the framework of test usefulness by Bachman and Palmer (1996). Usefulness consisted of six qualities: reliability, construct validity, authenticity, interactiveness, impact, and practicality. The study first explained the specifications of the test, including test environment, format, and language input, and evaluated its usefulness regarding five tests qualities of the CR-CSAT: reliability, construct validity, authenticity, interactiveness, and the impact on test-takers, teachers, and education. To evaluate construct validity of the CSAT, CSAT samples from 2014 to 2018 were selected and examined. Regarding lexical input over time, there were fluctuations observed in tokens, types, and type token ratio of all sections examined. Such fluctuations and inconsistencies have an adverse effect on the exam fairness and reliability as they fail to consider that the same test-taker who did well in one test could face failure in the next. The task types employed throughout the exams are very limited: both tests include multiple-choice questions which may enhance objectivity and reliability but fail to measure ‘true communicative skills.’ The language input of listening subtest was composed of mainly low level vocabulary while the vocabulary of reading subtest was high. To examine reliability of the CR-CSAT, the researcher conducted a number of analyses. Grade level and test score data from 1,226 students who participated in the simulated CSAT in June, 1,218 students who participated in the simulated CSAT administered in September, and 1,195 students who took the 2018 CSAT were analyzed. Analyses included qualitative analyses of test content and quality, a correlation analysis of the test scores, and an item analysis based on CTT and IRT collected from the two mock CSATs and the 2018 CSAT. Results of this study based on CTT and IRT analyses revealed that the difficulty and discrimination of the listening subtest items were very low whereas some reading test items were extremely difficult but their discrimination power was very low. These items need revision. Items with the low degree of difficulty and discrimination were allocated three points while some difficult items were assigned two points. As a result of this problematic score scaling, test-takers within a higher ability parameter obtained a lower scaled score than those within a lower ability parameter. It is desirable to revise test items which are extremely easy or difficult and inappropriately discriminate in order to design more reliable test items. Three native English-speaking instructors were invited to examine the test items of the CSATs from 2014 to 2018. The results revealed that although the CSAT had relatively high internal reliability, the consistency between the tests of different years and the test discriminability did not reach a satisfactory level. Some test items that require revision were found through both quantitative and qualitative item analyses. For the evaluation of the authenticity and interactiveness analysis, six raters were asked to score the degree of authenticity and interactiveness of test item types of the 2018 CSAT and indirect speaking type was found to have the lowest degree of authenticity and interactiveness. On the other hand, the item types of understanding details and understanding context in both listening and reading comprehension displayed high degrees of authenticity and interactiveness. Overall the reading subtests’ authenticity and interactiveness were higher than those of the listening subtest. Raters’ qualitative review supported the results of quantitative analysis. Concerning the representation of national curriculum, the subject matters of the listening subtest were confined to general and familiar topics whereas subject matters of the reading subtest were concentrated on specified academic topics such as social studies or natural sciences. A diversification of topics needs to be considered when designing the CSAT. Finally, a survey of teachers’ and students’ perceptions toward the 2018 CR-CSAT was conducted using a questionnaire survey collected from 372 high school students and 102 secondary school teachers. A descriptive analysis of the questionnaire results was completed. It indicated that the CSAT has substantial impacts on English teaching and learning in high schools. The survey results revealed that high school students and English teachers’ perception of the CSAT in terms of test usefulness: reliability, construct validity, authenticity, interactiveness, and impact. One noticeable result was that the two groups both did not think that the 2018 CSAT contributed the equality of English education, decreased private education cost, or changed language teaching and learning. From the qualitative analysis of the high school English teachers’ responses, the number of students who applied to English in after-school program reduced significantly after the implementation of the 2018 CSAT. Also, the number of English class hours decreased. English teachers who participated in the written interview provided alternatives for successful implementation of the CR-CSAT and normalization of the public English education. The findings from this study provide opportunities to explore pedagogical implications for entrance test developers and education policy makers in Korea in regards to exam quality and methods. The CSAT should be developed by including test items more related to the current national English curriculum and by fulfilling the need for construct validity, authenticity, and positive impacts on both candidates and society. It is important that the various voices of the stakeholders are known to the public in order to promote further discussion on how to improve test quality and to normalize public English education. This study will contribute to validate the first implemented 2018 CR-CSAT based on the Bachman & Palmer’s test usefulness framework. Finally, it is hoped that the study’s comprehensive investigation of the test qualities of construct validity, reliability, authenticity, interactiveness, and impact by analyzing quantitative and quantitative data will raise the awareness of different stakeholders of the CSAT.

more

목차

Validation of the Criterion-Referenced English Assessment in the College Scholastic Ability Test (CSAT) in Korea


CONTENTS

LIST OF TABLES vii
LIST OF FIGURES xiii

CHAPTER 1
INTRODUCTION
1. Context of the Study 1
2. Research Questions 6

CHAPTER 2
LITERATURE REVIEW
1. Norm-referenced vs. Criterion-referenced Tests 8
1.1 Norm-referenced Tests 9
1.2 Criterion-referenced Tests 10
2. The Concept of Test Usefulness 12
2.1 Reliability 13
2.2 Construct Validity 14
2.3 Authenticity 17
2.4 Interactiveness 18
2.5 Impact and Backwash 19
2.6 Practicality 21
3. Classical Test Theory (CTT) and Item Response Theory (IRT) 22
3.1 Classical Test Theory (CTT) 22
3.1.1 Item Facility 23
3.1.2. Item Discrimination 24
3.2 Item Response Theory (IRT) 27
3.2.1 Item Characteristic Curve (ICC) 28
3.2.2 IRT Assumptions 30
3.2.3 IRT Models 30
3.2.4 Comparison of True Score Estimation in CTT and IRT 34
4. Corpus-based Estimation of Difficulty 35
4.1 WordSmith 35
4.2 GSL Range 36
4.3 Readability 37
4.3.1. Flesch Reading Ease Formula & Flesch Kincaid Grade Level 37
4.3.2 The Gunning’s Fog Index 38
5. The Test Tasks of the CSAT and National Curriculum of High School English 39
6. Previous Studies 42
6.1 Studies on English Section in the College Scholastic Ability Test 42
6.2 Studies on the Criterion-referenced English Section of CSAT 46


CHAPTER 3
RESEARCH METHOD
1. Research Questions 49
2. Participants 50
2.1 A High school English Teacher and Three Native Instructors 52
2.2 Native English-Speaking Instructors and Korean Instructors 53
2.3 Students as Test-takers and Survey Respondents 54
2.4 High-school English Teachers as Survey Respondents 55
3. Materials 56
3.1 The College Scholastic Ability Tests (CSAT) and Simulated CSATs 56
3.2 Survey Questionnaire 57
3.3 Evaluation of Authenticity and Interactiveness in the CSAT 59
4. Data Collection Procedures 61
4.1 The CSATs and the Simulated CSATs 61
4.2 Test-takers’ True Scores and Grades 61
4.3 Item Analysis of the CSAT by Native English Instructors 62
4.3.1 Evaluation for Authenticity and Interactiveness of the 2018 CSAT 62
4.3.2 Qualitative Item Analysis 63
4.4 Survey Questionnaire 63
5. Data Analysis Criteria and Analysis Tools 64
5.1 Data Analysis Criteria 64
5.2 Data Analysis Tools 65
6. Data Analysis Procedures 68
6.1 Test Specification 68
6.2 Input Language Analysis 68
6.3 Evaluation of Authenticity and Interactiveness 69
6.4 Analysis of Survey Results 70

CHAPTER 4
RESULTS AND DISCUSSION
1. Construct Validity 72
1.1 Test Specification of the CR-CSAT 72
1.1.1 Characteristics of Setting 72
1.1.2 Characteristics of the Test Format 74
1.1.3 Characteristics of the Input (Corpus Analysis of the CSAT) 82
1.1.4 Expected Response 91
1.1.5 The relationship between Input and Expected Response 91
1.2 Internal Correlations among Test Tasks in 2018 CSAT 91
1.3 Qualitative Item Analysis of the CSAT 94
1.3.1 Example of a Problematic Answer Key 96
1.3.2 Example of Problematic Distractors 97
1.3.3 Example of a Problematic Passage 98
1.3.4 Example of Difficulty Level 100
1.3.5 Example of Inaccurate Grammar 101
2. Reliability 103
2.1 Descriptive Statistics of the Simulated CSATs and the 2018 CSAT 107
2.2 Correlational Analysis of the Test-takers’ CSAT and Mock Tests Grades 109
2.3 Results of the 2018 CSAT 110
2.3.1 Analysis of the 2018 CSATs based on CTT 112
2.3.2 Item Analysis of the 2018 CSAT by IRT 119
2.4 Analyses of the 2018 Simulated CSATs in June and September 129
3. Authenticity and Interactiveness 135
3.1 The Degree of Authenticity and Interactiveness 135
3.1.1 Inter-rater Reliability Analysis 136
3.1.2 Raters’ Analyses of Authenticity and Interactiveness of the Test Items in the CR-CSAT 137
3.2 Correspondence of the Task CSAT Characteristics to the National Curriculum 147
4. Impact 153
4.1 Students’ Survey 153
4.1.1 Demographic Information of Student Participants 153
4.1.2 Respondents’ Perception of the CSAT. 158
4.2 Teachers' Survey 175
4.2.1. Quantitative Analysis (Descriptive Statistics) 176
4.2.2 Qualitative Analysis: Teachers' Open-ended Responses 194
4.2.2.1 English Teachers’ Responses to Question 1 197
4.2.2.2 English Teachers’ Responses to Question 2 198
4.2.2.3 English Teachers’ Responses to Question 3 200
4.2.2.4 English Teachers’ Responses to Question 4 202
4.2.2.5 English Teachers’ Responses to Question 5 203



CHAPTER 5
CONCLUSION & IMPLICATIONS
1. Summary 206
2. Implications 213
3. Limitations and Further Study 217

References 219
Appendix A 230
Appendix B 238
Appendix C 240
Appendix D 242
Appendix E 261
Appendix F 275

more