The Concise Encyclopedia of Applied Linguistics. Carol A. Chapelle

Читать онлайн книгу.

The Concise Encyclopedia of Applied Linguistics - Carol A. Chapelle


Скачать книгу
the situation. In other words, examinees use functional knowledge to express language functions in context. Consider how the proposition I'm Italian is used to convey two contextualized language functions, as shown in Figure 6.

image image image

      In sum, the ability to perform real‐world competencies depends on the learners' semantico‐grammatical knowledge and their ability to use context to accurately form utterances that communicate not only propositions, but also contextualized pragmatic meanings. From an assessment perspective, these components are all implicated in real‐life language use. However, depending on the purpose of the test, these components can also be measured separately (see Grabowski, 2009; Kim, 2009), especially when finer grained information is needed.

      L2 educators have proposed three approaches to measuring the linguistic resources of communication. These include a trait/task‐based approach, a production features approach, and a developmental approach. The trait/task‐based approach can be based on a conception of L2 proficiency as a mental trait involving L2 knowledge, skills, and abilities, similar to those mentioned in the meaning‐oriented model of L2 knowledge. Alternatively, it can be based on the view that L2 proficiency is determined by the knowledge, skills, and abilities underlying task completion. Regardless of the basis for trait/task‐based approach, the assessment method includes a single task or a carefully sequenced set of tasks that allows test takers to display their receptive, emergent, or productive knowledge of the L2; the responses are then scored, mostly by human raters, using scoring rubrics.

image

      SR (and some LP) tasks are typically scored right/wrong for L2 features based on one criterion for correctness (e.g., accurate form). Scoring criteria might involve accuracy, precision, range, complexity, fluency, acceptability, meaningfulness, appropriateness, naturalness, or conventionality. Dichotomous scoring such as this assumes that an item elicits only one underlying dimension of knowledge (e.g., form), that it measures full or no knowledge of the feature, and that item difficulty resides in the interaction between the input and the response key, and not with the distractors.

      In other SR or LP tasks, response choices may represent complete knowledge of the feature (e.g., form), partial knowledge, misinformation, or a total lack of knowledge. If the distractors represent “partial” knowledge of the feature, then the use of partial credit scoring should be considered, as dichotomous scoring would deflate test scores by failing to reward examinees for partial knowledge (e.g., Purpura, Dakin, Ameriks, & Grabowski, 2010). In the case of grammaticality judgments, for example, grammatical acceptability depends on what feature is being measured. If knowledge of both form and meaning are required for an acceptable response, then dichotomous scoring would be inappropriate, which is the case in several studies of second language acquisition (SLA) using grammatical judgment tasks. In these cases, right/wrong scoring with multiple areas of correctness or a partial credit scoring method could be used. Partial credit scores are assigned according to the dimensions of knowledge being measured (e.g., 1 point for form + 1 for meaning = 2 points). The measurement of different levels of knowledge can also be accomplished by using an analytic or holistic rubric based on a rating scale such as the following: 0, .3, .6, or 1.

      EP tasks vary considerably in the quantity and quality of the response. As a result, they are typically scored with a more comprehensive rating scale (e.g., five‐point rubric: 1 to 5 or five bands from 1 to 10). Rating scales provide hierarchical descriptions of observed behavior associated with different levels of performance related to some construct. The more focused the descriptors are for each rating scale, the greater the potential for measurement precision and feedback utility (see Purpura, 2004). Although rating scales are efficient for many contexts, the information they provide may be too coarse‐grained for other assessment contexts, where detailed feedback is required.

      In sum, countless studies in L2 assessment have used these techniques to measure the linguistic resources of L2 communication. These same methods have also been used in mainstream educational measurement to measure other learner characteristics.


Скачать книгу