The Concise Encyclopedia of Applied Linguistics. Carol A. Chapelle
Читать онлайн книгу.advanced proficiency in second language acquisition. Hoboken, NJ: Wiley‐Blackwell.
7 Pae, H. K. (Ed.). (2018). Writing systems, reading processes, and cross‐linguistic influences. Philadelphia, PA: John Benjamins.
8 Verhoeven, L., & Perfetti, C. (Eds.). (2018). Reading acquisition across languages and writing systems: An international handbook. Cambridge, England: Cambridge University Press.
9 Wen, X., & Xin, J. (Eds.). (2018). Empirical studies on learning and teaching Chinese as a second language. New York, NY: Routledge.
Assessment of Speaking
APRIL GINTHER
Assessment of speaking requires that we either observe a “live” oral performance or capture the performance by some means for later evaluation. Capturing speaking performances in audio and video, once a considerable challenge, has become a relatively easy task due to technological advances. As Lim (2018) observes:
Technology has made face‐to‐face conversations between people thousands of miles apart—once by definition a contradiction in terms—not only possible but increasingly commonplace, giving rise to learners who are “digital natives” (Prensky, 2001). Automated technologies are also beginning to be applied more widely to the delivery and scoring of speaking tests (Bernstein, 2012; Xi, Higgins, Zechner, & Williamson, 2012). (p. 215)
Facilitated by the availability of computer‐based assessment platforms, language programs now have expanded opportunities to place, evaluate, and track the progress of their own students/examinees. The merits of localization (O'Sullivan, 2005), the process of examining the fit between large‐scale, international assessments and the characteristics of test takers within their own contexts, easily extend to the growth of locally developed, embedded speaking assessments.
Language testers and applied linguists have offered cognitive, psycholinguistic, and sociocultural perspectives that provide strong practical and theoretical foundations for the development of and research on speaking assessments, including but not limited to research on fluency (Segalowitz, 2010; De Jong, 2018), pronunciation (Isaacs & Trofimovitch, 2017; Kang & Ginther, 2017), varieties (Dimova, 2017), and interaction (Galaczi & Taylor, 2018; Plough, Banerjee, & Iwashita, 2018). These multifaceted perspectives enrich and expand our conceptualizations of the underlying constructs (see Ginther & McIntosh, 2018) and allow test developers to create speaking assessments that focus on the aspects of speaking they value most highly. However, to develop reliable and valid speaking assessments, it makes sense to begin with the basics: methods, scales, and raters.
Speaking Assessment Methods
Clark's (1979) classification of language assessment methods as indirect, semi‐direct, and direct has proven useful for understanding speaking assessment methods. O'Loughlin (2001) explains,
Indirect tests generally refer to those procedures where the test taker is not actually required to speak and belong to the “precommunicative” era in language testing. Examples of this kind of procedure are the pronunciation tests of Lado (1961) in which the candidate is asked to indicate which of a series of printed words is pronounced differently from others. (p. 4)
Indirect methods have largely given way to direct and semi‐direct methods.
Direct methods are defined as “procedures in which the examinee is asked to engage in face‐to‐face communicative exchanges with one or more human interlocutors” (Clark, 1979, p. 36), such as an interview in which participants engage in structured or semi‐structured interactions with an evaluator. Speaking assessment methods centered on interviews are collectively referred to as oral proficiency interviews (OPIs). A well‐known OPI is the American Council of Teachers of Foreign Languages oral proficiency interview or the ACTFL OPI (ACTFL, 2009), and many locally developed OPIs are modifications of ACTFL guidelines and elicitation procedures. Common OPI structures involve a series of warm‐up questions followed by increasingly difficult questions, with examinees expected to display increasing levels of complexity in their responses. Interviewers may elicit a preselected set of responses, decide to follow up on topics or comments that the participant has introduced, or both. Examinee performance may be rated simultaneously by the interviewer or by an additional rater who rates as the interview proceeds. When an audio or video recording is made, responses can be rated after the interview is completed.
A variation of the direct method may require examinees to give a presentation on a selected topic, which often includes face‐to‐face engagement with members of an audience who pose follow‐up questions. Performance tests that require examinees to teach remain popular in international teaching assistant (ITA) contexts but the assessment of actual teaching abilities may or may not be included in the final score (Ginther, 2003).
Direct methods have the perceived advantage of their elicitation of speaking skills in a manner that recreates “the setting and operation of the real‐life situations in which proficiency is normally demonstrated” (Shohamy, 1994, p. 100); and have considerable face validity. An important qualification is one that Clark (1979) identified early on: “In the interview situation, the examinee is certainly aware that he or she is talking to a language assessor and not a waiter, taxi driver, or personal friend” (p. 38). Indeed, the fidelity of OPIs to natural conversation has been challenged by a number of researchers (Ross & Berwick, 1992; Johnson & Tyler, 1998), leading others to qualify OPIs as a specific genre of face‐to‐face interaction (He & Young, 1998); nevertheless, research on actual interaction in such tests indicates the genre does share important characteristics with natural conversation (Lazaraton, 1992, 1997).
While OPIs have traditionally been administered with a single interviewer and a single interviewee, speaking assessment of examinees in pairs, or even groups, has attracted growing attention from both researchers and language assessment practitioners (Brooks, 2009; Ducasse & Brown, 2009). The procedure is often referred to as “paired orals” or “group orals.” Such formats hold potential for increased interactivity and authenticity relative to a one‐on‐one interview; however, the added complexity complicates rating. Nevertheless, paired and group oral assessments have successfully been incorporated into large‐scale assessment programs (Hasselgreen, 2005; Van Moere, 2006).
In other testing contexts, semi‐direct methods may be preferred. Semi‐direct methods do not require the presence of an interlocutor to administer the assessment. Examinees are presented with a set of prerecorded questions or tasks, typically under laboratory conditions, and responses are recorded and can be rated later. Advantages of semi‐direct methods are their potential for efficiency, time and cost savings, and high reliability. The absence of a human interlocutor may reduce construct‐irrelevant variance associated with interviewer effects.
Researchers comparing direct and semi‐direct OPI testing methods have reported strong, positive correlations (.89 to .95), leading Stansfield (1991) and Stansfield and Kenyon (1992) to argue that the methods are largely equivalent, statistically speaking. Nevertheless, qualitative analyses have revealed differences in the language produced. Semi‐direct responses tend to display greater formality and more cohesion while being accompanied by longer pauses and hesitations (Shohamy, 1994; Koike, 1998; O'Loughlin, 2001).
Recently, psycholinguistic methods (Van Moere, 2012) and a focus on automaticity have undergone a renaissance. Elicited imitation (EI), which requires examinees to repeat sentences they have heard, is an example of a method that has enjoyed a second look (Yan, Maeda, Lv, & Ginther, 2016), in part because of its effectiveness in distinguishing proficiency levels. Ultimately, Shohamy (1994) concludes that the selection of a direct or semi‐direct method is dependent on four related concerns: accuracy (a function of reliability), utility (the assessment's relation to instruction and the difficulties associated with rater training), feasibility (ease and cost of administration), and fairness. Becker, Matsugu, and Mansoor (2017) also address the balance between practicality and construct representativeness in speaking assessments.
Rating Scales and Scale Descriptors