The Handbook of Multimodal-Multisensor Interfaces, Volume 1. Sharon Oviatt
Читать онлайн книгу.processing during cognitive activities. Working memory span is a limited capacity system that is critical for basic cognitive functions, including planning, problem solving, inferential reasoning, language comprehension, written composition, and others. It focuses on goal-oriented task processing, and is susceptible to distraction, especially from simultaneous processes and closely related information. Working Memory theory, Multiple Resource theory, and Cognitive Load theory all are limited resource theories that address the fundamental issue in multimodal interface design of how to manage input and output modalities in a way that alleviates this bottleneck in order to optimize human performance. See Kopp et al.’s Chapter 6 in this volume for a related description of recent extensions of Working Memory Theory, and Zhou and colleagues’ chapter in Volume 2 [Zhou et al. 2017] for current approaches to real-time assessment of cognitive load based on different modalities and sensors.
Working memory refers to the ability to store information temporarily in mind, usually for a matter of seconds without external aids, before it is consolidated into long-term memory. To be consolidated into long-term memory, information in working memory requires continual rehearsal or else it becomes unavailable. Loss of information from working memory is influenced by cognitive load, which can be due to task difficulty, dual tasking, interface complexity, and similar factors. It also can occur when the content of distractors interfere with to-be-remembered information [Waugh and Norman 1965].
Miller and colleagues originally introduced the term “working memory” over 50 years ago [Miller et al. 1960]. They described the span of working memory as limited to approximately seven elements or “chunks,” which could involve different types of content such as digits or words [Miller 1956]. Expansion of this limit can be achieved under some circumstances, for example when information content involves different modalities that are processed in different brain areas. The development of domain expertise also can effectively expand working memory limits, because it enables a person to perceive and group isolated units of information into larger organized wholes. As a result, domain experts do not need to retain and retrieve as many units of information from working memory when completing a task, which frees up memory reserves for focusing on other or more difficult tasks.
Baddeley and Hitch [1974] proposed a particularly consequential theory of working memory in the 1970s, which preceded modern neuroscience findings on multisensory-multimodal brain processing. According to Baddeley’s theory, working memory consists of multiple semi-independent processors associated with different modalities [Baddeley 1986, 2003]. A visual-spatial “sketch pad” processes visual materials such as pictures and diagrams, whereas a separate “phonological loop” stores auditory-verbal information in a different brain area. These lower-level modality-specific processing systems are viewed as functioning largely independently. They are responsible for constructing and maintaining information in mind through rehearsal activities. In addition, Baddeley describes a higher-level “central executive” component that plans actions, directs attention to relevant information while suppressing irrelevant ones, manages integration of information from the lower-level modality stores, coordinates processing when two tasks are performed at a time, initiates retrieval of long-term memories, and manages overall decision-making processes [Baddeley 1986, 2003].
It is the semi-independence of lower-level modality-specific processing that enables people to use multiple modalities during a task in a way that circumvents short-term memory limitations, effectively expanding the size of working memory. For example, during dual tasking it is easier to maintain digits in mind while working on a spatial task than another numeric one [Maehara and Saito 2007]. Likewise, it is easier to simultaneously process information presented auditorily and visually than two auditory tasks. The “expansion” of working memory reserves that occurs is especially important as tasks become more difficult, because under these circumstances more elements of information typically must be integrated to solve a problem. Two key implications of these theoretical contributions for interface design are the following:
• Human performance improves when a computer interface combines different modalities that can support complementary information processing in separate brain regions conducted simultaneously. An advantage can accrue whether simultaneous information processing involves two input streams, an input and output stream, or two output streams.
• Flexible multimodal interfaces that support these processing advantages are essential as tasks become more difficult, or whenever users’ processing abilities are limited.
Multiple Resource theory, which is related to Working Memory theory, directly addresses the above processing advantages due to modality complementarity [Wickens et al. 1983, Wickens 2002]. It states that there can be competition between modalities during tasks, such that attention and processing required during input and output will result in better human performance if information is distributed across complementary modalities. For example, verbal input is more compatible with simultaneous visual than auditory output. This theory states that cross-modal time-sharing is effectively better than intra-modal time-sharing. The implication of both Working Memory and Multiple Resource theories is that multimodal interface design that permits distributing processing across different modality-specific brain regions can minimize interference and cognitive load, improving performance.
Working memory is a theoretical concept that is actively being researched in both cognitive psychology and neuroscience. During the past few decades, the neural basis of memory function has advanced especially rapidly [D’Esposito 2008]. It has confirmed and elaborated our understanding of modality-specific brain regions, the process of multisensory fusion, and circumstances under which interference occurs during consolidation of information in memory. Neurological evidence has confirmed that working memory is lateralized, with the right prefrontal cortex more engaged in visual-spatial working memory, and the left more active during verbal-auditory tasks [Owen et al. 2005, Daffner and Searl 2008]. Working Memory theory is well aligned with Activity Theory (see Section 1.3) in emphasizing the dynamic processes that construct and actively suppress memories, which are a byproduct of neural activation and inhibition. For example, active forgetting is now understood to be an inhibitory process at the neural level that is under conscious control [Anderson and Green 2001].
Cognitive Load theory, introduced by John Sweller and colleagues, applies working memory concepts to learning theory [Sweller 1988]. It maintains that during the learning process, students can acquire new schemas and automate them more easily if instructional methods or computer interfaces minimize demands on students’ attention and working memory, thereby reducing extraneous cognitive load [Baddeley 1986, Mousavi et al. 1995, Oviatt 2006, Paas et al. 2003, van Merrienboer and Sweller 2005]. Cognitive load researchers assess the extraneous complexity associated with instructional methods and tools separately from the intrinsic complexity and load of a student’s main learning task. Assessments typically compare performance indices of cognitive load as students use different curriculum materials or computer interfaces. Educational researchers then focus on evidence-based redesign of these materials and tools to decrease students’ extraneous cognitive load, so their learning progress can be enhanced.
Numerous learning studies have shown that a multimodal presentation format supports students’ learning more successfully than does unimodal presentation. For example, presentation of educational information that includes diagrams and audiotapes improves students’ ability to solve geometry problems, compared with visual-only presentation of comparable information content [Mousavi et al. 1995]. When using the multimodal format, larger performance advantages have been demonstrated on more difficult tasks, compared with simpler ones [Tindall-Ford et al. 1997]. These performance advantages of a multimodal presentation format have been replicated in different content domains, with different types of instructional materials (e.g., computer-based multimedia animations), and using different dependent measures [Mayer and Moreno 1998, Tindall-Ford et al. 1997]. These research findings based on educational activities are consistent with the general literature on multimodal processing advantages.
In recent years, Cognitive Load theory has been applied more broadly to computer interface design [Oviatt 2006]. It has supported the development of multimodal interfaces for education, and adaptive interface design tailored to a