Robot Learning from Human Teachers. Sonia Chernova
Читать онлайн книгу.recognize, seek proximity to, and interact with their caregivers. They assume that the caregiver has their best interest in mind and even very young infants use this to their advantage when faced with an unknown situation [219].
The ability and desire to engage, communicate, and interact with others is seen from an early age. By the time infants are two months old, they can actively engage in communicative interactions or turn-taking routines with adults. Studies have shown that infants can start and stop communication with their mother through gesture and gaze, and that it is the infants that control the pace of the turn taking interaction [130, 257]. This turn taking capability is the foundation of many situated learning activities, and is a precursor to more sophisticated interactions, such as imitation. For example, Arbib characterizes learning as assisted imitation, a dynamic turn-taking activity [274]. Bruner characterizes social scaffolding interactions in general as asymmetric cooperation that becomes symmetric over time [99]. Thus, turn-taking engagements are an underlying framework in which learning takes place.
Turn-taking abilities are characteristically based on causal assumptions about the world. There is an expectation that the world, and particularly other actors in the world, will have some contingent response to one’s activity. Thus, the ability to take advantage of these social interactions requires a robot to have models of engagement, turn taking, and other fundamental social skills. A growing body of research within the HRI field has focused on models for engagement and turn-taking. The work of [218] and [110] identifies and generates “connection events” in order for a robot to maintain engagement with a human interaction partner. Other systems have been developed to control multimodal dialog for social robots, such as the work of [128] that controls dynamic switching of behaviors in the speech and gesture modalities, and the framework of [185] that controls task-based dialog using parallelized processes with interruption handling. The work of [62] and [63] centers on building autonomous robot controllers for successfully engaging in human-like turn-taking interactions, with a computational model for regulating the speaking floor that explicitly represents and reasons about all four components of the behavior regulation problem: seizing the speaking floor, yielding the floor, holding the floor, and auditing the owner of the floor.
Motivated to Learn
Another important influence on human learning is the idea of a “like-me” bias—the propensity and ability to map between actions seen by others and done by self is seen at a very early age [174]. As the child grows older, interacting with adults, they come to understand that the adult is “like-me” and is therefore a source of information about actions and skills [274]. For example, both Bruner and Leontiev indicate that play is intrinsically motivated and that the object of play is the desire to be like adults and participate in the adult world [107]. Lave and Wenger make a similar argument for the motivation of learning altogether [155]. They develop of theory of “Legitimate Peripheral Participation,” in which the driving force for learning a new practice is the learner’s motivation to form their identity and become a full participant in the practice. On a large scale this is the motivation of all learning, children “wanting to become full participants in the adult world.”
Litowitz has a similar explanation: the child wishes to be like the adult and is thus motivated to imitate and be lead through activities by the adult. He goes one step further, however, and poses an elegant theory of why the process stops. The child gets out of the subordinate learner role and becomes capable on its own through the very same mechanism. The desire to be like the adult extends to the meta-activity level, the child comes to want to have the adult-role of structuring activity (wanting to choose the clothes they wear, resisting being told what to do, etc.) [163].
Given this motivation to imitate, there are several ways in which an adult’s behavior can influence a child’s exploration or learning process. The following four social learning mechanisms have been identified in both human and animal learners [56, 254].
• Stimulus (local) enhancement is a mechanism through which an observer (child, novice) is drawn to objects others interact with. This facilitates learning by focusing the observer’s exploration on interesting objects—ones useful to other social group members.
• Emulation is a process where the observer witnesses someone produce a particular result on an object, but then employs their own action repertoire to produce the result. Learning is facilitated both by attention direction to an object of interest and by observing the goal.
• Mimicking corresponds to the observer copying the actions of others without an appreciation of their purpose. The observer later comes to discover the effects of the action in various situations. Mimicking suggests, to the observer, actions that can produce useful results.
• Imitation refers to reproducing the actions of others to obtain the same results with the same goal.
Cakmak et al. [46] present an implementation of these four social learning mechanisms and articulate the distinct computational benefits of each. Their results show that all four social strategies provide learning benefits over self exploration, particularly when the target goal of learning is a rare occurrence in the environment. The work characterizes the differences between strategies, showing that the “best” one depends on both the nature of the problem space and the current behavior of the social partner.
The general concept of motivation has also been studied in the context of reinforcement learning. Intrinsically motivated RL been proposed as a framework within which agents exploit “internal reinforcement” that rewards novel situations or experiences [65, 233]. A number of other techniques for integrating self-motivation and curiosity have also been studied within the context of developmental learning [121, 200, 229], however these methodologies have not yet been applied in the context of interactive learning agents or LfD.
Figure 2.3: Examples of scaffolding the learning process through attention direction and simplification of the task or environment.
2.2 TEACHERS SCAFFOLD THE LEARNING PROCESS
An important characteristic of a good learner is the ability to learn both on one’s own and by interacting with another. Children are capable of exploring and learning on their own, but in the presence of a teacher they can take advantage of the social cues and communicative acts provided to accomplish more. For instance, the teacher often guides the child’s search process by providing timely feedback, luring the child to perform desired behaviors, and controlling the environment so the appropriate cues are easy to attend to, thereby allowing the child to learn more effectively, appropriately, and flexibly. Scaffolding is the process by which an adult organizes a new skill into manageable steps and provides support such that a child can achieve something they would not be able to accomplish independently [99, 265]. A good teacher will scale instruction appropriately and create a good environment for learning the task at hand. In robotics, the human may be able to help the robot with hard problems like “what to learn,” “when to learn,” “what action to try,” and “how to measure success” [35].
2.2.1 ATTENTION DIRECTION
Attention direction is one of the essential mechanisms that contributes to the learning process [268, 274]. Analyzing parent-child tutoring sessions reveals a number of ways that adults provide structure and guide attention to let children succeed: placing important objects close to the child’s face, arranging the physical environment such that the desired action is within reach, or doing a demonstration in the infant’s line of sight to introduce object affordances.
The adult is also implicitly directing the child’s attention with their gaze direction. The tendency to follow eye gaze is seen very early on, this is a first step to reference and joint attention. It has also been shown that in order to hold joint attention and direct the infant’s attention, a communicative situation must first be established. This can be with a period of eye contact, verbal, or behavioral contingent responses [76].
Within HRI research, a growing body of work has focused on social gaze behavior [117, 127, 153, 181, 182, 230, 256, 270], for example in the use of gaze for regulating turn-taking in two-party [153, 270]