Multi-Agent Coordination. Amit Konar
Читать онлайн книгу.or otherwise following classical techniques. The superiority of the proposed learning and learning‐based planning algorithms are validated over contestant algorithms in terms of the speed of convergence and run‐time complexity, respectively.
In Chapter 3, it is shown that robots may select the suboptimal equilibrium in the presence of multiple types of equilibria (here NE or CE). In the above perspective, robots need to adapt to such a strategy, which can select the optimal equilibrium in each step of the learning and the planning. To address the bottleneck of the optimal equilibrium selection among multiple types, Chapter 3 presents a novel consensus Q‐learning (CoQL) for multi‐robot coordination, by extending the equilibrium‐based multi‐agent Q‐learning algorithms. It is also shown that a consensus (joint action) jointly satisfies the conditions of the coordination‐type pure strategy NE and the pure strategy CE. The superiority of the proposed CoQL algorithm over traditional reference algorithms in terms of the average reward collection are shown in the experimental section. In addition, the proposed consensus‐based planning algorithm is also verified considering the multi‐robot stick‐carrying problem as the testbed.
Unlike CQL, Chapter 4 proposes an attractive approach to adapt composite rewards of all the agents in one Q‐table in joint state–action space during learning, and subsequently, these rewards are employed to compute CE in the planning phase. Two separate models of multi‐agent Q‐learning have been proposed. If the success of only one agent is enough to make the team successful, then model‐I is employed. However, if an agent's success is contingent upon other agents and simultaneous success of the agents is required, then model‐II is employed. It is also shown that the CE obtained by the proposed algorithms and by the traditional CQL are identical. In order to restrict the exploration within the feasible joint states, constraint versions of the said algorithms are also proposed. Complexity analysis and experiments have been undertaken to validate the performance of the proposed algorithms in multi‐robot planning on both simulated and real platforms.
Chapter 5 hybridizes the Firefly Algorithm (FA) and the Imperialist Competitive Algorithm (ICA). The above‐explained hybridization results in the Imperialist Competitive Firefly Algorithm (ICFA), which is employed to determine the time‐optimal trajectory of a stick, being carried by two robots, from a given starting position to a predefined goal position amidst static obstacles in a robot world map. The motion dynamics of fireflies of the FA is embedded into the sociopolitical evolution‐based meta‐heuristic ICA. Also, the trade‐off between the exploration and exploitation is balanced by modifying the random walk strategy based on the position of the candidate solutions in the search space. The superiority of the proposed ICFA is studied considering run‐time and accuracy as the performance metrics. Finally, the proposed algorithm has been verified in a real‐time multi‐robot stick‐carrying problem.
Chapter 6 concludes the book based on the analysis made, experimental and simulation results obtained from the earlier chapters. The chapter also examines the prospects of the book in view of the future research trends.
In summary, the book aimed at developing multi‐robot coordination algorithms with a minimum computational burden and less storage requirement as compared to the traditional algorithms. The novelty, originality, and applicability of the book are illustrated below.
Chapter 1 introduces fundamentals of the multi‐robot coordination. Chapter 2 offers two useful properties, which have been developed to speedup the convergence of TMAQL algorithms in view of the team‐goal exploration, where team‐goal exploration refers to the simultaneous exploration of individual goals. The first property accelerates exploration of the team‐goal. Here, each agent accumulates high (immediate) reward for team‐goal state‐transition, thereby improving the entries in the Q‐table for state‐transitions leading to the team‐goal. The Q‐table thus obtained offers the team the additional benefit to identify the joint action leading to a transition to the team‐goal during the planning, where TMAQL‐based planning stops inadvertently. The second property directs an alternative approach to speedup the convergence of TMAQL by identifying the preferred joint action for the team. Finding preferred joint action for the team is crucial when robots are acting synchronously in a tight cooperative system. The superiority of the proposed algorithms in Chapter 2 is verified both theoretically as well as experimentally in terms of the convergence speed and the run‐time complexity.
Chapter 3 proposes the novel CoQL, which addresses the equilibrium selection problem. In case multiple equilibria exist at a joint state, by adapting the Q‐functions at a consensus. Analytically it is shown that a consensus at a joint state is a coordination‐type pure strategy NE as well as a pure strategy CE. Experimentally, it is shown that the average rewards earned by the robots are more when adapting at consensus, than by either NE or CE.
Chapter 4 introduces a new dimension in the literature of the traditional CQL. In traditional CQL, CE is evaluated both in learning and planning phases. In Chapter 4, CE is computed partly in the learning and the rest in the planning phases, thereby requiring CE computation once only. It is shown in an analysis that the CE obtained by the proposed techniques is same as that obtained by the traditional CQL algorithms. In addition, the computational cost to evaluate CE by the proposed techniques is much smaller than that obtained by traditional CQL algorithms for the following reasons. Computation of CE in the traditional CQL requires consulting m Q‐tables in joint state–action space for m robots, whereas in the present context, we use a single Q‐table in the joint state–action space for evaluation of CE. Complexity analysis (both time‐ and space‐complexity) undertaken here confirms the last point. Two schemes are proposed: one for a loosely‐ and the other one for a tightly coupled multi‐robot system. Also, the problem‐specific constraints are taken care of in Chapter 4 to avoid unwanted exploration of the infeasible state‐space during the learning phase, thereby saving additional run‐time complexity during the planning phase. Experiments are undertaken to validate the proposed concepts in simulated and practical multi‐agent robotic platform (here Khepera‐environment).
Chapter 5 offers the evolutionary optimization approach to address the multi‐robot stick‐carrying problem using the proposed ICFA. ICFA is the synergistic fusion of the motion dynamics of a firefly in the FA and the local exploration capabilities of the ICA. In ICA, an evolving colony is not guided by the experience of more powerful colonies within the same empire. However, in ICFA, each colony attempts to contribute to the improvement of its governing empire by improving its sociopolitical attributes following the motion dynamics of a firefly in the FA. To improve the performance of the above‐mentioned hybrid algorithm further, the step‐size for random movement of each firefly is modulated according to its relative position in the search space. An inferior solution is driven by the explorative force while a qualitative solution should be confined to its local neighborhood in the search space. The chapter also recommends a novel approach of evaluating the threshold value for uniting empires without imposing any serious computational overhead on the traditional ICA. Simulation and experimental results confirm the superiority of the proposed ICFA over the state‐of‐the‐art techniques. Chapter 6 concludes the book with interesting future research directions.
Arup Kumar Sadhu
Amit Konar
Artificial