Underlying relationships among multiagent systems (MAS) in hazardous scenarios can be represented as Game-theoretic models. This paper proposes a new hierarchical network-based model called Game-theoretic Utility Tree (GUT), which decomposes high-level strategies into executable low-level actions for cooperative MAS decisions. It combines with a new payoff measure based on agent needs for real-time strategy games. We present an Explore game domain, where we measure the performance of MAS achieving tasks from the perspective of balancing the success probability and system costs. We evaluate the GUT approach against state-of-the-art methods that greedily rely on rewards of the composite actions. Conclusive results on extensive numerical simulations indicate that GUT can organize more complex relationships among MAS cooperation, helping the group achieve challenging tasks with lower costs and higher winning rates. We also demonstrated the applicability of the GUT using the Robotarium platform, which is a simulator-hardware testbed for verifying multi-robot system algorithms.
Extend to Pursuit Domain in Robotarium Paper: Game-theoretic Utility Tree for Multi-Robot Cooperative Pursuit Strategy
Structure of the Game-theoretic Utility Tree (GUT)
In most cases, computing an optimal complex strategy through calculating the Nash Equilibrium in a big game is less efficient. GUT decomposes a complex big game into conditionally dependent simple games and organizes them as a tree structure. We take a game in the Explore Domain as an example (Fig. (c)), supposing that when explorers encounter aliens, agents need to decide their strategy's specific purpose (what), target (who), and group organization (how). To describe the strategy combination in a big game, we assume the strategy spaces of the explorer and alien are 18 and 12 based on their different purpose, target, and group organization. And it is difficult to calculate the optimal strategy through the nash equilibrium for the entire game with 216 strategy combinations. However, we can decompose the big game into three simple games describing agents' different motivations. Then, we can organize those games as a tree structure through the corresponding dependence among tactics. By calculating the nash equilibrium of those simple games at each level, we can find a specific trajectory in the tree as an optimal tactic combination for the current situation. In this way, we essentially decrease the size of the strategy space in the game and improve the computation efficiency.
Note: Please check the paper for more details about the proof of the GUT Decision Theorem.
Considering cross-platform, scalability, and efficiency of the simulations, we chose the Unity game engine to simulate the Explorers and Aliens Game and selected Gambit toolkit for calculating each level's Nash Equilibrium in the GUT.
Note: Gambit is an open-source collection of tools for doing computation in game theory which can build, analyze, and explore game models.
In this domain, the explorers will group in patrol formation to explore the circumstance when they do not detect aliens. They are provided with the location of the treasure and always choose the shortest path to the goal, then circle around the treasure location once reached. In the whole process, explorers present a kind of global behaviors using collective rationality and caring about their group interest. In contrast, aliens are more powerful than explorers but show only self-interest and do not cooperate within themselves.
For this game implementation, we decompose the team strategy into three levels.
- At the highest strategy level, the explorer agents decide "what" to do (attack or defend) under the presence of an adversary, using their teaming needs as the utility function expressed through a win probability function for a specific Attach/Defend strategy combination. This helps make group-aware decisions to maximize the chance of collectively reaching the treasure as a team while minimizing the overall team costs.
- At the second strategy level (deciding "who" to attack or defend against), the explorers use their collective basic needs expressed as a function of their current energy level in their payoff table. This helps the decision energy-aware.
- At the lowest strategy level (deciding "how" to attack/defend against), the explorers use their collective safety needs expressed through their health status (HP value) to calculate the payoff at this level. This ensures the decision is safety-aware since safety is the highest priority of needs as per the needs hierarchy defined.
Note: The design of the utility functions at each level is critical to determine whether an agent can calculate reasonable tactics.
Note: Please check the Link for the full video representation.
To demonstrate the GUT on the multi-robot applications, we implement our method in the Robotarium platform, a remote-accessible multi-robot experiment testbed that supports controlling up to 20 robots simultaneously on a 3.2m x 2.0m large rectangular area. Each robot has the dimensions 0.11 m x 0.1 m x 0.07 m in the testbed. The platform also provides a simulator helping users test their code, which can rapidly prototype their distributed control algorithms and receive feedback about their implementation feasibility before sending them to be executed by the robots on the Robotarium.
Our Robotarium experiments consider four different strategies for the explorer team: attacking and changing direction, attacking and changing speed, defending and changing direction, and defending and changing speed. We decompose this strategy set into two levels for GUT implementation in Robotarium: Level 1 considers deciding attack or defend; and Level 2 considers changing direction or speed for a single explorer game while it considers triangle or diamond formation shape for a multiple explorer game. Two different tactics payoff matrices are designed in Level 2 to differentiate the strategies between single-agent and multiagent cooperation.
Note: Check the Link for the full demonstration video.
We introduce a new Game-theoretic Utility Tree (GUT) for multiagent decision-making in adversarial scenarios. We then present an example real-time strategy game called Explore Domain where a group of explorer agents tackles physically attacking adversary agents. Through extensive numerical simulations, we analyze GUT and compare it against a state-of-the-art cooperative decision-making approach, such as the greedy selection method in QMIX. We verified the effectiveness of GUT through two types of experiments involving interaction and information prediction between the agents. The results showed that the GUT could organize more complex relationships among MAS cooperation, helping the group achieve more challenging tasks with lower costs and higher winning rates.
Further, we verified the effectiveness of the GUT in the real robot application through the implementation of the Explore Domain on the Robotarium hardware-simulator multiagent testbed and compared it with the random and greedy approaches in three scenarios. These results demonstrated that the GUT could effectively organize multiagent cooperation strategies, enabling a group with fewer advantages to achieve higher performance.