MARL XAI

Project

Google Drive Link

Pettingzoo MPE

simple_adversary_v3 simple_crypto_v3 simple_push_v3 simple_reference_v3 simple_speaker_listener_v4 simple_spread_v3 simple_tag_v3 simple_world_comm_v3
------------------------------------ simple_adversary_v3 => adversary_0 : Box(-inf, inf, (8,), float32) / Discrete(5) => agent_0 : Box(-inf, inf, (10,), float32) / Discrete(5) => agent_1 : Box(-inf, inf, (10,), float32) / Discrete(5) - [size] raw action: (3,) - [size] raw next obs, reward: (3, 10) (3,) ------------------------------------ simple_crypto_v3 => eve_0 : Box(-inf, inf, (4,), float32) / Discrete(4) => bob_0 : Box(-inf, inf, (8,), float32) / Discrete(4) => alice_0 : Box(-inf, inf, (8,), float32) / Discrete(4) - [size] raw action: (3,) - [size] raw next obs, reward: (3, 8) (3,) ------------------------------------ simple_push_v3 => adversary_0 : Box(-inf, inf, (8,), float32) / Discrete(5) => agent_0 : Box(-inf, inf, (19,), float32) / Discrete(5) - [size] raw action: (2,) - [size] raw next obs, reward: (2, 19) (2,) ------------------------------------ simple_reference_v3 => agent_0 : Box(-inf, inf, (21,), float32) / Discrete(50) => agent_1 : Box(-inf, inf, (21,), float32) / Discrete(50) - [size] raw action: (2,) - [size] raw next obs, reward: (2, 21) (2,) ------------------------------------ simple_speaker_listener_v4 => speaker_0 : Box(-inf, inf, (3,), float32) / Discrete(3) => listener_0 : Box(-inf, inf, (11,), float32) / Discrete(5) - [size] raw action: (2,) - [size] raw next obs, reward: (2, 11) (2,) ------------------------------------ simple_spread_v3 => agent_0 : Box(-inf, inf, (18,), float32) / Discrete(5) => agent_1 : Box(-inf, inf, (18,), float32) / Discrete(5) => agent_2 : Box(-inf, inf, (18,), float32) / Discrete(5) - [size] raw action: (3,) - [size] raw next obs, reward: (3, 18) (3,) ------------------------------------ simple_tag_v3 => adversary_0 : Box(-inf, inf, (16,), float32) / Discrete(5) => adversary_1 : Box(-inf, inf, (16,), float32) / Discrete(5) => adversary_2 : Box(-inf, inf, (16,), float32) / Discrete(5) => agent_0 : Box(-inf, inf, (14,), float32) / Discrete(5) - [size] raw action: (4,) - [size] raw next obs, reward: (4, 16) (4,)

Direct step in environment for rendering

------------------------ simple_adversary_v3 obs length: [11, 7, 7] | act length: [5, 5, 5] input action: [4, 3, 2] ------------------------ simple_crypto_v3 obs length: [5, 5, 7] | act length: [4, 4, 4] input action: [3, 0, 3] ------------------------ simple_push_v3 obs length: [11, 7] | act length: [5, 5] input action: [1, 1] ------------------------ simple_reference_v3 obs length: [7, 7] | act length: [50, 50] input action: [48, 31] ------------------------ simple_speaker_listener_v4 obs length: [9, 10] | act length: [3, 5] input action: [2, 3] ------------------------ simple_spread_v3 obs length: [7, 7, 7] | act length: [5, 5, 5] input action: [4, 2, 2] ------------------------ simple_tag_v3 obs length: [11, 11, 11, 7] | act length: [5, 5, 5, 5] input action: [0, 1, 2, 1]

### Direct step in environment with Communication Model ------------------------------------ simple_crypto_v3 actions: tensor([[3, 1, 3]]) | obs_size: (3, 8) | termination: [0 0 0] actions: tensor([[3, 2, 2]]) | obs_size: (3, 8) | termination: [0 0 0] actions: tensor([[0, 1, 1]]) | obs_size: (3, 8) | termination: [0 0 0] ------------------------------------ simple_push_v3 actions: tensor([[1, 4]]) | obs_size: (2, 19) | termination: [0 0] actions: tensor([[3, 0]]) | obs_size: (2, 19) | termination: [0 0] actions: tensor([[0, 2]]) | obs_size: (2, 19) | termination: [0 0] ------------------------------------ simple_reference_v3 actions: tensor([[29, 22]]) | obs_size: (2, 21) | termination: [0 0] actions: tensor([[17, 38]]) | obs_size: (2, 21) | termination: [0 0] actions: tensor([[41, 7]]) | obs_size: (2, 21) | termination: [0 0] ------------------------------------ simple_speaker_listener_v4 actions: tensor([[1, 0]]) | obs_size: (2, 11) | termination: [0 0] actions: tensor([[1, 2]]) | obs_size: (2, 11) | termination: [0 0] actions: tensor([[0, 1]]) | obs_size: (2, 11) | termination: [0 0] ------------------------------------ simple_spread_v3 actions: tensor([[3, 2, 3]]) | obs_size: (3, 18) | termination: [0 0 0] actions: tensor([[3, 2, 3]]) | obs_size: (3, 18) | termination: [0 0 0] actions: tensor([[2, 1, 4]]) | obs_size: (3, 18) | termination: [0 0 0] ------------------------------------ simple_tag_v3 actions: tensor([[0, 0, 4, 4]]) | obs_size: (4, 16) | termination: [0 0 0 0] actions: tensor([[3, 1, 1, 0]]) | obs_size: (4, 16) | termination: [0 0 0 0] actions: tensor([[0, 1, 3, 1]]) | obs_size: (4, 16) | termination: [0 0 0 0]


simple adversary

Trained Modelโ€™s Return


simple push

Trained Modelโ€™s Return


simple crypto

Trained Modelโ€™s Return


simple reference

Trained Modelโ€™s Return


simple speaker and listener

Trained Modelโ€™s Return


simple spread

Trained Modelโ€™s Return


simple tag

Trained Modelโ€™s Return


simple communication world

Trained Modelโ€™s Return


Three types of models and training results

ํ™˜๊ฒฝ๋ณ„, ๋ชจ๋ธ ๋ณต์žก๋„์— ๋”ฐ๋ฅธ ๋ฉ”์‹œ์ง€์˜ ํ˜ผ์žก๋„๋ฅผ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ผ์ฐจ์ ์œผ๋กœ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ  ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•œ๋‹ค. ํ™˜๊ฒฝ์€ ํฌ๊ฒŒ ์„ธ ๊ฐ€์ง€ ์š”์†Œ good, adversary, obstacle ๊ฐ€ ์žˆ์œผ๋ฉฐ, ์ด๋“ค์˜ ๊ฐœ์ˆ˜๋ฅผ ์„ค์ •ํ•˜์—ฌ ํ™˜๊ฒฝ์˜ ๋ณต์žก๋„๋ฅผ ์˜ฌ๋ฆด ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ์‹คํ—˜์—์„œ๋Š” ๋‹ค์Œ ๊ฐœ์ˆ˜๋ฅผ ์„ค์ •ํ•˜์—ฌ ์—์ด์ „ํŠธ๋“ค์„ ํ•™์Šตํ•˜์˜€๋‹ค.

num_steps=128
update_epochs=4
num_layers=4
total_timesteps=1000000
hidden_dim=128 
env_max_cycles=50
seed=0
msg_activation in Sigmoid 
message_dim : 1 2 4 8 16
activation : [ReLU,  Sigmoid]
num_layers : 3
update_epochs : 4 

Model V1

๋ฉ”์‹œ์ง€๋ฅผ ์ƒ์„ฑํ•˜์ง€ ์•Š๊ณ , ๊ด€์ฐฐ๊ฐ’์œผ๋กœ๋งŒ ํ–‰๋™์„ ์ทจํ•œ๋‹ค.

def step1(self, obs):
   message = None 
   return message

def step2(self, obs, messages):
   combined = obs 
   return combined

def forward(self, obs):
   message = self.step1(obs)
   combined = self.step2(obs, message)
   return combined

Model V2

๋ฉ”์‹œ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ƒ์„ฑ๋œ ์•„๊ตฐ ๋ฉ”์‹œ์ง€๋“ค์„ ๊ฒฐํ•ฉํ•œ๋‹ค. ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ average pooling์ด๋‹ค.

pooled_message = torch.stack(gathered_messages, dim=0).mean(dim=0)

Model V3

๋ฉ”์‹œ์ง€๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ƒ์„ฑ๋œ ์•„๊ตฐ ๋ฉ”์‹œ์ง€๋“ค์„ ๊ฒฐํ•ฉํ•œ๋‹ค. ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•์€ query ์— ๋Œ€ํ•ด์„œ attention์„ ์ทจํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.

query = self.queries[group_id](agent_obs).unsqueeze(1)
gathered_messages = torch.stack(gathered_messages, dim=1)
pooled_message, scores = self.attentions[group_id](query, gathered_messages, gathered_messages)
pooled_message = pooled_message.squeeze(1)

ํ•™์Šต๊ฒฐ๊ณผ : google drive ํ•™์Šต๊ฒฐ๊ณผ full png : google drive