Google Drive Link
Pettingzoo MPE
simple_adversary_v3
simple_crypto_v3
simple_push_v3
simple_reference_v3
simple_speaker_listener_v4
simple_spread_v3
simple_tag_v3
simple_world_comm_v3
------------------------------------
simple_adversary_v3
=> adversary_0 : Box(-inf, inf, (8,), float32) / Discrete(5)
=> agent_0 : Box(-inf, inf, (10,), float32) / Discrete(5)
=> agent_1 : Box(-inf, inf, (10,), float32) / Discrete(5)
- [size] raw action: (3,)
- [size] raw next obs, reward: (3, 10) (3,)
------------------------------------
simple_crypto_v3
=> eve_0 : Box(-inf, inf, (4,), float32) / Discrete(4)
=> bob_0 : Box(-inf, inf, (8,), float32) / Discrete(4)
=> alice_0 : Box(-inf, inf, (8,), float32) / Discrete(4)
- [size] raw action: (3,)
- [size] raw next obs, reward: (3, 8) (3,)
------------------------------------
simple_push_v3
=> adversary_0 : Box(-inf, inf, (8,), float32) / Discrete(5)
=> agent_0 : Box(-inf, inf, (19,), float32) / Discrete(5)
- [size] raw action: (2,)
- [size] raw next obs, reward: (2, 19) (2,)
------------------------------------
simple_reference_v3
=> agent_0 : Box(-inf, inf, (21,), float32) / Discrete(50)
=> agent_1 : Box(-inf, inf, (21,), float32) / Discrete(50)
- [size] raw action: (2,)
- [size] raw next obs, reward: (2, 21) (2,)
------------------------------------
simple_speaker_listener_v4
=> speaker_0 : Box(-inf, inf, (3,), float32) / Discrete(3)
=> listener_0 : Box(-inf, inf, (11,), float32) / Discrete(5)
- [size] raw action: (2,)
- [size] raw next obs, reward: (2, 11) (2,)
------------------------------------
simple_spread_v3
=> agent_0 : Box(-inf, inf, (18,), float32) / Discrete(5)
=> agent_1 : Box(-inf, inf, (18,), float32) / Discrete(5)
=> agent_2 : Box(-inf, inf, (18,), float32) / Discrete(5)
- [size] raw action: (3,)
- [size] raw next obs, reward: (3, 18) (3,)
------------------------------------
simple_tag_v3
=> adversary_0 : Box(-inf, inf, (16,), float32) / Discrete(5)
=> adversary_1 : Box(-inf, inf, (16,), float32) / Discrete(5)
=> adversary_2 : Box(-inf, inf, (16,), float32) / Discrete(5)
=> agent_0 : Box(-inf, inf, (14,), float32) / Discrete(5)
- [size] raw action: (4,)
- [size] raw next obs, reward: (4, 16) (4,)
Direct step in environment for rendering
- masking sampling is required when sampling actions (for listener_v4)
- observation requires zero padding when computing
- input of environment is a list of discrete actions.
------------------------
simple_adversary_v3
obs length: [11, 7, 7] | act length: [5, 5, 5]
input action: [4, 3, 2]
------------------------
simple_crypto_v3
obs length: [5, 5, 7] | act length: [4, 4, 4]
input action: [3, 0, 3]
------------------------
simple_push_v3
obs length: [11, 7] | act length: [5, 5]
input action: [1, 1]
------------------------
simple_reference_v3
obs length: [7, 7] | act length: [50, 50]
input action: [48, 31]
------------------------
simple_speaker_listener_v4
obs length: [9, 10] | act length: [3, 5]
input action: [2, 3]
------------------------
simple_spread_v3
obs length: [7, 7, 7] | act length: [5, 5, 5]
input action: [4, 2, 2]
------------------------
simple_tag_v3
obs length: [11, 11, 11, 7] | act length: [5, 5, 5, 5]
input action: [0, 1, 2, 1]
### Direct step in environment with Communication Model
------------------------------------
simple_crypto_v3
actions: tensor([[3, 1, 3]]) | obs_size: (3, 8) | termination: [0 0 0]
actions: tensor([[3, 2, 2]]) | obs_size: (3, 8) | termination: [0 0 0]
actions: tensor([[0, 1, 1]]) | obs_size: (3, 8) | termination: [0 0 0]
------------------------------------
simple_push_v3
actions: tensor([[1, 4]]) | obs_size: (2, 19) | termination: [0 0]
actions: tensor([[3, 0]]) | obs_size: (2, 19) | termination: [0 0]
actions: tensor([[0, 2]]) | obs_size: (2, 19) | termination: [0 0]
------------------------------------
simple_reference_v3
actions: tensor([[29, 22]]) | obs_size: (2, 21) | termination: [0 0]
actions: tensor([[17, 38]]) | obs_size: (2, 21) | termination: [0 0]
actions: tensor([[41, 7]]) | obs_size: (2, 21) | termination: [0 0]
------------------------------------
simple_speaker_listener_v4
actions: tensor([[1, 0]]) | obs_size: (2, 11) | termination: [0 0]
actions: tensor([[1, 2]]) | obs_size: (2, 11) | termination: [0 0]
actions: tensor([[0, 1]]) | obs_size: (2, 11) | termination: [0 0]
------------------------------------
simple_spread_v3
actions: tensor([[3, 2, 3]]) | obs_size: (3, 18) | termination: [0 0 0]
actions: tensor([[3, 2, 3]]) | obs_size: (3, 18) | termination: [0 0 0]
actions: tensor([[2, 1, 4]]) | obs_size: (3, 18) | termination: [0 0 0]
------------------------------------
simple_tag_v3
actions: tensor([[0, 0, 4, 4]]) | obs_size: (4, 16) | termination: [0 0 0 0]
actions: tensor([[3, 1, 1, 0]]) | obs_size: (4, 16) | termination: [0 0 0 0]
actions: tensor([[0, 1, 3, 1]]) | obs_size: (4, 16) | termination: [0 0 0 0]
simple adversary
- adversary
- agents= [adversary_0, agent_0,agent_1]
Trained Modelโs Return

simple push
- adversary
- agents= [adversary_0, agent_0]
Trained Modelโs Return

simple crypto
- adversary
- agents= [eve_0, bob_0, alice_0]
Trained Modelโs Return

simple reference
- adversary
- agents= [adversary_0, agent_0,agent_1]
Trained Modelโs Return

simple speaker and listener
- cooperative
- agents=[speaker_0, listener_0]
Trained Modelโs Return

simple spread
- cooperative
- agents= [agent_0, agent_1, agent_2]
Trained Modelโs Return

simple tag
- adversary
- predator-prey
- agents= [adversary_0, adversary_1, adversary_2, agent_0]
Trained Modelโs Return

simple communication world
- adversary
- agents=[leadadversary_0, adversary_0, adversary_1, adversary_3, agent_0, agent_1]
Trained Modelโs Return
Three types of models and training results
ํ๊ฒฝ๋ณ, ๋ชจ๋ธ ๋ณต์ก๋์ ๋ฐ๋ฅธ ๋ฉ์์ง์ ํผ์ก๋๋ฅผ ๋ถ์ํ๊ธฐ ์ํด์ ์ผ์ฐจ์ ์ผ๋ก ๋ชจ๋ธ์ ํ์ตํ๊ณ ์ฑ๋ฅ์ ํ๊ฐํ๋ค.
ํ๊ฒฝ์ ํฌ๊ฒ ์ธ ๊ฐ์ง ์์ good, adversary, obstacle ๊ฐ ์์ผ๋ฉฐ, ์ด๋ค์ ๊ฐ์๋ฅผ ์ค์ ํ์ฌ ํ๊ฒฝ์ ๋ณต์ก๋๋ฅผ ์ฌ๋ฆด ์ ์๋ค.
๋ณธ ์คํ์์๋ ๋ค์ ๊ฐ์๋ฅผ ์ค์ ํ์ฌ ์์ด์ ํธ๋ค์ ํ์ตํ์๋ค.
- num_adversaries : [2 3 4]
- num_goods : [1 2]
- num_obstacles : [0 2 4]
num_steps=128
update_epochs=4
num_layers=4
total_timesteps=1000000
hidden_dim=128
env_max_cycles=50
seed=0
msg_activation in Sigmoid
message_dim : 1 2 4 8 16
activation : [ReLU, Sigmoid]
num_layers : 3
update_epochs : 4
Model V1
๋ฉ์์ง๋ฅผ ์์ฑํ์ง ์๊ณ , ๊ด์ฐฐ๊ฐ์ผ๋ก๋ง ํ๋์ ์ทจํ๋ค.
def step1(self, obs):
message = None
return message
def step2(self, obs, messages):
combined = obs
return combined
def forward(self, obs):
message = self.step1(obs)
combined = self.step2(obs, message)
return combined
Model V2
๋ฉ์์ง๋ฅผ ์์ฑํ๊ณ , ์์ฑ๋ ์๊ตฐ ๋ฉ์์ง๋ค์ ๊ฒฐํฉํ๋ค.
๊ฒฐํฉํ๋ ๋ฐฉ๋ฒ์ average pooling์ด๋ค.
pooled_message = torch.stack(gathered_messages, dim=0).mean(dim=0)
Model V3
๋ฉ์์ง๋ฅผ ์์ฑํ๊ณ , ์์ฑ๋ ์๊ตฐ ๋ฉ์์ง๋ค์ ๊ฒฐํฉํ๋ค.
๊ฒฐํฉํ๋ ๋ฐฉ๋ฒ์ query ์ ๋ํด์ attention์ ์ทจํ๋ ๋ฐฉ๋ฒ์ด๋ค.
query = self.queries[group_id](agent_obs).unsqueeze(1)
gathered_messages = torch.stack(gathered_messages, dim=1)
pooled_message, scores = self.attentions[group_id](query, gathered_messages, gathered_messages)
pooled_message = pooled_message.squeeze(1)
ํ์ต๊ฒฐ๊ณผ : google drive
ํ์ต๊ฒฐ๊ณผ full png : google drive