[Experiment Note] Domain Neurons

Series of experiments conducted for domain neurons

📌 Why CartPole checkpoints have the same return (500) for all models? : To validate the code, we run additional training: Mountain-Car. [mail-link]
✔️ Mountain-Car Training is implemented [mail-link]
✔️ Moutain-Car evaluation showed the increasing return unlike CartPole. [mail-link]. Although, training is underfit, we don’t play with Mountain-Car and CartPole as they are not be used in the Paper.
❌ Initial PPO Training in CarRacing. (failed. The return did not increase.) mail-link
- This experiment is failed as the return did not increase.
✔️ CartPole-Randomized environmet training (V0 ~ V2). (to be done). [mail-link v2] [mail-link v1] [mail-link v0]
- V2 has higher return than v1 due to small pole length has higher return.