✅ Proximal Policy Gradient (PPO) |
ppo.py , docs |
|
ppo_atari.py , docs |
|
ppo_continuous_action.py , docs |
|
ppo_atari_lstm.py , docs |
|
ppo_atari_envpool.py , docs |
|
ppo_atari_envpool_xla_jax.py , docs |
|
ppo_procgen.py , docs |
|
ppo_atari_multigpu.py , docs |
|
ppo_pettingzoo_ma_atari.py , docs |
|
ppo_continuous_action_isaacgym.py , docs |
✅ Deep Q-Learning (DQN) |
dqn.py , docs |
|
dqn_atari.py , docs |
|
dqn_jax.py , docs |
|
dqn_atari_jax.py , docs |
✅ Categorical DQN (C51) |
c51.py , docs |
|
c51_atari.py , docs |
✅ Soft Actor-Critic (SAC) |
sac_continuous_action.py , docs |
✅ Deep Deterministic Policy Gradient (DDPG) |
ddpg_continuous_action.py , docs |
|
ddpg_continuous_action_jax.py , docs |
✅ Twin Delayed Deep Deterministic Policy Gradient (TD3) |
td3_continuous_action.py , docs |
|
td3_continuous_action_jax.py , docs |
✅ Phasic Policy Gradient (PPG) |
ppg_procgen.py , docs |
✅ Random Network Distillation (RND) |
ppo_rnd_envpool.py , docs |