Regular updates on deep learning, reinforcement learning, and their applications to combinatorial optimization problems.
This repository provides insights into various reinforcement learning algorithms and their implementations, accompanied by clean and well-structured code.
-
Deep Q Network (DQN)
Implementation of the foundational DQN algorithm, which uses Q-learning with deep neural networks for decision making. -
Double DQN (DDQN)
A more stable version of DQN that reduces overestimation bias in the Q-value. -
Dueling DQN
An enhancement to DQN that separates value and advantage functions, improving the network's performance. -
REINFORCE
A Monte Carlo policy gradient method for directly optimizing policy performance. -
REINFORCE with Baseline
A variation of REINFORCE that reduces variance by subtracting a learned baseline value. -
Actor-Critic
Combines the benefits of value-based and policy-based methods by learning both the policy and value functions. -
Advantage Actor-Critic (A2C)
A synchronous version of Actor-Critic that uses the advantage function to optimize policy. -
Proximal Policy Optimization (PPO)
A state-of-the-art policy gradient method that ensures stable learning by limiting policy updates.
-
[] LSTM and Pointer Network, A2C, Greedy & Sampling & Active Search for TSP
Reference: Bello, I., Pham, H., Le, Q. V., et al. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016. -
Multi-Head Self-Attention, REINFORCE, Greedy & Sampling for TSP
Reference: Kool, W., Van Hoof, H., Welling, M. Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475, 2018.
Stay tuned for more updates!