Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/user choice sample tensor #172

Merged
merged 121 commits into from
Dec 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
121 commits
Select commit Hold shift + click to select a range
7135dcb
Add first PPO numpy buffer implementation
belerico Oct 5, 2023
055cacd
Merge branch 'main' of https://github.com/Eclectic-Sheep/sheeprl into…
belerico Oct 6, 2023
91dbdcb
Add distribution cfg to agent
belerico Oct 6, 2023
8174704
No need for tensordict
belerico Oct 6, 2023
9da5a86
Add SAC numpy
belerico Oct 6, 2023
3cc29cf
Improve sample_next_obs
belerico Oct 8, 2023
1882b6a
Add DV1 with numpy buffer
belerico Oct 8, 2023
024cc98
Too much reshapes
belerico Oct 8, 2023
4e4389e
Add Sequential and EnvIndipendent np buffers
belerico Oct 9, 2023
c5fc229
Fewer number of reshapes
belerico Oct 9, 2023
34b3261
Faster indexing + from_numpy parameter
belerico Oct 9, 2023
f549019
Dreamer-V2 numpy
belerico Oct 9, 2023
5131e60
Fix buffer add
belerico Oct 10, 2023
ded1b19
Better indexing
belerico Oct 10, 2023
f513171
Fix indexes to sample
belerico Oct 10, 2023
48d20d3
Fix metrics when they are nan
belerico Oct 10, 2023
82b576a
Fix reshape when bootstrapping + fix normalization
belerico Oct 10, 2023
f029e1e
Guard timer metrics
belerico Oct 10, 2023
6b423f5
Merge branch 'fix/algos' of github.com:Eclectic-Sheep/sheeprl into fe…
belerico Oct 10, 2023
f12ecb1
np.intp for indexing
belerico Oct 10, 2023
f8546f4
Change dtype after creating the tensor
belerico Oct 10, 2023
662e54c
Merge branch 'main' of https://github.com/Eclectic-Sheep/sheeprl into…
belerico Oct 10, 2023
fb7b7c8
Merge branch 'main' of github.com:Eclectic-Sheep/sheeprl into feature…
belerico Oct 10, 2023
b79d67d
Fix buf[key] after __getstate__ is called upon checkpoint
belerico Oct 11, 2023
1260c06
Securely close fd on __getstate__()
belerico Oct 11, 2023
2b7ef10
Add MemmapArray
belerico Oct 11, 2023
385405d
Merge branch 'feature/buffer-np' of https://github.com/Eclectic-Sheep…
belerico Oct 11, 2023
75f0664
Add __len__ function
belerico Oct 11, 2023
63ec5e6
Fix len
belerico Oct 11, 2023
a9b323c
Merge branch 'feature/buffer-np' of github.com:Eclectic-Sheep/sheeprl…
belerico Oct 11, 2023
3094295
Better array setter and __del__ now controls ownership
belerico Oct 11, 2023
c5794f7
Do not transfer ownership upon array setter
belerico Oct 12, 2023
7b81238
Add properties
belerico Oct 12, 2023
b134b4f
Feature/episode buffer np (#121)
michele-milesi Oct 12, 2023
64b60d2
Fix not use self._obs_keys
belerico Oct 12, 2023
4be0442
Sample only if n > 0
belerico Oct 12, 2023
11d2c68
Fix shapes
belerico Oct 12, 2023
d71f09d
feat: added possibility to specify sequence length in sample() + adde…
michele-milesi Oct 12, 2023
d3f477e
tests: update episode buffer numpy tests
michele-milesi Oct 12, 2023
f11e886
Merge branch 'feature/buffer-np' of github.com:Eclectic-Sheep/sheeprl…
michele-milesi Oct 12, 2023
26cc62d
tests: added replay buffer np tests
michele-milesi Oct 12, 2023
5be26de
tests: added sequential replay buffer np tests
michele-milesi Oct 12, 2023
9b60f07
fix: env independent repla buffer name
michele-milesi Oct 12, 2023
3d67ca5
fix: replay buffer + add tests
michele-milesi Oct 12, 2023
70c2b59
Safely release buffer on Windows
belerico Oct 12, 2023
cc36d97
Safely delets memmaps
belerico Oct 12, 2023
0d025d1
Del buffer
belerico Oct 12, 2023
82fd261
Safer array setter
belerico Oct 13, 2023
28008fa
Add Memmap.from_array
belerico Oct 14, 2023
63efdc0
Fix ReplayBuffer __set_item__
belerico Oct 14, 2023
f0fc7ed
fix: sac_np sample
michele-milesi Oct 16, 2023
775ec1d
Merge branch 'main' of https://github.com/Eclectic-Sheep/sheeprl into…
belerico Oct 16, 2023
b36775c
tests: update tests
michele-milesi Oct 16, 2023
0f54d7c
tests: update
michele-milesi Oct 17, 2023
bc51ad3
fix: sequential replay buffer sample clone
michele-milesi Oct 17, 2023
7c66ee4
Add tests + Fix MemmapArray on Windows
belerico Oct 17, 2023
ae36340
Merge branch 'feature/buffer-np' of https://github.com/Eclectic-Sheep…
belerico Oct 17, 2023
611d4ad
Add tests to run only on Linux
belerico Oct 17, 2023
25ae53f
Fix tests
belerico Oct 17, 2023
bea4211
Fix skip test on Windows
belerico Oct 17, 2023
944f3b3
Dreamer-V2 with EpisodeBuffer np
belerico Oct 17, 2023
9a957a1
Add user warning if file exists when creating a new MemmapArray
belerico Oct 17, 2023
6d97cda
feat: added dreamer v3 np
michele-milesi Oct 17, 2023
4b47902
Add docstrings + Fix array setter if shapes differ
belerico Oct 18, 2023
bdc87fe
Fix tests
belerico Oct 18, 2023
38eeafe
Add docstring
belerico Oct 20, 2023
207de84
Docstrings
belerico Oct 23, 2023
3d00eca
fix: sample of env independent buffer
michele-milesi Oct 23, 2023
80d968a
Fix locked tensordict
belerico Oct 24, 2023
f7974e2
Merge branch 'main' of github.com:Eclectic-Sheep/sheeprl into feature…
belerico Oct 27, 2023
28f7f58
Add configs
belerico Oct 27, 2023
49e6083
Merge branch 'main' of github.com:Eclectic-Sheep/sheeprl into feature…
belerico Oct 27, 2023
fb270a4
merge: update numpy-np branch
michele-milesi Nov 30, 2023
ada75fc
feat: update np algorithms with new specifications
michele-milesi Nov 30, 2023
a6c64f3
fix: mypy
michele-milesi Dec 1, 2023
62383d7
PokemonRed env from https://github.com/PWhiddy/PokemonRedExperiments/…
belerico Dec 4, 2023
77cd103
Merge branch 'main' of github.com:Eclectic-Sheep/sheeprl into feature…
belerico Dec 4, 2023
a3e70ba
Update dreamer_v3 with main
belerico Dec 4, 2023
ca05368
Update dreamer_v2 with main
belerico Dec 4, 2023
3a24773
Update dreamer_v1 with main
belerico Dec 4, 2023
a2792d2
Update ppo with main
belerico Dec 4, 2023
1e78ad6
Update sac with main
belerico Dec 4, 2023
8c16305
Amend numpy to torch dtype and back dicts
belerico Dec 4, 2023
fbb5743
Merge branch 'feature/pokemon' into feature/buffer-np
belerico Dec 4, 2023
99c1efa
feat: added np callback
michele-milesi Dec 12, 2023
74ac79a
fix: np callback
michele-milesi Dec 12, 2023
1b4a527
feat: add support functions in np checkpoint callback
michele-milesi Dec 12, 2023
7c02f87
feat: added droq np
michele-milesi Dec 14, 2023
90331d2
feat: added ppo recurrent np
michele-milesi Dec 14, 2023
bb43a5b
feat: added sac-ae np
michele-milesi Dec 14, 2023
c686bfa
Merge branch 'main' of https://github.com/Eclectic-Sheep/sheeprl into…
belerico Dec 14, 2023
b78a38a
Update dreamer algos with main
belerico Dec 14, 2023
0cff9c7
Merge branch 'feature/buffer-np' of github.com:Eclectic-Sheep/sheeprl…
michele-milesi Dec 14, 2023
8328c76
feat: added p2e dv1 np
michele-milesi Dec 14, 2023
4c01893
feat: added p2e dv2 np
michele-milesi Dec 15, 2023
b8b5f66
feat: add p2e dv3 np
michele-milesi Dec 15, 2023
cfbeb0a
feat: added ppo decoupled np
michele-milesi Dec 15, 2023
efc3491
Merge branch 'main' of https://github.com/Eclectic-Sheep/sheeprl into…
belerico Dec 15, 2023
6139b16
Merge branch 'feature/buffer-np' of https://github.com/Eclectic-Sheep…
belerico Dec 15, 2023
933030c
feat: add sac decoupled
michele-milesi Dec 15, 2023
b2f2859
Merge branch 'feature/buffer-np' of github.com:Eclectic-Sheep/sheeprl…
michele-milesi Dec 15, 2023
0c47f73
np.tanh instead of torch.tanh
belerico Dec 15, 2023
706d728
Merge branch 'feature/buffer-np' of https://github.com/Eclectic-Sheep…
belerico Dec 15, 2023
dc6c9ce
feat: from tensordict to buffers np
michele-milesi Dec 18, 2023
39974e0
from td to np
belerico Dec 18, 2023
495661c
Merge branch 'feature/buffer-np' of https://github.com/Eclectic-Sheep…
belerico Dec 18, 2023
2150065
exclude mlflow from tests
belerico Dec 18, 2023
21412af
No more tensordict
belerico Dec 18, 2023
818b490
Updated howto
belerico Dec 18, 2023
a82e5a2
Fix tests
belerico Dec 18, 2023
a0f108a
.cpu().numpy() just one time
belerico Dec 18, 2023
f2c5e38
Removed old cfgs
belerico Dec 18, 2023
a8766df
Convert all when hydra instantiating
belerico Dec 18, 2023
2ad65f3
convert all on instantiate
belerico Dec 18, 2023
ad10fe2
[skip-ci] Removed pokemon files
belerico Dec 18, 2023
ba04da0
fix: git merge related errors
michele-milesi Dec 18, 2023
63b1f7e
Fix get absolute path
belerico Dec 19, 2023
ab3ee64
Merge branch 'feature/buffer-np' of https://github.com/Eclectic-Sheep…
belerico Dec 19, 2023
fb0c7d0
Amend dreamer-v3 pokemon config
belerico Dec 19, 2023
4757583
feat: added user choice from as_tensor and from_numpy in sample_tenso…
michele-milesi Dec 19, 2023
b7258a4
mearge: main into feature/user-choice-sample_tensor
michele-milesi Dec 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions sheeprl/algos/dreamer_v1/dreamer_v1.py
Original file line number Diff line number Diff line change
Expand Up @@ -689,6 +689,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any]):
n_samples=1,
dtype=None,
device=device,
from_numpy=cfg.buffer.from_numpy,
) # [N_samples, Seq_len, Batch_size, ...]
batch = {k: v[0].float() for k, v in sample.items()}
train(
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/dreamer_v2/dreamer_v2.py
Original file line number Diff line number Diff line change
Expand Up @@ -733,6 +733,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any]):
n_samples=n_samples,
dtype=None,
device=fabric.device,
from_numpy=cfg.buffer.from_numpy,
)
with timer("Time/train_time", SumMetric(sync_on_compute=cfg.metric.sync_on_compute)):
for i in range(next(iter(local_data.values())).shape[0]):
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/dreamer_v3/dreamer_v3.py
Original file line number Diff line number Diff line change
Expand Up @@ -676,6 +676,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any]):
),
dtype=None,
device=fabric.device,
from_numpy=cfg.buffer.from_numpy,
)
with timer("Time/train_time", SumMetric(sync_on_compute=cfg.metric.sync_on_compute)):
for i in range(next(iter(local_data.values())).shape[0]):
Expand Down
6 changes: 4 additions & 2 deletions sheeprl/algos/droq/droq.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,9 @@ def train(
# Sample a minibatch in a distributed way: Line 5 - Algorithm 2
# We sample one time to reduce the communications between processes
sample = rb.sample_tensors(
cfg.algo.per_rank_gradient_steps * cfg.algo.per_rank_batch_size, sample_next_obs=cfg.buffer.sample_next_obs
cfg.algo.per_rank_gradient_steps * cfg.algo.per_rank_batch_size,
sample_next_obs=cfg.buffer.sample_next_obs,
from_numpy=cfg.buffer.from_numpy,
)
critic_data = fabric.all_gather(sample)
flatten_dim = 3 if fabric.world_size > 1 else 2
Expand All @@ -63,7 +65,7 @@ def train(
critic_sampler = BatchSampler(sampler=critic_idxes, batch_size=cfg.algo.per_rank_batch_size, drop_last=False)

# Sample a different minibatch in a distributed way to update actor and alpha parameter
sample = rb.sample_tensors(cfg.algo.per_rank_batch_size)
sample = rb.sample_tensors(cfg.algo.per_rank_batch_size, from_numpy=cfg.buffer.from_numpy)
actor_data = fabric.all_gather(sample)
actor_data = {k: v.view(-1, *v.shape[flatten_dim:]) for k, v in actor_data.items()}
if fabric.world_size > 1:
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/p2e_dv1/p2e_dv1_exploration.py
Original file line number Diff line number Diff line change
Expand Up @@ -729,6 +729,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any]):
n_samples=1,
dtype=None,
device=device,
from_numpy=cfg.buffer.from_numpy,
) # [N_samples, Seq_len, Batch_size, ...]
batch = {k: v[0].float() for k, v in sample.items()}
train(
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/p2e_dv1/p2e_dv1_finetuning.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any], exploration_cfg: Dict[str, Any]):
n_samples=1,
dtype=None,
device=device,
from_numpy=cfg.buffer.from_numpy,
) # [N_samples, Seq_len, Batch_size, ...]
batch = {k: v[0].float() for k, v in sample.items()}
train(
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/p2e_dv2/p2e_dv2_exploration.py
Original file line number Diff line number Diff line change
Expand Up @@ -878,6 +878,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any]):
n_samples=n_samples,
dtype=None,
device=fabric.device,
from_numpy=cfg.buffer.from_numpy,
)
# Start training
with timer("Time/train_time", SumMetric(sync_on_compute=cfg.metric.sync_on_compute)):
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/p2e_dv2/p2e_dv2_finetuning.py
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any], exploration_cfg: Dict[str, Any]):
n_samples=n_samples,
dtype=None,
device=fabric.device,
from_numpy=cfg.buffer.from_numpy,
)
# Start training
with timer("Time/train_time", SumMetric(sync_on_compute=cfg.metric.sync_on_compute)):
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/p2e_dv3/p2e_dv3_exploration.py
Original file line number Diff line number Diff line change
Expand Up @@ -947,6 +947,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any]):
),
dtype=None,
device=fabric.device,
from_numpy=cfg.buffer.from_numpy,
)
# Start training
with timer("Time/train_time", SumMetric(sync_on_compute=cfg.metric.sync_on_compute)):
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/p2e_dv3/p2e_dv3_finetuning.py
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any], exploration_cfg: Dict[str, Any]):
),
dtype=None,
device=fabric.device,
from_numpy=cfg.buffer.from_numpy,
)
# Start training
with timer("Time/train_time", SumMetric(sync_on_compute=cfg.metric.sync_on_compute)):
Expand Down
2 changes: 1 addition & 1 deletion sheeprl/algos/ppo/ppo.py
Original file line number Diff line number Diff line change
Expand Up @@ -343,7 +343,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any]):
fabric.print(f"Rank-0: policy_step={policy_step}, reward_env_{i}={ep_rew[-1]}")

# Transform the data into PyTorch Tensors
local_data = rb.to_tensor(dtype=None, device=device)
local_data = rb.to_tensor(dtype=None, device=device, from_numpy=cfg.buffer.from_numpy)

# Estimate returns with GAE (https://arxiv.org/abs/1506.02438)
with torch.no_grad():
Expand Down
2 changes: 1 addition & 1 deletion sheeprl/algos/ppo/ppo_decoupled.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ def player(
fabric.print(f"Rank-0: policy_step={policy_step}, reward_env_{i}={ep_rew[-1]}")

# Transform the data into PyTorch Tensors
local_data = rb.to_tensor(dtype=None, device=device)
local_data = rb.to_tensor(dtype=None, device=device, from_numpy=cfg.buffer.from_numpy)

# Estimate returns with GAE (https://arxiv.org/abs/1506.02438)
normalized_obs = normalize_obs(next_obs, cfg.algo.cnn_keys.encoder, obs_keys)
Expand Down
2 changes: 1 addition & 1 deletion sheeprl/algos/ppo_recurrent/ppo_recurrent.py
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any]):
fabric.print(f"Rank-0: policy_step={policy_step}, reward_env_{i}={ep_rew[-1]}")

# Transform the data into PyTorch Tensors
local_data = rb.to_tensor(dtype=None, device=device)
local_data = rb.to_tensor(dtype=None, device=device, from_numpy=cfg.buffer.from_numpy)

# Estimate returns with GAE (https://arxiv.org/abs/1506.02438)
with torch.no_grad():
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/sac/sac.py
Original file line number Diff line number Diff line change
Expand Up @@ -286,6 +286,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any]):
sample_next_obs=cfg.buffer.sample_next_obs,
dtype=None,
device=device,
from_numpy=cfg.buffer.from_numpy,
) # [G*B]
gathered_data: Dict[str, torch.Tensor] = fabric.all_gather(sample) # [World, G*B]
for k, v in gathered_data.items():
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/sac/sac_decoupled.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ def player(
sample_next_obs=cfg.buffer.sample_next_obs,
dtype=None,
device=device,
from_numpy=cfg.buffer.from_numpy,
)
# chunks = {k1: [k1_chunk_1, k1_chunk_2, ...], k2: [k2_chunk_1, k2_chunk_2, ...]}
chunks = {
Expand Down
1 change: 1 addition & 0 deletions sheeprl/algos/sac_ae/sac_ae.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,7 @@ def main(fabric: Fabric, cfg: Dict[str, Any]):
sample = rb.sample_tensors(
training_steps * cfg.algo.per_rank_gradient_steps * cfg.algo.per_rank_batch_size,
sample_next_obs=cfg.buffer.sample_next_obs,
from_numpy=cfg.buffer.from_numpy,
) # [G*B, 1]
gathered_data = fabric.all_gather(sample) # [G*B, World, 1]
flatten_dim = 3 if fabric.world_size > 1 else 2
Expand Down
3 changes: 2 additions & 1 deletion sheeprl/configs/buffer/default.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
size: ???
memmap: True
validate_args: False
validate_args: False
from_numpy: False