Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gym.vector] Add BatchedVectorEnv, (chunking + flexible n_envs) #2072

Closed
wants to merge 13 commits into from

Conversation

lebrice
Copy link

@lebrice lebrice commented Oct 18, 2020

Adds the following features, compared to using the vectorized Async and Sync
VectorEnvs:

  • Chunking: Running more than one environment per worker. This is done by
    passing SyncVectorEnvs as the env_fns to the AsyncVectorEnv.

  • Flexible batch size: Supports any number of environments, irrespective
    of the number of workers or of CPUs. The number of environments will be
    spread out as equally as possible between the workers.

    For example, if you want to have a batch_size of 17, and n_workers is 6,
    then the number of environments per worker will be: [3, 3, 3, 3, 3, 2].

    Internally, this works by creating up to two AsyncVectorEnvs, env_a and
    env_b. If the number of envs (batch_size) isn't a multiple of the number
    of workers, then we create the second AsyncVectorEnv (env_b).

    In the first environment (env_a), each env will contain
    ceil(n_envs / n_workers) each. If env_b is needed, then each of its envs
    will contain floor(n_envs / n_workers) environments.

    The observations/actions/rewards are reshaped to be (n_envs, *shape), i.e.
    they don't have an extra 'chunk' dimension.

Signed-off-by: Fabrice Normandin fabrice.normandin@gmail.com

@lebrice
Copy link
Author

lebrice commented Oct 18, 2020

Resolves #1795

@lebrice lebrice changed the title Add BatchedVectorEnv, (chunking + flexible n_envs) [gym.vector] Add BatchedVectorEnv, (chunking + flexible n_envs) Oct 18, 2020
@jkterry1
Copy link
Collaborator

jkterry1 commented Nov 2, 2020

@lebrice can you please get this to pass tests? This is something of note to me.

@jkterry1
Copy link
Collaborator

jkterry1 commented Nov 3, 2020

@lebrice this appears to still be failing tests?

@lebrice
Copy link
Author

lebrice commented Nov 4, 2020

Hey @justinkterry yes, the tests for python versions below 3.6 are failing, I think it might have to do with the type hints.

I'll take a quick look, but if it comes to the type hints, I'm not currently planning on removing them to be honest, and this PR probably won't get accepted even if I did, given that the people at openai are probably working on other things..

@jkterry1
Copy link
Collaborator

jkterry1 commented Nov 7, 2020

@lebrice Python 3.5 is removed now (#2084), try again?

@lebrice
Copy link
Author

lebrice commented Nov 17, 2020

Hey @justinkterry, @pzhokhov @tristandeleu , the issues above have been fixed. Would you mind maybe taking a look over the code and letting me know what you think?

Copy link
Contributor

@tristandeleu tristandeleu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main strength of BatchedVectorEnv is to allow the batch size to not be a multiple of the number of workers. In my opinion this is a somewhat niche use case. Even the chunking option in SubprocVecEnv, which prompted #1795, only handles cases where the overall number of environments is a multiple of the number of workers.

I would rather see a fix to AsyncVectorEnv to allow it to run SyncVectorEnv environments first, possibly with a VectorEnvWrapper to flatten the first two batch dimensions (similar to you unroll function), before going for a new VectorEnv subclass. And if there is major interest in having instances where the batch size is not a multiple of the number of workers, then we can think about adding BatchedVectorEnv.

That being said @pzhokhov & @christopherhesse have the final say on it, and this is a good PR!

from gym.vector.utils.spaces import _BaseGymSpaces


@singledispatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's really cool! It could be worth updating some of the utility functions in gym.vector with singledispatch (in another PR of course, this would be outside the scope of this one)

EDIT: I just saw #2093 that keeps track of that thanks!

gym/vector/vector_env.py Outdated Show resolved Hide resolved
# return self.viewer.isopen


def distribute(values: Sequence[T], n_groups: int) -> List[Sequence[T]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utility functions could possibly be moved to somewhere in gym.vector.utils if you think this makes sense.

They also require tests (for distribute, chunk, unroll, fuse_and_batch, n_consecutive and zip_dicts).

env = BatchedVectorEnv(env_fns, n_workers=n_workers)
env.seed(123)

assert env.single_observation_space[0].shape == (4,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more explicit to equal env.single_observation_space to its Tuple space directly, instead of checking individual elements. Maybe something like

env.single_observation_space == Tuple(Box(-high, high, dtype=np.float32), Discrete(1))

And same thing for env.observation_space below. I understand that the observation space for CartPole is a bit verbose in its definition of high, which might not be relevant in this test, but maybe this calls for a simpler environment (e.g. one of gym.envs.unittest).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the hard-coded "CartPole" environment reference, and I compared with the single observation spaces and their batched counterparts.

return groups


def unroll(chunks: Sequence[Sequence[T]], item_space: Space = None) -> List[T]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks if chunks is a dict (if the space is a dict space). Here is a snippet:

import gym
import numpy as np

from collections import OrderedDict
from gym.spaces import Dict, Box, Discrete
from gym.vector import BatchedVectorEnv

class DictEnv(gym.Env):
    def __init__(self):
        super(DictEnv, self).__init__()
        self.observation_space = Dict(OrderedDict([
            ('position', Box(-2., 2., shape=(2,), dtype=np.float32)),
            ('velocity', Box(0., 1., shape=(2,), dtype=np.float32))
        ]))
        self.action_space = Discrete(2)

    def reset(self):
        return self.observation_space.sample()

    def step(self, action):
        observation = self.observation_space.sample()
        reward, done = 0., False
        return (observation, reward, done, {})

def make_env(seed):
    def _make_env():
        return DictEnv()
    return _make_env

env = BatchedVectorEnv([make_env(i) for i in range(10)], n_workers=4)
observations = env.reset()

Here obs_a (after the call to unroll) in reset is ['p', 'o', 's', 'i', 't', 'i', 'o', 'n', 'v', 'e', 'l', 'o', 'c', 'i', 't', 'y'].

EDIT: For reference, with env = AsyncVectorEnv([make_env(i) for i in range(4)]), observations is a dict with two arrays of size (4, 2).

Copy link
Author

@lebrice lebrice Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks for pointing this out. I was indeed missing a test case for this, so I reused your snippet directly as a test, and this is now behaving correctly.
I'll probably also add other tests to cover when the actions are Tuples/Dicts/weird spaces as well.

gym/vector/batched_vector_env.py Outdated Show resolved Hide resolved
@jkterry1
Copy link
Collaborator

@lebrice I'm a maintainer now so I can actually merge this. If you could please add detailed automated testing of this I should be happy to, pending the secondary approval of @benblack769.

@jkterry1
Copy link
Collaborator

Reviewer: @benblack769

Adds the following features, compared to using the vectorized Async and Sync
VectorEnvs:

-   Chunking: Running more than one environment per worker. This is done by
    passing `SyncVectorEnv`s as the env_fns to the `AsyncVectorEnv`.

-   Flexible batch size: Supports any number of environments, irrespective
    of the number of workers or of CPUs. The number of environments will be
    spread out as equally as possible between the workers.

    For example, if you want to have a batch_size of 17, and n_workers is 6,
    then the number of environments per worker will be: [3, 3, 3, 3, 3, 2].

    Internally, this works by creating up to two AsyncVectorEnvs, env_a and
    env_b. If the number of envs (batch_size) isn't a multiple of the number
    of workers, then we create the second AsyncVectorEnv (env_b).

    In the first environment (env_a), each env will contain
    ceil(n_envs / n_workers) each. If env_b is needed, then each of its envs
    will contain floor(n_envs / n_workers) environments.

    The observations/actions/rewards are reshaped to be (n_envs, *shape), i.e.
    they don't have an extra 'chunk' dimension.

Additionally, this adds the following change to the AsyncVectorEnv and
SyncVectorEnv classes:

-   When some environments have `done=True` while stepping, those
    environments are reset, as was done previously. Additionally, the final
    observation for those environments is placed in the info dict at key
    FINAL_STATE_KEY (currently 'final_state').

Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
Signed-off-by: Fabrice Normandin <fabrice.normandin@gmail.com>
@pseudo-rnd-thoughts
Copy link
Contributor

@lebrice Do you have any plans for this PR?

@lebrice
Copy link
Author

lebrice commented Apr 25, 2022

Hey @pseudo-rnd-thoughts , Yes I do, but I'm waiting on #2104 before I work more on this.

@jkterry1 jkterry1 closed this May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants