diff --git a/docs/making_dataloaders.rst b/docs/making_dataloaders.rst index c24c8a11..ba0846c2 100644 --- a/docs/making_dataloaders.rst +++ b/docs/making_dataloaders.rst @@ -49,6 +49,13 @@ takes an ``enum`` provided by :class:`ffcv.loader.OrderOption`: # Memory-efficient but not truly random loading # Speeds up loading over RANDOM when the whole dataset does not fit in RAM! ORDERING = OrderOption.QUASI_RANDOM + +.. note:: + ``order`` options require different amounts of RAM, thus should be used considering how much RAM available in a case-by-case basis. + + - ``RANDOM`` requires RAM the most since it will have to cache the entire dataset to sample perfectly at random. If the available RAM is not enough, it will throw an exception. + - ``QUASI_RANDOM`` requires much less RAM than ``RANDOM``, but a bit more than ``SEQUENTIAL``, in order to cache a part of samples. It is used when the entire dataset can not fit RAM. + - ``SEQUENTIAL`` requires least RAM. It only keeps several samples loaded ahead of time used in incoming training iterations. Pipelines ''''''''' @@ -165,12 +172,12 @@ Other options You can also specify the following additional options when constructing an :class:`ffcv.loader.Loader`: -- ``os_cache``: If True, the entire dataset is cached +- ``os_cache``: If ``True``, the OS automatically determines whether the dataset is held in memory or not, depending on available RAM. If ``False``, FFCV manages the caching, and the amount of RAM needed depends on ``order`` option. - ``distributed``: For training on :ref:`multiple GPUs` - ``seed``: Specify the random seed for batch ordering - ``indices``: Provide indices to load a subset of the dataset - ``custom_fields``: For specifying decoders for fields with custom encoders -- ``drop_last``: If True, drops the last non-full batch from each iteration +- ``drop_last``: If ``True``, drops the last non-full batch from each iteration - ``batches_ahead``: Set the number of batches prepared in advance. Increasing it absorbs variation in processing time to make sure the training loop does not stall for too long to process batches. Decreasing it reduces RAM usage. - ``recompile``: Recompile every iteration. Useful if you have transforms that change their behavior from epoch to epoch, for instance code that uses the shape as a compile time param. (But if they just change their memory usage, e.g., the resolution changes, it's not necessary.) diff --git a/docs/parameter_tuning.rst b/docs/parameter_tuning.rst index e9bbcba1..13f1306e 100644 --- a/docs/parameter_tuning.rst +++ b/docs/parameter_tuning.rst @@ -22,7 +22,7 @@ Scenario: Large scale datasets If your dataset is too large to be cached on the machine we recommend: - Use ``os_cache=False``. Since the data can't be cached, FFCV will have to read it over and over. Having FFCV take over the operating system for caching is beneficial as it knows in advance the which samples will be needed in the future and can load them ahead of time. -- For ``order``, we recommend using the ``QUASI_RANDOM`` traversal order if you need randomness but perfect uniform sampling isn't mission critical. This will optimize the order to minimize the reads on the underlying storage while maintaining very good randomness properties. If you have experience with the ``shuffle()`` function of ``webdataset`` and the quality of the randomness wasn't sufficient, we still suggest you give ``QUASI_RANDOM`` a try as it should be significantly better. +- For ``order``, we recommend using the ``QUASI_RANDOM`` traversal order if you need randomness but perfect uniform sampling isn't mission critical. This will optimize the order to minimize the reads on the underlying storage while maintaining very good randomness properties. If you have experience with the ``shuffle()`` function of ``webdataset`` and the quality of the randomness wasn't sufficient, we still suggest you give ``QUASI_RANDOM`` a try as it should be significantly better. Using ``RANDOM`` is unfeasible in this situation because it needs to load the entire dataset in RAM, causing an out-of-memory exception. Scenario: Multi-GPU training (1 model, multiple GPUs)