Distributed Training
====================

REAX makes it easy to scale your training to multiple GPUs or TPUs.

Strategies
----------

REAX supports several distributed strategies:

*   **'ddp'** (Data Distributed Parallel): Replicates the model on each device and synchronises
    gradients.
*   **'fsdp'** (Fully Sharded Data Parallel): Shards the model parameters across devices to save
    memory.
*   **'auto'**: Automatically selects the best strategy based on the available hardware.

Configuration
-------------

To enable distributed training, simply set the ``devices`` and ``strategy`` arguments in the
Trainer:

.. code-block:: python

    # Train on 4 GPUs using DDP
    trainer = reax.Trainer(accelerator="gpu", devices=4, strategy="ddp")

Launch Methods
--------------

You can launch your script using standard tools like ``mpirun`` or SLURM. REAX will automatically
detect the environment and initialise the distributed backend.