+ Gradient

How to use Gradient and



Use PyTorch to train models on Gradient

PyTorch is an open source ML framework developed by Facebook's AI Research lab (FAIR) for training and deploying ML models. PyTorch is widely popular in research as well as large production environments.

Gradient supports any version of PyTorch for Notebooks, Experiments, or Jobs. In Gradient, the ML framework used to execute workloads runs within a Docker container. Containers are lightweight and portable environments that can easily be customized to include various framework versions and other libraries. Any Docker container is supported on the Gradient platform. This flexibility makes it easy to switch between different frameworks, to update them from one version to another, and to incorporate other libraries to be used alongside the framework itself.

A set of pre-built containers is provided out of the box though any container can be used if hosted on a public or private container registry.

Running workloads with PyTorch

When launching a workload via the web interface, CLI, or Workflows, you can pass in a Docker image path (e.g. <inline-code>paperspace/dl-containers:pytorch-py36-cu100-jupyter<inline-code>. There are also several pre-configured templates available. These templates are updated regularly and optimized to run on Gradient.

A set of pre-built containers can be used as a starting point within Gradient

When using the CLI, the command would like something like this:

Distributed training with MPI

Gradient offers push-button distributed / multi-node training as a first-class citizen. Scaling out your workloads with an MPI-based architecture like Horovod doesn't require any background in DevOps and can be accomplished with a few additional lines of code. By specifying the <inline-code>multinode<inline-code> mode and a few additional parameters, you can take any PyTorch model and execute training across as many nodes as desired. Learn more in the docs.