Welcome to this repo about High Performance Deep Learning

This repository contains different example Jupyter notebooks dealing with large-scale models training with PyTorch. They are inspired by materials, examples, exercies mainly taken from official PyTorch tutotials and other authors. Each notebook contains the list of reference material.

Topics covered in this PyTorch Multi-GPU approach to Deep learning Models include:

Data and Model Parallelism
Message Passing
Distributed training using Horovord
Mixed Precision and Memory Format
Pipeline Parallelism
and a challenge to test your knowledge

Prerequisites

To run this tutorial you will need a machine with NVIDIA GPU and also install any of the two listed below.

PyTorch, the primitives it provides for writing distributed applications as well as training distributed models.
Install the latest Docker or Singularity. Then start you will have to build a Docker or Singularity container.

Docker Container

To build a docker container, run: sudo docker build --network=host -t <imagename>:<tagnumber> .

For instance: sudo docker build -t pytorch:1.0 .

The code labs have been written using Jupyter notebooks and a Dockerfile has been built to simplify deployment. The following command would expose port 8888 inside the container as port 8888 on the lab machine:

sudo docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -it --rm --network=host -v ~/hpdl/Pytorch_Distributed_Deep_Learning/workspace:/workspace pytorch:1.0 jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace

The --gpus flag is used to enable all NVIDIA GPUs during container runtime. The --rm flag is used to clean an temporary images created during the running of the container. The -it flag enables killing the jupyter server with ctrl-c. The --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 enable sufficient memory allocation to run pytorch within the docker environment.

The jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace command launch the jupyter notebook inside the container. The flag -v allows the mapping of working directory on your local machine ~/hpdl/Pytorch_Distributed_Deep_Learning/workspace:/workspace to worspace directory inside the container.

This command may be customized for your hosting environment. Now, open the jupyter notebook in browser: http://localhost:8888

Start by clicking on the Start_Here.ipynb notebook.

Singularity Container

To build the singularity container, run: sudo singularity build --fakeroot <image_name>.simg Singularity

For example: singularity build --fakeroot pytorch.simg Singularity

Then, run the container: singularity run --nv --bind ~/hpdl/Pytorch_Distributed_Deep_Learning/workspace:/workspace pytorch.simg jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace

Then, open the jupyter notebook in browser: http://localhost:8888 Start working on the lab by clicking on the Start_Here.ipynb notebook.

#Tutorial Duration The total bootcamp material would take approximately 4 hours.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
Pytorch_Distributed_Deep_Learning		Pytorch_Distributed_Deep_Learning
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to this repo about High Performance Deep Learning

Prerequisites

Docker Container

Singularity Container

About

Releases

Packages

Contributors 2

Languages

gfiameni/hpdl

Folders and files

Latest commit

History

Repository files navigation

Welcome to this repo about High Performance Deep Learning

Prerequisites

Docker Container

Singularity Container

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages