An example showing how to use jax to train resnet50 on multi-node multi-GPU

Last update: Jul 04, 2022

Overview

jax-multi-gpu-resnet50-example

This repo shows how to use jax for multi-node multi-GPU training. The example is adapted from the resnet50 example in dm-haiku (https://github.com/deepmind/dm-haiku/tree/main/examples/imagenet). It only requires each node knows the IP of the rank 0 node, very similar to PyTorch's DDP.

When two containers on the same cluster are running, one can run the following script in each container to launch a multi-node multi-GPU training job:

python train.py --server_ip=$ROOT_IP --server_port=$PORT --num_hosts=$NUM_HOSTS --host_idx=$HOST_IDX

Owner

Yangzihao Wang

UC Davis PhD from @owensgroup. Now at Sea AI Lab. [email protected] and Wechat,

GitHub Repository

Source code, data, and evaluation details for “Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Formation, and Ramifications”

Analysis of cross-lingual citations in English papers Contents initial_analysis Source code, data, and evaluation details as published at ICADL2020 ci

1 Oct 27, 2022

Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Aspect-level Sentiment Classification Code and dataset for ACL2018 [paper] ‘‘Exploiting Document Knowledge for Aspect-level Sentiment Classification’’

146 Nov 29, 2022

Code for paper: Towards Tokenized Human Dynamics Representation

Video Tokneization Codebase for video tokenization, based on our paper Towards Tokenized Human Dynamics Representation. Prerequisites (tested under Py

20 May 31, 2022

BABEL: Bodies, Action and Behavior with English Labels [CVPR 2021]

BABEL is a large dataset with language labels describing the actions being performed in mocap sequences. BABEL labels about 43 hours of mocap sequences from AMASS [1] with action labels.

113 Dec 28, 2022

It's final year project of Diploma Engineering. This project is based on Computer Vision.

Face-Recognition-Based-Attendance-System It's final year project of Diploma Engineering. This project is based on Computer Vision. Brief idea about ou

10 Nov 02, 2022

[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos Created by Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie

58 Dec 23, 2022

Easy genetic ancestry predictions in Python

ezancestry Easily visualize your direct-to-consumer genetics next to 2500+ samples from the 1000 genomes project. Evaluate the performance of a custom

38 Jan 02, 2023

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Swin-Transformer-Tensorflow A direct translation of the official PyTorch implementation of "Swin Transformer: Hierarchical Vision Transformer using Sh

52 Dec 29, 2022

Facial Image Inpainting with Semantic Control

Facial Image Inpainting with Semantic Control In this repo, we provide a model for the controllable facial image inpainting task. This model enables u

8 Nov 22, 2021

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB)

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB) This repository provides evaluation codes of PLNLP for OGB link property prediction t

31 Oct 10, 2022

A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

YOLOv4 CrowdHuman Tutorial This is a tutorial demonstrating how to train a YOLOv4 people detector using Darknet and the CrowdHuman dataset. Table of c

118 Nov 10, 2022

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention. AdaNet buil

3.4k Jan 07, 2023

An example showing how to use jax to train resnet50 on multi-node multi-GPU

Related tags

Overview

jax-multi-gpu-resnet50-example

Owner

Yangzihao Wang

Source code, data, and evaluation details for “Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Formation, and Ramifications”

Code and dataset for ACL2018 paper "Exploiting Document Knowledge for Aspect-level Sentiment Classification"

Code for paper: Towards Tokenized Human Dynamics Representation

BABEL: Bodies, Action and Behavior with English Labels [CVPR 2021]

It's final year project of Diploma Engineering. This project is based on Computer Vision.

[CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Easy genetic ancestry predictions in Python

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Facial Image Inpainting with Semantic Control

Pairwise Learning for Neural Link Prediction for OGB (PLNLP-OGB)

A tutorial on training a DarkNet YOLOv4 model for the CrowdHuman dataset

AdaNet is a lightweight TensorFlow-based framework for automatically learning high-quality models with minimal expert intervention

High frequency AI based algorithmic trading module.

UI2I via StyleGAN2 - Unsupervised image-to-image translation method via pre-trained StyleGAN2 network

Multi-Joint dynamics with Contact. A general purpose physics simulator.

ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation

Trajectory Variational Autoencder baseline for Multi-Agent Behavior challenge 2022

ComPhy: Compositional Physical Reasoning ofObjects and Events from Videos

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Generating Anime Images by Implementing Deep Convolutional Generative Adversarial Networks paper