FedTorch is an open-source Python package for distributed and federated training of machine learning models using PyTorch distributed API

Last update: Dec 23, 2022

Overview

FedTorch is an open-source Python package for distributed and federated training of machine learning models using PyTorch distributed API. Various algorithms for federated learning and local SGD are implemented for benchmarking and research, including our own proposed methods:

And other common algorithms such as:

We are actively trying to expand the library to include more training algorithms as well.

NEWS

Recent updates to the package:

Paper Accepted (01/22/2021): Our paper titled Federated Learning with Compression: Unified Analysis and Sharp Guarantees accepted to AISTAT 2021 🎉
Public Release (01/17/2021): We are releasing the package to the public with a docker image

Installation

First you need to clone the repo into your computer:

git clone https://github.com/MLOPTPSU/FedTorch.git

The PyPi package will be added soon.

This package is built based on PyTorch Distributed API. Hence, it could be run with any supported distributed backend of GLOO, MPI, and NCCL. Among these three, MPI backend since it can be used for both CPU and CUDA runnings, is the main backend we use for developement. Unfortunately installing the built version of PyTorch does not support MPI backend for distributed training and needed to be built from source with a version of MPI installed that supports CUDA as well. However, do not worry since we got you covered. We provide a docker file that can create an image with all dependencies needed for FedTorch. The Dockerfile can be found here, where you can edit based on your need to create your customized image. In addition, since building this docker image might take a lot of time, we provide different versions that we built before along with this repository in the packages section.

For instance, you can pull one of the images that is built with CUDA 10.2 and OpenMPI 4.0.1 with CUDA support and PyTorch 1.6.0, using the following command:

docker pull docker.pkg.github.com/mloptpsu/fedtorch/fedtorch:cuda10.2-mpi

The docker images can be used for cloud services such as Azure Machine Learning API for running FedTorch. The instructions for running on cloud services will be added in near future.

Get Started

Running different trainings with different settings is easy in FedTorch. The only thing you need to take care of is to set the correct parameters for the experiment. The list of all parameters used in this package is in parameters.py file. For different algorithms we will provide examples, so the relevant parameters can be set correctly. YAML support will be added in future for each distinct training. The parameters can be parsed from the input of the commandline using the following method:

from fedtorch.parameters import get_args

args = get_args()

When the parameters are set, we need to setup the nodes using those parameters and start the training. First, we need to setup the distributed backend. For instance, we can use MPI backend as:

import torch.distributed as dist

dist.init_process_group('mpi')

This will initialize the backend for distributed training. Next, we need to setup each node based on the parameters and create the graph of nodes. Then we need to initialize nodes and load their data. For this, we can use the node object in the FedTorch:

from fedtorch.nodes import Client

client = Client(args, dist.get_rank())
# Initialize the node
client.initialize()
# Initialize the dataset if not downloaded
client.initialize_dataset()
# Load the dataset
client.load_local_dataset()
# Generate auxiliary models and params for training
client.gen_aux_models()

Then, we need to call the appropriate training method and run the training. For instance, if the parameters are set for a FedAvg training, we can run:

from fedtorch.comms.trainings.federated import train_and_validate_federated

train_and_validate_federated(client)

Different distributed and federated algorithms in this package can be run using this procedure, and for simplicity, we provide main.py file, where can be used for running those algorithms following the same procedure. To run this file, we should run it using mpi and define number of clients (processes) that will run the same file for training using:

mpirun -np {NUM_CLIENTS} python main.py {args:values}

where {args:values} should be filled with appropriate parameters needed for the training. Next we provide a file to automatically build this command for mpi running for various situations.

Running Examples

To make the process easier, we have provided a run_mpi.py file that covers most of parameters needed for running different training algrithms. We first get into the details of different parameters and then provide some examples for running.

Dataset Parameters

For setting up the dataset there are some parameters involved. The main parameters are:

--data : Defines the dataset name for training.
--batch_size : Defines the size of the batch in the training.
--partition_data : Defines whether the data should be partitioned or each client access to the whole dataset.
--reshuffle_per_epoch : This can be set True for distributed training to have iid data accross clients and faster convergence. This is not inline with Federated Learning settings.
--iid_data : If set True, the data is randomly distributed accross clients. If is set to False, either the dataset itself is non-iid by default (like the EMNIST dataset) or it can be manullay distributed to be non-iid (like the MNIST dataset using parameter num_class_per_client). The default is True in the package, but in the run_mpi.py file the default is False.
--num_class_per_client: If the parameter iid is set to False, we can distribute the data heterogeneously by attributing certain number of classes to each client. For instance if setting --num_class_per_client 2, then each client will only has access to two randomly selected classes' data in the entire training process.

Federated Learning Parameters

To run the training using federated learning setups some main parameters are:

--federated : If set to True the training will be in one of federated learning setups. If not, it will be in a distributed mode using local SGD and with periodic averaging (that could be set using --local_step) and possibly reshuffling after each epoch. The default is False.
--federated_type : defines the type of fderated learning algorithm we want to use. The default is fedavg.
--federated_sync_type : It could be either epoch or local_step and it will be used to determine when to synchronize the models. If set to epoch, then the parameter --num_epochs_per_comm should be set as well. If set to local_step, then the parameter --local_steps should be set. The default is epoch.
--num_comms : Defines the number of communication rounds needed for trainings. This is only for federated learning, while in normal distirbuted mode the number of total iterations should be set either by --num_epochs or --num_iterations, and hence the --stop_criteria should be either epoch or iteration.
--online_client_rate : Defines the ratio of clients that are online and active during each round of communication. This is only for federated learning. The default value is 1.0, which means all clients will be active.

Learning Rate Schedule

Different learning rate schedules can be set using their corresponding parameters. The main parameter is --lr_schedule_scheme, which defines the scheme for learning rate scheduling. For more information about different learning rate schedulers, please see learning.py file.

Examples

Now we provide some simple examples for running some of the training algorithms on a single node with multiple processes using mpi. To do so, we first need to run the docker container with installed dependencies.

docker run --rm -it --mount type=bind,source="{path/to/FedTorch}",target=/FedTorch docker.pkg.github.com/mloptpsu/fedtorch/fedtorch:cuda10.2-mpi
cd /FedTorch

This will run the container and will mount the FedTorch repo to it. The {path/to/FedTorch} should be replaced with your local path to the FedTorch repo directory. Now we can run the training on it.

FedAvg and FedGATE

Now, we can run the FedAvg algorithm for training an MLP model using MNIST data by the following command.

python run_mpi.py -f -ft fedavg -n 10 -d mnist -lg 0.1 -b 50 -c 20 -k 1.0 -fs local_step -l 10 -r 2

This will run the training on 10 nodes with initial learning rate of 0.1, the batch size of 50, for 20 communication rounds each with 10 local steps of SGD. The dataset is distributed hetergeneously with each client has access to only 2 classes of data from the MNIST dataset.

Changing -ft fedavg to -ft fedgate will run the same training using the FedGATE algorithm. To run the FedCOMGATE algorithm we need to add -q to the parameter to enable quantization as well. Hence the command will be:

python run_mpi.py -f -ft fedgate -n 10 -d mnist -lg 0.1 -b 50 -c 20 -k 1.0 -fs local_step -l 10 -r 2 -q

APFL

To run APFL algorithm a simple command will be:

python run_mpi.py -f -ft apfl -n 10 -d mnist -lg 0.1 -b 50 -c 20 -k 1.0 -fs local_step -l 10 -r 2 -pa 0.5 -fp

where -pa 0.5 sets the alpha parameter of the APFL algorithm. The last parameter -fp will turn on the fed_personal parameter, which evaluate the personalized or the localized model using a local validation dataset. This will be mostly used for personalization algorithms such as APFL.

DRFA

To run a DRFA training we can use the following command:

python run_mpi.py -f -fd -ft fedavg -n 10 -d mnist -lg 0.1 -b 50 -c 20 -k 1.0 -fs local_step -l 10 -r 2 -dg 0.1

where -dg 0.1 sets the gamma parameter in the DRFA algorithm. Note that DRFA is a framework that can be run using any federated learning aggergator such as FedAvg or FedGATE. Hence the parameter -fd will enable DRFA training and -ft will define the federated type to be used for aggregation.

References

For this repository there are several different references used for each training procedure. If you use this repository in your research, please cite the following paper:

@article{haddadpour2020federated,
  title={Federated learning with compression: Unified analysis and sharp guarantees},
  author={Haddadpour, Farzin and Kamani, Mohammad Mahdi and Mokhtari, Aryan and Mahdavi, Mehrdad},
  journal={arXiv preprint arXiv:2007.01154},
  year={2020}
}

Our other papers developed using this repository should be cited using the following bibitems:

@inproceedings{haddadpour2019local,
  title={Local sgd with periodic averaging: Tighter analysis and adaptive synchronization},
  author={Haddadpour, Farzin and Kamani, Mohammad Mahdi and Mahdavi, Mehrdad and Cadambe, Viveck},
  booktitle={Advances in Neural Information Processing Systems},
  pages={11082--11094},
  year={2019}
}
@article{deng2020distributionally,
  title={Distributionally Robust Federated Averaging},
  author={Deng, Yuyang and Kamani, Mohammad Mahdi and Mahdavi, Mehrdad},
  journal={Advances in Neural Information Processing Systems},
  volume={33},
  year={2020}
}
@article{deng2020adaptive,
  title={Adaptive Personalized Federated Learning},
  author={Deng, Yuyang and Kamani, Mohammad Mahdi and Mahdavi, Mehrdad},
  journal={arXiv preprint arXiv:2003.13461},
  year={2020}
}

Acknowledgement and Disclaimer

This repository is developed, mainly by MM. Kamani, based on our group's research on distributed and federated learning algorithms. We also developed the works of other groups' proposed methods using FedTorch for a better comparison. However, this repo is not the official code for those methods other than our group's. Some parts of the initial stages of this repository were based on a forked repo of Local SGD code from Tao Lin, which is not public now.

Comments

Errors when running the code using Docker

Hi, I followed your README instructions to run your algorithm but I encountered multiple errors along the way:

I tried to pull the image you provided on This issue #3 and run the following command: python run_mpi.py -f -ft apfl -n 10 -d cifar10 -lg 0.1 -b 50 -c 20 -k 1.0 -fs local_step -l 10 -r 2 -pa 0.5 -fp -oc I keep getting warnings about my RTX 2080ti card which I think is not been recognized.

warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "main.py", line 49, in <module>
    main(args)
  File "main.py", line 21, in main
    client.initialize()
  File "/workspace/fedtorch/fedtorch/nodes/nodes.py", line 44, in initialize
    init_config(self.args)
  File "/workspace/fedtorch/fedtorch/utils/init_config.py", line 36, in init_config
    torch.cuda.set_device(args.graph.device)
  File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 281, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (101) : invalid device ordinal at /workspace/pytorch/torch/csrc/cuda/Module.cpp:59
THCudaCheck FAIL file=/workspace/pytorch/torch/csrc/cuda/Module.cpp line=59 error=101 : invalid device ordinal
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:125: UserWarning:
GeForce RTX 2080 Ti with CUDA capability sm_75 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_35 sm_37 sm_52 sm_60 sm_61 sm_70 compute_70.
If you want to use the GeForce RTX 2080 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "main.py", line 49, in <module>
    main(args)
  File "main.py", line 21, in main
    client.initialize()
  File "/workspace/fedtorch/fedtorch/nodes/nodes.py", line 44, in initialize
    init_config(self.args)
  File "/workspace/fedtorch/fedtorch/utils/init_config.py", line 36, in init_config
    torch.cuda.set_device(args.graph.device)
  File "/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py", line 281, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (101) : invalid device ordinal at /workspace/pytorch/torch/csrc/cuda/Module.cpp:59

Regarding these warnings the code does not collapse but not even one log regarding the training procedure is printed.

In addition when I run nvidia-smi I see the following gpu utilization:

The utilization percentage across all 3 GPUs remains zero.

As second option I tried to build the image given the Dockerfile located in the docker folder. Yet again I encountered the following run error:

Traceback (most recent call last):
  File "main.py", line 4, in <module>
    import torch.distributed as dist
  File "/usr/local/lib/python3.8/dist-packages/torch/__init__.py", line 189, in <module>
    from torch._C import *
RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[51835,1],1]
  Exit code:    1
--------------------------------------------------------------------------

It looks like there is a problem with PyTorch installation so I tried something simple:

Upgrading Numpy's version didn't help me either. Can you please help and explain how to run your code?

Thank you!

opened by AvivSham 5

A question about APFL algorithm

Hi, I read your paper and am interested in the algorithm of APFL. I have a question about the code. It seems that the code of APFL is a little different from the algorithm written on the paper. In the paper, each client maintains 3 models and first update the local version of global model, then update the local model and finally mix them to get the new personalized model. However, in this code, it just maintains 2 models and first update the local version of global model, then get the grad of personalized model by mixing the output of local version of global model and personalized model.

The code: `

inference and get current performance.

                    client.optimizer.zero_grad()
                    loss, performance = inference(client.model, client.criterion, client.metrics, _input, _target)

                    # compute gradient and do local SGD step.
                    loss.backward()
                    client.optimizer.step(
                        apply_lr=True,
                        apply_in_momentum=client.args.in_momentum, apply_out_momentum=False
                    )
                    
                    client.optimizer.zero_grad()
                    client.optimizer_personal.zero_grad()
                    loss_personal, performance_personal = inference_personal(client.model_personal, client.model, 
                                                                             client.args.fed_personal_alpha, client.criterion, 
                                                                             client.metrics, _input, _target)

                    # compute gradient and do local SGD step.
                    loss_personal.backward()
                    client.optimizer_personal.step(
                        apply_lr=True,
                        apply_in_momentum=client.args.in_momentum, apply_out_momentum=False
                    )

` Are they the same? Looking forward to your reply. Thank you!

opened by BrightHaozi 3

a question about fedprox

in this page "https://github.com/MLOPTPSU/FedTorch/blob/main/fedtorch/comms/trainings/federated/main.py". line 123 -> 129, code is below.

elif client.args.federated_type == 'fedprox':
    # Adding proximal gradients and loss for fedprox
    for client_param, server_param in zip(client.model.parameters(), client.model_server.parameters()):
        if client.args.graph.rank == 0:
            print("distance norm for prox is:{}".format(torch.norm(client_param.data - server_param.data )))
        loss += client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)
        client_param.grad.data += client.args.fedprox_mu * (client_param.data - server_param.data)

client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data) may mean $\frac{\mu}{2} \left\|w-w_t\right\|$ . But, $\frac{\mu}{2} \left\|w-w_t\right\|^2$ is used in fedprox.

I'm not sure what I said above is true. Thank you very much for your kind consideration.

opened by bird-two 2

Could you release the code in "Federated Learning with Compression: Unified Analysis and Sharp Guarantees"?

Hi,

I just read the paper of "Federated Learning with Compression: Unified Analysis and Sharp Guarantees", it looks very interesting While searching for the code I was directed to this repo, however, I didn't found the implementation of FedCOM, so I would like to ask if you would release it?

Thanks!

opened by Dzhange 1
Does BrokenPipeError Matters？

Hi, I am very interested in your proposed method. It is really nice of you to share such a helpful and clear repo. I follow your repo and use your docker images to run the code, but I keep getting some errors as follows after some random starting round of the traning.

Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/queues.py", line 245, in _feed send_bytes(obj) File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/usr/lib/python3.8/multiprocessing/connection.py", line 411, in _send_bytes self._send(header + buf) File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe

After all, I still can get a result, but I wonder whether the brokenpipeerror matters, am I still getting the correct results? Hope to hear your response soon.

opened by RongDai430 1
No basic auth credentials

I read your paper and am interested in your proposed method. It is really nice of you to share such a helpful and clear repo. I pulled one of the images built with CUDA 10.2 and OpenMPI 4.0.1 with CUDA support and PyTorch 1.6.0, using the following command: $ docker pull docker.pkg.github.com/mloptpsu/fedtorch/fedtorch:cuda10.2-mpi

However it reported an error "Error response from daemon: Head https://docker.pkg.github.com/v2/mloptpsu/fedtorch/fedtorch/manifests/cuda10.2-mpi: no basic auth credentials".
How can I solve this error?

opened by Sybil-Huang 1
Adding EMNIST full dataset

Adding EMNIST full dataset in addition to its digits_only version, which was implemented before. Now, we have emnist and emnist_full datasets. The emnist dataset has 3383 clients with 10 classes, but the emnist_full has 3400 clients with 62 classes. Resolve the Batch Norm issue for training with 1 sample in the batch.

opened by mmkamani7 1
setup problem in docker

Hello,when I ran docker pull docker.pkg.github.com/mloptpsu/fedtorch/fedtorch:cuda10.2-mpi in my docker,there's a problem do you have any ideas to solve it? Thanks.

opened by b856741 0
A question about drawing the plots

Hi, Recently I've read your paper "Distributionally Robust Federated Averaging" carefully and I'm trying to reproduce the experiments in it. However, I don't know how to get the plots from the codes, I think modifying main.py and run_mpi.py would help but it seems really complicated. Could you please give me some instructions? Thanks!

opened by Zhg9300 2
Models not saved for Federated Centered

Hi While running main_centered.py the model checkpoints not saved. It is not Implemented?

Is there anything else missing for main_centered.py?

Regards, Ahmed
enhancement

opened by aldahdooh 1

Releases(v0.1.1)

v0.1.1(Feb 1, 2021)
In this release, the dependency on the Tensorflow Federated for accessing its datasets is dropped 🎉. Now we can get the dataset from their provided URL and process it there.

Also, the EMNIST dataset using its full 62 classes is updated in all modules and makes it available for training.

Source code(tar.gz)
Source code(zip)
v0.1.0(Jan 18, 2021)
This is the first public release of FedTorch. In this repository, we develop different distributed and federated learning schema for the training of machine learning models. It uses PyTorch Distributed API for running on multiple nodes and/or processes in parallel. Some of the main algorithms in this package:

Distributed Sync-SGD

Distributed Local SGD

Local SGD with Adaptive Synchoronization (LUPA-SGD)

Adaptive Personalized Federated Learning (APFL)

Distributionally Robust Federated Learning (DRFA)

Federated Learning with Gradient Tracking and Compression (FedGATE and FedCOMGATE)

FedAvg

SCAFFOLD

Qsparse Local SGD

AFL

FedProx

Different models for training:

Convex:

Least Squared Linear Regression

Logistic Regression

Robust Regression

Robust Logistic Regression

Non-Convex:

MLP

CNN

RNN

ResNet

DenseNet

WideResNet

Robust MLP

Datasets included in this package, with iid and non-iid distribution mechanisms:

MNIST

CIFAR10 and CIFAR100

Fashion MNIST

EMNIST and FEMNIST

STL10

Epsilon

Year Prediction MSD

RCV1

Higgs

Synthetic Dataset

Adult

Shakespear Dataset

Source code(tar.gz)
Source code(zip)

Owner

Machine Learning and Optimization Lab @PennState

This is the GitHub repository of the Machine Learning and Optimization lab at Penn State University.

GitHub Repository

A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor

Phase-SLAM A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor This open source is written by MATLAB Run Mode Open

14 Dec 19, 2022

A PyTorch Lightning solution to training OpenAI's CLIP from scratch.

train-CLIP 📎 A PyTorch Lightning solution to training CLIP from scratch. Goal ⚽ Our aim is to create an easy to use Lightning implementation of OpenA

396 Dec 30, 2022

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

SOLQ: Segmenting Objects by Learning Queries This repository is an official implementation of the paper SOLQ: Segmenting Objects by Learning Queries.

179 Jan 02, 2023

BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization

BOOKSUM: A Collection of Datasets for Long-form Narrative Summarization Authors: Wojciech Kryściński, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong,

125 Dec 31, 2022

GemNet model in PyTorch, as proposed in "GemNet: Universal Directional Graph Neural Networks for Molecules" (NeurIPS 2021)

GemNet: Universal Directional Graph Neural Networks for Molecules Reference implementation in PyTorch of the geometric message passing neural network

124 Dec 30, 2022

Progressive Image Deraining Networks: A Better and Simpler Baseline

Progressive Image Deraining Networks: A Better and Simpler Baseline [arxiv] [pdf] [supp] Introduction This paper provides a better and simpler baselin

190 Dec 01, 2022

This is the code of NeurIPS'21 paper "Towards Enabling Meta-Learning from Target Models".

ST This is the code of NeurIPS 2021 paper "Towards Enabling Meta-Learning from Target Models". If you use any content of this repo for your work, plea

7 Dec 06, 2022

GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification

GalaXC GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification @InProceedings{Saini21, author = {Saini, D. and Jain,

28 Dec 05, 2022

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)

1k Jan 03, 2023

Deep Learning Based Fasion Recommendation System for Ecommerce

Project Name: Fasion Recommendation System for Ecommerce A Deep learning based streamlit web app which can recommened you various types of fasion prod

13 Dec 13, 2022

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition

Zen-NAS: A Zero-Shot NAS for High-Performance Deep Image Recognition How Fast Compare to Other Zero-Shot NAS Proxies on CIFAR-10/100 Pre-trained Model

190 Dec 29, 2022

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

6.5k Jan 04, 2023

Time Series Forecasting with Temporal Fusion Transformer in Pytorch

Forecasting with the Temporal Fusion Transformer Multi-horizon forecasting often contains a complex mix of inputs – including static (i.e. time-invari

6 Jan 24, 2022

PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation

deep-hist PyTorch implementation of Histogram Layers from DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation PyT

10 Dec 06, 2022

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

82 Nov 29, 2022