Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs

Last update: Nov 10, 2022

Overview

Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs

In this work, we propose an algorithm DP-SCAFFOLD(-warm), which is a new version of the so-called SCAFFOLD algorithm ( warm version : wise initialisation of parameters), to tackle heterogeneity issues under mathematical privacy constraints known as Differential Privacy (DP) in a federated learning framework. Using fine results of DP theory, we have succeeded in establishing both privacy and utility guarantees, which show the superiority of DP-SCAFFOLD over the naive algorithm DP-FedAvg. We here provide numerical experiments that confirm our analysis and prove the significance of gains of DP-SCAFFOLD especially when the number of local updates or the level of heterogeneity between users grows.

Two datasets are studied:

a real-world dataset called Femnist (an extended version of EMNIST dataset for federated learning), which you see the Accuracy growing with the number of communication rounds (50 local updates first and then 100 local updates)

synthetic data called Logistic for logistic regression models, which you see the train loss decreasing with the number of communication rounds (50 local updates first and then 100 local updates),

Significant results are available for both of these datasets for logistic regression models.

Structure of the code

main.py: four global options are available.
- generate: to generate data, introduce heterogeneity, split data between users for federated learning and preprocess data
- optimum (after generate): to run a phase training with unsplitted data and save the "best" empirical model in a centralized setting to properly compare rates of convergence
- simulation (after generate and optimum): to run several simulations of federated learning and save the results (accuracy, loss...)
- plot (after simulation): to plot visuals

./data

Contains generators of synthetic (Logistic) and real-world (Femnist) data ( file data_generator.py), designed for a federated learning framework under some similarity parameter. Each folder contains a file data where the generated data (train and test) is stored.

./flearn

differential_privacy : contains code to apply Gaussian mechanism (designed to add differential privacy to mini-batch stochastic gradients)
optimizers : contains the optimization framework for each algorithm (adaptation of stochastic gradient descent)
servers : contains the super class Server (in server_base.py) which is adapted to FedAvg and SCAFFOLD (algorithm from the point of view of the server)
trainmodel : contains the learning model structures
users : contains the super class User (in user_base.py) which is adapted to FedAvg and SCAFFOLD ( algorithm from the point of view of any user)

./models

Stores the latest models over the training phase of federated learning.

./results

Stores several metrics of convergence for each simulation, each similarity/privacy setting and each algorithm.

Metrics (evaluated at each round of communication):

test accuracy over all users,
train loss over all users,
highest norm of parameter difference (server/user) over all selected users,
train gradient dissimilarity over all users.

Software requirements:

To download the dependencies: pip install -r requirements.txt

References

Code (main structure): https://github.com/ramshi236/Accelerated-Federated-Learning-Over-MAC-in-Heterogeneous-Networks
Code (utils/autograd_hacks.py): https://github.com/cybertronai/autograd-hacks/blob/master/autograd_hacks.py
SCAFFOLD & FedAvg paper: https://arxiv.org/abs/1910.06378
Generation of Logistic data and introduction of heterogeneity: https://arxiv.org/abs/1812.06127
Creation of dissimilarity for FEMNIST data: https://arxiv.org/abs/1909.06335

Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs

Related tags

Overview

Differential Privacy for Heterogeneous Federated Learning : Utility & Privacy tradeoffs

Structure of the code

./data

./flearn

./models

./results

Software requirements:

References

Owner

The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

Pytorch implementation of PTNet for high-resolution and longitudinal infant MRI synthesis

Large scale and asynchronous Hyperparameter Optimization at your fingertip.

Code for the paper "Location-aware Single Image Reflection Removal"

pytorch implementation of Attention is all you need

An Image compression simulator that uses Source Extractor and Monte Carlo methods to examine the post compressive effects different compression algorithms have.

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

This repository contains implementations and illustrative code to accompany DeepMind publications

Code for "ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on", accepted at WACV 2021 Generation of Human Behavior Workshop.

Computer Vision and Pattern Recognition, NUS CS4243, 2022

Python Library for Signal/Image Data Analysis with Transport Methods

TensorFlow implementation of the paper "Hierarchical Attention Networks for Document Classification"

Nerf pl - NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning

nn_builder lets you build neural networks with less boilerplate code

Implementation of the Chamfer Distance as a module for pyTorch

A toolkit for document-level event extraction, containing some SOTA model implementations

This is the second place solution for : UmojaHack Africa 2022: African Snake Antivenom Binding Challenge

vit for few-shot classification

DEMix Layers for Modular Language Modeling

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.