Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

Last update: Sep 18, 2022

Overview

VFedPCA+VFedAKPCA

This is the official source code for the Paper: Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework.

Despite enormous research interest and rapid application of federated learning (FL) to various areas, existing studies mostly focus on supervised federated learning under the horizontally partitioned local dataset setting. This paper will study the unsupervised FL under the vertically partitioned dataset setting.

Server-Clients Architecture

Figure: Server-Clients Architecture

Master Branch

VFedPCA+VFedAKPCA                    
└── case                        // Case Studies
    └── figs                    // Save experimental results' figures in '.eps' / '.png' format 
        ├── img_name*.eps              
        └── img_name*.png           
    ├── main.py          
    ├── model.py              
    └── utils.py                 
├── dataset                     // Put downloaded dataset in this folder
└── figs                        // Save experimental results' figures in '.eps' / '.png' format
    ├── img_name*.eps              
    └── img_name*.png           
├── README.md               
├── main.py                     // Experiment on Structured Dataset
├── model.py                   
└── utils.py

Environments

python = 3.8.8
numpy = 1.20.1
pandas = 1.2.4
scikit-learn = 0.24.1
scipy = 1.6.2
imageio = 2.9.0

Prepare Dataset

To demonstrate the superiority of our method, we utilized FIVE types of real-world datasets coming with distinct nature.

structured datasets from different domains;
medical image dataset;
face image dataset;
gait image dataset;
person re-identification image dataset.

Step 1: Download Dataset from the Google Drive URL

Step 2: Specify Dataset Path by Command Argument

$ python main.py --data_path="./dataset/xxx"

Experiments

We conduct extensive experiments on structured datasets to exmaines the effect of feature size, local iterations, warm-start power iterations, and weight scaling method on structed datasets. Furthermore, we investigate some case studies with image dataset to demonstrate the effectiveness of VFedPCA and VFedAKPCA.

A. Experiment on Structured Dataset

First, you need to choose the dataset.

python main.py --data_path './dataset/College.csv' --batch_size 160

Then, you only need to set different flag, p_list, iter_list and sampler_num to exmaines the effect of feature size, local iterations, warm-start power iterations, and weight scaling method on structed datasets. The example is as follows.

flag ='clients'
p_list = [3, 5, 10]         # the number of involved clients
iter_list = [100, 100, 100] # the number of local power iterations
sampler_num = 5

B. Case Studies

python main.py --data_path '../dataset/Image/DeepLesion' /
               --client_num 8 / 
               --iterations 100 / 
               --re_size 512

Citation

@inproceedings{
title = {{Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data}},
author = {Yiu-ming Cheung, Fellow, IEEE, Feng Yu, and Jian Lou},
year = 2021
}

Vertical Federated Principal Component Analysis and Its Kernel Extension on Feature-wise Distributed Data based on Pytorch Framework

Related tags

Overview

VFedPCA+VFedAKPCA

Server-Clients Architecture

Master Branch

Environments

Prepare Dataset

Experiments

A. Experiment on Structured Dataset

B. Case Studies

Citation

Owner

John

Python implementation of Project Fluent

Fashion Recommender System With Python

PyTorch implementation of TSception V2 using DEAP dataset

Utilities and information for the signals.numer.ai tournament

Zero-Shot Text-to-Image Generation VQGAN+CLIP Dockerized

A big endian Gentoo port developed on a Pine64.org RockPro64

Code for MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks

The Agriculture Domain of ERPNext comes with features to record crops and land

Implementation of the paper All Labels Are Not Created Equal: Enhancing Semi-supervision via Label Grouping and Co-training

SimBERT升级版（SimBERTv2）！

AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty

[3DV 2020] PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction

Efficient Two-Step Networks for Temporal Action Segmentation (Neurocomputing 2021)

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

Implementations of the algorithms in the paper Approximative Algorithms for Multi-Marginal Optimal Transport and Free-Support Wasserstein Barycenters

Fast mesh denoising with data driven normal filtering using deep variational autoencoders

PyTorch Implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedding (ORAL, MICCAIW 2021)

This is the source code for generating the ASL-Skeleton3D and ASL-Phono datasets. Check out the README.md for more details.