A Python 3 package for state-of-the-art statistical dimension reduction methods

Last update: Dec 14, 2022

Related tags

Overview

`direpack`: a Python 3 library for state-of-the-art statistical dimension reduction techniques

This package delivers a scikit-learn compatible Python 3 package for sundry state-of-the art multivariate statistical methods, with a focus on dimension reduction.

The categories of methods delivered in this package, are:

Projection pursuit dimension reduction (ppdire)
Sufficient dimension reduction (sudire)
Robust M-estimators for dimension reduction (sprm) each of which are presented as scikit-learn compatible objects in the corresponding folders.

We hope that this package leads to scientific success. If it does so, we kindly ask to cite the direpack vignette [0], as well as the original publication of the corresponding method.

The package also contains a set of tools for pre- and postprocessing:

The preprocessing folder provides classical and robust centring and scaling, as well as spatial sign transforms [4]
The dicomo folder contains a versatile class to access a wide variety of moment and co-moment statistics, and statistics derived from those. Check out the dicomo Documentation file and the dicomo Examples Notebook.
Plotting utilities in the plot folder
Cross-validation utilities in the cross-validation folder

Methods in the `sprm` folder

The estimator (sprm.py) [1]
The Sparse NIPALS (SNIPLS) estimator [3](snipls.py)
Robust M regression estimator (rm.py)
Ancillary functions for M-estimation (_m_support_functions.py)

Methods in the `ppdire` folder

The ppdire class will give access to a wide range of projection pursuit dimension reduction techniques. These include slower approximate estimates for well-established methods such as PCA, PLS and continuum regression. However, the class provides unique access to a set of robust options, such as robust continuum regression (RCR) [5], through its native grid optimization algorithm, first published for RCR as well [6]. Moreover, ppdire is also a great gateway to calculate generalized betas, using the CAPI projection index [7].

The code is orghanized in

ppdire.py - the main PP dimension reduction class
capi.py - the co-moment analysis projection index.

Methods in the `sudire` folder

The sudire folder gives access to an extensive set of methods that resort under the umbrella of sufficient dimension reduction. These range from meanwhile long-standing, well-accepted approaches, such as sliced inverse regression (SIR) and the closely related SAVE [8,9], through methods such as directional regression [10] and principal Hessian directions [11], and more. However, the package also contains some of the most recently developed, state-of-the-art sufficient dimension reduction techniques, that require no distributional assumptions. The options provided in this category are based on energy statistics (distance covariance [12] or martingale difference divergence [13]) and ball statistics (ball covariance) [14]. All of these options can be called by setting the corresponding parameters in the sudire class, cf. the docs. Note: the ball covariance option will require some lines to be uncommented as indicated. We decided not to make that option generally available, since it depends on the Ball package that seems to be difficult to install on certain architectures.

How to install

The package is distributed through PyPI, so install through:

    pip install direpack

Note that some of the key methods in the sudire subpackage rely on the IPOPT optimization package, which according to their recommendation, can best be installed directly as:

    conda install -c conda-forge cyipopt

Documentation

Detailed documentation can be found in the ReadTheDocs page.
A more extensive description on the background is presented in the direpack vignette.
Examples on how to use each of the dicomo, ppdire, sprm and sudire classes are presented as Jupyter notebooks in the examples folder
Furthemore, the docs folder contains a few markdown files on usage of the classes.

References

direpack: A Python 3 package for state-of-the-art statistical dimension reduction methods
Sparse partial robust M regression, Irene Hoffmann, Sven Serneels, Peter Filzmoser, Christophe Croux, Chemometrics and Intelligent Laboratory Systems, 149 (2015), 50-59.
Partial robust M regression, Sven Serneels, Christophe Croux, Peter Filzmoser, Pierre J. Van Espen, Chemometrics and Intelligent Laboratory Systems, 79 (2005), 55-64.
Sparse and robust PLS for binary classification, I. Hoffmann, P. Filzmoser, S. Serneels, K. Varmuza, Journal of Chemometrics, 30 (2016), 153-162.
Spatial Sign Preprocessing: A Simple Way To Impart Moderate Robustness to Multivariate Estimators, Sven Serneels, Evert De Nolf, Pierre J. Van Espen, Journal of Chemical Information and Modeling, 46 (2006), 1402-1409.
Robust Continuum Regression, Sven Serneels, Peter Filzmoser, Christophe Croux, Pierre J. Van Espen, Chemometrics and Intelligent Laboratory Systems, 76 (2005), 197-204.
Robust Multivariate Methods: The Projection Pursuit Approach, Peter Filzmoser, Sven Serneels, Christophe Croux and Pierre J. Van Espen, in: From Data and Information Analysis to Knowledge Engineering, Spiliopoulou, M., Kruse, R., Borgelt, C., Nuernberger, A. and Gaul, W., eds., Springer Verlag, Berlin, Germany, 2006, pages 270--277.
Projection pursuit based generalized betas accounting for higher order co-moment effects in financial market analysis, Sven Serneels, in: JSM Proceedings, Business and Economic Statistics Section. Alexandria, VA: American Statistical Association, 2019, 3009-3035.
Sliced Inverse Regression for Dimension Reduction Li K-C, Journal of the American Statistical Association (1991), 86, 316-327.
Sliced Inverse Regression for Dimension Reduction: Comment, R.D. Cook, and Sanford Weisberg, Journal of the American Statistical Association (1991), 86, 328-332.
On directional regression for dimension reduction , B. Li and S.Wang, Journal of the American Statistical Association (2007), 102:997–1008.
On principal hessian directions for data visualization and dimension reduction:Another application of stein’s lemma, K.-C. Li. , Journal of the American Statistical Association(1992)., 87,1025–1039.
Sufficient Dimension Reduction via Distance Covariance, Wenhui Sheng and Xiangrong Yin in: Journal of Computational and Graphical Statistics (2016), 25, issue 1, pages 91-104.
A martingale-difference-divergence-based estimation of central mean subspace, Yu Zhang, Jicai Liu, Yuesong Wu and Xiangzhong Fang, in: Statistics and Its Interface (2019), 12, number 3, pages 489-501.
Robust Sufficient Dimension Reduction Via Ball Covariance Jia Zhang and Xin Chen, Computational Statistics and Data Analysis 140 (2019) 144–154

Release Notes can be checked out in the repository.

A list of possible topics for further development is provided as well. Additions and comments are welcome!

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

A Critical Assessment of State-of-the-Art in Entity Alignment This repository contains the source code for the paper A Critical Assessment of State-of

16 Oct 14, 2022

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

349 Dec 8, 2022

State of the art Semantic Sentence Embeddings

Contrastive Tension State of the art Semantic Sentence Embeddings Published Paper · Huggingface Models · Report Bug Overview This is the official code

88 Dec 30, 2022

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models. Developers can reproduce these SOTA methods and build their own methods.

405 Jan 4, 2023

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

State-of-the-art data augmentation search algorithms in PyTorch

MuarAugment Description MuarAugment is a package providing the easiest way to a state-of-the-art data augmentation pipeline. How to use You can instal

43 Dec 12, 2022

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Introduction This is the unofficial code of Deep Dual-re

113 Dec 23, 2022

A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A selection of State Of The Art research papers (and code) on human trajectory prediction (forecasting). Papers marked with [W] are workshop papers.

40 Nov 18, 2022

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

CSL-YOLO: A New Lightweight Object Detection System for Edge Computing This project provides a SOTA level lightweight YOLO called "Cross-Stage Lightwe

54 Dec 21, 2022

Comments

`p` should never be smaller than `n_components` in `sprm.fit`
The variable p should never be smaller than n_components in sprm.fit otherwise an error occurs. This is checked for at the top of fit but p can be redefined at line 185.

Inserting as line 186:

self.n_components = min(p, self.n_components)

...appears to fix the issue, but I have not done extensive testing. It may also be advisable to raise a warning if n_components is reduced in this way.
opened by MattWenham 5
gsspp.GenSpatialSignPrePprocessor().transform() is not working

Dear sirs,

I like to make spatial sign transform for my data when I come across your module and found it won't work. My codes is as the following:

scaler = gsspp.GenSpatialSignPrePprocessor(center = 'kstepLTS', fun = 'ball').fit(X_train) X_scaled = scaler.transform(X_train)

It won't work for scaler don't have the transform method due to no object type is defined which makes it no attribute or method bestowed upon. The error message is as the following:

AttributeError: 'NoneType' object has no attribute 'transform'

maurice

opened by shinhongwu 2

coef_ attribute expected but missing when using ppdire

Below is a reproducible code for the error. The cells with # NB code are code blocks while the other are jupyter outputs.

# NB code
import numpy as np
from direpack import dicomo, ppdire

X = np.random.rand(5,5)

reducer = ppdire(
    projection_index = dicomo,
    # mode of projection_index class defines dim reduction 'method'
    pi_arguments = {'mode' : 'var'},
    n_components=4,
    optimizer='SLSQP'
)
reducer.fit(X)
reducer.x_loadings_

array([[-0.36157257,  0.59084429,  0.31816485, -0.13799567],
       [-0.59046145, -0.14633256,  0.28087908, -0.57627361],
       [ 0.52330409,  0.27622013, -0.27929959, -0.75601132],
       [ 0.09839508,  0.72132604,  0.11781207,  0.27450752],
       [-0.48692072,  0.18133122, -0.85322337,  0.04425411]])

# NB code
reducer.transform(X)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_63144/911793123.py in <module>
----> 1 reducer.transform(X)

~/.conda/envs/prod3/lib/python3.9/site-packages/direpack/ppdire/ppdire.py in transform(self, Xn)
    759         Xn = convert_X_input(Xn)
    760         (n,p) = Xn.shape
--> 761         if p!= self.coef_.shape[0]:
    762             raise(ValueError('New data must have seame number of columns as the ones the model has been trained with'))
    763         Xnc = scale_data(Xn,self.x_loc_,self.x_sca_)

AttributeError: 'ppdire' object has no attribute 'coef_'

I looked into the code and the issue seems to come from this attribute only being created in there is no flag one-block.

but a data check on the transform and predict functions uses that attribute.

opened by nikml 1

A possible mistake in the estimation basis of SDR

Thanks for the package you provide, and I found a confusing problem. in src/direpack/sudire/sudire.py Line 489. When using scale, x_loadings should be set to N2 multiply P, not P, because we do scale. I notice you intended to do so in Line225 in src/direpack/sudire/_sudire_utils.py (take SIR for example), but x passed to this function has already been scaled, so variable "signsqrt" in this function is always identity matrix, which can not function as we want.

opened by I-zhouqh 1

Releases(1.0.25)

1.0.25(Dec 25, 2022)

ReadTheDocs updated
Source code(tar.gz)
Source code(zip)
1.0.24(Dec 11, 2022)

Update documentation
Source code(tar.gz)
Source code(zip)
1.0.23b(Oct 22, 2022)
option to use IPOPT and differential evolution as optimizers in ppdire

linting

Source code(tar.gz)
Source code(zip)
1.0.23(Oct 22, 2022)
option to use IPOPT and differential evolution as optimizers in ppdire

linting

Source code(tar.gz)
Source code(zip)
1.0.22(Oct 9, 2022)

continued sunsetting of np.matrix
Source code(tar.gz)
Source code(zip)
1.0.21(Oct 9, 2022)

avoid obsolete np.matrix and substitute for np.array in dicomo, ppdire
Source code(tar.gz)
Source code(zip)
1.0.20b(Aug 1, 2022)
updated notebooks

minor updates for Py 3.10

Source code(tar.gz)
Source code(zip)
1.0.20(Aug 1, 2022)

updated notebooks minor fixes for Py 3.10
Source code(tar.gz)
Source code(zip)
1.0.19(Sep 13, 2021)

Fix issue that leads to nan scale in sprm when autoscaling with scaleTau2
Source code(tar.gz)
Source code(zip)
1.0.18(May 20, 2021)

fix in sudire
Source code(tar.gz)
Source code(zip)
1.0.17(Apr 26, 2021)

More documentation formatting adjusted
Source code(tar.gz)
Source code(zip)
1.0.16(Apr 26, 2021)

Changed some formatting in documentation.
Source code(tar.gz)
Source code(zip)
1.0.15(Apr 26, 2021)

fixed name in preprocessing
Source code(tar.gz)
Source code(zip)
1.0.14(Apr 26, 2021)

Source code(tar.gz)
Source code(zip)
1.0.13(Apr 25, 2021)

Readthedocs updated
Source code(tar.gz)
Source code(zip)
1.0.12g(Apr 15, 2021)

Source code(tar.gz)
Source code(zip)
1.0.12f(Apr 15, 2021)

Source code(tar.gz)
Source code(zip)
1.0.12e(Apr 15, 2021)

Source code(tar.gz)
Source code(zip)
1.0.12d(Apr 15, 2021)

Source code(tar.gz)
Source code(zip)
1.0.12c(Apr 15, 2021)

Source code(tar.gz)
Source code(zip)
1.0.12b(Apr 15, 2021)

Source code(tar.gz)
Source code(zip)
1.0.12(Apr 15, 2021)

Source code(tar.gz)
Source code(zip)
1.0.11(Apr 7, 2021)

Documentation restructured for go-live of readthedocs.io page.
Source code(tar.gz)
Source code(zip)
1.0.10(Dec 24, 2020)
reduced the use of np.matrix in the sprm branch as recommended by numpy

fixed a bug to call the GenSpatialSignPreProcessor

Source code(tar.gz)
Source code(zip)
1.0.9b(Dec 21, 2020)

with updated version id
Source code(tar.gz)
Source code(zip)
1.0.9(Dec 21, 2020)

Added code for the martingale difference divergence matrix
Source code(tar.gz)
Source code(zip)
1.0.8(Sep 25, 2020)

documentation update
Source code(tar.gz)
Source code(zip)
1.0.7(Sep 15, 2020)

Source code(tar.gz)
Source code(zip)
1.0.6(Aug 26, 2020)

Source code(tar.gz)
Source code(zip)
1.0.5b(Aug 8, 2020)

Source code(tar.gz)
Source code(zip)

Owner

Sven Serneels

I Presently manage a team on stats, machine learning and AI. On the side, avid method developer for high dimensional stats and machine learning.

GitHub Repository

Experiments and examples converting Transformers to ONNX

Experiments and examples converting Transformers to ONNX This repository containes experiments and examples on converting different Transformers to ON

4 Dec 24, 2022

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

589 Jan 02, 2023

SlotRefine: A Fast Non-Autoregressive Model forJoint Intent Detection and Slot Filling

SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling Reference Main paper to be cited (Di Wu et al., 2020) @article

34 Nov 03, 2022

This Jupyter notebook shows one way to implement a simple first-order low-pass filter on sampled data in discrete time.

How to Implement a First-Order Low-Pass Filter in Discrete Time We often teach or learn about filters in continuous time, but then need to implement t

4 Aug 24, 2022

Node for thenewboston digital currency network.

Project setup For project setup see INSTALL.rst Community Join the community to stay updated on the most recent developments, project roadmaps, and ra

27 Jul 08, 2022

GAN-generated image detection based on CNNs

GAN-image-detection This repository contains a GAN-generated image detector developed to distinguish real images from synthetic ones. The detector is

17 Dec 15, 2022

Fit Fast, Explain Fast

FastExplain Fit Fast, Explain Fast Installing pip install fast-explain About FastExplain FastExplain provides an out-of-the-box tool for analysts to

8 Dec 15, 2022

This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

47 Dec 28, 2022

🔊 Audio and fastai v2

Fastaudio An audio module for fastai v2. We want to help you build audio machine learning applications while minimizing the need for audio domain expe

152 Dec 28, 2022

Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Poisson Surface Reconstruction for LiDAR Odometry and Mapping Surfels TSDF Our Approach Table: Qualitative comparison between the different mapping te

305 Dec 21, 2022

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

English | 简体中文 Welcome to the PaddlePaddle GitHub. PaddlePaddle, as the only independent R&D deep learning platform in China, has been officially open

19.4k Jan 04, 2023

Teaching end to end workflow of deep learning

Deep-Education This repository is now available for public use for teaching end to end workflow of deep learning. This implies that learners/researche

2 Sep 26, 2022

[CVPR 2016] Unsupervised Feature Learning by Image Inpainting using GANs

Context Encoders: Feature Learning by Inpainting CVPR 2016 [Project Website] [Imagenet Results] Sample results on held-out images: This is the trainin

829 Dec 31, 2022

Procedural 3D data generation pipeline for architecture

Synthetic Dataset Generator Authors: Stanislava Fedorova Alberto Tono Meher Shashwat Nigam Jiayao Zhang Amirhossein Ahmadnia Cecilia bolognesi Dominik

49 Nov 25, 2022

Official repository for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'21, Oral Presentation)

Official PyTorch Implementation for HOTR: End-to-End Human-Object Interaction Detection with Transformers (CVPR'2021, Oral Presentation) HOTR: End-to-

114 Nov 28, 2022

Food Drinks and groceries Images Multi Lingual (FooDI-ML) dataset.

41 Jan 04, 2023

Official PyTorch Implementation of Learning Self-Similarity in Space and Time as Generalized Motion for Video Action Recognition, ICCV 2021

26 Dec 07, 2022

Code for our paper "Multi-scale Guided Attention for Medical Image Segmentation"

Medical Image Segmentation with Guided Attention This repository contains the code of our paper: "'Multi-scale self-guided attention for medical image

394 Dec 28, 2022

Multi-Modal Machine Learning toolkit based on PaddlePaddle.

简体中文 | English PaddleMM 简介飞桨多模态学习工具包 PaddleMM 旨在于提供模态联合学习和跨模态学习算法模型库，为处理图片文本等多模态数据提供高效的解决方案，助力多模态学习应用落地。近期更新 2022.1.5 发布 PaddleMM 初始版本 v1.0 特性丰富的任务

520 Dec 28, 2022

This repository attempts to replicate the SqueezeNet architecture and implement the same on an image classification task.

SqueezeNet-Implementation This repository attempts to replicate the SqueezeNet architecture using TensorFlow discussed in the research paper: "Squeeze

3 Dec 13, 2022

A Python 3 package for state-of-the-art statistical dimension reduction methods

Related tags

Overview

direpack: a Python 3 library for state-of-the-art statistical dimension reduction techniques

Methods in the sprm folder

Methods in the ppdire folder

Methods in the sudire folder

How to install

Documentation

References

You might also like...

Code for paper "A Critical Assessment of State-of-the-Art in Entity Alignment" (https://arxiv.org/abs/2010.16314)

Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

State of the art Semantic Sentence Embeddings

LaneDet is an open source lane detection toolbox based on PyTorch that aims to pull together a wide variety of state-of-the-art lane detection models

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

State-of-the-art data augmentation search algorithms in PyTorch

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

A selection of State Of The Art research papers (and code) on human locomotion (pose + trajectory) prediction (forecasting)

A state of the art of new lightweight YOLO model implemented by TensorFlow 2.

Comments

`p` should never be smaller than `n_components` in `sprm.fit`

gsspp.GenSpatialSignPrePprocessor().transform() is not working

coef_ attribute expected but missing when using ppdire

A possible mistake in the estimation basis of SDR

Releases(1.0.25)

1.0.25(Dec 25, 2022)

1.0.24(Dec 11, 2022)

1.0.23b(Oct 22, 2022)

1.0.23(Oct 22, 2022)

1.0.22(Oct 9, 2022)

1.0.21(Oct 9, 2022)

1.0.20b(Aug 1, 2022)

1.0.20(Aug 1, 2022)

1.0.19(Sep 13, 2021)

1.0.18(May 20, 2021)

1.0.17(Apr 26, 2021)

1.0.16(Apr 26, 2021)

1.0.15(Apr 26, 2021)

1.0.14(Apr 26, 2021)

1.0.13(Apr 25, 2021)

1.0.12g(Apr 15, 2021)

1.0.12f(Apr 15, 2021)

1.0.12e(Apr 15, 2021)

1.0.12d(Apr 15, 2021)

1.0.12c(Apr 15, 2021)

1.0.12b(Apr 15, 2021)

1.0.12(Apr 15, 2021)

1.0.11(Apr 7, 2021)

1.0.10(Dec 24, 2020)

1.0.9b(Dec 21, 2020)

1.0.9(Dec 21, 2020)

1.0.8(Sep 25, 2020)

1.0.7(Sep 15, 2020)

1.0.6(Aug 26, 2020)

1.0.5b(Aug 8, 2020)