Code for the paper "There is no Double-Descent in Random Forests"

Overview

Code for the paper "There is no Double-Descent in Random Forests"

This repository contains the code to run the experiments for our paper called "There is no Double-Descent in Random Forest". In the paper we highlight experiments on the 5 different datasets (adult, bank, eeg, magic, nomao), but this implementation also supports more datasets out of the box. Most of the code should be somewhat commented and self-explanatory given the two caveats below. To run the experiments simply clone this repository

[email protected]:sbuschjaeger/rf-double-descent.git

(Optional) Build the conda environment and activate it:

conda env creat -f environment.yml --force
conda activate rfdd

Run experiments on the adult dataset with M = 256 trees over a 5 fold cross validation with different number of max_nodes with 96 threads:

./run.py -x 5 -M 256 --n_jobs 96 --max_nodes 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 -d adult

Important 1: This will run all experiments with 96 threads. The experiments are executed in a multiprocessing.Pool environment which means that the entire dataset is copied for each cross-validation run. Hence this may take a decent amount of memory (up to 200GB) and some time.

Important 2: The command-line argument n_jobs only determines the total number of threads in the processing pool, but not the total number of threads used by this script. We currently supply n_jobs = n_jobs_per_forest = None to scikit-learns RandomForestClassifier when fitting the (initial) RF. Hence, scikit-learn uses a heuristic to choose the number of jobs used for fitting the RF. If required, then you can set n_jobs_per_forest in the script manually (line 132).

Important 3: Datasets which are not found in the tempfolder (issued by tempfile.gettmpdir() which likely points to /tmp on Linux systems) are automatically downloaded. If you have already downloaded the datasets or you simply do not like the temp folder you can set this via --tmpdir ${your_new_tmp_dir}.

Plot the results on the adult dataset and store the them in the current folder:

./plot.py -d adult -o .

Alternativley, plot.py is also divided into execution cells which you can run via an inline interpreter (e.g. VSCode or a Juypter Notebook).

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view. Center-based 3D Object Detection and Tracking, Tianwei Yin, Xin

Tianwei Yin 134 Dec 23, 2022
Face Recognition & AI Based Smart Attendance Monitoring System.

In today’s generation, authentication is one of the biggest problems in our society. So, one of the most known techniques used for authentication is h

Sagar Saha 1 Jan 14, 2022
Official implementation of "Watermarking Images in Self-Supervised Latent-Spaces"

🔍 Watermarking Images in Self-Supervised Latent-Spaces PyTorch implementation and pretrained models for the paper. For details, see Watermarking Imag

Meta Research 32 Dec 13, 2022
Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation

Info This is the code repository of the work Tackling the Class Imbalance Problem of Deep Learning Based Head and Neck Organ Segmentation from Elias T

2 Apr 20, 2022
Official implementation of FCL-taco2: Fast, Controllable and Lightweight version of Tacotron2 @ ICASSP 2021

FCL-Taco2: Towards Fast, Controllable and Lightweight Text-to-Speech synthesis (ICASSP 2021) Paper | Demo Block diagram of FCL-taco2, where the decode

Disong Wang 39 Sep 28, 2022
A PyTorch implementation of unsupervised SimCSE

A PyTorch implementation of unsupervised SimCSE

99 Dec 23, 2022
Simple tutorials using Google's TensorFlow Framework

TensorFlow-Tutorials Introduction to deep learning based on Google's TensorFlow framework. These tutorials are direct ports of Newmu's Theano Tutorial

Nathan Lintz 6k Jan 06, 2023
Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

Modeling Category-Selective Cortical Regions with Topographic Variational Autoencoders

1 Oct 11, 2021
FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX.

FedJAX: Federated learning with JAX What is FedJAX? FedJAX is a library for developing custom Federated Learning (FL) algorithms in JAX. FedJAX priori

Google 208 Dec 14, 2022
Streaming Anomaly Detection Framework in Python (Outlier Detection for Streaming Data)

Python Streaming Anomaly Detection (PySAD) PySAD is an open-source python framework for anomaly detection on streaming multivariate data. Documentatio

Selim Firat Yilmaz 181 Dec 18, 2022
IGCN : Image-to-graph convolutional network

IGCN : Image-to-graph convolutional network IGCN is a learning framework for 2D/3D deformable model registration and alignment, and shape reconstructi

Megumi Nakao 7 Oct 27, 2022
PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

PyTorch Kafka Dataset: A definition of a dataset to get training data from Kafka.

ERTIS Research Group 7 Aug 01, 2022
BTC-Generator - BTC Generator With Python

Что такое BTC-Generator? Это генератор чеков всеми любимого @BTC_BANKER_BOT Для

DoomGod 3 Aug 24, 2022
JFB: Jacobian-Free Backpropagation for Implicit Models

JFB: Jacobian-Free Backpropagation for Implicit Models

Typal Research 28 Dec 11, 2022
[IROS2021] NYU-VPR: Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymization Influences

NYU-VPR This repository provides the experiment code for the paper Long-Term Visual Place Recognition Benchmark with View Direction and Data Anonymiza

Automation and Intelligence for Civil Engineering (AI4CE) Lab @ NYU 22 Sep 28, 2022
This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

[CVPRW 2021] - Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation

Anirudh S Chakravarthy 6 May 03, 2022
Code for our ACL 2021 paper "One2Set: Generating Diverse Keyphrases as a Set"

One2Set This repository contains the code for our ACL 2021 paper “One2Set: Generating Diverse Keyphrases as a Set”. Our implementation is built on the

Jiacheng Ye 63 Jan 05, 2023
The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting

About The Python code for the paper A Hybrid Quantum-Classical Algorithm for Robust Fitting The demo program was only tested under Conda in a standard

Anh-Dzung Doan 5 Nov 28, 2022
[ECCV'20] Convolutional Occupancy Networks

Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page | Blog Post This repository contains the implementation o

622 Dec 30, 2022
TalkingHead-1KH is a talking-head dataset consisting of YouTube videos

TalkingHead-1KH Dataset TalkingHead-1KH is a talking-head dataset consisting of YouTube videos, originally created as a benchmark for face-vid2vid: On

173 Dec 29, 2022