Code for the Higgs Boson Machine Learning Challenge organised by CERN & EPFL

Overview

A method to solve the Higgs boson challenge using Least Squares - Novae

This project is the Project 1 of EPFL CS-433 Machine Learning. The project is the same as the Higgs Boson Machine Learning Challenge posted on Kaggle. The dataset and the detailed description can also be found in the GitHub repository of the course.

Team name: Novae

Team members: Giacomo Orsi, Vittorio Rossi, Chun-Tso Tsai

About the Project

The task of this project is to train a model based on the provided train.csv to have the best prediction on the data given in test.csv or any other general case.

We built our model for the problem using regularized linear regression after applying some data cleaning and features engineering techniques. A report describing our approach and our results can be found in the file report.pdf. In the end, we obtained an accuracy of 0.836 and an F1 score of 0.751 on the test.csv dataset.

Instructions

  • The project runs under Python 3.8 and requires NumPy=1.19.
  • Please make sure to place train.csv and test.csv inside the data folder. Those files can be downloaded here.
  • Go to the script/ folder and execute run.py. A model will be trained with the given hyper-parameters and predictions for the test dataset will be outputed in the file out.csv.

Modules

implementations.py

Contains the implementations of different learning algorithms. Including

  • Least squares linear regression
    • least_squares: Direct computation from linear equations.
    • least_squares_GD: Gradient descent.
    • least_squares_SGD: Stochastic gradient descent.
    • ridge_regression: Regularized linear regression from direct computation.
  • Logistic regression
    • logistic_regression: Gradient descent
    • reg_logistic_regression: Gradient descent with regularization.

There are also some helper functions in this file to facilitate the above functions.

data_processing.py

Calls the following files to process the data.

  • data_cleaning.py: Contains functions used to
    1. Categorize data into subgroups.
    2. Replace missing values with the median.
    3. Standardize the features.
  • feature_engineering.py: Contains functions used to generate our interpretable features.

run.py

Generates the submission .csv file based on the data of test.csv stored in the folder data/. Our optimized model is also defined in this file.

Some helper Functions

  • models.py: Create the models for predicting the labels for new data points without true labels.
  • expansions.py: Contains a function to apply polynomial expansion to our features to add extra degrees of freedom for our models.
  • proj1_helpers.py: Contains functions which loads the .csv files as training or testing data, and create the .csv file for submission.
  • cross_validation.py: Contains a function to build the index for k-fold cross_validation.
  • disk_helper.py: Save/load the NumPy array to disk for further usage. Useful for saving hyper-parameters when trying a long training process.

Notebook

It is possible to use the Jupyter notebook project_notebook.ipynb located in the scripts folder to train the best hyper-parameters for the model. In the notebook it is possible to cross-validate a logistic and a least square regression model over given lambdas and degrees.

Owner
Giacomo Orsi
CS Student at EPFL. Previously at University of Bologna
Giacomo Orsi
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Phillip Lippe 1.1k Jan 07, 2023
Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Zhengxia Zou 1.5k Dec 28, 2022
Easy-to-use micro-wrappers for Gym and PettingZoo based RL Environments

SuperSuit introduces a collection of small functions which can wrap reinforcement learning environments to do preprocessing ('microwrappers'). We supp

Farama Foundation 357 Jan 06, 2023
PyTorch implementation of Constrained Policy Optimization

PyTorch implementation of Constrained Policy Optimization (CPO) This repository has a simple to understand and use implementation of CPO in PyTorch. A

Sapana Chaudhary 25 Dec 08, 2022
Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data"

Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data" You can download the pretrained

Bountos Nikos 3 May 07, 2022
Hybrid Neural Fusion for Full-frame Video Stabilization

FuSta: Hybrid Neural Fusion for Full-frame Video Stabilization Project Page | Video | Paper | Google Colab Setup Setup environment for [Yu and Ramamoo

Yu-Lun Liu 430 Jan 04, 2023
TransReID: Transformer-based Object Re-Identification

TransReID: Transformer-based Object Re-Identification [arxiv] The official repository for TransReID: Transformer-based Object Re-Identification achiev

569 Dec 30, 2022
Implementation of ICCV2021(Oral) paper - VMNet: Voxel-Mesh Network for Geodesic-aware 3D Semantic Segmentation

VMNet: Voxel-Mesh Network for Geodesic-Aware 3D Semantic Segmentation Created by Zeyu HU Introduction This work is based on our paper VMNet: Voxel-Mes

HU Zeyu 82 Dec 27, 2022
CoSMA: Convolutional Semi-Regular Mesh Autoencoder. From Paper "Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes"

Mesh Convolutional Autoencoder for Semi-Regular Meshes of Different Sizes Implementation of CoSMA: Convolutional Semi-Regular Mesh Autoencoder arXiv p

Fraunhofer SCAI 10 Oct 11, 2022
PyTorch implementation of "LayoutTransformer: Layout Generation and Completion with Self-attention"

PyTorch implementation of "LayoutTransformer: Layout Generation and Completion with Self-attention" to appear in ICCV 2021

Kamal Gupta 75 Dec 23, 2022
Clustering is a popular approach to detect patterns in unlabeled data

Visual Clustering Clustering is a popular approach to detect patterns in unlabeled data. Existing clustering methods typically treat samples in a data

Tarek Naous 24 Nov 11, 2022
Tool cek opsi checkpoint facebook!

tool apa ini? cek_opsi_facebook adalah sebuah tool yang mengecek opsi checkpoint akun facebook yang terkena checkpoint! tujuan dibuatnya tool ini? too

Muhammad Latif Harkat 2 Jul 17, 2022
MWPToolkit is a PyTorch-based toolkit for Math Word Problem (MWP) solving.

MWPToolkit is a PyTorch-based toolkit for Math Word Problem (MWP) solving. It is a comprehensive framework for research purpose that integrates popular MWP benchmark datasets and typical deep learnin

119 Jan 04, 2023
2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

sohu_text_matching 2021搜狐校园文本匹配算法大赛Top2:分比我们低的都是帅哥队 本repo包含了本次大赛决赛环节提交的代码文件及答辩PPT,提交的模型文件可在百度网盘获取(链接:https://pan.baidu.com/s/1T9FtwiGFZhuC8qqwXKZSNA ,

hflserdaniel 43 Oct 01, 2022
Fuzzing the Kernel Using Unicornafl and AFL++

Unicorefuzz Fuzzing the Kernel using UnicornAFL and AFL++. For details, skim through the WOOT paper or watch this talk at CCCamp19. Is it any good? ye

Security in Telecommunications 283 Dec 26, 2022
Eff video representation - Efficient video representation through neural fields

Neural Residual Flow Fields for Efficient Video Representations 1. Download MPI

41 Jan 06, 2023
LiDAR R-CNN: An Efficient and Universal 3D Object Detector

LiDAR R-CNN: An Efficient and Universal 3D Object Detector Introduction This is the official code of LiDAR R-CNN: An Efficient and Universal 3D Object

TuSimple 295 Jan 05, 2023
IPATool-py: download ipa easily

IPATool-py Python version of IPATool! Installation pip3 install -r requirements.txt Usage Quickstart: download app with specific bundleId into DIR: p

159 Dec 30, 2022
PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

55 Nov 23, 2022
A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).

ALLINONE-Det ALLINONE-Det is a general and strong 3D object detection codebase built on OpenPCDet, which supports more methods, datasets and tools (de

Michael.CV 5 Nov 03, 2022