Classifies galaxy morphology with Bayesian CNN

Last update: Dec 20, 2022

Related tags

Deep Learning zoobot

Overview

Zoobot

Zoobot classifies galaxy morphology with deep learning. This code will let you:

Reproduce and improve the Galaxy Zoo DECaLS automated classifications
Finetune the classifier for new tasks

For example, you can train a new classifier like so:

model = define_model.get_model(
    output_dim=len(schema.label_cols),  # schema defines the questions and answers
    input_size=initial_size, 
    crop_size=int(initial_size * 0.75),
    resize_size=resize_size
)

model.compile(
    loss=losses.get_multiquestion_loss(schema.question_index_groups),
    optimizer=tf.keras.optimizers.Adam()
)

training_config.train_estimator(
    model, 
    train_config,  # parameters for how to train e.g. epochs, patience
    train_dataset,
    test_dataset
)

Install using git and pip: git clone [email protected]:mwalmsley/zoobot.git pip install -r zoobot/requirements.txt (virtual env or conda highly recommended) pip install -e zoobot The main branch is for stable-ish releases. The dev branch includes the shiniest features but may change at any time.

To get started, see the documentation.

I also include some working examples for you to copy and adapt:

Latest cool features on dev branch (June 2021):

Multi-GPU distributed training
Support for Weights and Biases (wandb)
Worked examples for custom representations

Contributions are welcome and will be credited in any future work.

If you use this repo for your research, please cite the paper.

Comments

Benchmarks
It's important that Zoobot has proper benchmarks so that we can be confident new releases work properly for users. This PR adds those benchmarks.

In the course of setting up the benchmarks, I have made some major changes/improvements:

pytorch-galaxy-datasets refactored to work for tensorflow, imports adapted

both tensorflow and pytorch zoobot versions use albumentations for augmentations. Old TF code removed.

tensorflow version bumped to 2.10 (current latest) while I'm at it

pytorch version now has logging for per-question loss. Loss func aggregation has new option to support this.

TensorFlow version has per-question logging also, but awaiting issue with Keras team to enable

Created minimal_example.py for TensorFlow (thanks, @katgre )

Support CPU-only PyTorch training

Refactor TF TrainingConfig to Trainer object, Lightning style, for consistency

enhancement
opened by mwalmsley 3
on_train_batch_end is slow in TF
Unclear what's causing this slowness. Presumably a callback I added - but none look like they should be heavy? Perhaps something wandb is doing?

Remove all callbacks and rerun

Remove wandb and rerun For each, check if slow warning continues (or if training speed changes at all)

enhancement
opened by mwalmsley 3
add gh action to publish package to pypi

Related to https://github.com/mwalmsley/zoobot/issues/18#issuecomment-1278635788

This PR adds an auto CI release mechanism for publishing zoobot to pypi. It uses the GH action to release to pypi https://github.com/pypa/gh-action-pypi-publish

opened by camallen 3
Publish latest version to PyPi?

A question rather than a request. Are there any plans to publish the refactored work ?

PyPi shows v0.0.1 is published https://pypi.org/project/zoobot/#history on 15th March 2021 but the latest code is ~v0.0.3 (tags) and the refactor seems to be working well.

Ideally I can pull in these packages to my own env / container and then train with the latest code vs pulling in from github etc.

opened by camallen 3
setup branch protection rules on 'main'

https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/defining-the-mergeability-of-pull-requests/managing-a-branch-protection-rule

It may be too restrictive for your use case / dev flows but we use this for contributor PRs etc. Basically we ensure that a PR meets certain criteria in terms of our CI runs, can only merge a PR once one of the CI runs v3.7 or v3.9 tests pass.

Feel free to close if you don't think this is useful.
enhancement

opened by camallen 2
Deprecate TFRecords

TFRecords are cumbersome and take up a lot of disk space. It's much simpler to learn directly from images on disk, at the cost of some I/O performance.

This PR removes support for TFRecords in favour of images-on-disk. This will ultimately enable new TensorFlow weights trained on all of DESI (impractical with TFRecords).

Breaking change for anyone using TFRecords (i.e. everyone using TensorFlow to train from scratch). Finetuning should not be affected.

TODO - will require new greyscale/colour pretrained models, just for safety.

opened by mwalmsley 2
feat(CI): Add proposed python CI GH Action

This PR proposes to add a simple GH Action script that establishes a python environment, downloads the requirements and runs pytest.

Some other things to consider might be to use conda for virtual environments and creating CI scripts for Docker as well.

opened by SauravMaheshkar 2
Improve data files for docker
This PR changes the docker / compose setup, specifically it

consolidates the docker files to cuda and tensorflow base images (no need for a python base image)

adds a .dockerignore entry for all data files when building the container to keep the size down

and provides an easy way to inject them at run time via local directory mounts in the compose file

finally this removes specific to my machine local directory setup for injecting unrelated data files
opened by camallen 2
add wandb logging, freeze batchnorm by default
Doing some polishing on finetuning

Add wandb logging to the full_tree example. @camallen use this for dashboard. You will need to add import wandb, wandb.init(authkey, etc) just before when running on Azure.

Freeze batch norm layers by default when finetuning, with new recursive function

Pass additional params via config (thanks Cam)

Minor cleanup
opened by mwalmsley 1
Add PyTorch Finetuning Capability, Examples
Key change is adding pytorch.training.finetune() method. Works on either classification (e.g. 0, 1) data or count (e.g. 12 said yes, 4 said no) data.

Includes three working examples:

Binary classification, with tiny rings subset

Counts for single question, with full internal rings data

Counts for all questions, with GZ Cosmic Dawn schema

Also updates various imports for the galaxy-datasets refactor, fixes prediction method to work on unlabelled data, minor QoL improvements.

Finally, changes PyTorch dense layer initialisation to custom high-uncertainty initialisation - see efficientnet_custom.py

cc @camallen
opened by mwalmsley 1
Add v0.02 changes
Adds support (minimal working examples, a guide) for calculating new representations with a trained model.

Also adds significant new features:

Distributed training with several GPUs

Metric logging with Weights&Biases (add your own login credentials)

Train on color (3-band) images, not just greyscale

Also adds a critical bugfix (when loading images for direct predictions i.e. not via TFRecords, correctly normalise to the 0-1 interval expected (without documentation) by the tf.keras.experimental.preprocessing layers).

Also adds misc. minor fixes and documentation tweaks.

This code was used for the morphology tools paper (to be submitted shortly).
opened by mwalmsley 1
Avoid --extra-index-url via dependency_links

It should be possible to search for non-standard package repositories using just setup.py, without having the user also set --extra-index-url.

https://setuptools.pypa.io/en/latest/deprecated/dependency_links.html

But I couldn't get this to work on a quick try.
enhancement help wanted

opened by mwalmsley 1
Can't import finetune while going through finetune_binary_classification.py

I tried to go through finetune_binary_classification.py, but got the error:

ImportError: cannot import name 'finetune' from 'zoobot.pytorch.training' (/usr/local/lib/python3.8/dist-packages/zoobot/pytorch/training/init.py)

I tried it both with kasia and dev branch, went through "git clone" and "pip install" (I remembered there were some issues during Hackaton regarding that), also tried to import other features from the folder (i.e. losses) and it worked fine.
bug

opened by katgre 2
Create a simple decision tree in minimal_example.py

Instead of using on of the complicated decision trees from decals dr5, let's create a simple decision tree with one dependency already written in the minimal_example.py.

opened by katgre 0

Releases(v0.0.3)

v0.0.3(Apr 25, 2022)

Improved documentation and refactored train API (pytorch).

Awaiting results from several segmentation experiments ahead of public release (inc pytorch version).
Source code(tar.gz)
Source code(zip)
v0.0.2(Oct 4, 2021)

Polish throughout and add new features. See #2 for a full description.
Source code(tar.gz)
Source code(zip)
beta(Sep 29, 2021)

Initial release.

This had enough documentation and code to replicate the DECaLS model and make predictions. There are a few minor missing arguments and similar typos that you might have stumbled into, because I made some last minute changes without updating the docs, but everything worked with a little stack tracing.
Source code(tar.gz)
Source code(zip)

Owner

Mike Walmsley

GitHub Repository

Tensorforce: a TensorFlow library for applied reinforcement learning

Tensorforce: a TensorFlow library for applied reinforcement learning Introduction Tensorforce is an open-source deep reinforcement learning framework,

3.2k Jan 02, 2023

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

SPLADE 🍴 + 🥄 = 🔎 This repository contains the weights for four models as well as the code for running inference for our two papers: [v1]: SPLADE: S

170 Dec 28, 2022

Library to enable Bayesian active learning in your research or labeling work.

Bayesian Active Learning (BaaL) BaaL is an active learning library developed at ElementAI. This repository contains techniques and reusable components

687 Dec 25, 2022

This repository is for DSA and CP scripts for reference.

dsa-script-collections This Repo is the collection of DSA and CP scripts for reference. Contents Python Bubble Sort Insertion Sort Merge Sort Quick So

9 Nov 22, 2022

CharacterGAN: Few-Shot Keypoint Character Animation and Reposing

CharacterGAN Implementation of the paper "CharacterGAN: Few-Shot Keypoint Character Animation and Reposing" by Tobias Hinz, Matthew Fisher, Oliver Wan

181 Dec 27, 2022

the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

EmbedSeg Introduction This repository hosts the version of the code used for the preprint Embedding-based Instance Segmentation of Microscopy Images.

88 Dec 25, 2022

DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.

DWIPrep: A Robust Preprocessing Pipeline for dMRI Data DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transp

1 Jan 09, 2023

Speed-Test - You can check your intenet speed using this tool

Speed-Test Tool By Hez_X AVAILABLE ON : Termux & Kali linux & Ubuntu (Linux E

3 Feb 17, 2022

Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

GraspNet Baseline Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020). [paper] [dataset] [API] [do

209 Dec 29, 2022

ChatBot-Pytorch - A GPT-2 ChatBot implemented using Pytorch and Huggingface-transformers

ChatBot-Pytorch A GPT-2 ChatBot implemented using Pytorch and Huggingface-transf

42 Dec 09, 2022

Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21

MonoFlex Released code for Objects are Different: Flexible Monocular 3D Object Detection, CVPR21. Work in progress. Installation This repo is tested w

169 Dec 06, 2022

DA2Lite is an automated model compression toolkit for PyTorch.

DA2Lite (Deep Architecture to Lite) is a toolkit to compress and accelerate deep network models. ⭐ Star us on GitHub — it helps!! Frameworks & Librari

7 Mar 22, 2022

Python 3 module to print out long strings of text with intervals of time inbetween

Python-Fastprint Python 3 module to print out long strings of text with intervals of time inbetween Install: pip install fastprint Sync Usage: from fa

2 Jun 27, 2022

Streamlit component for TensorBoard, TensorFlow's visualization toolkit

streamlit-tensorboard This is a work-in-progress, providing a function to embed TensorBoard, TensorFlow's visualization toolkit, in Streamlit apps. In

27 Nov 13, 2022

git《FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding》(CVPR 2021) GitHub: [fig8]

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR 2021) This repo contains the implementation of our state-of-the-art fewshot ob

233 Dec 29, 2022

Code release for "BoxeR: Box-Attention for 2D and 3D Transformers"

BoxeR By Duy-Kien Nguyen, Jihong Ju, Olaf Booij, Martin R. Oswald, Cees Snoek. This repository is an official implementation of the paper BoxeR: Box-A

111 Dec 07, 2022

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

272 Jan 08, 2023