Interactive dimensionality reduction for large datasets

Last update: Dec 14, 2022

Related tags

Overview

BlosSOM 🌼

BlosSOM is a graphical environment for running semi-supervised dimensionality reduction with EmbedSOM. You can use it to explore multidimensional datasets, and produce great-looking 2-dimensional visualizations.

WARNING: BlosSOM is still under development, some stuff may not work right, but things will magically improve without notice. Feel free to open an issue if something looks wrong.

❓ Overview
🔧 Compiling and running
➡️ How-To 💡
📘 Documentation

BlosSOM was developed at the MFF UK Prague, in cooperation with IOCB Prague.

Overview

BlosSOM creates a landmark-based model of the dataset, and dynamically projects all dataset point to your screen (using EmbedSOM). Several other algorithms and tools are provided to manage the landmarks; a quick overview follows:

High-dimensional landmark positioning:
- Self-organizing maps
- k-Means
2D landmark positioning
- k-NN graph generation (only adds edges, not vertices)
- force-based graph layouting
- dynamic t-SNE
Dimensionality reduction
- EmbedSOM
- CUDA EmbedSOM (with roughly 500x speedup, enabling smooth display of a few millions of points)
Manual landmark position optimization
Visualization settings (colors, transparencies, cluster coloring, ...)
Dataset transformations and dimension scaling
Import from matrix-like data files
- FCS3.0 (Flow Cytometry Standard files)
- TSV (Tab-separated CSV)
Export of the data for plotting

Compiling and running BlosSOM

You will need cmake build system and SDL2.

For CUDA EmbedSOM to work, you need the NVIDIA CUDA toolkit. Append -DBUILD_CUDA=1 to cmake options to enable the CUDA version.

Windows (Visual Studio 2019)

Dependencies

The project requires SDL2 as an external dependency:

install vcpkg tool and remember your vcpkg directory
install SDL: vcpkg install SDL2:x64-windows

Compilation

git submodule init
git submodule update

mkdir build
cd build

# You need to fix the path to vcpkg in the following command:
cmake .. -G "Visual Studio 16 2019" -A x64 -DCMAKE_BUILD_TYPE="Release" -DCMAKE_INSTALL_PREFIX=./inst -DCMAKE_TOOLCHAIN_FILE=your-vcpkg-clone-directory/scripts/buildsystems/vcpkg.cmake

cmake --build . --config Release
cmake --install . --config Release

Running

Open Visual Studio solution BlosSOM.sln, set blossom as startup project, set configuration to Release and run the project.

Linux (and possibly other unix-like systems)

Dependencies

The project requires SDL2 as an external dependency. Install libsdl2-dev (on Debian-based systems) or SDL2-devel (on Red Hat-based systems), or similar (depending on the Linux distribution). You should be able to install cmake package the same way.

Compilation

git submodule init
git submodule update

mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=./inst    # or any other directory
make install                              # use -j option to speed up the build

Running

./inst/bin/blossom

Documentation

Basic usage of the software and the description of the user interface is available in HOWTO.md.
Some technical details about the code may be found in src/README.md.
Doxygen-generated documentation of the source code can be found at https://molnsona.github.io/blossom/

Quickstart

Click on the "plus" button on the bottom right side of the window
Choose Open file (the first button from the top) and open a file from the demo_data/ directory
You can now add and delete landmarks using ctrl+mouse click, and drag them around.
Use the tools and settings available under the "plus" button to optimize the landmark positions and get a better visualization.

See the HOWTO for more details and hints.

Performance and CUDA

If you pass -DBUILD_CUDA=1 to the cmake commands, you will get extra executable called blossom_cuda (or blossom_cuda.exe, on Windows).

The 2 versions of BlosSOM executable differ mainly in the performance of EmbedSOM projection, which is more than 100× faster on GPUs than on CPUs. If the dataset gets large, only a fixed-size slice of the dataset gets processed each frame (e.g., at most 1000 points in case of CPU) to keep the framerate in a usable range. The defaults in BlosSOM should work smoothly for many use-cases (defaulting at 1k points per frame on CPU and 50k points per frame on GPU).

If required (e.g., if you have a really fast GPU), you may modify the constants in the corresponding source files, around the call sites of clean_range(), which is the function that manages the round-robin refreshing of the data. Functionality that dynamically chooses the best data-crunching rate is being implemented and should be available soon.

License

BlosSOM is licensed under GPLv3 or later. Several small libraries bundled in the repository are licensed with MIT-style licenses.

Interactive dimensionality reduction for large datasets

Related tags

Overview

BlosSOM 🌼

Overview

Compiling and running BlosSOM

Windows (Visual Studio 2019)

Dependencies

Compilation

Running

Linux (and possibly other unix-like systems)

Dependencies

Compilation

Running

Documentation

Quickstart

Performance and CUDA

License

Owner

Fully Convolutional Refined Auto Encoding Generative Adversarial Networks for 3D Multi Object Scenes

A fast MoE impl for PyTorch

PyTorch for Semantic Segmentation

Predicting Tweet Sentiment Maching Learning and streamlit

Code of Periodic Activation Functions Induce Stationarity

ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information

This repository contains the code for using the H3DS dataset introduced in H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction

LF-YOLO (Lighter and Faster YOLO) is used to detect defect of X-ray weld image.

A PyTorch implementation of SlowFast based on ICCV 2019 paper "SlowFast Networks for Video Recognition"

Implementation of PersonaGPT Dialog Model

Official implementation of the paper 'High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network' in CVPR 2021

BarcodeRattler - A Raspberry Pi Powered Barcode Reader to load a game on the Mister FPGA using MBC

ALL Snow Removed: Single Image Desnowing Algorithm Using Hierarchical Dual-tree Complex Wavelet Representation and Contradict Channel Loss (HDCWNet)

Easy way to add GoogleMaps to Flask applications. maintainer: @getcake

This is a virtual picture dragging application. Users may virtually slide photos across the screen. The distance between the index and middle fingers determines the movement. Smaller distances indicate click and motion, whereas bigger distances indicate only hand movement.

TorchFlare is a simple, beginner-friendly, and easy-to-use PyTorch Framework train your models effortlessly.

Delving into Localization Errors for Monocular 3D Object Detection, CVPR'2021

COVID-Net Open Source Initiative

The Multi-Mission Maximum Likelihood framework (3ML)

Stacked Generative Adversarial Networks