A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

Last update: Dec 29, 2022

Related tags

Overview

OMNI

A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

Why?

When I finished my Kubernetes cluster using a few Raspberry Pis, the first thing I wanted to do is install Prometheus + Grafana for monitoring, and so I did. But when I had all of it working I found a few drawbacks:

The Prometheus exporter pods use a lot of RAM
The Prometheus exporter pods use a considerable amount of CPU
Prometheus gathers way too much data that I don't really need.
The node where the main Prometheus pod is installed gets all of the information and saves it in its own database, constantly performing a lot of writes to the SD card. SD cards under lots of constant writing operations tend to die.

Last but not least, I like to learn how these things work.

Advantages

Omni has (what I consider) some advantages over the regular Prometheus + Grafana combo:

It uses almost no RAM (13 Mb)
It uses almost no CPU
It gathers only the information I need
All of the information is sent to an InfluxDB instance that could be outside of the cluster. This means that no information is persisted in the Pis, extending their SD card's lifetime.
InfluxDB acts as the database and the graph dashboard at the same time, so there is no need to also install Grafana (although you could if you wanted to).

Prerequisites

For Omni to work, you'll need to have a couple of things running first.

InfluxDB

It's a time series database (just like Prometheus) that has nice charts and UI overall.

One of the goals of this project is to avoid constant writing to the SD cards, so you have a few options for the placement of the database:

Use InfluxDB's online service (there is even a free tier https://www.influxdata.com/influxdb-pricing/)
Run an InfluxDB instance in a server outside the Pi cluster (this what I'm doing right now)
If you have better storage in your cluster (like M.2, SSD, etc.) and don't have the SD card limitation, run InfluxDB in the same cluster.

Libraries

You'll need to have the libseccomp2.deb library installed in each of your nodes to avoid a Python error:

Fatal Python Error: pyinit_main: can't initialize time

(more info here)

To install it you can do it in two ways (only one is needed):

Ansible: all nodes at the same time

Edit the file ansible-playbook-libs.yaml in this repo, add your hosts and run:
```
ansible-playbook install-libs.yaml
```

SSH: one by one

Connect into each of your nodes and run:

wget http://ftp.us.debian.org/debian/pool/main/libs/libseccomp/libseccomp2_2.5.1-1_armhf.deb
sudo dpkg -i libseccomp2_2.5.1-1_armhf.deb

Once you have it, everything should work ok.

Installation

Before deploying Omni you'll have to specify the attributes of your InfluxDB instance.

Open omni-install.yaml and fill the variables with your InfluxDB instance information.

NOTE: The attribute OMNI_DATA_RATE_SECONDS specifies the number of seconds between data reporting events that are sent to the InfluxDB server.
Check that everything is running as expected:

kubectl get all -n omni-system

And you are done! 🎉

Contributions

Pull requests with improvements and new features are more than welcome.

A very lightweight monitoring system for Raspberry Pi clusters running Kubernetes.

Related tags

Overview

OMNI

Why?

Advantages

Prerequisites

InfluxDB

Libraries

Installation

Contributions

Owner

Matias Godoy

Official repository of the AAAI'2022 paper "Contrast and Generation Make BART a Good Dialogue Emotion Recognizer"

gtfs2vec - Learning GTFS Embeddings for comparing PublicTransport Offer in Microregions

Example scripts for the detection of lanes using the ultra fast lane detection model in ONNX.

Code of Adverse Weather Image Translation with Asymmetric and Uncertainty aware GAN

Improving the robustness and performance of biomedical NLP models through adversarial training

Hierarchical Memory Matching Network for Video Object Segmentation (ICCV 2021)

BridgeGAN - Tensorflow implementation of Bridging the Gap between Label- and Reference-based Synthesis in Multi-attribute Image-to-Image Translation.

This is the code for ACL2021 paper A Unified Generative Framework for Aspect-Based Sentiment Analysis

Self-supervised spatio-spectro-temporal represenation learning for EEG analysis

Bi-level feature alignment for versatile image translation and manipulation (Under submission of TPAMI)

Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers"

Using this you can control your PC/Laptop volume by Hand Gestures (pinch-in, pinch-out) created with Python.

Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

Welcome to The Eigensolver Quantum School, a quantum computing crash course designed by students for students.

🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

A implemetation of the LRCN in mxnet

🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥

All public open-source implementations of convnets benchmarks

Virtual hand gesture mouse using a webcam

Deep Convolutional Generative Adversarial Networks