Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems.

Overview

Baselining, on steroids!

Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems.

The project aims to offer an open-source alternative to the famous NSRL or HashSets and allows you to generate baselines from your own systems. Plus, it is cross-platform, so you can use it the same way whether on a Windows or a GNU/Linux system.

Currently available extractors:

  • fs: Extracts filesystem-related metadata.
  • hash: Computes several hashes from the entry's data (e.g. MD5, SHA-1, ssdeep).
  • pe: Extracts detailed information from Portable Executable (PE) files.

Table of contents

Installation

Installing from PyPI

Baseline is currently not available on PyPI. The main reason is that the name is currently taken by another project.

Installing from source

Since Baseline uses Poetry as its packaging toolkit of choice, so installing it from source is as simple as:

git clone httpe://github.com/sk4la/baseline.git
cd baseline

python3 -m pip install poetry
python3 -m poetry install

Precompiled binaries

Precompiled binaries are available in the Releases section.

Docker

Baseline is also available as a Docker image.

To pull the latest image from Docker Hub:

docker pull sk4la/baseline

See the official Docker documentation for details on how to install and use it.

Usage

The help menu for the baseline command-line utility:

Usage: baseline 
   
     COMMAND [ARGS]...

  Command-line utility that creates file-oriented baselines.

Options:
  --ensure-administrator          Ensure that the current user has administrative privileges (i.e.
                                  is `root` or equivalent on GNU/Linux systems).
  --log-file 
    
                    Set the log file path (e.g. 'baseline.log').  [default:
                                  /home/sk4la/baseline/20211031110323.25756fb1b706.log]
  --logging-configuration 
     
        Set the logging configuration using a custom file. The file must
                                  adhere to the official specification. See
                                  https://docs.python.org/3/library/logging.config.html#logging-
                                  config-dictschema for more details.
  --monochrome                    Disable console output coloring. This can be useful when piping
                                  the output to a log file.
  -v, --verbose                   Increase the logging verbosity. Supports up to 4 occurrences of
                                  the same option (e.g. -vvvv).  [0<=x<=4]
  --version                       Show the version and exit.
  --help                          Show this message and exit.

Commands:
  new     Creates a new filesystem-based baseline.
  schema  Show the JSON representation of the actual schema.

     
    
   

The help menu for the baseline new subcommand:

Usage: baseline new 
    
    
     ...

  Creates a new filesystem-based baseline.

Options:
  --comment 
     
                       Add an arbitrary comment to the generated output file.
  --exclude-directory 
      
       ...
                                  Exclude a specific directory from the baseline. Can be specified
                                  multiple times (e.g. `--exclude-directory /dev --exclude-
                                  directory /proc`).
  --exclude-extractor [hash|pe|fs]
                                  Exclude extractors. Can be specified multiple times (e.g.
                                  `--exclude-extractor hash --exclude-extractor pe`).
  --max-size 
       
         Set the maximum file size (in bytes) to inspect. [default: 5000000; x>=1] -o, --output-file 
        
          Set the output file path (e.g. 'baseline.ndjson'). [default: /workspaces/baseline/20211102200452.58bd60a3b16a.ndjson] --output-file-encoding [utf-8|utf-16le] Set the output file encoding. Only applies when writing to an actual file. [default: utf-8] -f, --output-format [ndjson] Set the output format. [default: ndjson] --partition-size 
         
           Set the partition size (i.e. number of entries per process). [default: 200; x>=1] --processes 
          
            Set the number of parallel processes. [default: 2; x>=1] --recursive / --non-recursive Whether to walk the filesystem recursively. When set, the program will only inspect the files and directories specifies on the first level of any included path. For example, if '/mnt/image' is specified as an included path, then only the directory '/mnt/image' itself and its direct children will be inspected. [default: recursive] --remap 
           
            ... Artificially remap included paths (e.g. '/mnt/image:/'). Can be specified multiple times (e.g. `--remap /mnt/image:/ --remap /dev/null:/dev/void`). --report / --no-report Whether to show a final report at the end. [default: report] --skip-compression Whether to skip on-the-fly compression of the resulting file. --skip-directories Whether to skip directories. --skip-empty Whether to skip empty entries. --help Show this message and exit. 
           
          
         
        
       
      
     
    
   

The help menu for the baseline schema subcommand:

Usage: baseline schema 
   
    

  Show the JSON representation of the actual schema.

Options:
  --compact                       Render compact JSON instead of the default idented version.
  --output-file 
    
                 Set the output file path (e.g. 'schema.json').
  --output-file-encoding [utf-8|utf-16le]
                                  Set the output file encoding. Only applies when writing to an
                                  actual file.  [default: utf-8]
  --help                          Show this message and exit.

    
   

Creating a baseline of a live system

Creating a baseline of a live system is as simple as:

baseline new

When using Baseline from a removable device, you may want to exclude its path (for example /mnt/usb) from the generated baseline:

baseline --ensure-administrator new --exclude-directory /mnt/usb

See the Usage section for a complete list of options and arguments.

Creating a baseline from a mounted image

When creating a baseline of a mounted image, you may want the baseline to represent the files as if they were read from the actual system, not the mounted image.

For example, if your image is currently mounted on /mnt/IMG-001, you can then execute the following command to remap all entries read from this path to /:

baseline new --remap /mnt/IMG-001:/ /mnt/IMG-001

You can think of this as a chroot jail.

Displaying the schema

Baseline uses a fixed schema for rendering the information. This schema is enforced using the Pydantic package and produces a heavily-typed output that can later be ingested as-is.

To print the standardized JSON schema:

baseline schema

To dump a compact version of the JSON schema to schema.min.json:

baseline schema --compact --output-file schema.min.json

The JSON schema produced by Pydantic is compatible with the specifications from JSON Schema Core, JSON Schema Validation and OpenAPI Data Types. See the official Pydantic documentation for more details.

Advanced usage

Building binaries

Baseline currently supports the following packaging systems:

Although precompiled binaries are available in the Releases section, you should always build your own binaries.

To produce a binary using PyInstaller:

make pyinstaller-linux

To produce a binary using Nuitka:

make nuitka-linux

As Nuitka is a Python compiler by itself and does not rely on the standard CPython interpreter, you should be aware that there may be bugs and/or issues unrelated to Baseline itself.

Building the Docker image

To build the official Docker image:

make docker

Additional instructions can be added to the Dockerfile in order to customize the image.

The official Docker image is available at https://hub.docker.com/r/sk4la/baseline. You can use the FROM docker.io/sk4la/baseline:latest instruction in your own Dockerfile to derive your own image.

API

Using Baseline from Python is possible using the Baseline class:

from baseline.core import Baseline

with Baseline() as baseline:
    for record in baseline.compute(*[
        "/mnt/IMG-001",
        "/mnt/IMG-002",
    ]):
        print(record.json(exclude_none=True))

See the actual code for a more thorough example.

The Baseline class emits logging messages to the baseline logger, to which you can subscribe to if you wish. The command-line utility displays these messages to the console by default.

Contribute

Baseline is a work in progress, everyone is welcome to contribute! πŸ‘

Writing a new extractor

In order for new extractors to be able to enrich the generated records, the global schema first needs to be updated. To do this, you must create a sublass of Pydantic's BaseModel in schema.py that references the fields that will eventually be filled by the extractor. This class will then be referenced in the schema's root Record class.

In this example, we want to extract the first 50 lines of any *.txt file. Here we arbitrarily decide that the extracted text will be stored in the content attribute and that the extractor's key will be text:

class Text(pydantic.BaseModel):
    content: str

class Record(pydantic.BaseModel):
    ...
    text: typing.Optional[Text]

We can then start to write the actual code. All extractors must inherit from the base Extractor class:

None: with self.entry.open() as stream: setattr( record, self.KEY, schema.Text( content=stream.read(50), ), ) ">
from baseline.models import Extractor
from baseline.schema import Text

class Text(Extractor):
    """Extracts the first line of any `.txt` file."""

    EXTENSION_FILTERS = (
      r"\.txt$",
    )
    KEY = "text"

    def run(self: object, record: schema.Record) -> None:
        with self.entry.open() as stream:
            setattr(
                record,
                self.KEY,
                schema.Text(
                    content=stream.read(50),
                ),
            )

The extractor's KEY class variable must correspond to the one that was specified in the schema's root Record class (text in this example).

Support

In case you encounter a problem or want to suggest a new feature, please submit a ticket.

License

Baseline is licensed under the GNU General Public License (GPL) version 3.

You might also like...
A Python module and command line utility for working with web archive data using the WACZ format specification

py-wacz The py-wacz repository contains a Python module and command line utility for working with web archive data using the WACZ format specification

A Python module and command-line utility for converting .ANS format ANSI art to HTML

ansipants A Python module and command-line utility for converting .ANS format ANSI art to HTML. Installation pip install ansipants Command-line usage

A command line utility to export Google Keep notes to markdown.

Keep-Exporter A command line utility to export Google Keep notes to markdown files with metadata stored as a frontmatter header. Supports exporting: S

A command line utility for tracking a stock market portfolio. Primarily featuring high resolution braille graphs.
A command line utility for tracking a stock market portfolio. Primarily featuring high resolution braille graphs.

A command line stock market / portfolio tracker originally insipred by Ericm's Stonks program, featuring unicode for incredibly high detailed graphs even in a terminal.

πŸ“¦ A command line utility to put text in a box.
πŸ“¦ A command line utility to put text in a box.

boxie A command line utility to put text in a box. Installation pip install boxie If you are on Linux you may need to use sudo to access this globally

Tiny command-line utility for mapping broken keys to other positions.

brokenkey Tiny command-line utility for mapping broken keys to other positions. Installation Clone this repository using git: git clone https://github

This is a CLI utility that allows you to view RedFlagDeals.com on the command line.
This is a CLI utility that allows you to view RedFlagDeals.com on the command line.

RFD Description Motivation Installation Usage View Hot Deals View and Sort Hot Deals Search Advanced View Posts Shell Completion bash zsh Description

img-proof (IPA) provides a command line utility to test images in the Public Cloud

overview img-proof (IPA) provides a command line utility to test images in the Public Cloud (AWS, Azure, GCE, etc.). With img-proof you can now test c

A Python command-line utility for validating that the outputs of a given Declarative Form Azure Portal UI JSON template map to the input parameters of a given ARM Deployment Template JSON template

A Python command-line utility for validating that the outputs of a given Declarative Form Azure Portal UI JSON template map to the input parameters of a given ARM Deployment Template JSON template

Comments
  • executable variable is not defined in extractors/executables

    executable variable is not defined in extractors/executables

    self.executable is not defined in baseline/extractors/executables.py, so in the __del__ method, it crashes.

    $ baseline new /etc      
    Exception ignored in: <function PortableExecutable.__del__ at 0x7fd1142c98b0>
    Traceback (most recent call last):
      File "/home/git/baseline/env/lib/python3.9/site-packages/baseline/extractors/executables.py", line 80, in __del__
        if self.executable:
    AttributeError: 'PortableExecutable' object has no attribute 'executable'
    All done! πŸ’ͺ
    
    Entries Processed : 1592
    Output Format     : ndjson
    Output File       : /home/git/baseline/20211106191044.heaven.ndjson.xz
    Size              : 1.2 MB
    Total Time        : 0:00:01.993753
    
    opened by jurelou 0
Releases(0.1.0)
Owner
Nelson
Random bits of code
Nelson
A simple reverse shell in python

RevShell A simple reverse shell in python Getting started First, start the server python server.py Finally, start the client (victim) python client.py

Lojacopsen 4 Apr 06, 2022
Ros command - Unifying the ROS command line tools

Unifying the ROS command line tools One impairment to ROS 2 adoption is that all

37 Dec 15, 2022
An open source terminal project made in python

Calamity-Terminal An open source terminal project made in python. Calamity Terminal is a free and open source lightweight terminal. Its made 100% off

1 Mar 08, 2022
Salesforce object access auditor

Salesforce object access auditor Released as open source by NCC Group Plc - https://www.nccgroup.com/ Developed by Jerome Smith @exploresecurity (with

NCC Group Plc 90 Sep 19, 2022
Stream comments, submissions from subreddits and users across reddit right in your terminal

reddit_from_terminal stream comments, submissions from subreddits and users across reddit right in your terminal Alert! : Can't watch media contents(p

Pritam Dhara 2 Dec 30, 2021
Python script to tabulate data formats like json, csv, html, etc

pyT PyT is a a command line tool and as well a library for visualising various data formats like: JSON HTML Table CSV XML, etc. Features Print table o

Mobolaji Abdulsalam 1 Dec 30, 2021
organize your books on the command line

organize your books on the command line

Ben Winston 19 Jan 21, 2022
Arithmos cipher on CLI based

Arithmos Cipher CLI This is the CLI version of Arithmos Cipher. Install pip inst

LyQuid :3 1 Jan 16, 2022
spade is the next-generation networking command line tool.

spade is the next-generation networking command line tool. Say goodbye to the likes of dig, ping and traceroute with more accessible, more informative and prettier output.

Vivaan Verma 5 Jan 28, 2022
A Yahtzee-solving python package and command line tool.

yahtzee A Yahtzee-solving python package and command line tool. The algorithm is mathematically guaranteed to have the best strategy. That is, it maxi

David Merrell 0 Aug 19, 2022
Juniper Command System is a Micro CLI Tool that allows you to manage your files, launch applications, as well as providing extra tools for OS Management.

Juniper Command System is a Micro CLI Tool that allows you to manage your files, launch applications, as well as providing extra tools for OS Management.

Juan Carlos JuΓ‘rez 1 Feb 02, 2022
This is a simple Termo application in command line style

my-termo This is a simple Termo application in command line style. This app run a Linux crontab task every day to get a new word. Type termo in your t

Gustavo Soares 1 Feb 14, 2022
Get latest astronomy job and rumor news in your command line

astrojobs Tired of checking the AAS job register and astro rumor mill for job news? Get the latest updates in the command line! astrojobs automaticall

Philip Mocz 19 Jul 20, 2022
Python commandline tool for remembering linux/terminal commands

ehh Remember linux commands Commandline tool for remembering linux/terminal commands. It stores your favorite commands in ~/ehh.json in your homedir a

56 Nov 10, 2022
PyWordle: A Python-made wordle manual solver

PyWordle: A Python-made wordle manual solver How to use it Start the program with python3 pywordlesolver.py. How it works The program has a simple 5-l

Federico Torrielli 5 Nov 24, 2022
This is a command line program to play cricket made using Python.

SimpleCricketPython This is a command line program to play cricket made using Python How it works First you have the option of selecting whether you

Imira Randeniya 1 Sep 11, 2022
CLI based diff viewer

Rich Diff CLI based diff viewer

Suresh Kumar 24 Nov 15, 2022
A CLI for advanced management of your notes with simple commands

PyNoteManager This is a CLI for advanced management of your notes with simple co

3 Dec 30, 2021
A simple python script to execute a command when a YubiKey is disconnected

YubiKeyExecute A python script to execute a command when a YubiKey / YubiKeys are disconnected. β€β€β€Ž β€Ž How to use: 1. Download the latest release and d

6 Mar 12, 2022
Autosub - Command-line utility for auto-generating subtitles for any video file

Auto-generated subtitles for any video Autosub is a utility for automatic speech recognition and subtitle generation. It takes a video or an a

Anastasis Germanidis 3.9k Jan 05, 2023