🧬 Performant Evolutionary Algorithms For Python with Ray support

Last update: Oct 20, 2022

Overview

🧬 Ruck 🏉

Performant evolutionary algorithms for Python

Visit the examples to get started, or browse the releases.

Contributions are welcome!

Goals

Ruck aims to fill the following criteria:

Provide high quality, readable implementations of algorithms.
Be easily extensible and debuggable.
Performant while maintaining its simplicity.

Citing Ruck

Please use the following citation if you use Ruck in your research:

@Misc{Michlo2021Ruck,
  author =       {Nathan Juraj Michlo},
  title =        {Ruck - Performant evolutionary algorithms for Python},
  howpublished = {Github},
  year =         {2021},
  url =          {https://github.com/nmichlo/ruck}
}

Overview

Ruck takes inspiration from PyTorch Lightning's module system. The population creation, offspring, evaluation and selection steps are all contained within a single module inheriting from EaModule. While the training logic and components are separated into its own class.

Members of a Population (A list of Members) are intended to be read-only. Modifications should not be made to members, instead new members should be created with the modified values. This enables us to easily implement efficient multi-threading, see below!

The trainer automatically constructs HallOfFame and LogBook objects which keep track of your population and offspring. EaModule provides defaults for get_stats_groups and get_progress_stats that can be overridden if you wish to customize the tracked statistics and statistics displayed by tqdm.

Minimal OneMax Example

import random
import numpy as np
from ruck import *


class OneMaxMinimalModule(EaModule):
    """
    Minimal onemax example
    - The goal is to flip all the bits of a boolean array to True
    - Offspring are generated as bit flipped versions of the previous population
    - Selection tournament is performed between the previous population and the offspring
    """

    # evaluate unevaluated members according to their values
    def evaluate_values(self, values):
        return [v.sum() for v in values]

    # generate 300 random members of size 100 with 50% bits flipped
    def gen_starting_values(self):
        return [np.random.random(100) < 0.5 for _ in range(300)]

    # randomly flip 5% of the bits of each each member in the population
    # the previous population members should never be modified
    def generate_offspring(self, population):
        return [Member(m.value ^ (np.random.random(m.value.shape) < 0.05)) for m in population]

    # selection tournament between population and offspring
    def select_population(self, population, offspring):
        combined = population + offspring
        return [max(random.sample(combined, k=3), key=lambda m: m.fitness) for _ in range(len(population))]


if __name__ == '__main__':
    # create and train the population
    module = OneMaxMinimalModule()
    pop, logbook, halloffame = Trainer(generations=100, progress=True).fit(module)

    print('initial stats:', logbook[0])
    print('final stats:', logbook[-1])
    print('best member:', halloffame.members[0])

Advanced OneMax Example

Ruck provides various helper functions and implementations of evolutionary algorithms for convenience. The following example makes use of these additional features so that components and behaviour can easily be swapped out.

The three basic evolutionary algorithms provided are based around deap's default algorithms from deap.algorithms. These basic evolutionary algorithms can be created from ruck.functional.make_ea. We provide the alias ruck.R for ruck.functional. R.make_ea supports the following modes: simple, mu_plus_lambda and mu_comma_lambda.

Code Example

Population: return [ np.random.random(self.hparams.member_size) < 0.5 for i in range(self.hparams.population_size) ] if __name__ == '__main__': # create and train the population module = OneMaxModule(population_size=300, member_size=100) pop, logbook, halloffame = Trainer(generations=40, progress=True).fit(module) print('initial stats:', logbook[0]) print('final stats:', logbook[-1]) print('best member:', halloffame.members[0]) ">

"""
OneMax serial example based on:
https://github.com/DEAP/deap/blob/master/examples/ga/onemax_numpy.py
"""

import functools
import numpy as np
from ruck import *


class OneMaxModule(EaModule):

    def __init__(
        self,
        population_size: int = 300,
        offspring_num: int = None,  # offspring_num (lambda) is automatically set to population_size (mu) when `None`
        member_size: int = 100,
        p_mate: float = 0.5,
        p_mutate: float = 0.5,
        ea_mode: str = 'simple'
    ):
        # save the arguments to the .hparams property. values are taken from the
        # local scope so modifications can be captured if the call to this is delayed.
        self.save_hyperparameters()
        # implement the required functions for `EaModule`
        self.generate_offspring, self.select_population = R.make_ea(
            mode=self.hparams.ea_mode,
            offspring_num=self.hparams.offspring_num,
            mate_fn=R.mate_crossover_1d,
            mutate_fn=functools.partial(R.mutate_flip_bit_groups, p=0.05),
            select_fn=functools.partial(R.select_tournament, k=3),
            p_mate=self.hparams.p_mate,
            p_mutate=self.hparams.p_mutate,
        )

    def evaluate_values(self, values):
        return map(np.sum, values)

    def gen_starting_values(self) -> Population:
        return [
            np.random.random(self.hparams.member_size) < 0.5
            for i in range(self.hparams.population_size)
        ]


if __name__ == '__main__':
    # create and train the population
    module = OneMaxModule(population_size=300, member_size=100)
    pop, logbook, halloffame = Trainer(generations=40, progress=True).fit(module)

    print('initial stats:', logbook[0])
    print('final stats:', logbook[-1])
    print('best member:', halloffame.members[0])

Multithreading OneMax Example (Ray)

If we need to scale up the computational requirements, for example requiring increased member and population sizes, the above serial implementations will soon run into performance problems.

The basic Ruck implementations of various evolutionary algorithms are designed around a map function that can be swapped out to add multi-threading support. We can easily do this using ray and we even provide various helper functions that enhance ray support.

We begin by placing member's values into shared memory using ray's read-only object store and the ray.put function. These ObjectRef's values point to the original np.ndarray values. When retrieved with ray.get they obtain the original arrays using an efficient zero-copy procedure. This is advantageous over something like Python's multiprocessing module which uses expensive pickle operations to pass data around.
The second step is to swap out the aforementioned map function in the previous example to a multiprocessing equivalent. We use ray.remote along with ray.get, and provide the ray_map function that has the same API as python map, but accepts ray.remote(my_fn).remote values instead.
Finally we need to update our mate and mutate functions to handle ObjectRefs, we provide a convenient wrapper to automatically call ray.put on function results so that you can re-use your existing code.

Code Example

"""
OneMax parallel example using ray's object store.

8 bytes * 1_000_000 * 128 members ~= 128 MB of memory to store this population.
This is quite a bit of processing that needs to happen! But using ray
and its object store we can do this efficiently!
"""

from functools import partial
import numpy as np
from ruck import *
from ruck.external.ray import *


class OneMaxRayModule(EaModule):

    def __init__(
        self,
        population_size: int = 300,
        offspring_num: int = None,  # offspring_num (lambda) is automatically set to population_size (mu) when `None`
        member_size: int = 100,
        p_mate: float = 0.5,
        p_mutate: float = 0.5,
        ea_mode: str = 'mu_plus_lambda'
    ):
        self.save_hyperparameters()
        # implement the required functions for `EaModule`
        self.generate_offspring, self.select_population = R.make_ea(
            mode=self.hparams.ea_mode,
            offspring_num=self.hparams.offspring_num,
            # decorate the functions with `ray_remote_put` which automatically
            # `ray.get` arguments that are `ObjectRef`s and `ray.put`s returned results
            mate_fn=ray_remote_puts(R.mate_crossover_1d).remote,
            mutate_fn=ray_remote_put(R.mutate_flip_bit_groups).remote,
            # efficient to compute locally
            select_fn=partial(R.select_tournament, k=3),
            p_mate=self.hparams.p_mate,
            p_mutate=self.hparams.p_mutate,
            # ENABLE multiprocessing
            map_fn=ray_map,
        )
        # eval function, we need to cache it on the class to prevent
        # multiple calls to ray.remote. We use ray.remote instead of
        # ray_remote_put like above because we want the returned values
        # not object refs to those values.
        self._ray_eval = ray.remote(np.mean).remote

    def evaluate_values(self, values):
        # values is a list of `ray.ObjectRef`s not `np.ndarray`s
        # ray_map automatically converts np.sum to a `ray.remote` function which
        # automatically handles `ray.get`ing of `ray.ObjectRef`s passed as arguments
        return ray_map(self._ray_eval, values)

    def gen_starting_values(self):
        # generate objects and place in ray's object store
        return [
            ray.put(np.random.random(self.hparams.member_size) < 0.5)
            for i in range(self.hparams.population_size)
        ]


if __name__ == '__main__':
    # initialize ray to use the specified system resources
    ray.init()

    # create and train the population
    module = OneMaxRayModule(population_size=128, member_size=1_000_000)
    pop, logbook, halloffame = Trainer(generations=200, progress=True).fit(module)

    print('initial stats:', logbook[0])
    print('final stats:', logbook[-1])
    print('best member:', halloffame.members[0])

Comments

MANIFEST.in

I'm trying to get this project onto Conda Forge, to facilitate that, is it possible to include an MANIFEST.in file to enumerate the sdist? https://packaging.python.org/guides/using-manifest-in/
enhancement good first issue question

opened by thewchan 1
[FEATURE]: Multi-Threading & Algorithm Helpers/Factories
Currently all algorithms extend from EaModule, often various similarities exist in implementations and code is repeated.

Helper functions, subclasses of EaModule or maybe even mixins might be worth investigating.

eg.

class Problem(EaModuleRay): ... # OR class Problem(EaModule, EaRayMixin): ...

I haven't really pondered too much about these but they may be worthwhile?
enhancement
opened by nmichlo 0

Releases(v0.2.4)

v0.2.4(Oct 20, 2021)
Additions

Functionally equivalent version of NSGA-ii implemented at ruck.functional.select_nsga2

if numba is installed then the function will be JIT compiled for up to 65x faster performance

minor differences exist compared to ruck.external.deap.select_nsga2, but overall results should be similar.

Deprecations

ruck.external.deap.select_nsga2 has been deprecated and will be removed in v0.3.0

Source code(tar.gz)
Source code(zip)
v0.2.3(Oct 17, 2021)
Additions

Added MANIFEST.in

Source code(tar.gz)
Source code(zip)
v0.2.2(Sep 27, 2021)
Breaking change if you are using the ray helper functions that allows us to remove the optional dependencies. The core ruck API remains the same.

ruck.util._ray moved to ruck.external.ray

added ruck.external.deap with select_nsga2 that uses deep behind the scenes. The goal is to implement the ourselves in the near future. Currently it is a bit slow, and means me need extra deps.

More examples!

Source code(tar.gz)
Source code(zip)
v0.2.1(Sep 25, 2021)
Changes

Allow tuples of floats as fitness values

Source code(tar.gz)
Source code(zip)
v0.2.0(Sep 25, 2021)
Changes

Ray remote function support was broken because of algorithm design decisions and use of nested functions and decorators. Updated to remove these limitations. API changes slightly.

R.mate_crossover_nd added to compliment R.mate_crossover_1d

Training checks that all the values in a population maintain the same type, this can help avoid ray multithreading errors when object refs are required instead of values.

Source code(tar.gz)
Source code(zip)
v0.1.0(Sep 25, 2021)

Initial Release
Source code(tar.gz)
Source code(zip)
v0.1.0.dev1(Sep 25, 2021)

Initial dev version
Source code(tar.gz)
Source code(zip)

🧬 Performant Evolutionary Algorithms For Python with Ray support

Related tags

Overview

🧬 Ruck 🏉

Goals

Citing Ruck

Overview

Minimal OneMax Example

Advanced OneMax Example

Multithreading OneMax Example (Ray)

You might also like...

Implementation of Apriori algorithms via Python

A simple python application to visualize sorting algorithms.

Programming Foundations Algorithms With Python

Repository for Comparison based sorting algorithms in python

Python Client for Algorithmia Algorithms and Data API

Algorithms and data structures for educational, demonstrational and experimental purposes.

This is the code repository for 40 Algorithms Every Programmer Should Know , published by Packt.

Solving a card game with three search algorithms: BFS, IDS, and A*

Nature-inspired algorithms are a very popular tool for solving optimization problems.

Comments

MANIFEST.in

[FEATURE]: Multi-Threading & Algorithm Helpers/Factories

Releases(v0.2.4)

v0.2.4(Oct 20, 2021)

v0.2.3(Oct 17, 2021)

v0.2.2(Sep 27, 2021)

v0.2.1(Sep 25, 2021)

Changes

v0.2.0(Sep 25, 2021)

Changes

v0.1.0(Sep 25, 2021)

v0.1.0.dev1(Sep 25, 2021)

Owner

Nathan

This repository is not maintained

A fast python implementation of the SimHash algorithm.

Algorithms and utilities for SAR sensors

Sorting-Algorithms - All information about sorting algorithm you need and you can visualize the code tracer

Implementation of Apriori Algorithm for Association Analysis

HashDB is a community-sourced library of hashing algorithms used in malware.

sudoku solver using CSP forward-tracking algorithms.

Algorithmic Trading with Python

Algorithm and Structured Programming course project for the first semester of the Internet Systems course at IFPB

This is an implementation of the QuickHull algorithm in Python. I

RRT algorithm and its optimization

FLIght SCheduling OPTimization - a simple optimization library for flight scheduling and related problems in the discrete domain

This is a demo for AAD algorithm.

Pathfinding algorithm based on A*

Python package to monitor the power consumption of any algorithm

Apriori - An algorithm for frequent item set mining and association rule learning over relational databases

Minimal examples of data structures and algorithms in Python

Python implementation of Aho-Corasick algorithm for string searching

The DarkRift2 networking framework written in Python 3

Programming Foundations Algorithms With Python