Python script to preprocess images of all Pokémon to finetune ruDALL-E

Last update: Dec 11, 2022

Overview

ai-generated-pokemon-rudalle

Python script to preprocess images of all Pokémon (the "official artwork" of each Pokémon via PokéAPI) into a format such that it can be used to finetune ruDALL-E using the finetuning example Colab Notebook linked in that repo. This workflow was used to create a model that resulted in AI-Generated Pokemon that went viral (10k+ retweets on Twitter + 30k+ upvotes on Reddit)

My modified Colab Notebook that I used to finetune the model on Pokémon is here: this Notebook's release is purely for demonstration/authentication purposes and no support will be given on how to use it because it is incredibly messy and embarrassing, but there may be a few ideas there that are useful for future generation. Some notes on how the process works are included below, with oppertunity to reproduce/improve it.

The script outputs two things: an images folder with all the preprocessed images plus a data_desc.csv file which contains the image path and Russian caption pairs for finetuning. Some examples of the preprocessed input images are present in the images folder, plus the final data_desc.csv.

The model used is not included in this repo because it's currently too large (~3GB) to distribute (will add the model to Hugging Face at some point).

Preprocessing Script Notes

The GraphQL interface to PokéAPI is used as it allows to retrieve the type information plus IDs of all Pokémon in a single request. As a bonus, the returned IDs include the alternate forms of Pokémon (e.g. Mega) which would not otherwise be present just by incrementing IDs.
ruDALL-E requires 256x256px, RGB input images. In this case the source input images from PokéAPI are conveiently both square and larger than 256x256 so they downsample nicely. Since the images have transparency (RGBA), they are composited onto a white background.
The translation service used is Yandex, which apparently has decent rate limits, plus as a Russian company the translations from English to Russian should theoetically be better.
The captions (which are later translated into Russian) are determined by type. For example, a Grass/Poison type will have the caption A Grass-type and Poison-type Pokémon, which is then translated into Russian. In theory, this improves the finetuning process by allowing ruDALL-E to notice trends, plus in theory this can be leveraged at generation-time to control the generation (e.g. prompt with A Grass-type Pokémon and have ruDALL-E generate only Grass-type Pokémon)
Due to potential rate limits on translation, translations are cached at runtime by Pokémon type(s) so the API is pinged only once.

Finetuning and Generation Notes

The model used above was trained for 12 epochs (4.5 hours on a P100), at a max learning rate of 1e-5. The pct_start param of the OneCycleLR scheudler was set to 0.1 so that learning rate decay happens faster. Despite that, the model converged quickly.

The parameters for finetuning ruDALL-E are very difficult to get the expected results. Too little training and the output images will be too incoherent; too much training and the model will overfit and output the source images, and also ignore any text prompts. In the social media posts above, the model is slightly overfit and attempts at using text prompts to control generation failed. But overfitting is not necessairly a bad thing as long as it avoids verbatim output.

Usage

You can install the dependences via:

pip3 install Pillow requests translatepy tqdm

Then run build_image_dataset.py

Getting the images into the ruDALL-E finetuning Colab Notebook is up to the user, but the recommended way to do so is to ZIP the generated images folder (~42 MB!), upload it to Colab (or upload to Google Drive and copy it into the Notebook from there), and unzip the folder in Colab itself via !unzip.

Maintainer/Creator

Max Woolf (@minimaxir)

Max's open-source projects are supported by his Patreon and GitHub Sponsors. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

MIT

Python script to preprocess images of all Pokémon to finetune ruDALL-E

Related tags

Overview

ai-generated-pokemon-rudalle

Preprocessing Script Notes

Finetuning and Generation Notes

Usage

Maintainer/Creator

License

Owner

Max Woolf

A clock purely made with python(turtle)...

Python script to autodetect a base set of swiftlint rules.

Implemented Exploratory Data Analysis (EDA) using Python.Built a dashboard in Tableau and found that 45.87% of People suffer from heart disease.

Web站点选优工具 - 优化GitHub的打开速度、高效Clone

JLC2KICAD_lib is a python script that generate a component library for KiCad from the JLCPCB/easyEDA library.

A faster Python generator that get function results from multi-process workers

It is Keqin Wang first project in CMU, trying to use DRL(PPO) to control a 5-dof manipulator to draw line in space.

Painel de consulta

SkyPort console user terminal written in python

Block the annoying Token Grabbers on your discord

A simple program to recolour simple png icon-like pictures with just one colour + transparent or white background. Resulting images all have transparent background and a new colour.

A MCPI hack with many features.

Python client SDK designed to simplify integrations by automating key generation and certificate enrollment using Venafi machine identity services.

Repositorio com arquivos processados da CPI da COVID para facilitar analise

Bootcamp de Introducción a la Programación. Módulo 6: Matemáticas Discretas

WordlistPasswordGenerator - Shuhfab Basheer

✔️ Create to-do lists to easily manage your ideas and work.

Tool for running a high throughput data ingestion/transformation workload with MongoDB

ChieriBot,词云API版,用于统计群友说过的怪话

Just some mtk tool for exploitation, reading/writing flash and doing crazy stuff