Python script to preprocess images of all Pokémon to finetune ruDALL-E

Overview

ai-generated-pokemon-rudalle

Python script to preprocess images of all Pokémon (the "official artwork" of each Pokémon via PokéAPI) into a format such that it can be used to finetune ruDALL-E using the finetuning example Colab Notebook linked in that repo. This workflow was used to create a model that resulted in AI-Generated Pokemon that went viral (10k+ retweets on Twitter + 30k+ upvotes on Reddit)

My modified Colab Notebook that I used to finetune the model on Pokémon is here: this Notebook's release is purely for demonstration/authentication purposes and no support will be given on how to use it because it is incredibly messy and embarrassing, but there may be a few ideas there that are useful for future generation. Some notes on how the process works are included below, with oppertunity to reproduce/improve it.

The script outputs two things: an images folder with all the preprocessed images plus a data_desc.csv file which contains the image path and Russian caption pairs for finetuning. Some examples of the preprocessed input images are present in the images folder, plus the final data_desc.csv.

The model used is not included in this repo because it's currently too large (~3GB) to distribute (will add the model to Hugging Face at some point).

Preprocessing Script Notes

  • The GraphQL interface to PokéAPI is used as it allows to retrieve the type information plus IDs of all Pokémon in a single request. As a bonus, the returned IDs include the alternate forms of Pokémon (e.g. Mega) which would not otherwise be present just by incrementing IDs.
  • ruDALL-E requires 256x256px, RGB input images. In this case the source input images from PokéAPI are conveiently both square and larger than 256x256 so they downsample nicely. Since the images have transparency (RGBA), they are composited onto a white background.
  • The translation service used is Yandex, which apparently has decent rate limits, plus as a Russian company the translations from English to Russian should theoetically be better.
  • The captions (which are later translated into Russian) are determined by type. For example, a Grass/Poison type will have the caption A Grass-type and Poison-type Pokémon, which is then translated into Russian. In theory, this improves the finetuning process by allowing ruDALL-E to notice trends, plus in theory this can be leveraged at generation-time to control the generation (e.g. prompt with A Grass-type Pokémon and have ruDALL-E generate only Grass-type Pokémon)
  • Due to potential rate limits on translation, translations are cached at runtime by Pokémon type(s) so the API is pinged only once.

Finetuning and Generation Notes

  • The model used above was trained for 12 epochs (4.5 hours on a P100), at a max learning rate of 1e-5. The pct_start param of the OneCycleLR scheudler was set to 0.1 so that learning rate decay happens faster. Despite that, the model converged quickly.

  • The parameters for finetuning ruDALL-E are very difficult to get the expected results. Too little training and the output images will be too incoherent; too much training and the model will overfit and output the source images, and also ignore any text prompts. In the social media posts above, the model is slightly overfit and attempts at using text prompts to control generation failed. But overfitting is not necessairly a bad thing as long as it avoids verbatim output.

Usage

You can install the dependences via:

pip3 install Pillow requests translatepy tqdm

Then run build_image_dataset.py

Getting the images into the ruDALL-E finetuning Colab Notebook is up to the user, but the recommended way to do so is to ZIP the generated images folder (~42 MB!), upload it to Colab (or upload to Google Drive and copy it into the Notebook from there), and unzip the folder in Colab itself via !unzip.

Maintainer/Creator

Max Woolf (@minimaxir)

Max's open-source projects are supported by his Patreon and GitHub Sponsors. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

MIT

Owner
Max Woolf
Data Scientist @buzzfeed. Plotter of pretty charts.
Max Woolf
100 Days of Python Programming

100 days of Python Following the initiative of my friend Helber Belmiro, who is almost done with his 100 days of Java, I have decided to start my 100

Henrique Pereira 19 Nov 08, 2021
A tool for checking if the external data used in Flatpak manifests is still up to date

Flatpak External Data Checker This is a tool for checking for outdated or broken links of external data in Flatpak manifests. Motivation Flatpak apps

Flathub 76 Dec 24, 2022
Button paginator using discord_components

Button Paginator With discord-components Button paginator using discord_components Welcome! It's a paginator for discord-componets! Thanks to the orig

Decave 7 Feb 12, 2022
Awesome Casino is simple offline casino made on python.

Awesome-Casino Awesome Casino is simple offline casino made on python. I found bug, what can i do? If you find any bug or want to suggest any idea, al

Herman 1 Feb 04, 2022
Active Transport Analytics Model: A new strategic transport modelling and data visualization framework

{ATAM} Active Transport Analytics Model Active Transport Analytics Model (“ATAM”

ATAM Analytics 2 Dec 21, 2022
Lock a program and kills it indefinitely if it is started.

Kill By Lock Lock a program and kills it indefinitely if it is started. How start it? It' simple, you just have to double-click on the python file (.p

1 Jan 12, 2022
Script de monitoramento das teclas do teclado, salvando todos os dados digitados em um arquivo de log juntamente com os dados de rede.

listenerPython Script de monitoramento das teclas do teclado, salvando todos os dados digitados em um arquivo de log juntamente com os dados de rede.

Vinícius Azevedo 4 Nov 27, 2022
A programming language that for tech savvy graphic designers

Microsoft Hackathon - PhoTex Idea A programming language that allows tech savvy graphic designers develop scalable vector graphics using plain text co

Joe Furfaro 5 Nov 14, 2021
使用clash核心,对服务器进行Netflix解锁批量测试。

注意事项 测速及解锁测试仅供参考,不代表实际使用情况,由于网络情况变化、Netflix封锁及ip更换,测速具有时效性 本项目使用 Python 编写,使用前请完成环境安装 首次运行前请安装pip及相关依赖,也可使用 pip install -r requirements.txt 命令自行安装 Net

11 Dec 07, 2022
Code for the manim-generated scenes used in 3blue1brown videos

This project contains the code used to generate the explanatory math videos found on 3Blue1Brown. This almost entirely consists of scenes generated us

Grant Sanderson 4.1k Jan 02, 2023
An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing

An experimental Python-to-C transpiler and domain specific language for embedded high-performance computing

Andrea Zanelli 562 Dec 28, 2022
laTEX is awesome but we are lazy -> groff with markdown syntax and inline code execution

pyGroff A wrapper for groff using python to have a nicer syntax for groff documents DOCUMENTATION Very similar to markdown. So if you know what that i

Subhaditya Mukherjee 27 Jul 23, 2022
Something like Asteroids but not really, done in CircuitPython

CircuitPython Staroids Something like Asteroids, done in CircuitPython. Works with FunHouse, MacroPad, Pybadge, EdgeBadge, CLUE, and Pygamer. circuitp

Tod E. Kurt 14 May 31, 2022
Installer, package manager, build wrapper and version manager for Piccolo

Piccl Installer, package manager, build wrapper and version manager for Piccolo

1 Dec 19, 2021
Mail Me My Social Media stats (SoMeMailMe)

Mail Me My Social Media follower count (SoMeMailMe) TikTok only show data 60 days back in time. With this repo you can easily scrape your follower cou

Daniel Wigh 1 Jan 07, 2022
Singularity Containers on Apple M1 (ARM64)

Singularity Containers on Apple M1 (ARM64) This is a repository containing a ready-to-use environment for singularity in arm64 (M1). It has been prepa

Manuel Parra 4 Nov 14, 2022
Render reMarkable documents to PDF

rmrl: reMarkable Rendering Library rmrl is a Python library for rendering reMarkable documents to PDF files. It takes the original PDF document and th

Robert Schroll 95 Dec 25, 2022
BridgeWalk is a partially-observed reinforcement learning environment with dynamics of varying stochasticity.

BridgeWalk is a partially-observed reinforcement learning environment with dynamics of varying stochasticity. The player needs to walk along a bridge to reach a goal location. When the player walks o

Danijar Hafner 6 Jun 13, 2022
Team collaborative evaluation tracker.

Team collaborative evaluation tracker.

2 Dec 19, 2021
Using graph_nets for pion classification and energy regression. Contributions from LLNL and LBNL

nbdev template Use this template to more easily create your nbdev project. If you are using an older version of this template, and want to upgrade to

3 Nov 23, 2022