Python script to preprocess images of all Pokémon to finetune ruDALL-E

Overview

ai-generated-pokemon-rudalle

Python script to preprocess images of all Pokémon (the "official artwork" of each Pokémon via PokéAPI) into a format such that it can be used to finetune ruDALL-E using the finetuning example Colab Notebook linked in that repo. This workflow was used to create a model that resulted in AI-Generated Pokemon that went viral (10k+ retweets on Twitter + 30k+ upvotes on Reddit)

My modified Colab Notebook that I used to finetune the model on Pokémon is here: this Notebook's release is purely for demonstration/authentication purposes and no support will be given on how to use it because it is incredibly messy and embarrassing, but there may be a few ideas there that are useful for future generation. Some notes on how the process works are included below, with oppertunity to reproduce/improve it.

The script outputs two things: an images folder with all the preprocessed images plus a data_desc.csv file which contains the image path and Russian caption pairs for finetuning. Some examples of the preprocessed input images are present in the images folder, plus the final data_desc.csv.

The model used is not included in this repo because it's currently too large (~3GB) to distribute (will add the model to Hugging Face at some point).

Preprocessing Script Notes

  • The GraphQL interface to PokéAPI is used as it allows to retrieve the type information plus IDs of all Pokémon in a single request. As a bonus, the returned IDs include the alternate forms of Pokémon (e.g. Mega) which would not otherwise be present just by incrementing IDs.
  • ruDALL-E requires 256x256px, RGB input images. In this case the source input images from PokéAPI are conveiently both square and larger than 256x256 so they downsample nicely. Since the images have transparency (RGBA), they are composited onto a white background.
  • The translation service used is Yandex, which apparently has decent rate limits, plus as a Russian company the translations from English to Russian should theoetically be better.
  • The captions (which are later translated into Russian) are determined by type. For example, a Grass/Poison type will have the caption A Grass-type and Poison-type Pokémon, which is then translated into Russian. In theory, this improves the finetuning process by allowing ruDALL-E to notice trends, plus in theory this can be leveraged at generation-time to control the generation (e.g. prompt with A Grass-type Pokémon and have ruDALL-E generate only Grass-type Pokémon)
  • Due to potential rate limits on translation, translations are cached at runtime by Pokémon type(s) so the API is pinged only once.

Finetuning and Generation Notes

  • The model used above was trained for 12 epochs (4.5 hours on a P100), at a max learning rate of 1e-5. The pct_start param of the OneCycleLR scheudler was set to 0.1 so that learning rate decay happens faster. Despite that, the model converged quickly.

  • The parameters for finetuning ruDALL-E are very difficult to get the expected results. Too little training and the output images will be too incoherent; too much training and the model will overfit and output the source images, and also ignore any text prompts. In the social media posts above, the model is slightly overfit and attempts at using text prompts to control generation failed. But overfitting is not necessairly a bad thing as long as it avoids verbatim output.

Usage

You can install the dependences via:

pip3 install Pillow requests translatepy tqdm

Then run build_image_dataset.py

Getting the images into the ruDALL-E finetuning Colab Notebook is up to the user, but the recommended way to do so is to ZIP the generated images folder (~42 MB!), upload it to Colab (or upload to Google Drive and copy it into the Notebook from there), and unzip the folder in Colab itself via !unzip.

Maintainer/Creator

Max Woolf (@minimaxir)

Max's open-source projects are supported by his Patreon and GitHub Sponsors. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

MIT

Owner
Max Woolf
Data Scientist @buzzfeed. Plotter of pretty charts.
Max Woolf
A project to work with databases in 4 worksheets, insert, update, select, delete using Python and MySqI

A project to work with databases in 4 worksheets, insert, update, select, delete using Python and MySqI As a small project for school or college hope it is useful

Sina Org 1 Jan 11, 2022
Simply create JIRA releases based on your github releases

Simply create JIRA releases based on your github releases

8 Jun 17, 2022
Senior Comprehensive Project For Python

Senior Comprehensive Project Author: Grey Hutchinson My project, which I nicknamed “Murmur”, was to create a research tool that would use neural netwo

1 May 29, 2022
JARVIS PC Assistant is an assisting program to make your computer easier to use

JARVIS-PC-Assistant JARVIS PC Assistant is an assisting program to make your computer easier to use Welcome to the J.A.R.V.I.S. PC Assistant help file

Dasun Nethsara 2 Dec 02, 2022
This code can help you with auto update for-TV-advertisements in the store.

Auto-update-files-for-TV-advertisements-in-the-store This code can help you with auto update for-TV-advertisements in the store. It was write for Rasp

Max 2 Feb 20, 2022
Fithub is a website application for athletes and fitness enthusiasts of all ages and experience levels.

Fithub is a website application for athletes and fitness enthusiasts of all ages and experience levels. Our website allows users to easily search, filter, and sort our comprehensive database of over

Andrew Wu 1 Dec 13, 2021
A self contained invitation management system for gatekeeping.

Invitease Description A self contained invitation management system for gatekeeping. Purpose Serves as a focal point for inviting guests to a venue pr

מעגן מיכאל 7 Jul 19, 2022
Blender-3D-SH-Dma-plugin - Import and export Sonic Heroes Delta Morph animations (.anm) into Blender 3D

io_scene_sonic_heroes_dma This plugin for Blender 3D allows you to import and ex

Psycrow 3 Mar 22, 2022
⚡KiCad library containing footprints and symbols for inductive analog keyboard switches

Inductive Analog Switches This library contains footprints and symbols for inductive analog keyboard switches for use with the Texas Instruments LDC13

Elias Sjögreen 3 Jun 30, 2022
Donatus Prince 6 Feb 25, 2022
Collection of functions for working with interlaced content in VapourSynth.

vsfieldkit Collection of functions for working with interlaced content in VapourSynth. It does not have any hard dependencies outside of VapourSynth.

Justin Turner Arthur 11 May 27, 2022
🌲 Um simples criador de arvore de items feito em Python para o Prompt 🐍

Esse projeto foi feito em Python com, intuito de fortificar meu aprendizado de programação. Sobre • Tecnologias • Pré Requisitos • Licença • Autor 📄

Kawan Henrique 1 Aug 02, 2021
Dotfiles & list of programs

dotfiles & list of programs So I wanted to just backup my most used files. I have a bad habit, sometimes I get tired of a distro and do a wipe and sta

2 Sep 04, 2022
A simple chatbot that I made for school project

Chatbot: Python A simple chatbot that I made for school Project. Tho this chatbot is dumb sometimes, but it's not too bad lol. Check it Out! FAQ How t

Prashant 2 Nov 13, 2021
Organize seu linux - organize your linux

OrganizeLinux Organize seu linux - organize your linux Organize seu linux Uma forma rápida de separar arquivos dispersos em pastas. formatos a serem c

Marcus Vinícius Ribeiro Andrade 1 Nov 30, 2021
This is a Python 3.10 port of mock, a library for manipulating human-readable message strings.

This is a Python 3.10 port of mock, a library for manipulating human-readable message strings.

Alexander Bartolomey 1 Dec 31, 2021
Beacon Object File (BOF) to obtain a usable TGT for the current user.

Beacon Object File (BOF) to obtain a usable TGT for the current user.

Connor McGarr 109 Dec 25, 2022
🌌A Python library to exhaustively enumerate a combinatorial space represented by a function

exhaust A Python library to exhaustively enumerate a combinatorial space represented by a function. The API is modelled after Python's random module a

Maik Riechert 1 Dec 05, 2021
Library to emulate the Sneakers movie effect

py-sneakers Port to python of the libnms C library To recreate the famous data decryption effect shown in the 1992 film Sneakers. Install pip install

Nicolas Rebagliati 11 Aug 27, 2021
Ml-design-patterns - Source code accompanying O'Reilly book: Machine Learning Design Patterns

This is not an official Google product ml-design-patterns Source code accompanying O'Reilly book: Title: Machine Learning Design Patterns Authors: Val

Google Cloud Platform 1.5k Jan 05, 2023