Using AI and GANs to create (weed) NFTs from custom data

Walkthrough of creating Citral - Blunt Fact #7 with a custom GAN and Stable Diffusion

Nov 20, 2022

Since the last post in May, I have been improving image generation for Cluutch's Blunt Facts NFT collection. If I don’t get lazy, this will be the first in a series about using machine learning, specifically GANs, to generate images that can used for NFTs. This week’s post will walk through the creation of Citral, Blunt Fact #7.

tl;dr: Citral was created with a custom GAN that was trained by images of weed scraped from online dispensaries. That machine generated output was then upscaled and beautified with Stable Diffusion.

Context

The artwork for the older Blunt Facts NFTs is garbage. The very first release, Moonbow, was somewhat artistic, made by tracing an image of a bud and then styling it with Gimp.

But that process was complex and required an underlying image to trace. After that, I standardized the process by taking pictures of buds on white backgrounds and used basic image processing to crop the backgrounds.

This results in shoddy images with jagged edges. Let’s see if we can do better with AI.

Code

All of the code used is below. Weed info was pulled from Cluutch (see earlier posts).

What is a GAN?

GANs were invented in 2014 and are a powerful method for text to image generation. There are already lots of great resources covering GAN basics.

File:A-Standard-GAN-and-b-conditional-GAN-architecturpn.png — Source: Wikimedia

For now, you just need to understand the terms generator and discriminator. Generators generate images and try to fool the discriminator. Discriminators are provided two images: one real, and one from the generator. They must learn to differentiate real images from generated ones. As the system progresses, the generator should get better at generating realistic images and the discriminator should get better at identifying the fake.

Building a GAN

I am not an AI expert. There are no novel technical implementations covered in this post. Instead, what I found missing from existing GAN resources was an end-to-end guide for custom labeled data sets. The GAN in this post does not yet use labeled data. It does use custom data scraped from dispensary websites. This is the process I followed:

Prepare images for GAN
1. Query recent listings from Cluutch API
2. Download images from online dispensaries
3. Downscale and convert images to numpy array
4. Save raw numpy array in a cache
Run the model
Use Stable Diffusion for polish
Overlay strain facts on image
Mint NFT

Preliminary results

If you take a look at random outputs from the model, you will notice a few issues.

Mode collapse: Many of the outputs look very similar. This suggest the model has gotten stuck instead of using a wide range of underlying traits.
Upscaling needed: The GAN is operating on downscaled 100x100 pixel images. A beautiful piece of art needs to be higher definition.
Erratic generator loss: The model was run for over 200 epochs but the generator loss never stabilizes.

loss_stable_gans — Ideal loss stabilizes over time (source)

Actual loss of custom GAN never stabilizes

Stable Diffusion as an upscaling hack

It’s clear the current GAN is not perfect and it will take effort to improve. Even once the model is tweaked, the output will need to be upscaled into a higher resolution. In order to get something beautiful quicker, I am using Stable Diffusion to transform the low quality outputs. DALL-E and Stable Diffusion are two popular tools for text to image generation.

The resulting image looks nice, but like me you are probably curious how much of the final image can be attributed to the custom GAN vs Stable Diffusion. Did I just add a lot of unnecessary complexity that is not contributing to the output from Stable Diffusion? The table below attempts to address that question. It shows the outputs of two different prompts when generated with Stable Diffusion exclusively vs using the image from the custom GAN as a starting point.

Outputs from Stable Diffusion by varying the prompt and supplied image

Although I don’t propose a way to quantify the impact of the custom GAN, these observations can be made:

Without help from the custom GAN, Stable Diffusion is more likely to misinterpret the prompt.
There is less variance in the output for img2img than when unconstrained. Stable Diffusion really is using supplied image as a starting point.

Future work

The end goal is not only to create beautiful NFTs for Cluutch, but to generalize the work and provide GAN as a service.

[x] GAN for base image creation
[x] Upscaling with Stable Diffusion
[ ] Labeled GAN based on strain name
[ ] Improve the GAN model + add evaluation score
[ ] Add styling / theme
[ ] Upscale with custom net, remove Stable Diffusion
[ ] Use AI to select training data (remove irrelevant images)
[ ] GAN as a service

Coming up clutch

Discussion about this post