The Challenges of Dante LoRA V 2.0

As many of you know, I love AI a bunch. It's been a lot of fun for me and I've integrated it really well into my art pipeline for creating silly pictures of Dante. To do this, however, I can't just wire in prompts into YiffyMix, but I actually have my own custom of LoRA of Dante that I've been working on for some time. A LoRA is a means of adding your own concept, object or style into a stable diffusion model, so that it "learns" the new thing.

The general process is itself a kind of art, in that you have to do a lot of experimentation in order to get things right and you tweak and mold the results to try and improve the outputs. But until each LoRA is created, you're basically blind. The basics involve cropping and resizing art pictures of your character and then creating text documents for each picture a prompt that describes the picture - except for the character you want imprinted in the LoRA. That said, and this is part of the art, I was finding that I got better results when I *DID* give it some details about what Dante is, like "fennec" or "furry", "blue fur" ect. ect.

Choosing pictures is also hard, and I'm finding myself trying to "think like the AI" to figure out what will confuse it. I have no idea how they managed to just drop E621 into YiffyMix and get the results they get. AI is super finicky and struggles with multiple characters, or body positions that are not upright. Apparently AI believes we can't be upside down, which eliminates a LOT of fun postures. But also if you start hiding your arm too many times, it might believe your character is an amputee, or if the tail comes out the other side, it might believe you have TWO tails and for a while, it was trying to draw Dante with four ears.

My initial embedding efforts for nearly a year were pretty much useless (embeddings being the precursor to LoRAs). I was so excited to even see a blue furry cat person. That said, this last time I really seemed to hit a good little nitch, getting just the right number of epochs and retries for learning, the clip space I realized was off (YiffyMix is 1 instead of the usual anime 2 O_O) and I got much better at hand creating my prompts. I also found a nice mixture of backgrounds that let the LoRA focus on what a "Dante" was without losing the ability to create backgrounds.

The results have given me a ton of great pictures/bases, most of which aren't shared on here and a number are awaiting my paint brush to complete them. That is, even though they look way better than before and many of the basic anatomy segments are an improvement of what I want Dante to be over the base artwork, the outputs are still majorly flawed. The biggest glare I always note is that the ear dashes are missing - but if you look at any of the pictures you'll often notice the base pictures are generally full of little issues that I'm hoping to correct.

So. After months, I'm now trying to build a new version 2.0 LoRA, with way way more effort than normal... It may all be wasted, who knows. This is the kind of science/art thing where you put in a ton of effort and you're rewarded with a worse model than you have before. That's why it's always good to have backups!

While I'd love to use Stable Diffusion XL Loras, as of yet, my computer is terrible at XL (20 minutes per picture) and training it require a GPU beyond my abilities (I only have my 3070). I can't even build a Dreambooth. I also only had about 36 base images I used to begin with - and made extra use of the fact that I could often double use pictures for a far shot, sometimes a midshot and occasionally even a close up. Given the data is split between a few image sizes (512x512, 768x512, 512x768 and 768x768) these do matter given that I can't fit the same information into each based on the aspect ratio.

My latest version has 10 really good pictures of Dante on the runway - but... to my surprise, I'm not actually getting 46 images. In the process I instead identified 10 images I should probably prune from the original data set. I might be adding about 4-5 more before I go for launch though, so that might give me 40 or 41 images? That will give me about 11-13% more data overall, but more importantly, the data I DO have should be of higher quality on this iteration.

In particular, some older images had multiple characters visible, which confuse the AI. I also had a couple without ear dashes, which I was deeply motivated to remove. But... By my calculations, 86% of my pictures ALREADY have ear dashes. So that means the AI SHOULD know they're there, but they almost never produce them and they're always gone by the time I hit a high-res fix. The new version will have a 100%, but... will that improve anything that much?

More comical is tracking all the other features. We pick on AI for getting the wrong number of fingers, but because my characters are too, about 50% of my pictures have 5 digits and 50% have 4... The ref sheet has 4, so in actuality, the new data set is a 6% downgrade XD. I might be able to do a few edits to improve this a bit. Likewise, however, for toes, 59% of my data values had 3 toes, while 61% had 4 toes. This will actually have a significant improvement to 81% or higher! But it makes me laugh that I picked on AI for drawing hands or feet and here I don't have consistent feet or toes on my own character. In fact, I found a couple where I was like "Did I draw them with THREE fingers here?!"

Tail stripes are likewise ALL over the place and even in the new data set, the new data is basically still all over the place with a pitiful 3-4% improvement compared to the canonical ref sheet.

Apparently getting this data right will be hard. On the plus side, all the colors will be almost exclusively from the official ref color sheet now. That should improve color quality, maybe? There are some good shots of his boy bits too, so I'm really hopeful that I might get better results there.

As to other things I want to try, I'm hoping to leverage the AI this time around to create multiple backdrops for my images. I have them with alphas, so I can readily place my characters in any scene the AI creates. I am also thinking of trying the normal mapping code that comes with the ability to make 3D scenes. If THAT works, I might be able to create my training pictures with multiple light sources and multiple backgrounds, synthetically increasing the amount of data without accidently contaminating the background image data with the idea that A) all pictures of Dante are plain boring backgrounds, or B) all pictures of Dante are the same background. I don't want the AI to attach the background to Dante at all.

I've also thought of making the background blue noise, just to see what the AI does with that :D. Blue noise is basically featureless, so there is no "image" to be learned, it is... unlearnable and lacks any concept of a shape to grasp onto. This is why we use it to blend colors together! On the flip side, it might have also harmed a few of my training images as, for a while I had a habit of adding a little blue noise to my images to make them look better as a trick.

The final aspect is that I'm going to be trying my hand at regularization images, though... that will be a new experiment and I'm not sure how well it will go. Supposedly it will make it less likely that model overtrains on the data so it makes more original content.

Another hard bit I'm running into is that the new YiffyMix V35 doesn't seem to produce as cute of a Dante as V34. It's making him look older and more muscular even when Dante was never particularly muscular ever. This might just be a consequence of using a V34 model on V35, I will have to retrain with my old parameters next week and see. But knowing this community, it's also possibly they scrubbed all the younger characters from E621. V35 isn't all that much better for anything else, so if I must train on 34 I have a copy and hooray!

But that's the long crazy of all I'm working through on the road to making better Dante pictures. The dream goal is to be able to start making Dante pictures with the model that would be good enough to TRAIN a model on, even if only occasionally. So far, I've made hundreds or thousands of pictures with this, but... nothing I've made has ever been good enough to use as a base for training without me having to go back in and redraw it. The wild part is, once I DO redraw, it creates better training data than I ever could have hoped for. That's part of why I'm excited that the new data set will have 25-38% of it's training set created with AI origins. If that works, it's quite possible that a Date V3 LoRA, potentially on Stable Diffusion XL or another LoRA will be 100% AI Assisted in origin - or even 100% AI generated. AI training itself is a crazy idea, even if I'm guiding it by saying "This picture good. This picture bad."

Viewed:	22 times
Added:	2 years, 4 months ago 04 Nov 2023 03:03 CET

New Comment:

Move reply box to top ↑

Log in or create an account to comment.

Help

Information

Policy & Legal

All artwork and other content is copyright its respective owners.

Powered by Harmony 'Gravitation' Release 80.

Content Server: Los Angeles Cache - provided by Inkbunny Donors. Background: Blank Gray.

The Inkbunny web application, artwork, name and logo are copyright and trademark of their respective owners.