SD & A1111 Tutorial 2: Prompt & Parameter Tuning

1 of 15 next end

set default image size: small | medium | wide

1 of 15 next end

show original thumbnails

by Logically

Gallery

+11

Stable Diffusion and Automatic1111 Tutorial Series

« prev 4 in pool next »

+11

Stable Diffusion and Automatic1111 Tutorial Series (journal link)

Covered in this tutorial

• Writing and tuning prompts, word selection, and prompt weights
• Parameter settings in SD/A1111, what they do, and how to experiment with values
• Links to more comprehensive tutorials on individual topics

On Personal Taste

Just a general bit of advice before we get into things. The values for settings I use here are not meant to be prescriptive. They are just the settings that happen to work for me.

I want the main takeaway from this tutorial to be that you should experiment with your settings to see what works best for you. Generate renders with different values and see which ones you like best. This can be influenced by many factors, but most importantly will come down to personal taste.

X/Y/Z plot script

The grid images in this tutorial are made with the “X/Y/Z plot” script in A1111. This is a super useful tool for testing parameter values. I also use it for generating upscales of a hand picked set of seeds (see Tutorial 1).

To use this, select it from the Script dropdown at the bottom. Set just “X type” to the variable you are testing. “X values” is usually just a comma separated list of values. If you are adding Y and Z, just be aware that it will render every X/Y/Z combination, so don’t use too many values.

Images 3A and 3B show the settings I used for the iteration steps grid in image 2C. The setups for the other grids are similar except the X settings are different. The attached text file contains the exact settings used.

Prompt: Structure

Let’s start with writing prompts. If you look at the prompt info in any AI posts on Inkbunny, you’ll see that prompts are usually just an arbitrary list of terms and phrases separated by commas. As far as I know, the actual ordering of terms does have a small random effect, but ultimately doesn’t really matter. This means you can sort of just put terms in any order you want.

However, this means that things can get messy when writing prompts. A tiny bit of organization can go a long way to remembering where everything is. This is how I typically keep things organized:

• First line: Info about subject; including species, gender, clothing, pose, visible (ahem) body parts, other visual characteristics.
• The actions/activity the subject is doing can get its own line, unless it can fit at the end of the first line.
• If the scene involves certain (ahem) bodily fluids, the descriptions of the placement and amounts of those usually needs its own line.
• The scene, setting, and background objects can get its own line, unless it can fit at the start of the last line.
• Last line: visual style info; including point-of-view (front/back/side view), camera angle (low/high angle view), closeness (full body view/closeup), visual style (realistic/anime/cartoon), quality (masterpiece/high quality/detailed background).

These lines are explicit newlines. I try to keep each line smaller than the width of the A1111 text box.

The negative prompt for me is usually short enough that it all fits on one line.

Prompt: Word Selection

When you are starting out, it may be confusing to know which words to put in the prompt. Doing a lot of trial and error can be a good way to figure out what words work. But there are other things you can look at.

AI posts on Inkbunny, CivitAI, and other places include prompt data. Just find posts with the thing you want and look at the prompt text to see how they made it work. Don’t just stick to one person’s work; everyone will do things differently, but the similarities of what words are used will reveal what things work best.

Also, I mentioned in last tutorial but repeating here: e621.net is a great source for finding the correct words to put into prompts. If you are looking for a specific concept but don’t know the specific phrase, make sure it is a common e621 search term. There may be different ways to phrase a specific concept, but the one that generates search results will be more likely to do what you want.

Also, be aware that some words are actually textual inversions (TI) or negative embeddings. If you don’t have the specific inversion file, it will not have the same effect. Search for “TI hashes” in the prompt info to see which terms are these. Search CivitAI with the hash value. It should be a .pt file. Just having this file in the “embeddings” folder and including the term in the prompt should apply it.

Just a general PSA: for posts on Inkbunny, do not use artist names in your prompts. This goes against the Inkbunny Acceptable Content Policy. https://wiki.inkbunny.net/wiki/ACP#AI

Image 2A shows the effect of different “[word] in background” prompts. This is made with the “Prompt S/R” (search/replace) option of “X/Y/Z plot”. This replaces the first search term. So for my input: “buildings,fountain,trees,playground,lake”, the word “buildings” was replaced in the original prompt.

Prompt: Weights

Sometimes, a term in the prompt can be too weak or too strong. Prompt weights are a way to control how strongly a term influences the prompt.

Every term has a default weight of 1.0. Certain syntax can be applied to change this.

• (keyword:1.2) or (keyword:0.8) can be used to specifically set the exact number. This is how I usually do it.
In A1111, Ctrl+up and Ctrl+down will automatically add parentheses and change weight by 0.1.
• (keyword)+ or (keyword)++ or (keyword)- or (keyword)- can also work but I don’t normally do this.

Weights can be as high or low as you want. I typically try to stay under 1.4, almost never higher than 1.6. Having to go this high usually means its not the right words or I’m just really desperate. Doing this can overpower the rest of your prompt.

I honestly don’t have an explanation of the mathematical formula that the weight plugs into. It’s not really important to know. The main thing that can help your understanding is experimenting with different weights to see the effect on the prompt.

Image 2B shows the effect of different prompt weights on “sniffing flowers”. Going too high on this messes up the result. Going too low doesn’t achieve the right pose. This also used the “Prompt S/R” option.

A more comprehensive tutorial: https://getimg.ai/guides/guide-to-stable-diffusion-prom...

Setting: Iteration Steps

AI renders are an iterative process where an image is refined over multiple steps. Each step takes some amount of time, so getting a good result in a reasonable number of steps is important for speed. The number of steps that you test and iterate with is probably the same that you will do your final render. Increasing steps just for final render will usually result in different results.

My default is 30, which is a good sweet spot for my process. The best value for you will strongly depend on your setup and what style you are going for. On a low-end graphics card, you can get away with 10-20, but it may be less vibrant. I sometimes see people use 50-100 range. Going higher than that is probably overkill for most applications. Also, going higher doesn’t always mean better results.

Image 2C shows the effect of number of steps. I actually think in this case 100 looks nice, but the tripled render time is not good for my tuning process.

A more comprehensive tutorial: https://blog.segmind.com/beginners-guide-to-stable-diff...

Setting: CFG

Classifier-free guidance (CFG) is a number that controls how strongly the prompt is followed. It also has a very strong effect on how vibrant the image is. Lower numbers can result in dull, unsaturated renders. Higher numbers can result in overblown, oversaturated renders; and also can make the prompt more unstable.

My default is 7. This is a fairly common value, but you can go higher if you are going for a style that suits it. I’ve occasionally seen around 10-15.

Image 2D shows the effect of CFG. Going over 7 can make for a nicer result, but 30 (comically large) is just completely oversaturated.

A more comprehensive tutorial: https://stable-diffusion-art.com/cfg-scale/

Setting: Sampling Method

My knowledge is lacking on what samplers actually do. From what I’ve read, it affects how image denoising is done on each step. There are several named samplers available to choose from, which can have a profound effect on the result. Picking this is probably the trickiest of your settings since it is not just a number.

My default is “Euler a”. I see a lot of people use this. It was probably the A1111 default and I never had a reason to switch it. This will probably work fine for most cases, but if you are willing to experiment some, you may get something that suits you better.

The DPM++ family of samplers is also commonly used, and I think can produce nicer results. However, there are a lot of options in this family, and I don’t know what their differences are.

Image 2E shows 5 arbitrarily selected samplers. I do actually like some of these results.

A more comprehensive tutorial: https://stable-diffusion-art.com/samplers/

Setting: Denoising Strength

When upscaling, denoising is a percent (represented 0.0 to 1.0) of how closely the upscaler will stay similar to the original image. 0.0 means you are basically just scaling up the original image. 1.0 completely disregards the original image.

For upscaling a low resolution image, I default to 0.4. I find that to be the sweet spot on quality vs stability. I’ve seen some people go a bit higher, maybe up to 0.6, which can produce nicer results, but also is more unstable and more likely to result in visual errors. Since upscales take much longer, getting predictable and consistent results is really important.

Denoising also is a setting in img2img, which doesn’t necessarily involve upscaling. When I make edits in GIMP (photoshop), I use img2img to reprocess the image. For these renders, I usually use between 0.05 and 0.2 (the lower the better). I’m not trying to completely change the image, just enough to smooth out my edits.

Out of all the settings, this is the one I’m most likely to change on any given project. Sometimes I’ll notice that my upscaled renders have minor visual defects not seen in the original. If I’m desperate, I’ll retry the upscale in 0.05 decrements (0.35, 0.3, etc). Sometimes this works, other times I have to fix it some other way.

Image 2F shows the effect of different denoising values. You have to look closely to see the differences but the higher ones do add more detail. 1.0 completely doesn’t work, which is typical.

A more comprehensive tutorial: https://stable-diffusion-art.com/denoising-strength/

Setting: Hires Upscaler

Same with sampling method, I don’t know much about the different named upscalers. They basically affect the algorithm that runs during upscaling.

My default is “R-ESRGAN 4x+”. I switched from “ESRGAN_4x” at some point because I liked it better. There’s tons of options and maybe there are some new ones that are better.

Image 2G shows 5 arbitrarily selected upscalers. The differences seem really minor to me. They all are starting from the same original image, so I think it should be expected that they’re very similar.

A more comprehensive tutorial: https://stable-diffusion-art.com/ai-upscaler/

Setting: Checkpoint Model

This isn’t really a setting and I’ve already gone over models in the last tutorial. But just for fun, Image 2H shows 5 different models on the same prompt: Indigo (realistic, anime, and hybrid), BB95, and Furryrock. Other popular furry models include Fluffyrock, YiffyMix, and Domesticated Mix.

Setting: Seed

Tutorial 1 goes a lot into seed selection, so I won’t repeat that here. Just for fun, Image 2I shows 5 seeds that I hand picked out of seeds 0-29. I settled on seed 18 for this tutorial, but some of these other seeds I kind of like better.

Once you have settled on your personal default settings, the seed is the most important thing that will change on every new project. Seeds can differ from each other significantly, and the prompt you are tuning may work better on a different one.

Keywords