SD & A1111 Tutorial 6: OpenPose

1 of 9 next end

set default image size: small | medium | wide

1 of 9 next end

show original thumbnails

by Logically

Gallery

Stable Diffusion and Automatic1111 Tutorial Series (journal link)

Covered in this tutorial

• What is ControlNet
• What is OpenPose
• Example render

Most of the links included are optional reading materials. Mainly documentation from Github and research papers on ControlNet and OpenPose.

What is ControlNet

https://github.com/lllyasviel/ControlNet
https://github.com/lllyasviel/ControlNet-v1-1-nightly (contains details on individual control types)
https://arxiv.org/abs/2302.05543

In their own words, ControlNet is “a neural network architecture to add spatial conditioning controls to large, pretrained text-to-image diffusion models.” In other words, ControlNet can apply a wide variety of controls to Stable Diffusion renders. This is extremely powerful and gives users a profound amount of control over the composition and visuals of renders.

https://github.com/Mikubill/sd-webui-controlnet

The A1111 plugin “sd-webui-controlnet” provides a UI for ControlNet. The various ControlNet control types are selectable and have various parameter options. You can also combine multiple controls into the same render.

Installing ControlNet

https://inkbunny.net/s/3386763 My last tutorial covers how to install the Latent Couple plugin in A1111. You can install ControlNet the same way, searching for “sd-webui-controlnet”.

What is OpenPose

https://github.com/CMU-Perceptual-Computing-Lab/openpose
https://arxiv.org/abs/1812.08008

In their own words, OpenPose is a “real-time multi-person keypoint detection library for body, face, hands, and foot estimation”.

In ControlNet, OpenPose has been adapted to be used as a control type that allows the user to define a “pose” for the subjects in a render, which defines the number of subjects and the placement of their body parts in the composition. This is the control that I use by far the most often and it is extremely useful for composing subjects in a scene. It is also pretty easy to use.

Pros:

• OpenPose in ControlNet makes it trivial to define the number of subjects in a scene and the placement of their body and limbs.
• As far as I can tell, OpenPose does not seem to increase the render time at all. If there is an increase, it is barely noticeable. It also does not seem to constrain the amount you can upscale.
• The built-in pose maker works just fine. It takes a bit to get used to, but it is what I use to make poses. (There is a second built in maker called Photopea. I think it is meant to be used for other controls. The regular one is fine.)
• There is a built-in preprocessor that can take a sample image and turn it into a pose. It doesn’t always work well, but it is usually a good starting point.
• No use of GIMP/Photoshop is needed

Cons:

• OpenPose alone does not solve the problem of rendering multiple subjects that are visually different. You are able to combine OpenPose with Latent Couple to solve this, but having both of these together can make the render difficult to work with.
• Poses will not always behave the way you expect them to. Sometimes, the renders will do things like merge bodies, assign body parts to the wrong subjects, or have subjects turned the wrong way. Same with tuning prompts, you will sometimes need to meticulously craft the pose to get the result that you want.

Using OpenPose: making a pose

A pose is comprised of a set of humanoid stick figures, each one defined by a set of nodes/vertices that correspond to body parts (mainly joints and facial features). The way these vertices are connected is predetermined (e.g. left shoulder connects to left elbow). Each body part is assigned an X/Y coordinate in the image, and each one can also be turned off.

Detailed hands and faces can also be added to any person. This is an option if you need more precise control, but I personally have not gotten much use out of these.

The actual pose is stored and processed as a JSON file, but it can be visualized into an image. For the sample render, 1C is the pose visualization and next to it is the actual pose JSON that can be used to reproduce the render.

As I said, the built-in pose editor is usable but there are some minor usability issues to be aware of. Most importantly, there is no undo feature that I know of. The biggest tip I have is to save often by hitting “Send pose to ControlNet”. This takes you out of Edit mode and saves the current in-browser pose. You will have to re-enter the Edit window, but that is better than losing all your work. If you make a mistake, you either have to work with it or hit X to revert to the last time you saved.

Another tip is that you should also save to a JSON file frequently by hitting the “JSON” button under “Edit”. This is super important because you can accidentally close your browser window and lose all your work. So in summary:
• Save pose from editor to browser often.
• Save pose from browser to JSON often.

While you are working on your pose, it is helpful to running renders often. As long as you are not upscaling, these should be just as fast as any regular low-res render. Try it with different seeds too. Sometimes a tricky pose will work better with a different seed.

Example render

Like with some of my previous tutorials, this sample render is of a couple hugging. Since their arms are doing different things, you can clearly see the influence the pose has on the render.

The attached video shows the entire process of me making and testing the pose. Around 2:35, you can see that I do a test render and notice that their noses aren’t touching. A quick correction to the pose fixes this on the next render.

The image I uploaded to ControlNet is just a blank white picture. Usually, I will make some test render without OpenPose and process that into the initial pose. But since I usually just throw this pose away, I didn’t want to do that for this video. Preprocessing a white picture just creates a blank pose that I can edit.

The video only contains two low res renders. The uploaded image 1A is an upscale using the same prompt and pose. 1B is the original low res render created during the video. Images 2A and 2B show the settings used. The attached text files also contains the settings used.

Keywords