From time to time I get the following question: "What AI are you using?" or something similar. The answer is simple, ComfyUI. But sometimes it's followed by the next questions: "Is it free?" Do I need to download it?" "Is it hard to use?" Well yes, yes, depends. That's it for today, good night folks. Okay I'm just joking, I haven't seen a similar Journal to ComfyUI so this Journal will hopefully help some of you who want to try this thing out but have no clue how to start.
But first things first. It's not just a click of a button thing as most people think. It has a learning curve. Of course it's a lot easier than drawing, but still, if you think "I write shit and I'll get a masterpiece" than you are wrong. If you wrote shit you'll get shit. You have to learn how to prompt your thing and how to configure the settings in a way that will work well.
Basically there are two ways to generate images. The easier way is to find a site that lets you make AI generated images. The other is localy from you own computer with the help of a WebUI. Both have pros and cons.
Website: pros:
⦁ usually low effort, it's ready and easy to use ⦁ you don't need a good setup to run it, you can use it from you phone if you want cons: ⦁ most of the time a paid service ⦁ content restrictions
WebUI:
pros: ⦁ Pretty much no restictons ⦁ free cons: ⦁ you need a computer that can run the WebUI ⦁ you have to learn how to use it, set things up, can't make images right away
There are probably other things too but I think these are the important detailes for most people when they have to decide whichever they want to use. I know 3 UI options, these are EasyDiffusion, Automatic1111, ComfyUI. For sites I only know one, CivitAI and this site is very important even if you chose to use a local WebUI. I'll tell you later why.
Comfy isn't the easiest UI to use but it worth learning it. If you want a tiny bit easier UI or just have a sever fear of spaghetti you can use A1111/Forge instead.
So what do you need for ComfyUI? ⦁ An NVidia GPU (Works with other GPUs too but you need a lot of tinkering for that) with at least 8GB of VRAM, and it has to be an RTX3060 or better card ⦁ Willingness to learn how this thing Works, and maybe to edit ⦁ Patience
Do you have all of them? Good, than let's set things up. You have to download these files https://docs.comfy.org/installation/comfyui_portable_wi... from the official ComfyUI site, and follow the instructions. Basicaly extract->open folder->run_nvidia_gpu and let it install. Important: do not close the command window while installing or using Comfy or any WebUI. It sould install the latest version, but if you want to be sure that Comfy is up to date just go into the updates folder and run the update_comfyui bat, or the update_comfyui_and_python_dependencies bat if you want to make sure that Python is up to date too.
Okay now you have Comfy insalled but we aren't finished with setting things up. Now as I mentioned CivitAI is a very important site for us, because we need models for the generation. There are 3 models that are a must for the generation: ⦁ Diffusion model ⦁ VAE ⦁ Clip loader But let's keep it simple and use a checkpoint which is the combination of these 3 things. The checkpoint will determine what you can generate, how it will look and how AI will interpret your prompts. To get a checkpoint you need go to the CivitAI website and search for a checkpoint that suits your needs. One impotant note here, there are different checkpoints like Pony, Illustrious, FLUX, etc. The difference between them is they procces prompts very differently. I myself, and I belive most AI creators on this site uses Illustrious so I advise you to use this too, at least at the begining. I usually use these three: ⦁ https://civitai.com/models/503815?modelVersionId=2362113 ⦁ https://civitai.com/models/681901?modelVersionId=2343338 ⦁ https://civitai.com/models/135477?modelVersionId=2157057 but you can use whatever you want (flux won't work with the workflow I'm goind to show). When you found the checkpoint you want to use click on download and select the following folder: ComfyUI_windows_portable\ComfyUI\models\checkpoints There are other things you can download to bring you images to the next level but I'll point out only one for now, the loras. Loras are basicaly sampling data for a style, character, or pose. They aren't necessary for the generation, but they can help. For example you want an image with Jerry from Tom & Jerry you can use a lora to make it easier to generate. BUT! With the right prompts and editing you can make him without a lora, so you don't need them all the time. The installation is the same as with checkpoints, just download it into the "lora" folder instead of "checkpoints".
Got it? Very nice, you have only the very base but that's enough to start generating. Let's open Comfy, it Will open a command panel again, be patient and in a few sec it Will open your browser, where you'll see your workspace. I know, I know, it's not nice at first but it's not that hard to use. When you first start it you should get an example workflow, but honestly I don't know what it contains LoL, so just press the + icon next to the current workflow (it's on the top of the screen near the comfy logo), and you'll get a nice empty workflow. I'll guide you through on how to make a simple text to image workflow. If you are lazy I'll link in a workflow in the end.
Okay, double click on any empty space and search for the "Load checkpoint" node and click on it. Easy right? Next, do the same with "Empty Latent Image", "CLIP Text Encoder (Prompt)" twice (you'll need two), "KSampler", "VAE Decode", and either "Save Image" or "Preview Image". So now you have 7 boxes in your workflow. Now we have to connect them, it's simpler than it looks because you can't connect incompatible things. To make a connection between nodes, you need to click on a dot on one node and drag it to a dót on another node. 1. Let's start with the Checkpoint loader, connect its MODEL to the KSampler, the CLIP to the 2 CLIP Encoders and the VAE to the VAE decoder. 2. Next connect the 2 CLIP encoder's CONDITIONINGs to the KSampler One to negative and One to positive. Important to remember which box is which. To make it easier to remember you rename one to positive and the other to negative. You can do this by clicking on their names. Or do what I do, right click on the box than select color and make the positive box green and the negative red. 3. Empty Latent Image's LATENT to KSsampler's LATENT 4. Than the KSamper's LATENT to VAE Decoder's SAMPLER 5. Finaly VAE Decoder's IMAGE to Save/Preview Image's Image For the lazybones: https://files.catbox.moe/hx5w2n.json just drag it in an empty space. +If you want to add a lora search for "Load LoRA" and connect it to the checkpoint and to the the CLIP Encoder and to the KSampler. Here's a workflow with the Load LoRa node:https://files.catbox.moe/earzle.json
Okay now we have to talk about the nodes (the boxes) ⦁ Load Cheackpoint: select one from your checkpoints ⦁ CLIP Text Encoder: This is where you put you prompts. Things you want goes into the positive and things you don't want go into the negative. (Example: If you want a cat you write cat as positive prompt. If you want a black cat you write cat, black fur as positive). Important you have to write "quality and style promts to determine those, also checkpoints and loras have recomended prompts that are mentioned on there CivitAI pages, it's recomended to use thoes too. ⦁ Empty Latent Image: Here you can set the image's dimensions and batch size ⦁ KSampler:
Seed: it creates the noise for the generation, it's just a random number. If you run the same settings with different seeds you'll get different images. Keep the control random if you want different images, set it to fixed if you want slight changes but want a similar image. Steps:the number of iterations an AI model uses to refine a noisy image into a final image. More steps usualy gives better images but takes longer to generate. I think you should keep it between 20-60 CFG: It's how closely Will tha AI followe the prompts.Lower CFG gives more freedom the the AI. Sampler and scheduler: these are the algorithm and how noise is removed from the image. You'll only know which suits you best if you play around with them. Denoise: It's how much denoise the AI uses. For text to images use 1.0 ⦁ VAE Decoder: nothing to do here ⦁ Save/Preview image: you can give a prefix to the image if you want. If you have Save image the image is automaticaly saved here: ComfyUI_windows_portable\ComfyUI\output if you use Preview Image you have to save the images manually or they Will be lost after you close the application.
After you filled out and set up everything you can hit the blue RUN button at the bottom of the screen and wait 'till the goodies are baking. If you did everything right you'll have an image the resembles what you've imageined, if not you have to try different prompts or settings, maybe checkpoint or lora. If it resembles what you wanted but you don't like it just hit that run button again, maybe the next seed will be better. If you really can't figure something out here is a tip, you can go to IB, and check out someone elses settings, maybe you find something. But I advise against using all the prompts and copying all the time, try to find your own style and methods. Or you can just ask them nicely for a tip.
I hope this helped. This is only a smiple txt2img workflow and there are a lot more things you can do with comfy. If there's a need for more tutorials I'll probably make another one later. Here is a usefull video series about AI Image generation: https://www.youtube.com/watch?v=IIy3YwsXtTE&t=13s
Damn, not bad! That is quite the good tutorial. Also showcases that there is quite some work behind making actually higher quality content with AI. There is still quite the huge Stigma behind AI that its just one push of a button. It can be of course but the stuff you will get out is most of the time not the best quality. There is quite a lot of things to think off the goal is to actually make specific content equaling to what you envision. Well done and good work!
Damn, not bad! That is quite the good tutorial. Also showcases that there is quite some work behind
Thank you, it's really just wrote down the most basic thing you can do, but I think it's good for a start. If there will be a need for another tutorial with more workflows and stuff I'll do a 2nd journal for that.
Thank you, it's really just wrote down the most basic thing you can do, but I think it's good for a
By the way, to anyone who isn't rich and thus can't give their kidney for a good GPU, there's a service called Runpod that allows you to rent GPUs in the cloud. The best part is that you don't need to use your own resources and you have pretty much the same freedom as you would in a local installation. I made a tutorial for it, but it's focused on A1111. They have ComfyUI as well, though.
By the way, to anyone who isn't rich and thus can't give their kidney for a good GPU, there's a serv