I just wanted to share some of my first impressions while using SDXL 0.9. And a random image generated with it to shamelessly get more visibility. I would like to see if other had similar impressions as well, or if your experience has been different.
- The base model when used on its own is good for spatial coherence. It basically prevent the generation of multiple subjects for bigger images. However the result is generally, “low frequency”. For example a full 1080x1080 image is more like a lazy linear upscale of a 640x640 in terms of visual detail.
- The detail model is not good for spatial coherence when starting from a random latent. When used directly as a normal model, results are pretty much like those we get from good quality SD1.5 merges. However since it has been co-trained to use the same latent space representation; so we get the power of latent2latent in place of img2img upscaling techniques.
- The detail model seems to be strongly biased and will affect the final generation. From what I can see all nude images in their training set are “censored” in the sense that they hand picked high quality photos of people wearing some degree of clothing.
- While the two models share the same latent space, they do not converge to the same image in generation. A face generated with the first model will be extremely affected by the latent2latent details injection phase. As I said, I found the detail model very biased, which is potentially a big problem in generation: for example all faces I tried to generate will converge to more “I am a model” ones, often with issue capturing a specific ethnicity. I can see this being a bit of a problem in training LoRa.
What are your experiences? Have you encountered other issues? Things you liked?
You must log in or # to comment.