Looking completely realistic and being able to discern between real and fake are competing goals. If you can discern the difference, then it does not look completely realistic.
I think what they’re alluding to is generative adversarial networks https://en.m.wikipedia.org/wiki/Generative_adversarial_network where creating a better discriminator that can detect a good image from bad is how you get a better image.
The Brewster’s Millions genie