Getting used to synthetic visual reality

In my previous blogpost I focused on the byproducts of neural networks generating human faces – various visually intriguing gliches and interesting cross-overs between categories.

Now, let’s see how real can StyleGAN get (in home conditions). To be able to evaluate the “realness”, and also to deliberately go through some cognitive dissonance shock, the best strategy is to use a selfie. Looking in one’s own eyes is always strong, maybe a bit narcisstic, but definitely moving thing to do. I was wondering what it was going to feel like to look into my eyes which are not a reflection of physical me, but rather a generated vision of me based on translation of some peculiar numbers.

Why is it so disturbing to look at generated fakes? Seeing myself generated by neural network sort of was a revelation for me: the worrying thing is not the fact that a machine can render some pixels that resemble physical reality, nor the inability to distinguish between real and fake. It’s more about the perception of the whole reality outside.

Our understanding of the outside world strongly depends on what we are used to look at. This has been shifting constantly in the history. There were times when people were not exposed to any kind of visual representation of the world, except maybe religious paintings in church or some lame drawings. Before Renaissance these images didn’t even use realistic perspective, and the parts of the image were often organised only by hierarchy or time. Since Renaissance, European art was aiming for better and more real copy of our physical world, until we invented photography. Photography created the illusion we can document the world in an exact manner, just as its pure and undoubtful reflection. Until the digital age and hyperrealistic 3D renders. Actually, I wonder why we were not debating the broken trust in visual media already back then. The deceiving 3D renders, or even Photoshop photomanipulation, or even an analogue photomanipulation were responsible for so many fakes circulating out there. But it’s only now, when the media spread the panic about deepfakes and shocking fact, that our eyes are just too simple to distinguish what is real and what fake.

But what actually happened to my selfie?

First we used StyleGAN Encoder to convert my selfie to latent space. Simply put, the Encoder defined my face by the 1512 different values and placed it somewhere in between all other combinations of face values. The selfie is now translated into numbers. The result shows (not-surprisingly) the same face.

But the result looks honestly quite real – only what is odd is the distorted necklace and painted-like finish of the hairline. This StyleGAN was trained on a photos of human faces to generate human faces, not the accessories – therefore things like glasses, earings, piercings or necklaces work as unwanted noise. It is so far still good way how to spot deepfakes – but I don’t think it will remain that way for too long though.

StyleGAN – smile

Now it starts to be interesting – we can move my face on different vectors to either generate smile, or transform through gender or age. This might be especially beneficial for people with resting bitch face like me – I never had a proper good-looking frontal smiling headshot! And now imagine all those other possibilities!

StyleGAN – gender
StyleGAN – age

Possibilities are going far beyond what we are used to look at. If I can have a smiling face, there can be also me as asian, or me combined as half human half cat. But these are still concepts that could be created in an image-editing software. What concept goes even more beyond our imagination? Look at the picture below. It’s the representation of my face with negative values. An anti-face. Non-face. Non-existent non-physical and non-perceptual. I am superexcited about StyleGAN having no problem with delivering such concept! What would an idea of a face with negative values look like in your mind? Does it have shapes, textures, colours? Seeing how the neural networks see it is mind-blowing. Not saying this representation is right or wrong – it is undoubtly different.

negative values of my face

And looking through the eyes of AI will not be good or bad, it will just be different. We are now looking at screens most of the time during the day – and communicating and interacting with the world partly through computer interface, partly still offline by moving and talking in a physical environment. The digital and non-digital worlds are often experienced at once. We don’t distinguish between these two worlds, although I bet there were times when we had to get used to it. The same applies here with getting used to the synthetic visual media around us. It’s a new concept, it’s different, but I’m sure we will not care about the difference at all soon.

Coming back to my revelation: what is scary indeed is to jump into this visual uncertainty. To stop holding on the idea that what we see must be necessarily real. It was never real after all. What is real is constantly changing and co-living with the AI will definitely change this paradigm again.

The artificial intelligence might in the end teach us a lot about ourselves and our world. It’s giving us back what we feed her with. Sometimes it has some seemingly unrelated contexts and connections, sometimes it’s not “real enough”, but I think this reflection of our reality is as real as those we have considered as real until now. Anyway, we see only what we want to see, right?

Play with these tools too:

StyleGAN Encoder: https://github.com/Puzer/stylegan-encoder
Colab notebook: https://colab.research.google.com/drive/139OhnW0O_3-4IrnUCXRkO9nJn38qcdsi

These experiments and thoughts are formed together with my collaborator Pavol Rusnak. Check also his blogpost about StyleGAN-generated Czech Prime Minister Andrej Babiš: https://rusnak.io/smejici-se-hyeny/ (text in Czech).