Somewhere in Twitter an algorithm decides what is deemed striking or important in your oddly sized images. Saliency makes the automated cropping function choose. But can saliency be biased?
I was working on a story on cropping today and then this tweet on cropping popped up on Twitter:
For years and years I was professionally burdened by the cropping of images. Every social media channel or website template design requires some form of cropping. To make images fit, mostly for previewing purposes or as thumbnails. A tedious and humble task, something you would love to see automated. Personally I admire the talent and ingenuity that gave rise to this technology but the huge impact of unintended mistakes and biases asks for more transparency on how these AI’s work. And I was shocked when I repeated the experiment from Bascule and did some others. I had no idea why it happened and I wanted to research.
When you need a couple of these sizing/ratio adjustments you do the cropping by hand. Squeezing your images in a much smaller area, especially with different ratio’s between height and width is ugly or not relevant so you take out the digital scissors, make some esthetic considerations, maybe scale the image and cut something of from the sides of the image to make it fit.
Automated cropping with controlled input
Here you start cropping automatically and you make sure the images going in will turn out OK once some simple cropping procedures are done. The rules might dictate you always focus on the center content and so that requires some caution. Agree some rules on the visuals with yourself and the team and you will be fine. Stupid things will happen but in general you can correct.
Cropping photo’s from users with AI
But once you let a user crowd input freely the rules will be broken and you need some different approach to make the cropping work. Here Artificial Intelligence comes in. You set the basic rules to find out the area of interest for cropping and then put the algorithm to (machine) learn on big piles of data and probably assisted by a group of people. Once the code reaches a level of maturity and you are confident it will perform you can start using it on new images that come in from real users. And for instance have it pick a face.
So how does Twitter approach automatic cropping? Do they look for faces? The only people who know what is in the AI recipe are the cooks. And the nice thing is that Twitter is in this case pretty transparant on how things work and allowed the engineers Lucas Theis and Zehan Wang responsible for automatic cropping to publicly share their story in detail:
Speedy Neural Networks for Smart Auto-Cropping of Images
*The ability to share photos directly on Twitter has existed since 2011 and is now an integral part of the Twitter…*blog.twitter.com
In their story they make very clear their Cropping AI is not working with faces. At least not any more.
Previously, we used face detection to focus the view on the most prominent face we could find. While this is not an unreasonable heuristic, the approach has obvious limitations since not all images contain faces.
So when it is not faces that drive the decision, what is then?
Saliency of the topic
The Twitter people explain that a better way to crop is to focus on “salient” image regions. A region having high saliency means that a person is likely to look at it when freely viewing the image. The engineers refer to a research that is used for saliency of a topic.
We suppose that the most visually salient areas of a photo are also the most relevant ones to the users.
This suggests that the reason for picking one of the faces in the uploaded portraits on Twitter is because the area of the picture is seen as more salient. Saliency is an abstract term but it means: that what jumps out. And although rationally the concept is hard to explain, intuitively you probably sense what is going on. That is because we humans work with saliency as well. But real people would probably not agree on what is the most striking or important part of a picture.
The real answer is that “saliency” is based on a combination of visual elements (colour, shape, contrast) and what the people involved in the training of the algorithm (sets of photos) deemed striking/important. The algorithm later applies this saliency concept on newly uploaded photos.
Where to go from here
In the Twitter case it is not clear where and how these images were sourced and what made up the jury of people that helped the algorithm to be trained. Twitter already offered to Open Source the cropping code but this maybe calls more for Explainable AI (XAI). This is a procedure where the user can trace back how inferences are made and this sheds light on the black box that we have now. Probably a bit heavy for cropping images on Twitter but very much required for decisions in Education, Insurance, Health Care, etc.
And to prevent mishaps to your own twitter pics I would recommend using the proper sized image. And to speed things up you can use this tool by Sprout:
Social Media Image Resizing Tool | Landscape by Sprout Social
*Your go-to social image resizing tool for Instagram, Twitter, Facebook, LinkedIn, Pinterest, YouTube and more. 01…*sproutsocial.com