A while ago I started my first experiments with AI, using RunwayML and Arttoy images. One of the last manual tasks was the sorting through images to clean the dataset. Clean view of a single toy with clean background-Good. Multiple toys or people in the picture, etc. -Bad. Fast forward, I am working on a dynamic Self-Portrait using StyleGAN. I have about 450.000 images now and definitely don’t want to go through that manually.
Basically my dataset is made up of two sort of situations shown in the images below.
Left: Me working, or at least doing something on the computer.
Right: Empty room, either because I left the camera running accidentally, or because I left the room for a short period of time.
Between these two projects, I rediscovered Teachable Machine, a web-based app by Google that let’s you play around with Machine Learning easily, using transfer learning to provide you with an extremely speedy training process. So that within a few seconds, you could have a custom image classifier.
I said rediscovered, because since then, Google went ahead making it more “Maker”-friendly. They now offer image and sound classifier as well as an implementation of their PoseNet, human pose recognition, algorithm. All of it is more structured towards making projects, including: an export function for the trained model!
I started up the web-project for image classifier, put in 140 images for class 1: Me working, and about 60 for class 2: Empty Room. (Note: images have to be square) Within two minutes, all pictures were uploaded and the training already finished.
I downloaded the model and used the provided code snippet to run it with Python, only adding a loop to go through all images in a folder and saving the images in the respective “sorted” folders based on the models class predictions.
Finishing the setup around midnight, I let it run during the night without much before-hand testing. So when I woke up today and checked the results two things became clear — 1. It worked, and quite well too. 2. Some edge cases were probably not included in my tiny sample set for class 1, leading to a number of False Negatives, but no False Positives.
I then went through the results collecting edge cases, which quite often were literally the cases where I was only partly in the picture plus some of the more rare images, like recordings when the room was almost dark and the camera used black&white mode. It was easy now to collect these cases and simply start a second run with a refined model.
Lets tally up for now — Started with 450k images where I knew a small proportions were not suitable due to showing only an empty room. The first run sorted those into 370k fitting images and 80k “empty”, but had some declared as “empty” which were not actually so. After the second run, it identified 15k images which were suitable to be added to class 1. Bringing the total to 385k “working” images and 65k “empty” ones. There are probably still some where you might see a sliver of me when walking away or something, but these are just as well declared as unfit.
Conclusion: I found it remarkably easy to hack together these things also as someone who hardly ever touches code. And it worked really well. When I showed it to others, some asked if it might be used to separate blurry from non blurry images and such. And i can also imagine that it might be useful for photographers sorting some images in a more individual and especially subjective view. Though of course it might run into problems fast, once the content is too complex to judge. In any case, the nature of easily accessible AI tools shows how they can help us to build for situations where you individualize what you want. And I am thinking now how this could be put into a single work process and maybe interface, and how maybe something like t-sne could supercharge the edge case selection and declaration, making it more of an interactive machine learning/ human-in-the-loop process.