How Does Gender Balance In Training Data Affect Face Recognition Accuracy?

Abstract

Deep learning methods have greatly increased the accuracy of face recognition, but an old problem still persists- accuracy is usually higher for men than women. It is often speculated that lower accuracy for women is caused by under-representation in the training data. This work investigates female under-representation in the training data is truly the cause of lower accuracy for females on test data. Using a state-of-the-art deep CNN, three different loss functions, and two training datasets, we train each on seven subsets with different male/female ratios, totaling forty two trainings, that are tested on three different datasets. Results show that (1) gender balance in the training data does not translate into gender balance in the test accuracy, (2) the “gender gap” in test accuracy is not minimized by a gender-balanced training set, but by a training set with more male images than female images, and (3) training to minimize the accuracy gap does not result in highest female, male or average accuracy.

Publication
2020 International Joint Conference on Biometrics (IJCB)
Avatar
Vítor Albiero
Research Scientist

My research interests include responsible AI, computer vision, machine learning and biometrics.