TY - GEN
T1 - Dataset Cleaning - A Cross Validation Methodology for Large Facial Datasets using Face Recognition
AU - Varkarakis, Viktor
AU - Corcoran, Peter
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - In recent years, large 'in the wild' face datasets have been released in an attempt to facilitate progress in tasks such as face detection, face recognition, and other tasks. Most of these datasets are acquired from webpages with automatic procedures. As a consequence, noisy data are often found. Furthermore, in these large face datasets, the annotation of identities is important as they are used for training face recognition algorithms. But due to the automatic way of gathering these datasets and due to their large size, many identities folder contain mislabeled samples which deteriorates the quality of the datasets. In this work, it is presented a semi-automatic method for cleaning the noisy large face datasets with the use of face recognition. This methodology is applied to clean the CelebA dataset and show its effectiveness. Furthermore, the list with the mislabelled samples in the CelebA dataset is made available.
AB - In recent years, large 'in the wild' face datasets have been released in an attempt to facilitate progress in tasks such as face detection, face recognition, and other tasks. Most of these datasets are acquired from webpages with automatic procedures. As a consequence, noisy data are often found. Furthermore, in these large face datasets, the annotation of identities is important as they are used for training face recognition algorithms. But due to the automatic way of gathering these datasets and due to their large size, many identities folder contain mislabeled samples which deteriorates the quality of the datasets. In this work, it is presented a semi-automatic method for cleaning the noisy large face datasets with the use of face recognition. This methodology is applied to clean the CelebA dataset and show its effectiveness. Furthermore, the list with the mislabelled samples in the CelebA dataset is made available.
KW - CelebA
KW - clean face dataset
KW - face datasets
KW - mislabeled identities
KW - noisy samples
KW - semi-automatic cleaning
UR - https://www.scopus.com/pages/publications/85087662807
U2 - 10.1109/QoMEX48832.2020.9123123
DO - 10.1109/QoMEX48832.2020.9123123
M3 - Conference Publication
AN - SCOPUS:85087662807
T3 - 2020 12th International Conference on Quality of Multimedia Experience, QoMEX 2020
BT - 2020 12th International Conference on Quality of Multimedia Experience, QoMEX 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Conference on Quality of Multimedia Experience, QoMEX 2020
Y2 - 26 May 2020 through 28 May 2020
ER -