TY - JOUR
T1 - Systematic Investigation into the Performance of Neural Networks with Increasing Number of Classes
AU - Natarajan, Sai Abinesh
AU - Madden, Michael G.
N1 - Publisher Copyright:
© This is an open access article published by the IET under the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/)
PY - 2024
Y1 - 2024
N2 - Neural networks, including variants such as transformers, dominate image and language-based machine learning applications. Datasets have widely varying numbers of class labels; e.g. ImageNet1K has 1000 classes, while MNIST has just 10. Performance benchmarks also differ significantly. ImageNet's top-1 accuracy increased from 63.3% in 2011 to now over 92%, whereas MNIST's accuracy is 99.8% since 2013. While some tasks are inherently simpler than others, one natural hypothesis is that a lower number of classes contributes to higher performance; with fewer classes, a random guess has a higher probability of being correct. However, we find this is not always the case. Specifically, we test this on ubiquitous computer vision tasks - image classification, object detection, and semantic segmentation - examining how performance changes with increasing class labels, while controlling for variables like CNN architecture and training methodology. We use multiple datasets for each task. We find that in image classification and semantic segmentation, performance decreases with increasing number of classes. Conversely, we discover that performance improves with more classes in object detection. We further explore this observed difference by visualizing and analyzing feature maps in terms of their clustering performance. We conclude that in object detection, the feature map clusters become tighter and better separated as the number of classes increases, leading to an increase in performance. While prior research has explored performance versus class-number relationships theoretically, this study is the first to empirically and systematically test this, particularly in computer vision. This helps to advance our understanding of the performance characteristics of CNNs, and classification models generally.
AB - Neural networks, including variants such as transformers, dominate image and language-based machine learning applications. Datasets have widely varying numbers of class labels; e.g. ImageNet1K has 1000 classes, while MNIST has just 10. Performance benchmarks also differ significantly. ImageNet's top-1 accuracy increased from 63.3% in 2011 to now over 92%, whereas MNIST's accuracy is 99.8% since 2013. While some tasks are inherently simpler than others, one natural hypothesis is that a lower number of classes contributes to higher performance; with fewer classes, a random guess has a higher probability of being correct. However, we find this is not always the case. Specifically, we test this on ubiquitous computer vision tasks - image classification, object detection, and semantic segmentation - examining how performance changes with increasing class labels, while controlling for variables like CNN architecture and training methodology. We use multiple datasets for each task. We find that in image classification and semantic segmentation, performance decreases with increasing number of classes. Conversely, we discover that performance improves with more classes in object detection. We further explore this observed difference by visualizing and analyzing feature maps in terms of their clustering performance. We conclude that in object detection, the feature map clusters become tighter and better separated as the number of classes increases, leading to an increase in performance. While prior research has explored performance versus class-number relationships theoretically, this study is the first to empirically and systematically test this, particularly in computer vision. This helps to advance our understanding of the performance characteristics of CNNs, and classification models generally.
KW - Convolutional Neural Networks
KW - Deep Learning
KW - Image Classification
KW - Object Detection
KW - Performance Benchmarking
UR - https://www.scopus.com/pages/publications/85216769315
U2 - 10.1049/icp.2024.3297
DO - 10.1049/icp.2024.3297
M3 - Conference article
AN - SCOPUS:85216769315
SN - 2732-4494
VL - 2024
SP - 146
EP - 153
JO - IET Conference Proceedings
JF - IET Conference Proceedings
IS - 10
T2 - 26th Irish Machine Vision and Image Processing Conference, IMVIP 2024
Y2 - 21 August 2024 through 23 August 2024
ER -