Abstract
Neural networks, including variants such as transformers, dominate image and language-based machine learning applications. Datasets have widely varying numbers of class labels; e.g. ImageNet1K has 1000 classes, while MNIST has just 10. Performance benchmarks also differ significantly. ImageNet's top-1 accuracy increased from 63.3% in 2011 to now over 92%, whereas MNIST's accuracy is 99.8% since 2013. While some tasks are inherently simpler than others, one natural hypothesis is that a lower number of classes contributes to higher performance; with fewer classes, a random guess has a higher probability of being correct. However, we find this is not always the case. Specifically, we test this on ubiquitous computer vision tasks - image classification, object detection, and semantic segmentation - examining how performance changes with increasing class labels, while controlling for variables like CNN architecture and training methodology. We use multiple datasets for each task. We find that in image classification and semantic segmentation, performance decreases with increasing number of classes. Conversely, we discover that performance improves with more classes in object detection. We further explore this observed difference by visualizing and analyzing feature maps in terms of their clustering performance. We conclude that in object detection, the feature map clusters become tighter and better separated as the number of classes increases, leading to an increase in performance. While prior research has explored performance versus class-number relationships theoretically, this study is the first to empirically and systematically test this, particularly in computer vision. This helps to advance our understanding of the performance characteristics of CNNs, and classification models generally.
| Original language | English |
|---|---|
| Pages (from-to) | 146-153 |
| Number of pages | 8 |
| Journal | IET Conference Proceedings |
| Volume | 2024 |
| Issue number | 10 |
| DOIs | |
| Publication status | Published - 2024 |
| Event | 26th Irish Machine Vision and Image Processing Conference, IMVIP 2024 - Limerick, Ireland Duration: 21 Aug 2024 → 23 Aug 2024 |
Keywords
- Convolutional Neural Networks
- Deep Learning
- Image Classification
- Object Detection
- Performance Benchmarking
Fingerprint
Dive into the research topics of 'Systematic Investigation into the Performance of Neural Networks with Increasing Number of Classes'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver