Source: pubs.rsna.org
Deep learning with convolutional neural networks (CNNs) has shown tremendous success in classifying images, as we have seen with the ImageNet competition (1), which consists of millions of everyday color images, such as animals, vehicles, and natural objects. For example, recent artificial intelligence (AI) systems have achieved a top-five accuracy (correct answer within the top five predictions) of greater than 96% on the ImageNet competition (2). To achieve such, computer vision scientists have generally found that deeper networks perform better, and as a result, modern AI architectures frequently have greater than 100 layers (2).
Because of the sheer size of such networks, which contain millions of parameters, most AI solutions use significantly downsampled images. For example, the famous AlexNet CNN that won ImageNet in 2012 used an input size of 227 × 227 pixels (1), which is a fraction of the native resolution of images taken by cameras and smartphones (usually greater than 2000 pixels in each dimension). Lower-resolution images are used for a variety of reasons. First, smaller images are easier to distribute across the Web, as ImageNet in itself is approximately 150 GB of data. Second, the task of identifying common objects such as planes or cars can be readily discerned at lower resolutions. Third, downsampled images make it easier and much faster to train deep neural networks. Finally, using lower-resolution images may lead to increased generalizability or less overfitting of deep learning models that focus on important high-level features.
Given the success of deep learning in general image classification, many researchers have applied the same techniques used in the ImageNet competitions to medical imaging (3). With chest radiographs, for example, researchers have downsampled the input images to about 256 pixels in each dimension from original images with more than 2000 pixels in each dimension. Nevertheless, relatively high accuracy has been reported for detection on chest radiographs of some conditions, including tuberculosis, pleural effusion, atelectasis, and pneumonia (4,5).
However, subtle radiologic findings, such as pulmonary nodules, hairline fractures, or small pneumothoraces, are less likely to be visible at lower resolutions. As such, the optimal resolution for detecting such abnormalities using CNNs is an important research question. For example, in the 2017 Radiological Society of North America competition for determining bone age on skeletal radiographs (6), many competitors used an input size of 512 pixels or greater. For the DREAM (Dialogue for Reverse Engineering Assessments and Methods) challenge of classifying screening mammograms, resolutions of up to 1700 × 2100 pixels were used in top solutions (7). Recently, for the Society of Imaging Informatics in Medicine and American College of Radiology Pneumothorax Challenge (8), many top entries used an input size of up to 1024 × 1024 pixels.
In their article, “The Effect of Image Resolution on Deep Learning in Radiography,” Sabottke and Spieler (9) address that important question using the public ChestX-ray14 dataset from the National Institutes of Health, which consists of more than 100 000 chest radiographs stored as 8-bit gray-scale images at a resolution of 1024 × 1024 pixels (10). These radiographs have been labeled with 14 conditions including normal, lung nodule, pneumothorax, emphysema, and cardiomegaly (10). The authors used two popular deep CNNs, ResNet 34 and DenseNet 121, and analyzed their models’ efficacy to classify radiographs at image resolutions ranging from 32 × 32 pixels to 600 × 600 pixels.
The authors found that the performance of most models tended to plateau at resolutions of around 256 × 256 pixels and 320 × 320 pixels. However, classification of emphysema and lung nodules performed better at 512 × 512 pixels and 448 × 448 pixels, respectively, than at lower resolutions. Emphysema findings can be subtle in mild cases, manifested by faint lucencies, which probably explains the need for higher resolution. Similarly, small lung nodules may be “blurred out” and not visible at lower resolution, which can explain the improvement in classification performance at higher resolutions.
The authors’ work is important. As we move further in the application of AI in medical imaging, we should be more cognizant of the potential impact of image resolution on the performance of AI models, whether for segmentation, classification, or another task. Moreover, groups who create public datasets to advance machine learning in medical imaging should consider releasing the images at full or near-full resolution. This would allow researchers to further understand the impact of image resolution and could lead to more robust models that better translate into clinical practice.