Can Artificial Intelligence Help Reduce False-positive Mammograms?
Editor’s note: October is Breast Cancer Awareness month. You can read our recent blog posts on breast cancer research and treatment advances here.
In a study published in the AACR’s journal Clinical Cancer Research, a team of scientists from the University of Pittsburgh discuss yet another area of cancer research where artificial intelligence (AI) can potentially solve a decades-long problem: false-positive results and high patient recall rates from breast cancer screening mammography.
Mammography, while an indispensable tool for early detection of breast cancer and potential reduction in breast cancer mortality, has an important setback. According to the National Cancer Institute, about 10 percent of women receiving mammography are recalled for additional tests because their screening mammography is determined positive, but only 0.5 percent of those screened are found to have cancer. This means that about 9.5 percent of the women screened needed additional tests but had a false-positive exam. Further, about 50 percent of women screened annually for 10 years in the United States will experience a false-positive result.
These recalls and follow-up tests cause pain, anxiety, and stress to the patients, besides adding to the workload of health care professionals and to health care costs overall.
Looking to newer technology for a potential solution, Shandong Wu, PhD, director of the Intelligent Computing for Clinical Imaging lab in the Department of Radiology at the University of Pittsburgh, and his team, who published their results in Clinical Cancer Research, explored the utility of certain approaches in AI, such as deep learning and convolutional neural network (CNN), in evaluating mammograms beyond what is humanly possible.
Currently, the application of AI is being investigated in many areas of cancer research, including diagnosis and treatment, identification of molecular targets, and drug discovery. Several research groups are utilizing AI to study radiological breast images and determine breast cancer risk. In fact, these topics will be discussed extensively at the 30th Anniversary AACR Special Conference on Convergence: Artificial Intelligence, Big Data, and Prediction in Cancer, to be held in Newport, Rhode Island, Oct. 14-17.
AI has been utilized to study mammograms for different purposes so far, including classifying breast density, or determining whether a lesion is malignant or benign. “Our study is different and unique because we not only looked at the data for positive or negative results, but we explicitly included a third category for the recalled data,” Wu said. “This insight can potentially lead to an in-depth understanding of why these false recalls are happening in radiologists’ decisions when reading screening mammograms.”
Deep learning and CNN – new tools to study mammograms
Wu and colleagues studied whether deep learning, a type of machine learning, can be used to characterize breast mammograms as positive, negative, or false-positive. Machine learning is an area of AI that uses statistical methods that enable the computer program to learn from granular data without being instructed by the programmer. CNN, a main structure of deep learning models, is widely used to analyze images, Wu explained.
Deep learning/CNN works in a data-driven manner, meaning it can automatically learn, identify, and hierarchically organize features from a large dataset. By contrast, conventional machine learning paradigms require that these features be predefined or formulated explicitly beforehand by researchers. A limitation of the conventional process is that when analyzing medical images, it is often difficult to hand-craft “good” features that can capture the essential aspects of the images required for a specific task, such as stratification or prediction, Wu said. Deep learning CNN can automatically identify these essential features without any hand-crafting processes.
Wu and colleagues used a total of 14,860 images of 3,715 patients from two independent mammography datasets and built CNN models to investigate six classification scenarios (five binary and one triple) that would help distinguish images of benign, malignant, and recalled-benign mammograms. The six classification scenarios were designed to reveal different aspects of performance of their CNN models:
- malignant vs. recalled-benign and negative
- malignant vs. negative
- malignant vs. recalled-benign
- negative vs. recalled-benign
- recalled-benign vs. malignant and negative
- malignant vs. negative vs. recalled-benign
“The assumption is that there may be some nuanced imaging features associated with some mammogram images that could lead to a false/unnecessary recall when the images are interpreted by human radiologists, and our goal is to utilize a deep learning CNN-based method to build a computer toolkit to identify those potential mammogram images,” Wu said.
To estimate the accuracy of the resultant data, the researchers generated the receiver operating characteristic curve and calculated the area under the curve (AUC). When data from the two independent datasets were combined, the AUC for the six scenarios evaluated ranged from 0.76 to 0.91. The higher the AUC, the better the performance, with a maximum of 1. “AUC is a metric that summarizes the comparison of true positives against false positives, so it gives an indication not only of accuracy (how many were correctly identified), but also how many were falsely identified,” Wu noted.
False-positive mammograms may have unique features leading to recalls
The team found that the AUC was relatively high for specific scenarios comparing the recalled-benign images with the negative and malignant images. Best performance was observed for negative vs. recalled-benign, with an AUC of 0.91. The authors argue that the distinction in this scenario suggests there are certain imaging features in the recalled-benign images that result in these women being recalled instead of being identified as negative to begin with.
“Based on the consistent ability of our algorithm to discriminate all categories of mammography images, our findings indicate that there are indeed some distinguishing features/characteristics unique to images that are unnecessarily recalled,” Wu noted. “Our AI models can augment radiologists in reading these images and ultimately benefit patients by helping reduce unnecessary recalls.”
To accelerate clinical translation of their findings, Wu’s team is hoping to test and apply their methods to digital breast tomosynthesis (3D mammography) as it is increasingly being used in clinical settings for breast cancer screening.
What’s in the future?
As the number of applications of AI in cancer research and treatment keeps growing, there is also growing skepticism among many experts about what is realistic—what AI can and cannot do. But considering cancer is an extremely complex set of diseases that keep evolving, a collaboration between human wisdom and complex algorithms appears inevitable, to steer progress against cancer in the right direction. This collaboration will hopefully deliver new discoveries, durable treatments, and better outcomes for patients – sooner.