This study evaluated the performance of a DLM system in the differential diagnosis of superficial soft-tissue masses, especially its value for less experienced and experienced radiologists. The DLM-assisted diagnosis was significantly helpful for the two radiologists.
DLM-1 and DLM-2 are two deep learning diagnostic models. DLM-1 was trained to distinguish between benign and malignant masses, and it can be seen from Table 2 that DLM-1 showed excellent performance. In the validation cohort, the AUC of DLM-1 reached an astonishing 0.992 (95% CI: 0.980, 1.0), and the ACC was 0.987 (95% CI: 0.968, 1.0), which highly indicated that the model was more accurate than the clinician in distinguishing benign from malignant masses. DLM-2 was trained to classify the five most common benign masses (lipomyoma, hemangioma, neurinoma, epidermal cyst, calcifying epithelioma), and the AUCs in the validation cohort were 0.986, 0.993, 0.944, 0.973, and 0.903, respectively. In test cohort B, the DLM performed slightly worse because the ultrasonic images were taken on machines of different make and model from those used in the other two centers. As can be seen from the above data, all the performance indexes of DLM-2 were about 0.9, indicating that DLM-2 had a strong ability in classifying five kinds of benign soft-tissue masses. The combination of the two models can accurately diagnose soft-tissue masses. It can be seen that deep learning is not subjective like humans, so it can accurately and stably carry out reasonable classification, avoiding the problem of missed diagnosis and misdiagnosis caused by the subjective judgment of disease types.
In the radiologist study, under the condition of DLM-assisted diagnosis, the accuracy of diagnosis by the radiologist was greatly improved in both benign and malignant differentiation and benign classification, especially in benign classification. However, only in the diagnosis of calcifying epithelioma, the effect of elevation is not good; because the clinical radiologist’s diagnosis accuracy is already high, DLM-assisted with no significant improvement. Also, with the help of the DLM, junior radiologists can achieve the diagnostic accuracy of senior radiologists. Thus, the DLM has certain clinical application value in assisting radiologists in the diagnosis of soft-tissue masses.
We used Grad CAM to visualize the DLM. When comparing the areas of most concern identified by the DLM and those identified by the radiologists, we found there were many common areas of concern (the reasons why the proportion of complete or most overlap between the two was more than 75%). For example, (1) for malignant masses [29], both of them were very concerned about the rich blood flow inside the lesion (Fig. 4a); (2) for lipomyoma [30], both of them focused on the strong echo lines inside the lesion (Fig. 4b); (3) for hemangioma , both of them paid much attention to the obvious internal honeycomb structure and the enhanced echo behind the lesion (Fig. 4c); (4) for neurinoma [31, 32], both of them focused on the “bright cap sign” of the lesions (Fig. 4d); (5) for epidermal cyst [33, 34], both of them were very concerned about the enhancement of the echo behind the lesion (Fig. 4e); and (6) for calcifying epithelioma [35, 36], both of them focused on the obvious attenuation of the echo behind the lesion (Fig. 4f).
In addition, the two had many different concerns. For example, (1) for malignant masses, the radiologists focused on sharp but irregular edges of the lesion, while the DLM focused on the hyperechoic wrapping of unequal thickness around the lesion, which represents a large number of small interfaces after infiltration, which the radiologists did not pay sufficient attention to (Fig. 4a); (2) for lipomyoma, when there were not many thick lines, the DLM paid more attention to the thick lines; when there were many thin lines, the DLM paid more attention to the two thin lines that were very close together. More lines and fine lines indicate that there are many normal fascia lines in the lesion, meaning it is more likely to be benign, and there are fewer fascia lines in the malignant mass, which is really not generally paid attention to by ultrasound doctors (Fig. 4b). (3) For neurinoma, the DLM paid more attention to blood flow signals inside the lesions, indicating solid nodules (Fig. 4d); (4) for epidermal cyst, the DLM’s focus was on the beginning of the lateral sound shadow, which means that the site is smooth and not easy for the radiologist to see at a glance (Fig. 4e).
We found that there were many similarities and differences between the DLM area of concern and the signs of the radiologist. For the similarities, the rationality and feasibility of the model can be further confirmed. At the same time, it can also help doctors quickly find the focus of the lesion area. For different points, it can provide clinicians with lesion areas to focus on in other points and provide new ideas for clinical diagnosis. This phenomenon may come from this reason: in terms of image labeling, we did not cover and sketch the boundary details of the lesion as traditional labeling did, but chose to use a wide range of field of view to intercept, which gave the model more space for self-discovery and learning. Compared with the traditional model, which only saw the details that the doctor wanted the model to see, our method may enable the model to discover the details that the doctor did not find.
Currently, the only relevant work is an artificial intelligence model proposed by Benjamin Wang et al. [22] to distinguish soft-tissue masses. Their model does a good job of distinguishing benign from malignant. However, their work has many limitations. First, the number of cases they collected was small (n = 419), and there were many cases without two-dimensional and color Doppler ultrasound images. Second, they had no external test cohort and were not verified by other hospital data, so the model performance results were not convincing. Third, although their model did a good job of distinguishing benign from malignant, it failed completely to identify the three benign masses and did not even mention benign differentiation in the study’s conclusion. Finally, the artificial intelligence model applied in this study is simple in structure and low in efficiency, with low value for clinical application. However, we propose and verify that a DLM that addresses these deficiencies well and achieves excellent performance, establishing a more effective and clinically applicable model for the differentiation of soft-tissue masses.
The main limitation of the study relates to the reader study design of two specialist radiologists. In the reader study, the radiologist could only interpret selected static two-dimensional grayscale and CDFI images. In practice, radiologists can combine patient history, clinical symptoms, and real-time dynamic image information to obtain diagnosis results. The reader design of the study did not take this into account, which may have underestimated the performance of the radiologists. Another limitation is that due to the small number and wide variety of malignant cases, it is not possible to further distinguish malignant cases. In the future, we will use more clinical data collected to classify malignant masses, which may further improve the diagnostic performance of the DLM system.