Department of Pathology, Yonsei University College of Medicine, Seoul, South Korea
Minsun Jung , Seung Geun Song , Soo Ick Cho , Wonkyung Jung , Chiyoon Oum , Minuk Ma , Seonwook Park , Sergio Pereira , Sanghoon Song , Kyunghyun Paeng , Donggeun Yoo , Chan-Young Ock , Ji-Youn Sung , So-Woon Kim , Heon Song
Background: Human epidermal growth factor receptor 2 (HER2) expression is a predictive marker for HER2-targeted therapy in breast cancer patients. Interobserver variation in the interpretation of HER2 levels exists among pathologists, thus a method to increase the consistency of evaluation is needed. This study aimed to evaluate the performance of the artificial intelligence (AI)-based Lunit SCOPE HER2 in assisting pathologists to evaluate HER2 expression levels in breast cancer. Methods: Lunit SCOPE HER2 was developed with a 1.04 x 1010μm2 area and 7.31 x 105 tumor cells from 1,133 HER2 immunohistochemistry stained whole-slide images (WSI) of breast cancer, annotated by 113 board-certified pathologists. The AI model was developed based on a semantic segmentation algorithm, which consists of two atrous spatial pyramid pooling blocks for tissue level classification and for tumor cell level classification. To validate the model, a total of 209 HER2 WSIs diagnosed with breast cancer were obtained from Kyung Hee University Hospital in Korea and were assigned as an external validation set. Three board-certified pathologists evaluated slide level HER2 expression (3+, 2+, 1+, and 0) twice, first without AI assistance and second, with it. The second reading was performed for WSIs where the pathologist's reading showed discrepancy with the AI model. Results: In the external validation set, all pathologists scored the same HER2 grade in 103 WSIs (49.3%), and the Fleiss kappa value was 0.512. The HER2 grade from the AI model and pathologists was the same in 151 WSIs (72.2%), and the weighted kappa value was 0.844. The pathologists re-evaluate 43, 63, and 83 WSIs, respectively. After AI assistance, all pathologists scored the same HER2 grade in 156 WSIs (74.6%), and the Fleiss kappa value increased to 0.762 (Table). Conclusions: This study demonstrates that an AI-powered HER2 analyzer can help achieve consistent HER2 expression level evaluation in breast cancer by reducing interobserver variability. Thus, the AI model can be applied as an assistance tool for pathologists in HER2 grade evaluation.
HER2 grade | Concordance of all pathologists | Fleiss kappa | ||
---|---|---|---|---|
without AI | with AI | without AI | with AI | |
0 | 48.0% (12/25) | 77.8% (21/27) | 0.628 | 0.842 |
1+ | 25.7% (9/35) | 68.9% (42/61) | 0.242 | 0.687 |
2+ | 45.8% (44/96) | 68.8% (55/80) | 0.475 | 0.712 |
3+ | 71.7% (38/53) | 92.7% (38/41) | 0.733 | 0.874 |
All | 49.3% (103/209) | 74.6% (156/209) | 0.512 | 0.762 |
Disclaimer
This material on this page is ©2024 American Society of Clinical Oncology, all rights reserved. Licensing available upon request. For more information, please contact licensing@asco.org
Abstract Disclosures
2023 ASCO Annual Meeting
First Author: Maximilian Lennartz
2024 ASCO Gastrointestinal Cancers Symposium
First Author: David Bing Zhen
2024 ASCO Genitourinary Cancers Symposium
First Author: David H Aggen
2023 ASCO Annual Meeting
First Author: Tarek Mohamed Ahmed Abdel-Fatah