Back to Journal

SM Journal of Radiology

Impact of Criteria, Training, and Diagnostic Certainty on Community Radiologists’ Assessment of Imaging-Detected Extranodal Extension in HPV Positive Oropharyngeal Carcinoma

[ ISSN : 3067-9702 ]

Abstract Abstract Keywords Citation INTRODUCTION METHODS RESULTS DISCUSSION CONCLUSIONS FUNDING ACKNOWLEDGMENT REFERENCES
Details

Received: 16-Sep-2025

Accepted: 08-Nov-2025

Published: 10-Nov-2025

Shao Hui Huang, Eugene Yu,Jie Su, Kristoff Nelson, Marie Duguet Armand,Youri Kaitoukov,Brij Kapadia,Nayela Keen, Anne C Kim,Cerny Milena, Sapna J Palrecha, Anne Preville-Gendreau, Kirk Simon,Suhad Tantawi,Joongchul Paul Yoon,Horia Vulpe, Wei Xu,Houda Bahig, William Lydiatt,Barton F. Branstetter IV,Ezra Hahn, Brian O’Sullivan

1Department of Radiation Oncology, Princess Margaret Cancer Center/University of Toronto, Canada

2Department of Otolaryngology-Head and Neck Surgery, Princess Margaret Cancer Center/University of Toronto, Canada

3Department of Neuroradiology and Head and Neck Imaging, Princess Margaret Cancer Centre, University of Toronto, Canad

4Department of Biostatistics, Princess Margaret Cancer Center/University of Toronto, Canada

5Department of Radiology, Centre Hospitalier de l’Université de Montréal (CHUM)/Université de Montréal, Canada

6Department of Radiology, Hôpital de Saint-Eustache, Centre intégré de santé et de services sociaux des Laurentides, Canada

7Department of Radiology, Jewish General Hospital, Canada

8Department of Radiology, the Permanente Medical Group, USA

9Department of Radiology, the Regional Hospital du Grand Portage, Canada

10Department of Radiation Oncology, the Permanente Medical Group, USA

11Department of Radiation Oncology, Centre Hospitalier de l’Université de Montréal (CHUM)/Université de Montréal, Canada

12Department of Surgery, Creighton University, USA

13Department of Radiology, University of Pittsburgh, USA

 

Corresponding Author:

Brian O’Sullivan, Department of Radiation Oncology, University of Toronto, RM 7-323, Department of Radiation Oncology, the Princess Margaret Hospital, 700 University Ave. Toronto, ON, Canada, M5G 1Z5, Tel: (416) 946-2983; Fax: (416) 946-4586

Keywords

HPV-Positive Oropharyngeal Carcinoma; Extranodal Extension; Computed Tomography, Magnetic Resonance Imaging, Inter-Rater Concordance

Abstract

Purpose/Objective(s): The UICC and AJCC have recently added imaging-detected extranodal extension (iENE) as a cN modifier for the upcoming 9th edition TNM Classification (TNM9) of HPV-positive oropharyngeal carcinoma (HPV+ OPC). While academic radiologists demonstrate good inter-rater reliability in identifying imaging-detected extranodal extension (iENE), its reliability among community radiologists has not been studied. Therefore, we conducted an international study to assess the reliability and impact of training on iENE recognition by two groups of community radiologists in the USA and Canada. Materials/Methods: Community radiologists from The Permanente Medical Group (TPMG) and a Quebec Radiology Group (QR) who responded to “Expression-of-Interest” emails were recruited. They were asked to consult training material addressing the Head-and-Neck Cancer-International Group consensus definitions of iENE status before proceeding with a Round-1 (20 cases) and Round-2 (30 cases) iENE review. After each round, expert interpretations were provided to both groups for self-reflection. The TPMG group also had an online group review following Round-1. Gwet’s AC1 concordance score was estimated for the overall, TPMG, and QR groups. Results: A total of 10 radiologists (5 each from TPMG and QR groups) were recruited. The mean (standard deviation) agreement for Round-1 and Round-2 was 86.0% (7.4) and 86.0% (6.0), respectively. The Gwet’s AC1 concordance score were 0.72 (0.50-0.93) (moderate) and 0.76 (0.62-0.89) (substantial) in Round-1 and Round-2, respectively. Gwet’s AC1 score in Round-1 vs Round-2 was 0.74 (0.53-0.95) (moderate) vs 0.82 (0.69-0.94) (substantial) for the TPMG group, and 0.72 (0.48-0.97) (moderate) vs 0.69 (0.50-0.88)] (moderate) for the QR group, respectively. Conclusion: This study shows good inter-rater reliability for iENE status among community radiologists after applying the Head and-Neck-Cancer-International Group consensus definitions, and using high diagnostic certainty. With adherence to guidelines, wider dissemination of HPV+ OPC prognostic models that include iENE status appear feasible. Educational materials are also available to further augment reproducibility. Keywords: HPV-Positive Oropharyngeal Carcinoma; Extranodal Extension; Computed Tomography, Magnetic Resonance Imaging, Inter-Rater Concordance

Abstract

Purpose/Objective(s): The UICC and AJCC have recently added imaging-detected extranodal extension (iENE) as a cN modifier for the upcoming 9th edition TNM Classification (TNM9) of HPV-positive oropharyngeal carcinoma (HPV+ OPC). While academic radiologists demonstrate good inter-rater reliability in identifying imaging-detected extranodal extension (iENE), its reliability among community radiologists has not been studied. Therefore, we conducted an international study to assess the reliability and impact of training on iENE recognition by two groups of community radiologists in the USA and Canada. Materials/Methods: Community radiologists from The Permanente Medical Group (TPMG) and a Quebec Radiology Group (QR) who responded to “Expression-of-Interest” emails were recruited. They were asked to consult training material addressing the Head-and-Neck Cancer-International Group consensus definitions of iENE status before proceeding with a Round-1 (20 cases) and Round-2 (30 cases) iENE review. After each round, expert interpretations were provided to both groups for self-reflection. The TPMG group also had an online group review following Round-1. Gwet’s AC1 concordance score was estimated for the overall, TPMG, and QR groups. Results: A total of 10 radiologists (5 each from TPMG and QR groups) were recruited. The mean (standard deviation) agreement for Round-1 and Round-2 was 86.0% (7.4) and 86.0% (6.0), respectively. The Gwet’s AC1 concordance score were 0.72 (0.50-0.93) (moderate) and 0.76 (0.62-0.89) (substantial) in Round-1 and Round-2, respectively. Gwet’s AC1 score in Round-1 vs Round-2 was 0.74 (0.53-0.95) (moderate) vs 0.82 (0.69-0.94) (substantial) for the TPMG group, and 0.72 (0.48-0.97) (moderate) vs 0.69 (0.50-0.88)] (moderate) for the QR group, respectively. Conclusion: This study shows good inter-rater reliability for iENE status among community radiologists after applying the Head and-Neck-Cancer-International Group consensus definitions, and using high diagnostic certainty. With adherence to guidelines, wider dissemination of HPV+ OPC prognostic models that include iENE status appear feasible. Educational materials are also available to further augment reproducibility.

Keywords

HPV-Positive Oropharyngeal Carcinoma; Extranodal Extension; Computed Tomography, Magnetic Resonance Imaging, Inter-Rater Concordance

Citation

O’Sullivan B, Huang SH, Yu E, Su J, Nelson K et al, (2025) Impact of Criteria, Training, and Diagnostic Certainty on Community Radiologists’ Assess ment of Imaging-Detected Extranodal Extension in HPV-Positive Oropharyngeal Carcinoma. SM J Radiol 8(1): 8.

INTRODUCTION

Imaging-detected extranodal extension (iENE) is an important prognostic factor for head and neck cancer [1,2]. Lately, iENE status has been shown to have prognostic significance in HPV-positive oropharyngeal carcinoma (HPV+ OPC), even in stage I disease [3,4] and has recently been added as a cN modifier in the upcoming ninth edition TNM of the UICC and AJCC [5,6]. Several studies have reported significant interrater variability in iENE evaluation based on current individual interpretation of iENE [7 10]. However, these studies either did not use the recently published Head and Neck Cancer International Group (HNCIG) consensus criteria for iENE developed by pooling international expert radiologists [11] or did not emphasize high certainty for its declaration [12,13]. Thus, a major challenge in adopting the iENE parameter in staging and clinical care is the reliability of its assessment, even in academic centers, and especially in the absence of training and predetermined criteria for its declaration. The HNCIG consensus iENE definition also provided an online atlas depicting various form of iENE [11]. In addition, the AOSHNHR-ASHNR ESHNR (Asian-Oceanian Society of Neuroradiology and Head & Neck Radiology, American Society of Head & Neck Radiology, and European Society of Head and Neck Radiology), recently established a guide for iENE assessment (https://ashnr.org/iene/), [14] which clearly refined iENE definitions and emphasized the high certainty concept in its declaration. The International Collaboration of Oropharyngeal Cancer Network for N-Classification (ICON-N) showed the value of training and high certainty in iENE declaration with good concordance among academic community based radiologists following training and employing guidelines [5-12]. However, whether non-academic radiologists can also achieve high inter rater reliability following training with validated examples of imaging studies with an emphasis on high certainty for declaration has not been studied. This is especially relevant since these specialists would be responsible for the majority of such interpretations in real-world settings, The purpose of this study was to evaluate this question in two groups of community-based radiologists receiving two different forms of training in iENE assessment: self-directed review of training materials (mimicking real-world self-learning) and/or interactive group learning (simulating a workshop-based approach). Both groups received training materials regarding HNCIG consensus recommendations addressing the definitions of iENE. To potentially augment the initial training experience, one group also participated in an online group discussion session to review cases with an experienced head and neck radiologist (EY). The other group was offered only self-learning and self-reflection after reviewing introductory materials in an effort to mimic the real-world experience of radiologists without dedicated training in head and neck imaging.

METHODS

Study Population

The study received research ethics board approval from University Health Network, Toronto serving as the Board of Record for the principal investigator’s (BOS) institution. An “Expression of Interest” (EOI) email, outlining the study objectives, was sent to community radiologists affiliated with either The Permanente Medical Group (TPMG), headquartered in Oakland, California, United States, or a Quebec Radiology (QR) group comprising radiologists practicing outside the established academic head and neck programs in the province of Quebec, Canada. For both groups intended volunteers were sought who had some experience in interpreting head and neck radiology. Radiologists who responded to the EOI email were recruited.

Training and iENE Assessment Procedure

The study procedures comprised the following 4 steps (Figure 1): 1). Pre-assessment training (prior to the Round-1), 2). Round-1 iENE assessment (20 cases), 3). Inter-round training (after Round-1 and prior to Round-2), and 4). Round-2 iENE assessment (30 different cases). Pre-assessment training was a self-learning process. Volunteer radiologists were asked to acquaint themselves with the definitions of iENE status (iENE+ vs iENE–) by studying selected training material which included the HNCIG recommendations and its supplementary atlas [11] as well as the initial publication describing methods to augment iENE assessment reliability[12]. They were also instructed to interpret with high specificity; that is, consider only “highly suspicious for iENE” as “iENE+” while “low suspicion for iENE”, “equivocal”, or “no iENE” were recorded as “iENE-negative (iENE–)” for this study. This high-specificity approach is consistent with the AJCC/UICC general staging rule that assigns to a lesser extent when in doubt to minimise potentially spurious declarations. Figure 2 illustrates various forms of iENE. Round-1 and Round-2 comprised assessment of iENE on anonymized CT or MRI scans from 20 and 30 cases, respectively. These 50 cases were selected randomly from a previous study evaluating the prognostic value of iENE on HPV+ OPC [4]. A Powerpoint® (PPT) slide deck of the 20 and 30 anonymized cases from the leading institution was prepared by EY, an experienced radiologist with established expertise in iENE assessment and education. The slide deck for each patient included 20-30 images from a pre-treatment CT or MRI of a patient with HPV+ OPC. These were sequential images from multiple imaging sequences in multiple imaging planes, to mirror usual clinical image interpretation. The images were chosen to allow the radiologists to focus on specific metastatic lymph nodes with known iENE status reported for the other study [4]. To mimic DICOM images of actual CT or MRI, the rater was instructed to use the “slide show” function within Powerpoint® to view the images loaded on the PPT slides to mimic scrolling in a picture archiving and communication system (PACS). This PPT deck was the same file used for the previous multicenter study demonstrating the reliability of iENE assessment by academic radiologists [12] and the subsequent ICON-N study [5]. After Round-1, the validated iENE status for each case was shared with each community radiologist from both the TPMG and QR groups. The QR radiologists were instructed to review their results against the validated results using self-reflection without formal discussion. This learning process was created to resemble a real-world situation where knowledge is predominantly updated by self-learning and self-reflection. In contrast, to assess the value of workshop/conference group learning, a group discussion was held for the TPMG group to review iENE results with EY, and to discuss the reasoning processes underpinning the choices. This group discussion was undertaken by video conference and attended by the TPMG radiologists with EY and the main investigators (SSH and BO’S) as facilitators. This process also allowed experience to be shared, and judgements to be explained. It also permitted appreciation of individual “certainty level” used in iENE declaration.

Statistical Analysis

We used the validated iENE status as the reference standard since pENE was not available as a reference because many patients were treated with definitive chemotherapy or radiation therapy alone. The interrater agreement between each individual community radiologist’s assessment of iENE were calculated. Multi-rater concordance for all participating radiologists and between the TPMG and QR groups was calculated using Gwet’s AC1 test[15,16].

RESULTS

Clinical Experience of Participants

A total of 5 radiologists from the TPMG and 5 from the QR groups were recruited. Accumulated years of specialty training and working experience in head and neck oncology reporting is tabulated for each participant (Table 1). The clinical experience in radiology reporting

Figure 1: Study Flow Chart.

Figure 2: CT or and MR Images Showing Various Forms of iENE

Table 1. Specialty Training and Working Experience in Radiology Reporting of Head and Neck Malignancies

Experience

QR1

QR2

QR3

QR4

QR5

TPMG1

TPMG2

TPMG3

TPMG4

TPMG5

Years of diagnostic neuroradiology fellowship training

 

1

 

0

 

0

 

0

 

0

 

1.5

 

2

 

1

 

2

 

2

Years of clinical practice as an attending radiologist

 

1

 

7

 

5

 

2

 

8

 

18

 

13

 

20

 

20

 

16

% of radiology reporting pertaining

to HNC

 

2%

 

20%

 

25%

 

2%

 

8%

 

25%

 

15%

 

4%

 

25%

 

20%

Abbreviation: RT: Radiotherapy; CCRT: Concurrent Chemoradiotherapy, comprised cisplatin 100 mg/m2 tri-weekly or 40 mg/m2 weekly, given concurrently with radiotherapy 70 Gy in 35 fractions over 7 weeks; CCRT-AC: concurrent chemoradiotherapy followed by adjuvant chemotherapy (cisplatin + 5-FU); IC-CCRT: induction chemotherapy (gemcitabine + 5-FU) followed by concurrent chemoradiotherapy.

 

ranged from 1-20 years. The TPMG group had more experience than the QR group (median 18 vs 3.5 years, p<0.001). However, the proportion of head and neck cancer reporting among their routine clinical work was similar: ranging from 2% to 25% for both groups.

Inter-rater Agreement between Individual Radiologists vs the Reference Radiologist The agreement of individual radiologists against the reference standard for iENE status ranged from 75% to 95% in Round-1 and from 77% to 90% in Round-2 (Table 2). The Gwet’s AC1 concordance score for all radiologists were 0.72 (95% CI: 0.50-93) in Round-1, and slightly higher with 0.76 (0.62-0.89) in Round-2 (Table 3). Several community radiologists commented that the quality of CT or MR slices for Round-2 was not desirable with motion artifact in several cases. 1.1 The detailed assessment by each individual radiologist in Round-1 and Round-2 is summarized in Table 4. It is noted that 11 out of 20 (55%) Round-1 cases and 17 out of 30 (57%) Round-2 cases reached 100% agreement among all 11 radiologists in iENE status declaration.

Potential Effectiveness of Training in iENE Assessment Comparing iENE status assessment between Round-1 and Round-2, the TPMG group (which received the additional online group training after Round-1) manifested a non-significant improvement in the Gwet’s AC1 score in Round-2 vs Round-1 [0.82 (95% CI 0.69-0.94) vs 0.74 (95% CI 0.53-0.95)]. The Gwent AC1 score for the QR group (self-reflection only) was similar between Round-2 vs Round-1 [0.68 (95% CI 0.48-0.88) vs 0.70 (95% CI 0.42-0.97)] (Table 3).

DISCUSSION

Notwithstanding the limitations of Powerpoint® in displaying the test material (specifically, lack of windowing and leveling, and lack of access to all sequences and previous scans), this international cohort study shows the feasibility of achieving good interrater concordance in iENE assessment among community radiologists following two forms of training, Although not statistically significant, a slight improvement in Gwet’s AC1 concordance score was observed in the TPMG group during Phase II after receiving additional online, workshop-style group training. Although compelling evidence supports the prognostic importance of iENE in HPV+ OPC, concerns remain regarding the feasibility of incorporating this parameter into future staging and clinical practice in real-world settings. Several studies have reported substantial interrater variability in iENE assessment across specialties, including radiologists, surgical oncologists, and radiation oncologists [7-10]. Notably, these assessments were conducted prior to the release of the HNCIG iENE consensus definition, relied on individual interpretation, and in some situations involved non-radiology specialists and often lacked emphasis in declaring findings with high certainty. Without standardized training, even academic clinicians display considerable variability in iENE recognition and limited concordance between iENE and pathological ENE, as shown in international multispecialty expert studies [9,10]. In contrast, several studies involving radiologists from academic centers have shown improved inter-rater concordance after training and self directed learning [12,13]. Particularly when using consensus definitions and applying high-certainty criteria in their assessments [12-17]. Our findings extend this observation to community radiologists, highlighting the potential for broader implementation with appropriate training and emphasis of high certainty. It is important to recognize that iENE differs from pENE. For instance, a coalescent nodal mass (also known as grade 2 iENE or “matted nodes”) was not previously emphasized in pathology reports, [18] nor included in the HNCIG consensus definition [19]. This feature was only recently incorporated into the pENE definition by HN-CLEAR [20]. Additionally, the sensitivity of iENE for detecting pENE will inherently be limited, given the constraints of imaging with the naked eye compared to microscopic examination which can never be equivalent, since the tools of assessment differ substantially. Nonetheless, the inability to detect subtle ENE should not diminish the value of iENE recognition, as it often identifies more extensive or aggressive forms of ENE. An analogous situation has always been accepted in contemporary practice where not all lymph node involvement (gross and microscopic) is obvious using radiology but may be detected on microscopic evaluation of neck dissection specimens; despite this radiological declaration of gross nodal disease remains a critical part of clinical care and prognostication even though microscopic disease may remain undeclared or undetected. Radiology training is often by apprenticeship, and it may be anticipated that radiologists with more extensive specialty training and greater experience in head and neck radiology reporting could demonstrate higher interpretation accuracy. However, our study did not find an apparent association between iENE agreement and education level nor experience in radiology reporting. It is possible that specific training in the performance of iENE assessment, a relatively new concept in head and neck imaging reporting, as used in this study might place all participants at a similar level. Both training methods (self-learning and group discussion) applied in this study seem to perform well as evident in relatively acceptable inter-rater Gwet’s AC1 concordance score (≥0.7). It is conceivable that with accumulated experience and self-reflection, or with more highly tailored and validated teaching tools, the reliability of iENE assessment could further improve.

Table 2. Inter-rater Agreement of iENE Status and Concordance (Gwet’s AC1 Score)

 

Round

iENE Status

Concordance*

EY

QR1

QR2

QR3

QR4

QR5

TPMG1

TPMG2

TPMG3

TPMG4

TPMG5

All 11 Raters

Round-1

REF

95%

75%

95%

85%

90%

75%

90%

90%

80%

85%

0.70 (0.47-0.93)

Round-2

REF

90%

77%

83%

90%

83%

80%

87%

90%

87%

90%

0.77 (0.64-0.90)

Total

REF

92%

76%

88%

88%

86%

78%

88%

90%

84%

88%

0.74 (0.63-0.86)

*Multi-rater concordance was assessed using Gwet’s AC1 test (Reference: Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. The British journal of mathematical and statistical psychology 2008; 61(Pt 1): 29-48).

 

Table 3. Gwet’s Agreement Score (Gwet’s AC1 Score) Among QR and TPMG Groups

 

 

 

Round

 

 

Overall

 

QR Group

 

Self-Learning Between Round-1 & Round-2

 

TPMG Group

 

Group-Learning Between Round-1 & Round-2

Round-1

0.72 (0.50-0.93)

0.72 (0.48-0.97)

0.74 (0.53-0.95)

Round-2

0.76 (0.62-0.89)

0.69 (0.508-0.88)

0.82 (0.69-0.94)

Total

0.74 (0.63-0.85)

0.70 (0.56-0.85)

0.79 (0.68-0.89)

Table 4. iENE Assessment Results in Round-1 (20 Cases) and Round 2 (30 Cases)

 

 

Case No.

Imaging Type

Round-1: 20 Cases

Agreement on Each Case

REF

QR1

QR2

QR3

QR4

QR5

TPMG1

TPMG2

TPMG3

TPMG4

TPMG5

1

CT

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

2

CT

No

No

No

No

No

No

No

No

No

No

No

100%

3

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

4

MR

Yes

Yes

No

No

Yes

Yes

No

Yes

No

Yes

Yes

64%

5

MR

No

No

No

No

No

No

Yes

Yes

No

Yes

Yes

64%

6

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

7

MR

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

91%

8

MR

No

No

Yes

No

No

Yes

No

Yes

Yes

Yes

Yes

45%

9

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

10

CT

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

11

CT

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

12

MR

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

91%

13

MR

No

Yes

No

No

No

No

No

No

No

Yes

Yes

73%

14

MR

Yes

Yes

No

Yes

No

Yes

Yes

Yes

Yes

Yes

Yes

82%

15

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

16

MR

No

No

Yes

No

Yes

Yes

Yes

No

No

Yes

No

55%

17

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

18

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

19

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

20

MR

No

No

Yes

No

Yes

No

No

No

No

No

No

82%

Agreement

 

REF

95.0%

75.0%

95.0%

85.0%

90.00%

75.0%

90.0%

90.0%

80.0%

90.0%

 

 

 

 

Round-2: 30 Cases

 

1

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

2

MR

Yes

Yes

No

No

Yes

No

Yes

Yes

Yes

Yes

Yes

73%

3

MR

No

Yes

No

No

Yes

No

No

Yes

Yes

Yes

Yes

45%

4

CT

Yes

Yes

No

No

No

Yes

Yes

Yes

Yes

Yes

Yes

73%

5

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

6

CT

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

7

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

8

MR

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

100%

9

MR

No

Yes

No

No

No

No

No

Yes

Yes

Yes

Yes

55%

10

CT

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

11

MR

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

91%

12

MR

Yes

Yes

Yes

No

Yes

Yes

Yes

No

Yes

Yes

Yes

91%

13

MR

Yes

Yes

Yes

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

91%

14

CT

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

15

MR

Yes

Yes

No

Yes

Yes

No

Yes

Yes

Yes

Yes

Yes

82%

16

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

17

MR

No

No

Yes

No

No

No

No

No

Yes

Yes

No

73%

18

CT

Yes

Yes

No

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

91%

19

MR

Yes

Yes

No

No

Yes

No

No

No

Yes

Yes

Yes

55%

20

MR

Yes

Yes

No

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

82%

21

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

22

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

23

MR

No

Yes

No

No

Yes

Yes

Yes

No

No

Yes

Yes

45%

24

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

25

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

26

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

27

MR

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

Yes

Yes

Yes

91%

28

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

29

MR

Yes

Yes

Yes

Yes

Yes

No

Yes

Yes

Yes

Yes

Yes

100%

30

MR

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

100%

Agreement

 

REF

90.0%

76.7%

83.3%

93.3%

83.3%

76.7%

87%

90.0%

86.7%

93%

 

Besides training, radiologists’ level of certainty in declaring the presence or absence of a radiology feature also impacts the diagnostic accuracy of that feature [21]. Declaring uncertain/equivocal/low suspicious findings as genuine is fraught with the possibility of inaccuracy or potentially detecting clinically less relevant findings in the milieu of contemporary radiotherapy and chemotherapy which can eradicate subclinical disease. In this study, we emphasized high specificity (>75% certainty) for iENE declaration for staging purpose to minimise dilution of prognostic effect caused by potential spurious findings. However, for clinical care, high sensitivity is also useful to minimize potential for lack of recognition of relevant findings that could alert treating physicians. Clearly communicating the radiologic findings is a complex skill that requires ongoing effort and attention throughout training and practice [22]. Therefore, it is important to adopt a standardized lexicon in radiology reports to facilitate communication among radiology report consumers, including surgical, radiation, and medical oncologists, and allied health professionals [23]. To standardize assessment of lymph nodes in cancer generally, the Node Reporting and Data System 1.0 (Node-RADS 1.0) system was introduced in 2021 incorporating “size”, “configuration”, and “suspicion level” [24]. This system has been tested in several disease sites (including nasopharyngeal carcinoma) with radiologists in academic centers, and shows promising performance [25]. However, it has not been tested in community practice in head and neck cancer. Our study adopted similar principles regarding iENE assessment that include morphological features and suspicious level for iENE reporting. To our knowledge, this is the first study testing a similar concept to Node-RADS in community radiology practice and showed similarly good reliability of community vs academic radiologists. These encouraging results should not be considered unexpected. Recently, in another study using predefined head and neck radiological criteria, Lee et al. [26], reported that general neuroradiologists, instead of head and neck subspecialists, fared well in separating head and neck cancers into discrete categories using the Neck Imaging Reporting and Data System (NI-RADS) for predicting recurrent disease. In their study NI-RADS scores provided by 21 neuroradiologists for the primary site and lymph nodes showed that for categories 1, 2, and 3, primary site recurrence rates were 5%, 29%, and 65% with AUC of 0.765, while lymph node recurrence rates were 3%, 10%, and 80% with AUC of 0.820. Therefore, committed radiologists employing a standardized lexicon for radiology assessment can perform well in real-world situations where subspecialty expertise may not always be regularly available. There are several limitations of our study. Firstly, the participants were recruited by responding to the EOI email invitation. It is conceivable that it could favor more eager radiologists with a specific interest in head and neck imaging. We are unable to assess reliability among the “non responders”. Secondly, the images we provided in Powerpoint® only included selected cuts without the entire image data-set volumes in PACS that would normally allow scrolling and window/levelling, multiplanar sequences, T1 and T2 weighting in several planes, and the ability to cross reference and magnify. For these reasons, our study only focused on iENE status but not “grade”. However, we did provide the HNCIG description of “grade” to depict various radiological features of iENE [11]. Using a Powerpoint® slide deck was a practical compromise to respect ethical and regulatory clearances that would have been required to share full Digital Imaging and Communication in Medicine (DICOM) data sets across multiple institutions and international jurisdictions. However, it probably posed challenges for participants in some cases. For example, consistent feedback provided by participants was that some cases included in the Powerpoint® slide deck for Round-2 were lower quality due to motion artifact, and also comprised fewer images than cases in Round-1. Despite these challenges, the concordance among all 11 raters in iENE declaration still reached 70% or above, which is reassuring, and is consistent with the studies where the same PPT files were used [5-12]. Moreover, the ability for radiologists to agree with non-standard imaging should indicate that similar or even better results would exist in general practice. Thirdly, most cases in the slide deck were comprised of MRI images, and some radiologists were more familiar with CT in the assessment of the head and neck. Finally, we relied on a single observer to establish our reference standard. However, the multi-rater concordance that included every radiologist (including the reference standard) showed overall good score in iENE status, which is reassuring.

CONCLUSIONS

Reliably assessing iENE in both academic and community practice will be crucial for future patient care with the evolution in TNM criteria. Several large international studies have shown variability in iENE assessment without standardized training and/or emphasis on high certainty in declaration. This international cohort study highlights the importance of training and criteria for iENE declaration. It provides much needed evidence supporting the adoption of iENE status in future iterations of prognostic factor risk models and stage classification, and may also help direct the choice of cancer therapy. Workshop-like group training appears to provide additional value in improving interrater concordance. Additional advanced training paradigms will be needed to ensure that radiologists provide reliable and reproducible data for staging of head and neck cancer patients.

FUNDING

NONE Conflict of Interest:

• No actual or potential conflicts of interest exist for any authors

Author Contributions:

  • Study concept: Shao Hui Huang, Brian O’Sullivan, Eugene Yu, Kristoff Nelson, Houda Bahig

• Study design: Shao Hui Huang, Brian O’Sullivan, Eugene Yu, Kristoff Nelson, Houda Bahig

• Data acquisition: All • Quality control of data: Shao Hui Huang, Jie Su, Brian O’Sullivan

• Data analysis and interpretation: All • Statistical Analysis: Jie Su

• Manuscript drafting: Shao Hui Huang, Brian O’Sullivan

• Manuscript editing: All

• Manuscript review: All

ACKNOWLEDGMENT

We acknowledge the Bartley-Smith/Wharton, the Gordon Tozer, the Wharton Head and Neck Translational, the “Discovery Fund”, the Dr. Mariano Elia, Petersen-Turofsky Funds, and “The Joe & Cara Finley Center for Head & Neck Cancer Research” at the Princess Margaret Cancer Foundation for supporting the authors’ (SHH, JS, BOS) academic activities. We also acknowledge the O. Harold Warwick Prize of the Canadian Cancer Society for supporting the author’s (BOS) academic activities.

REFERENCES

  1. Benchetrit L, Torabi SJ, Givi B, Haughey B, Judson BL. Prognostic Significance of Extranodal Extension in HPV-Mediated Oropharyngeal Carcinoma: A Systematic Review and Meta-analysis. Otolaryngol Head Neck Surg. 2021; 164: 720-732.
  2. Mermod M, Tolstonog G, Simon C, Monnier Y. Extracapsular spread in head and neck squamous cell carcinoma: A systematic review and meta-analysis. Oral Oncol. 2016; 62: 60-71.
  3. Billfalk-Kelly A, Yu E, Su J, O’Sullivan B, Waldron J, Ringash J et.al. Radiologic Extranodal Extension Portends Worse Outcome in cN+ TNM-8 Stage I Human Papillomavirus-Mediated Oropharyngeal Cancer. Int J Radiat Oncol Biol Phys. 2019; 104: 1017-1027.
  4. Huang SH, O’Sullivan B, Su J, Bartlett E, Kim J, Waldron JN et.al. Prognostic importance of radiologic extranodal extension in HPV- positive oropharyngeal carcinoma and its potential role in refining TNM-8 cN-classification. Radiother Oncol. 2020; 144: 13-22.
  5. Huang SH, Su J, Koyfman SA, Routman D, Hoebers F, Bahig H et.al. A Proposal for HPV-Associated Oropharyngeal Carcinoma in the Ninth Edition Clinical TNM Classification. JAMA Otolaryngol Head Neck Surg. 2025; 151: 655-664.
  6. Brierley J, Giuliani M, O’Sullivan B, Rous B, Van Eycken E. UICC TNM Classification of Malignant Tumours, 9th Edition Hoboken, New Jersey, ISA: John Wiley & Sons Ltd; 2025.
  7. Tran NA, Palotai M, Hanna GJ, Schoenfeld JD, Bay CP, Rettig EM et.al. Diagnostic performance of computed tomography features in detecting oropharyngeal squamous cell carcinoma extranodal extension. Eur Radiol. 2023; 33: 3693-3703.
  8. Patel MR, Hudgins PA, Beitler JJ, Magliocca KR, Griffith CC, Liu Y et.al. Radiographic Imaging Does Not Reliably Predict Macroscopic Extranodal Extension in Human Papilloma Virus-Associated Oropharyngeal Cancer. ORL J Otorhinolaryngol Relat Spec. 2018; 80: 85-95.
  9. Sahin O, Kamel S, Wahid KA, Dede C, Taku N, He R et.al. International Multi-Specialty Expert Physician Preoperative Identification of Extranodal Extension n Oropharyngeal Cancer Patients using Computed Tomography: Prospective Blinded Human Inter-Observer Performance Evaluation. medRxiv [Preprint]. 2025; 131: e35815.
  10. Mehanna H, Abou-Foul AK, Henson C, Kristunas C, Nankivell PC, McDowell L et.al. Accuracy and Prognosis of Extranodal Extension on Radiologic Imaging in Human Papillomavirus-Mediated Oropharyngeal Cancer: A Head and Neck Cancer International Group (HNCIG) Real-world Study. Int J Radiat Oncol Biol Phys. 2025; 123: 432-441.
  11. Henson C, Abou-Foul AK, Yu E, Glastonbury C, Huang SH, King AD et.al. Criteria for the diagnosis of extranodal extension detected on radiological imaging in head and neck cancer: Head and Neck Cancer International Group consensus recommendations. Lancet Oncol. 2024; 25: e297-e307.
  12. Hoebers F, Yu E, O’Sullivan B, Postma AA, Palm WM, Bartlett E et.al. Augmenting inter-rater concordance of radiologic extranodal extension in HPV-positive oropharyngeal carcinoma: A multicenter study. Head Neck. 2022; 44: 2361-2369.
  13. Chin O, Alshafai L, O’Sullivan B, Su J, Hope A, Bartlett E et.al. Inter-rater concordance and operating definitions of radiologic nodal feature assessment in human papillomavirus-positive oropharyngeal carcinoma. Oral Oncol. 2022; 125: 105716.
  14. AOSHNHR-ASHNR-ESHNR Joint Task Force. Imaging-derived Extranodal Extension (iENE) in Head and Neck Cancer. 2025.
  15. Gwet KL. Chapter 6: Benchmarking Inter-Rater Coefficients. Gaithersburg: Advanced Analytics, LLC, USA. 2014.
  16. Gwet KL. Handbook of Inter-Rater Reliability, 5th Edition, Volume 1: Analysis of Categorical Ratings. Gaithersburg, MD: AgreeStat Analytics; 2021.
  17. Hu Y, Lu T, Huang SH, Lin S, Chen Y, Fang Y et.al. High-grade radiologic extra-nodal extension predicts distant metastasis in stage II nasopharyngeal carcinoma. Head Neck. 2019; 41: 3317-3327.
  18. Huang SH, Chernock R, O’Sullivan B, Fakhry C. Assessment Criteria and Clinical Implications of Extranodal Extension in Head and Neck Cancer. Am Soc Clin Oncol Educ Book. 2021; 41: 265-278.
  19. Abou-Foul AK, Henson C, Chernock RD, Huang SH, Lydiatt WM, McDowell L et.al. Standardised definitions and diagnostic criteria for extranodal extension detected on histopathological examination in head and neck cancer: Head and Neck Cancer International Group consensus recommendations. Lancet Oncol. 2024; 25: e286-e296.
  20. Gupta R, Fielder T, Bal M, Chiosea SI, Dahlstrom JE, Kakkar A et.al. International Consensus Recommendations of Diagnostic Criteria and Terminologies for Extranodal Extension in Head and Neck Squamous Cell Carcinoma: An HN CLEAR Initiative (Update 1). Head Neck Pathol. 2025; 19: 20.
  21. Panicek DM, Hricak H. How Sure Are You, Doctor? A Standardized Lexicon to Describe the Radiologist’s Level of Certainty. AJR Am J Roentgenol. 2016; 207: 2-3.
  22. Hartung MP, Bickle IC, Gaillard F, Kanne JP. How to Create a Great Radiology Report. Radiographics. 2020; 40: 1658-1670.
  23. Das JP, Panicek DM. Added Value of a Diagnostic Certainty Lexicon to the Radiology Report. Radiographics. 2021; 41: E64-E65.
  24. Elsholtz FHJ, Asbach P, Haas M, Becker M, Beets-Tan RGH, Thoeny HC et.al. Introducing the Node Reporting and Data System 1.0 (Node- RADS): a concept for standardized assessment of lymph nodes in cancer. Eur Radiol. 2021; 31: 6116-6124.
  25. Zhong J, Mao S, Chen H, Wang Y, Yin Q, Cen Q et.al. Node-RADS: a systematic review and meta-analysis of diagnostic performance, category-wise malignancy rates, and inter-observer reliability. Eur Radiol. 2025; 35: 2723-2735.
  26. Lee J, Kaht D, Ali S, Johnson S, Bullen J, Karakasis C et.al. Performance of the Neck Imaging Reporting and Data System as applied by general neuroradiologists to predict recurrence of head and neck cancers. Head Neck. 2022; 44: 2257-2264.

Citation

O’Sullivan B, Huang SH, Yu E, Su J, Nelson K et al, (2025) Impact of Criteria, Training, and Diagnostic Certainty on Community Radiologists’ Assess ment of Imaging-Detected Extranodal Extension in HPV-Positive Oropharyngeal Carcinoma. SM J Radiol 8(1): 8.