Approximating missing epidemiological data for cervical cancer through Footprinting: A case study in India
Figures

Hierarchical structure of availability of cervical cancer epidemiological data.
-
Figure 1—source data 1
Overview of availability of cervical cancer epidemiological data by state.
- https://cdn.elifesciences.org/articles/81752/elife-81752-fig1-data1-v1.docx

Identified clusters of registry-specific cervical cancer incidence.
Clusterings under (A) 2, (B) 3, and (C) 4 prefixed clusters. Each panel within a row corresponds to a cluster within a k-clustering, with the cluster label given on top of the panel. The cervical cancer incidence data were extracted from volume XI of Cancer Incidence in Five Continents (CI5) (Bray et al., 2017) and the 2012–2016 report by the Indian National Centre for Disease Informatics and Research (NCDIR) (Report of National Cancer Registry Programme, 2020). Black: cluster mean of cervical cancer incidence; dark grey: registry incidence assigned to the cluster; light grey: registry incidence assigned to other clusters.
-
Figure 2—source data 1
Registry-specific cervical cancer incidence data from Cancer Incidence in Five Continents (CI5) and National Centre for Disease Informatics and Research (NCDIR).
- https://cdn.elifesciences.org/articles/81752/elife-81752-fig2-data1-v1.xlsx
-
Figure 2—source data 2
Estimated model parameters under Poisson regression models.
- https://cdn.elifesciences.org/articles/81752/elife-81752-fig2-data2-v1.docx

Registry-specific cervical cancer incidence data from Cancer Incidence in Five Continents (CI5) and National Centre for Disease Informatics and Research (NCDIR).
See Figure 1—source data 1 for whether registries belong to CI5 or NCDIR.

Sexual behaviour data from National AIDS Control Organization (NACO) by Indian state.
Indian state-specific data on (A) median age of first sex, (B) proportion of respondents reporting sex with non-regular partners in the last 12 months, (C) proportion of male respondents reporting sex with commercial partners in the last 12 months, and (D) proportion of male respondents by number of commercial partners in the last 12 months. Each violin plot and the associated cloud of circles correspond to a sexual behaviour variable. Each circle corresponds to the data of a state (or group of states). The data were extracted from the 2006 National Behaviour Surveillance Survey of the National AIDS Control Organization of India (National Behavioural Surveillance Survey: General Population, 2006). Blue and red: Indian states identified in the high and low cervical cancer incidence clusters. Grey: states without cervical cancer incidence data and therefore unknown cluster.
-
Figure 3—source data 1
Indian state-specific sexual behaviour data from National AIDS Control Organization (NACO).
- https://cdn.elifesciences.org/articles/81752/elife-81752-fig3-data1-v1.xlsx
-
Figure 3—source data 2
Predictive values of the sexual behavior variables for cervical cancer incidence cluster.
- https://cdn.elifesciences.org/articles/81752/elife-81752-fig3-data2-v1.docx

Indian state-specific sexual behaviour data from National AIDS Control Organization (NACO).
States or groups of states as reported in the 2006 National Behaviour Surveillance Survey of the National AIDS Control Organization of India. Other North Eastern states include Arunachal Pradesh, Nagaland, Meghalaya, Mizoram, and Tripura.
Tables
Estimated parameters of clusters of cervical cancer incidence patterns.
Number of prefixed clusters | BIC* | Cluster label | Number (%) of registries in cluster | Maximum incidence† | Maximum incidence pattern | Maximum incidence age group‡ | Maximum incidence age group pattern |
---|---|---|---|---|---|---|---|
2 | 6933 | 1 | 27 (82%) | 47 cases | Low | 60–64 years | Late |
2 | 6 (18%) | 91 cases | High | 55–59 years | Early | ||
3 | 5700 | 1 | 19 (58%) | 38 cases | Low | 60–64 years | Late |
2 | 5 (15%) | 92 cases | High | 55–59 years | Early | ||
3 | 9 (27%) | 64 cases | Intermediate | 60–64 years | Late | ||
4 | 5532 | 1 | 18 (55%) | 39 cases | Low | 60–64 years | Late |
2 | 5 (15%) | 92 cases | High | 55–59 years | Early | ||
3 | 9 (27%) | 64 cases | Intermediate | 60–64 years | Late | ||
4 | 1 (3%) | 20 cases | Very low | 60–64 years | Early |
-
*
Bayesian information criterion for evaluating the goodness-of-fit of obtained clustering.
-
†
Maximum incidence given in cases per 100,000 women-years.
-
‡
Five-year age group in which the maximum incidence is located.
Clustering of cervical cancer incidence of Indian states based on clustering of registries.
State/group of states* | 2-Clustering | 3-Clustering | 4-Clustering | ||||||
---|---|---|---|---|---|---|---|---|---|
1(low, late) | 2(high, early) | 1(low, late) | 2(high, early) | 3(interm., late) | 1(low, late) | 2(high, early) | 3(interm., late) | 4(very low, early) | |
Andhra Pradesh | ● | ● | ● | ||||||
Assam | ●●● | ●●● | ●● | ● | |||||
Delhi | ● | ● | ● | ||||||
Gujarat+Dadra and Nagar Haveli | ● | ● | ● | ||||||
Karnataka | ● | ● | ● | ||||||
Kerala +Lakshadweep | ●● | ●● | ●● | ||||||
Madhya Pradesh | ● | ● | ● | ||||||
Maharashtra | ●●●●●● | ● | ●●●● | ●●● | ●●●● | ●●● | |||
Manipur | ●● | ●● | ●● | ||||||
Other North Eastern states† | ●●●●● | ●●●● | ●●●● | ●●●● | ● | ●●●● | ●●●● | ● | |
Punjab +Chandigarh | ● | ● | ● | ||||||
Sikkim | ● | ● | ● | ||||||
Tamil Nadu +Puducherry | ● | ● | ● | ● | ● | ● | |||
West Bengal +Andaman and Nicobar Islands | ● | ● | ● |
-
Each circle represents the count of one registry being assigned to the corresponding cluster. Grey shading represents the cluster including the highest number of registries, either exclusively or in a draw with another cluster.
-
Cluster labels and the corresponding patterns of maximum incidence and maximum incidence age group given in the second row were defined in the third, sixth, and eighth columns of Table 1, respectively.
-
*
States/or groups of states were defined as reported in the 2006 National Behaviour Surveillance Survey of the National AIDS Control Organization of India (National Behavioural Surveillance Survey: General Population, 2006).
-
†
Other North Eastern states included Arunachal Pradesh, Nagaland, Meghalaya, Mizoram, and Tripura.
Identified and classified cluster of cervical cancer incidence pattern by Indian state.
Cervical cancer incidence data* | State/group of states† | Identified cluster‡ | Classified cluster§ | Probability of belonging to the low-incidence cluster |
---|---|---|---|---|
Available | Andhra Pradesh | Low | Low | 0.60 |
Assam | Low | Low | 0.69 | |
Delhi | High | High | 0.42 | |
Gujarat+Dadra and Nagar Haveli | Low | Low | 0.69 | |
Karnataka | High | Low | 0.63 | |
Kerala+Lakshadweep | Low | Low | 0.60 | |
Madhya Pradesh | High | High | 0.44 | |
Maharashtra | Low | Low | 0.57 | |
Manipur | Low | Low | 0.65 | |
Other North Eastern states¶ | High | Low | 0.53 | |
Punjab+Chandigarh | High | High | 0.41 | |
Sikkim | Low | Low | 0.63 | |
Tamil Nadu+Puducherry | High | High | 0.38 | |
West Bengal+Andaman and Nicobar Islands | Low | Low | 0.71 | |
Unavailable | Bihar | - | Low | 0.67 |
Chhattisgarh | - | Low | 0.66 | |
Goa+Daman and Diu | - | Low | 0.54 | |
Haryana | - | Low | 0.66 | |
Himachal Pradesh | - | Low | 0.58 | |
Jammu and Kashmir | - | Low | 0.63 | |
Jharkhand | - | Low | 0.71 | |
Orissa | - | Low | 0.68 | |
Rajasthan | - | Low | 0.66 | |
Uttar Pradesh | - | Low | 0.64 | |
Uttarakhand | - | Low | 0.69 |
-
*
Availability of cervical cancer incidence data was based on the incidence data from volume XI of Cancer Incidence in Five Continents (CI5) (Bray et al., 2017) and the 2012–2016 report of the National Centre for Disease Informatics and Research (NCDIR) (Report of National Cancer Registry Programme, 2020).
-
†
States/groups of states were defined as reported in the 2006 National Behaviour Surveillance Survey of the National AIDS Control Organization of India (National Behavioural Surveillance Survey: General Population, 2006).
-
‡
Identified clusters derived in the Clustering step.
-
§
Classified clusters derived in the Classification step. A given state was classified to the low-incidence cluster if the probability of belonging to the low-incidence cluster (given in the next column) was above 0.50. For the Indian states with available cervical cancer incidence data and hence already in an identified cluster, classification was done for the purpose of validation.
-
¶
Other North Eastern states included Arunachal Pradesh, Nagaland, Meghalaya, Mizoram, and Tripura.
Additional files
-
Supplementary file 1
Appendix 1 - Poisson-regression-based CEM clustering algorithm.
- https://cdn.elifesciences.org/articles/81752/elife-81752-supp1-v1.docx
-
MDAR checklist
- https://cdn.elifesciences.org/articles/81752/elife-81752-mdarchecklist1-v1.docx