Pancreatic cancer symptom trajectories from Danish registry data and free text in electronic health records

  1. Jessica Xin Hjaltelin
  2. Sif Ingibergsdóttir Novitski
  3. Isabella Friis Jørgensen
  4. Troels Siggaard
  5. Siri Amalie Vulpius
  6. David Westergaard
  7. Julia Sidenius Johansen
  8. Inna M Chen
  9. Lars Juhl Jensen
  10. Søren Brunak  Is a corresponding author
  1. Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Denmark
  2. Department of Oncology, Copenhagen University Hospital - Herlev and Gentofte, Denmark
  3. Copenhagen University Hospital, Rigshospitalet, Blegdamsvej, Denmark
4 figures, 1 table and 2 additional files

Figures

Figure 1 with 1 supplement
Comparison of pancreatic cancer symptoms in the Danish National Patient Registry (NPR) and electronic health records (EHRs).

(A) Symptoms before the pancreatic cancer diagnosis identified in NPR (NNPR = 24), by text mining of clinical notes from EHRs (Nnotes = 16) or in both data sources (Nboth = 57). (B) The top 10 most …

Figure 1—figure supplement 1
Comparing symptoms found from text mining, registry (NPR) and established well-known symptoms of pancreatic cancer.

Type 2 diabetes/new-onset diabetes was not included in the well-known symptoms list (Supplementary file 1a), since this is coded as a disease in the ICD-10 chapter. This comparison only covers the …

Figure 2 with 2 supplements
The most frequent text-mined symptoms from the clinical notes 5 years prior to pancreatic cancer diagnosis.

The most common and significant (p<0.05) symptoms in the text-mined clinical notes are shown with survival information and time to pancreatic cancer diagnosis (Supplementary file 3). The symptoms …

Figure 2—figure supplement 1
The most frequent registry-based symptoms from the NPR prior to pancreatic cancer diagnosis.

Significant symptoms are filtered so only those prior to a pancreatic cancer diagnosis with a RR>1 and p-value<0.05. Outlier dots has been removed to safeguard patient-sensitive information.

Figure 2—figure supplement 2
Staging information for pancreatic cancer patients.

(A) Pancreatic cancer cases with at least one clinical note 5 years prior to the cancer diagnosis. (B) Pancreatic cancer patients without any clinical notes 5 years prior to the cancer diagnosis. …

Figure 3 with 2 supplements
Symptom trajectories before and after pancreatic cancer diagnosis.

(A) The registry symptom trajectories consist of significant disease pairs with a Relative Risk (RR) >1 (Supplementary file 1d). Each trajectory has a minimum of 100 patients. (B) Symptom …

Figure 3—figure supplement 1
Registry-based trajectories with both symptoms and diseases.

Significant length 3 and length 4 trajectories for the registry-based analysis using the 18,523 pancreatic cancer cases from the Danish National Patient Registry. The diseases and chapters are from …

Figure 3—figure supplement 2
Survival of patients in significant trajectories.

(A) Survival in months for patients in the registry-based disease trajectories for both length 3 and 4 (N=10,542). (B) Survival in months for patients in the text-mined symptom trajectories (N=311). …

The text mining pipeline.

A dictionary was generated with symptoms and expanded with word endings to capture multiple forms of the same symptom. Afterwards, the dictionary and the corpus (clinical notes) were tokenized to …

Tables

Table 1
Data set and patient characteristics.
General cohort informationThe National Patient Registry (NPR)Electronic Health Records (EHRs)
Data set timeline1994–20182006–2016
N pancreatic cancer patients23,5923078
N controls6.9 million30,780
Pancreatic cancer cohort information
Female9328 (50.4%)1506 (48.9%)
Male9195 (49.6%)1572 (51.1%)
Mean age at diagnosis (female/male)73/6972/70
Age distributions (years)
<40139 (0.8%)16 (0.52%)
40–50733 (4.0%)107 (3.48%)
50–602523 (13.6%)352 (11.4%)
60–705288 (28.5%)941 (30.6%)
70–806017 (32.5%)1037 (33.7%)
>803821 (20.6%)625 (20.3%)

Additional files

Download links