Introduction

Influenza, caused by influenza viruses, remains a significant global health challenge, responsible for millions of infections and considerable mortality annually (Iuliano et al. 2018). Influenza viruses, primarily transmitted via respiratory droplets and contact, predominantly belong to types A and B during human epidemics (Uyeki et al. 2022). Transmission dynamics are influenced by viral characteristics, host susceptibility, and environmental conditions (Javanian et al. 2021), particularly meteorological variables, such as temperature, humidity, and precipitation, which are increasingly affected by climate change (Moriyama, Hugentobler, and Iwasaki 2020). In temperate climates, influenza epidemics typically peak in winter (Ryu and Cowling 2021), while subtropical regions exhibit year-round sporadic cases with less pronounced seasonality (Li et al. 2019).

Laboratory and epidemiological studies suggest that cold temperatures and low humidity in high-latitude temperate areas enhance influenza transmissibility. These include stabilization of the virus in aerosolized form and impairment of host mucociliary clearance (lower temperatures:5-8°C) (Moriyama, Hugentobler, and Iwasaki 2020), together with promotion of smaller aerosol droplets that remain airborne longer, mucosal desiccation and weakening of the respiratory epithelial barrier (low humidity <40%) (Peci et al. 2019; Lowen and Steel 2014; Eccles 2002; Tamerius et al. 2013). Conversely, subtropical regions experience concentrated influenza outbreaks due to high humidity and moderate temperatures, with sporadic cases observed throughout the year (Deyle et al. 2016; Soebiyanto et al. 2015). While these findings highlight the complex interaction between meteorological factors and influenza incidence across different climatic zones, current research lacks systematic quantification of climate-disease dynamics, particularly in subtropical regions, necessitating further investigation to elucidate mechanisms and improve preventive strategies (Bloom-Feshbach et al. 2013; Tamerius et al. 2013).

Current approaches to investigating meteorological influences on subtropical influenza transmission face several limitations. Existing studies often focus on individual meteorological indicators (e.g., temperature or humidity), neglecting the interactive effects of multiple factors, such as temperature, precipitation, and solar radiation (Serman et al. 2022; Matsuki et al. 2023). Moreover, the region-specific patterns of these interactions in subtropical areas remain underexplored (Zheng et al. 2021). Traditional statistical methods, like Autoregressive Integrated Moving Average (ARIMA) models, despite widely used for influenza prediction (Wang et al. 2017), struggle to capture time-lagged effects and complex non-linear relationships between influenza incidence and multiple meteorological variables (Shoji, Katayama, and Sano 2011). These gaps hinder the development of accurate early warning systems, particularly in climate-vulnerable areas.

The introduction of Distributed Lag Non-linear Models (DLNM) offers a framework for analyzing exposure-response and lag-response relationships (Gasparrini, Armstrong, and Kenward 2010). While DLNM has been preliminarily applied in empirical analyses of influenza transmission patterns (Liu et al. 2020; Guo et al. 2019), limitations persist, including reliance on relatively small-scale, single-centre surveillance data (Ng et al. 2022; Si et al. 2024), which compromises generalizability. Current studies also inadequately explore the lagged effects of meteorological factors on influenza transmission, often overlooking spatiotemporal heterogeneity (Zhang et al. 2022). Recent advancements in time series analysis and deep neural learning to predict influenza trends (Venna et al. 2018), particularly Recurrent Neural Networks (RNNs), show promise in improving prediction accuracy due to their ability to capture non-linear patterns in time series data (Venna et al. 2018). However, traditional RNNs encounter issues such as gradient explosion and memory decay, limiting their effectiveness for long-term influenza predictions (Liu et al. 2021). Long Short-Term Memory (LSTM) algorithms, a more advanced type of RNNs, address these issues by incorporating gating mechanisms for learning long-term dependencies (Absar et al. 2022; Liu et al. 2021). Despite LSTM’s potential in public health applications (Absar et al. 2022), its use in influenza prediction is limited, facing challenges such as complexity, high computational cost, and poor interpretability (Liu et al. 2021). For instance, LSTM algorithms have demonstrated good predictive performance in Chinese subtropical Fujian region but have struggled to differentiate between influenza virus subtypes (Zhu et al. 2022). A study proposed a complex seasonal autoregressive integrated moving average-LSTM (SARIMA-LSTM) model, utilizing singular spectrum analysis to predict influenza in Shanxi. However, its complexity and computational intensity limit broader applicability (Zhao et al. 2023). While recent studies employing DLNM and LSTM models have mainly focused on temperate regions, there remains a notable knowledge gap of research integrating these advanced methodologies to analyze and predict subtype-specific influenza dynamics in subtropical zones, particularly in the context of comprehensive meteorological drivers and large-scale surveillance data.

To address these gaps, this study utilizes six years (2018-2023) of comprehensive, multi-site influenza surveillance data and detailed meteorological records from Putian, a representative subtropical city on the southeastern coast of China. Our primary objectives were twofold: first, to employ DLNM to rigorously quantify the non-linear and lagged associations between multiple meteorological variables and laboratory-confirmed influenza A and B incidence; second, to develop and validate a Bayesian-optimized LSTM neural network, incorporating meteorological insights and Corona virus disease 2019 (COVID-19) pandemic indicators, to predict influenza trends and benchmark its performance against a traditional ARIMA model. By elucidating the specific meteorological conditions and lag periods associated with increased risk for influenza A versus B, this study provides crucial evidence to inform targeted public health strategies. Specifically, our integrated analytical framework yields actionable public health strategies in three key areas: (1) optimize resource allocation through precise identification of high-risk meteorological windows for different influenza types, allowing health systems to adjust staffing and resource distribution accordingly; (2) enhance vaccination campaigns by identifying optimal timing based on type-specific meteorological risk patterns; and (3) improve risk communication by developing region-specific early warning systems that translate meteorological forecasts into influenza risk assessments. Our findings contribute to emerging digital health surveillance systems for climate-adaptive influenza management by bridging meteorological insights with advanced predictive modeling to strengthen global health preparedness, particularly in climatically complex subtropical regions facing increased environmental variability due to climate change.

Methods

Ethics statement

This study was institutionally approved by the Research Ethics Committee of the Putian Centre for Disease Control and Prevention (Putian CDC, approval number: Ethics Review 2020-003), Putian, and the Medicine and Biological Engineering Technology Research Centre of the Ministry of Health (MBETRC), Guangzhou, China. All procedures complied with the later amendments of the 1964 Declaration of Helsinki and relevant ethical guidelines for biomedical research. As a primarily prospective study with retrospective elements, data collected prior to formal ethical approval (from January 1, 2018 to October 13, 2020) were retrospectively analyzed. These data were originally gathered as part of established public health surveillance activities, following standardized protocols consistently applied since the start of data collection. Data collection from October 13, 2020, onwards proceeded prospectively under the approved ethical framework. Data quality was maintained throughout the retrospective (2018–2020) and prospective (2020–2023) phases by following China CDC protocols since 2004, ensuring consistent case definitions, sample collection, and laboratory testing. The Putian CDC’s use of Da’an Gene influenza A/B detection reagents since 2017 ensured diagnostic continuity, minimizing bias.

Rigorous procedures were employed to ensure patient confidentiality and data privacy. Influenza surveillance data provided for this research were fully anonymized prior to being accessed by the research team. The dataset included only essential epidemiological information (such as onset date, gender, age, and residential district) necessary for the analysis, and excluded any personal identifiable information, such as names, national identification numbers, addresses, or contact details. Additionally, individual-level data on vaccination status or detailed socioeconomic profiles were not routinely collected as part of this surveillance. Any potential linkage between coded data and original patient identifiers, if maintained within the primary surveillance system by the Putian CDC, was not accessible to the researchers involved in this study. Consequently, the Research Ethics Committee waived the requirement for individual informed consent from patients. Throughout the data collection process, the diagnosis of all influenza-like illness (ILI) cases, information and specimen collection, and influenza virus nucleic acid testing strictly followed the prevailing national standards, including the Diagnostic Criteria for Influenza (source: National Health Commission of the People’s Republic of China) and the National Influenza Surveillance Technical Specifications (source: China CDC). All medical professionals, CDC personnel, and laboratory technicians participating in the study underwent professional training to ensure research quality. Meteorological and COVID-19 pandemic data were sourced from publicly available databases, ensuring no privacy concerns. The anonymized data were securely managed, used solely for the research purposes described herein, and were not disclosed to any third party or utilized for other objectives.

Study design

This collaborative study by the Putian CDC and MBETRC utilized a prospective cohort design, with retrospective analysis of historical data to systematically assess non-linear quantitative relationships between multiple meteorological factors and both influenza A and B infection risks, ultimately constructing a predictive network for outbreaks. The technical workflow is illustrated in Figure 1.

Schematic illustration describing the design of the study.

The study spanned January 1, 2018, to December 31, 2023, collecting ILI and laboratory-confirmed cases from 7 influenza sentinel hospitals in Putian across urban and rural areas, alongside daily meteorological and COVID-19 monitoring data that were recorded from 2020. To ensure consistency between meteorological and influenza incidence data, rigorous quality control and pre-processing were applied to the raw datasets, including those from public health surveillance, laboratory testing and self-download. Descriptive statistics analyzed temporal and demographic distribution patterns of influenza cases, and time series plots visualized the dynamic relationship between influenza incidence and meteorological conditions.

DLNM was employed to characterize the non-linear exposure-lag-response relationships between meteorological factors and influenza risk, with particular emphasis on lag effects under extreme weather conditions. Subsequently, we developed an LSTM neural network-based predictive model, incorporating both key meteorological factors identified through DLNM analysis and COVID-19-related indicators (e.g., mask-wearing index, COVID-19 case numbers) The LSTM network was trained using influenza surveillance and meteorological data from 2018 to 2022, while 2023 data from Putian serving as the internal validation set. For external validation, we utilized analogous data from Sanming, a neighboring subtropical city. The predictive performance of the LSTM algorithm for influenza A/B prevalence in 2023 was benchmarked against a parallel ARIMA model to assess its comparative effectiveness.

Study area

Putian city, situated on the southeast coast of China (latitudes 24 ° 59’N and 25 ° 46’N, longitudes 118°27’E and 119°39’E), serves as an exemplary site for examining the impact of meteorological factors on influenza transmission in subtropical regions. The city experiences a typical subtropical monsoon climate, characterized by warm, humid summers and mild, dry winters. Annual average temperature ranges from 18 to 21°C, with summer highs reaching 38°C and winter lows dropping to 0.4°C. Annual precipitation averages 1538.8 mm, with 72% of rainfall occurring between April and September. These pronounced seasonal variations provide a dynamic natural laboratory for studying climate-driven influenza dynamics, reflecting conditions common across subtropical zones globally. In addition to its climatic profile, covering an area of 4200 square kilometres and comprising five counties (or districts) with a population exceeding 3.2 million, Putian serves as a commercial and manufacturing centre characterized by a high population density and comprehensive transportation infrastructure. This environment promotes significant mobility and frequent social interactions, which are critical factors in the transmission dynamics of influenza. Since 2004, the Putian CDC has operated a comprehensive influenza surveillance network, adhering to national standards and utilizing advanced diagnostic methods, including standardized real-time reverse transcription polymerase chain reaction (RT-PCR) testing for influenza A/B since late 2017, supported by a trained workforce. This infrastructure ensures access to high-quality, longitudinal data, underpinning the robustness of our analyses. Together, Putian’s climatic variability, socio-economic characteristics, and surveillance capabilities position it as a representative and ideal location for this study, with insights potentially extensible to other subtropical settings sharing similar environmental and epidemiological traits. Sanming city, located between latitudes 25°30’N and 27°97’N and longitudes 116°22’E and 118°39’E, is a subtropical area primarily characterized by its mountainous and hilly landscape, about 330 kilometres from Putian city. It was selected as an external validation data source for this study.

Study cohort

The study systematically screened outpatients and emergency departments across 7 influenza surveillance sentinel hospitals within Putian’s prefecture, with an emphasis on high-incidence departments such as fever clinics, internal medicine, and pediatrics. The screening encompassed ILI cases from January 1, 2018, to December 31, 2023. According to earlier mentioned Diagnostic Criteria for Influenza, ILI cases are defined as acute onset with high fever (≥38°C) accompanied by respiratory symptoms such as cough or sore throat yet lacking a definitive causative diagnosis. Respiratory samples, together with onset dates, ages, genders, and residence information, were collected from all ILI patients. Unified diagnostic standards, laboratory testing methods, and quality control measures were employed throughout the research process to ensure the reliability and comparability of results.

After conducting nucleic acid screening for influenza viruses from the collected respiratory samples of ILI cases, the criteria for inclusion in this study prescribed the following: 1) the patient presented ILI symptoms; 2) the nucleic acid test for the influenza virus yielded a positive result; 3) the patient is a permanent resident of Putian city; 4) the duration of symptoms from onset to consultation did not exceed three days. No restrictions were placed on age or gender. The exclusion criteria were: 1) patients with clear alternative diagnoses, such as bacterial pneumonia or tuberculosis; 2) patients with unstable underlying conditions, such as acute exacerbations of chronic obstructive pulmonary disease (COPD); 3) patients with missing data. This focused the analysis on acute ILI presentations.

Sample collection, nucleic acid extraction and pathogen typing

Respiratory specimens (nasopharyngeal swabs) were collected from ILI patients during their initial visit, prior to treatment, and stored at 4°C in viral transport medium. All samples were delivered to the Putian CDC laboratory within 24 hours of collection, where nucleic acid extraction and real-time RT-PCR for influenza viruses detection were performed within 24 hours of arrival.

For nucleic acid extraction, a commercial magnetic bead-based kit (Da’an Gene, Guangzhou, China) was used, following the manufacturer’s protocol. The extracted nucleic acid was dissolved in 50 µL of elution buffer, with concentration and purity measured using a Nanodrop 2000 spectrophotometer (Thermo-Fisher Scientific, Waltham, MA). Acceptable nucleic acid purity ratios were defined as an optical density (OD)260/280 ratio between 1.7 and 2.5, and an OD260/230 ratio between 0.5 and 2.5. The integrity of the extracted nucleic acids was confirmed using an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA), with RNA Integrity Number (RIN) values ≥8 and 28S/18S ratios ≥1.5 considered indicative of high-quality samples.

Qualified nucleic acids underwent one-step real-time fluorescence RT-PCR detection within two hours, utilizing commercially available detection kits for influenza A (H1N1) virus (2009) RNA, seasonal influenza virus H3N2 subtype, and influenza B (Victoria and Yamagata lineages), all purchased from Da’an Gene (Guangzhou, China). The detection procedure was performed on a QuantStudio real-time fluorescence quantitative PCR instrument (Thermo-Fisher Scientific, Waltham, MA), following the manufacturer’s instructions. Each detection batch contained negative, positive, and internal control replicates to ensure the reliability of the results. The PCR results were interpreted adhering to the assay manufacturer’s protocols, with amplification curves and cycle threshold (Ct) values used to determine positivity. Positive results were determined if typical amplification curves were observed in both the internal reference fluorescence channel and the target gene fluorescence channel, with corresponding Ct values within the reference range. Specific positive criteria were as follows: for influenza A H1N1, FAM channel amplification with Ct value ≤38; for H3N2, FAM channel amplification with Ct value ≤ 38; for influenza B Victoria, FAM channel amplification with Ct value ≤38; for Yamagata, VIC channel amplification with Ct value ≤ 38. All tests were conducted in laboratories meeting Biosafety Level 2 (BSL-2) standards.

Meteorological and COVID-19 data sources

Daily meteorological observation data were sourced from the authoritative public database Visual Crossing (https://www.visualcrossing.com/), which provides extensive historical time-series data across multiple meteorological variables on a global scale. We extracted meteorological variables for Putian city, including daily average temperature (°C), relative humidity (%), precipitation (mm), diurnal temperature range (°C), wind speed (km/h), wind direction, solar radiation (W/m2), and the ultraviolet (UV) index, wherein the diurnal temperature range was defined as the difference between the highest and lowest temperatures of the day. All indicators were synchronized with the influenza data. To ensure data quality, we conducted missing value and anomaly checks on the raw datasets.

Early COVID-19 studies revealed a notable decline in positivity rates of seasonal respiratory viruses relative to pre-pandemic levels, likely due to containment measures such as masking and behavioral changes during surges (Groves et al. 2021; Xiao et al. 2022). To evaluate the potential impact of the COVID-19 epidemic and its control measures on influenza incidence, we collected daily COVID-19 case data from Putian city, spanning January 1, 2020, to December 31, 2022, obtained from the municipal website (https://www.putian.gov.cn/ztzl/tcyqfkhjjshfz/yqdt/qptsyqdt/). Additionally, we obtained the policies on face-covering stringency index for China during the same period from the Our World in Data database (https://ourworldindata.org/coronavirus) as a reference indicator for social prevention measures. This database categorizes the stringency index into four levels: 0 represents “No policy”; 1 represents “Recommended”; 2 represents “Required in some specified shared/public spaces outside the home with other people present, or some situations when social distancing not possible”; 3 represents “Required in all shared/public spaces outside the home with other people present or all situations when social distancing not possible”; 4 represents “Required outside the home at all times regardless of location or presence of other people”. Based on the established criteria, we set the stringency index to 0 for the period prior to 2020; for 2023, due to adjustments in China’s COVID-19 outbreak control policies recommending public mask-wearing, we set the stringency index to 1 as defined by the Our World in Data database; other years’ indices were configured contextually to reflect the dynamic impact of pandemic prevention intensity on influenza incidence. Additionally, using the same definition, we collected data of the aforementioned types from Sanming city, for external validation of the LSTM network, covering the period from January 1 to December 31, 2023. The datasets from these public databases, along with influenza surveillance data, are devoid of missing values, eliminating the need for imputation. For influenza and COVID-19 data, days with no reported cases were explicitly recorded as zero per the sentinel hospitals’ and government’s daily reporting protocols, respectively.

DLNM analysis

To quantify the non-linear and lagged effects of meteorological factors on influenza incidence risk, this study employed DLNM. DLNM, proposed by Gasparrini et al., integrates generalized linear models, non-linear exposure-response relationships, and lag effects, enabling the simultaneous characterization of complex non-linear and time-varying exposure-lag-response associations (Gasparrini 2011). Separate DLNM models were constructed for influenza A and B incidence risks. The dependent variable was the daily number of confirmed influenza cases, with meteorological factors serving as independent variables. The maximum lag time was set to 15 days for all models, optimizing biological validity and analytical precision to capture typical influenza incubation and transmission cycles while minimizing model complexity and statistical noise from extended periods, aligning with previous research. A quasi-Poisson function was used to account for overdispersion in the incidence data. The model formula is as follows:

where α is the intercept, cb is the cross-basis function of the meteorological factor x, which includes the exposure-response relationship ns(x,dfx), defined using natural splines and the lag-response relationship g(k) defined using polynomial functions, with k as the lag time; ns(time,dftime), is a natural spline function reflecting long-term trends, used to control for long-term trends and seasonality. The degrees of freedom dfx for the exposure-response relationship ns spline were selected from the range {2, 3, 4, 5}, while the degrees of freedom dftime per year for the long-term trend ns spline were selected from the range {6, 7, 8, 9, 10}. To determine the optimal df, including dfx for the natural spline modeling the exposure-response relationship and dftime per year for the natural spline modelling the long-term trend, the combinations of above ranges across selected meteorological factors were systematically evaluated using the quasi-Akaike information criterion (QAIC), with the combination yielding the smallest QAIC being selected. This indicates the best balance between goodness-of-fit and model complexity for our influenza-specific dataset. The QAIC formula is:

where λ represents the log-likelihood of the model, Φ is the dispersion parameter, and k indicates the number of model parameters. When plotting prediction results, the median was selected as the reference value for its robustness against outliers, offering a consistent depiction of typical meteorological conditions in Putian and enabling the evaluation of bidirectional risk deviations, especially during extreme weather events. To further explore the impact of extreme weather conditions, we predicted and displayed the effects of extreme scenarios across the meteorological factors: except for precipitation extremes set at 5 mm and 25 mm and wind direction extremes at the most common eastern and rarest northwestern winds, all other meteorological factors were assessed at their 5th and 95th percentiles to estimate the lagged risk curves relative to the median. All results underwent seasonality and long-term trend corrections to eliminate the inherent temporal variation effects between meteorological factors and influenza incidence.

LSTM modeling and analysis

Building on the findings of the DLNM analysis, which identified significant non-linear and lagged effects, this study further attempted to construct multi-factor influenza A and B prediction networks based on LSTM to model these dynamics.

Data preprocessing

The LSTM network training utilized time series data from January 1, 2018, to December 31, 2022 (88.26% of the total Putian data), with 10% randomly selected as the cross-validation set. Data from January 1, 2023, to December 31, 2023 (11.74%) served as the internal validation set. Data from Sanming city, collected from the same period, was utilized as the external validation set. First, we performed data normalization, a crucial step to ensure that training and test set data are compared on a unified scale. We normalized the training and test set data separately within the range [0, 1]. For the test set normalization, we used the maximum and minimum values from the training set as boundaries. This approach ensured consistency between the normalized test set data and the training set data. After the algorithm conducted predictions on the test set data, we performed denormalization to convert the predicted results back to the original data scale and rounded them to integers. These steps ensured that the final prediction results accurately and objectively reflected the LSTM’s performance in real-world scenarios and provided reliable data for subsequent calculation of evaluation metrics.

The normalization formula is as follows:

The denormalization formula is as follows:

where ynormalization is the normalized result, y is the original data, maxY is the maximum value of the variable in the training set, minY is the minimum value of the variable in the training set, y is the predicted value output by the network, and ydenormalization is the denormalized result.

Finally, we used the above data along with consideration for potential confounders by including features such as the day of the week (DOW), COVID-19-reated variables, as input variables, set the time step to 15 (i.e., using data from the past 15 days to predict the number of influenza cases for the next day), and organized the two-dimensional data into three-dimensional arrays as input data for the LSTM network.

Network construction

Our algorithm is a unidirectional LSTM, using Adam as the optimizer and Mean Square Error (MSE) as the loss function. The number of hidden layers was determined through hyperparameter optimization to ensure optimal network performance. Each LSTM layer consists of multiple LSTM units, including input gates, output gates, and forget gates. The basic framework of LSTM and the structure of LSTM units are shown in the LSTM model training section of Figure 1. Detailed principles and formulas are provided in the LSTM cell description section of Supplementary information and illustrated in Figure S1.

In the LSTM network, we used Hyperopt for hyperparameter tuning, with MSE established as the reference metric for evaluating and selecting optimal parameters. The LSTM network hyperparameter setting strategy was as follows: We first defined the selection range for the number of hidden layers, choosing between 1, 2, and 3 layers. Next, for the number of neurons in each hidden layer, we set a candidate set of {8, 16, 32, 64, 128, 256, 512}. Subsequently, the activation function for each hidden layer was optimally selected from various functions including sigmoid, Rectified Linear Unit (ReLU), Hyperbolic Tangent (tanh), Swish, Exponential Linear Unit (ELU), and Softplus. Furthermore, we carefully planned the batch size for each training session, with candidate values covering the range {8, 16, 32, 64, 128, 256, 512}. For the number of iterations (epochs), we set a natural number selection interval from 300 to 1000 to flexibly control the training depth. Finally, the learning rate of the optimizer was selected from the set {0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001}.

LSTM performance evaluation

To comprehensively assess the predictive performance of the LSTM network, this study employed multiple commonly used error metrics, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and Symmetric Mean Absolute Percentage Error (SMAPE) (Chicco, Warrens, and Jurman 2021).

MAE reflects the actual magnitude of the error between predicted and true values. A smaller MAE indicates more accurate network predictions. The calculation formula is:

RMSE is the arithmetic square root of the mean of the sum of squared errors. A smaller RMSE indicates better network prediction performance. The calculation formula is:

A smaller MAPE indicates better network fitting. Since the denominator cannot be zero, this study only calculated MAPE for data where the true value was not zero. The calculation formula is:

A smaller SMAPE indicates better network fitting. The calculation formula is:

In the above formulas, ŷ is the denormalized predicted value, yi is the observed value, and n is the number of observations.

Additionally, we plotted the loss function curves for the network on the training and validation sets to monitor LSTM convergence and the risk of overfitting. By comparing the trends and differences between the two curves, we could intuitively judge the training quality and generalization ability of the LSTM.

To further explain the prediction logic of the LSTM, this study employed the SHapley Additive exPlanations (SHAP) method for interpretability analysis. Based on Shapley values from game theory, SHAP can quantify the contribution of each feature to the LSTM’s predicted values and generate intuitive feature importance rankings and dependency plots.

Positive values indicate that the feature increased the predicted result, while negative values indicate a decrease in the predicted result, with the absolute value representing the intensity of the influence. SHAP values were computed by treating each meteorological variable’s lagged sequence as a unified input feature, preserving the temporal dependencies within each sequence rather than decomposing lags into independent features. This approach not only aligns with the sequential nature of our LSTM inputs but also supports our focus on population-level trends rather than individual-level causal effects. To rigorously compare the predictive performance between the LSTM and traditional approaches, we concurrently developed a ARIMA model, employing the same training and validation datasets (detailed methodological description in Supplementary Information Additional File 4).

Statistical analysis

Statistical analyses were primarily conducted using R software version 4.3.1 and Python version 3.8.3. DLNM modeling was implemented using the DLNM package in R. LSTM construction used Python 3.8.3, with the main program packages including tensorflow 2.2.0, keras 2.3.1, and shap 0.44.1. The ARIMA model was developed using the forecast package in R. All statistical tests were two-sided, with P < 0.05 considered statistically significant.

Results

Descriptive epidemiological characteristics of influenza

Between 2018 and 2023, influenza surveillance conducted at 7 sentinel hospitals in Putian city documented a total of 20488 ILI cases, with the majority being influenza-negative (17333 cases, 84.60%). Of these, 3155 cases were confirmed positive for influenza through PCR testing, representing a positivity rate of 15.40%. By viral type, 1685 cases were attributed to influenza A (8.23%), and 1470 cases were influenza B (7.17%). Further subtype analysis of influenza A revealed 790 cases of H1N1 (46.88%) and 895 cases of H3N2 (53.12%), while for influenza B, 1270 cases were of the Victoria lineage (86.39%) and 200 cases were of the Yamagata lineage (13.61%). Notably, following the onset of the COVID-19 pandemic in the spring of 2020, there was a significant reduction in the incidence of influenza. The annual incidence rates for influenza in 2020 and 2021 were 4.54 per 100,000 and 5.59 per 100,000, respectively, markedly lower than the pre-pandemic rates observed in 2018 (16.01 per 100,000) and 2019 (18.74 per 100,000), suggesting that public health measures implemented during the COVID-19 pandemic, such as mask-wearing and social distancing, were highly effective in decreasing the transmission of respiratory infections, including influenza.

In terms of seasonal distribution, surveillance data (Figure 2 A) revealed a notable dynamic interplay between influenza A and B viruses over the study period. In 2018, influenza A H1N1 was the dominant strain. However, by 2019, cases of the influenza B Victoria lineage gradually increased, surpassing H1N1 during the summer to become the predominant strain. In 2020, there was a sharp decline in influenza cases, with influenza A H3N2 emerging as the leading strain. The dominance of influenza B Victoria returned in 2021, only to be overtaken by a resurgence of influenza A H3N2 in the summer of 2022. By the winter of 2023, both influenza A H3N2 and influenza B Victoria exhibited concurrent surges, highlighting a dual epidemic. Interestingly, contrary to the conventional understanding that influenza A typically drives larger epidemics, our findings show that influenza B accounted for 52.18%, 100%, and 64.76% of total influenza cases in 2019, 2021, and 2022, respectively.

Temporal and demographic distribution of Influenza cases in Putian city from

2018 to 2023. This figure presents the temporal and demographic characteristics of influenza cases in Putian City over the period from 2018 to 2023. (A) The heatmap displays the monthly incidence trends of various influenza subtypes from 2018 to 2023. The x-axis represents the years, while the y-axis corresponds to the months, with color intensity indicating the number of influenza cases in each time period. The heatmap highlights the seasonal pattern of influenza, particularly the peaks during winter and spring. Notably, the trends of various influenza subtypes differ across years, reflecting distinct epidemiological dynamics. The color blocks above the heatmap denote the corresponding influenza subtypes (Influenza A: H1N1, H3N2; Influenza B: Victoria, Yamagata). The observed interruption in the influenza cases during 2020-2021 may be associated with the global impact of the COVID-19 pandemic, which likely altered typical influenza transmission patterns. (B) The bar plot illustrates the distribution of influenza cases across different age groups and genders. The x-axis represents the number of cases, and the y-axis indicates age intervals. The plot is bifurcated by gender, with male cases on the left and female cases on the right. This visualization underscores the differential impact of influenza on various demographic groups, identifying children (0-10 years) and the elderly (60 years and above) as particularly vulnerable populations.

In the demographic analysis (Figure 2 B), among the 3155 laboratory-confirmed influenza cases, males (1811 cases; 57.40%) outnumbered females (1344 cases; 42.60%), with a male-to-female ratio of 1.35:1. Significant gender differences were observed across age groups (χ² = 37.36, P < 0.001), with males constituting 60.68% of cases in individuals under 18 years, whereas females were more prevalent among adults (51.53%). A statistically significant difference was noted in the gender distribution between influenza A and B (χ² = 6.64, P = 0.01), with males comprising a higher proportion (59.86%) of influenza B cases. No statistically significant gender differences were found in the distribution of influenza subtypes (χ² = 2.43, P = 0.49). Additionally, children and adolescents under 18 years of age were identified as the most at-risk population, comprising 73.12% of all cases. Influenza B was more prevalent in this age group (50.33% vs. 49.67%), while influenza A predominated in the adult population (63.56% vs. 36.44%). These findings suggest that susceptibility to different influenza virus types may vary across age groups.

Overall, our findings suggest that in subtropical regions, the epidemic intensity of Influenza B may rival or even exceed that of Influenza A. Furthermore, the widespread implementation of non-pharmaceutical interventions (e.g., mask-wearing) during the COVID-19 pandemic significantly curtailed the transmission of respiratory viruses, including influenza. Continuous monitoring of Influenza A (H1N1, H3N2) and Influenza B Victoria strains is essential, alongside heightened surveillance and preventive measures targeting children and adolescents, particularly in densely populated settings such as schools. These insights are crucial for informing vaccine strain selection, vaccination strategies, and targeted public health interventions.

Temporal trends in meteorological factors and influenza incidence

Between 2018 and 2023, influenza incidence in Putian city exhibited distinct seasonal patterns. The majority of cases occurred between December and April of the following year, accounting for over 60% of the annual cases, with a secondary peak observed from May to September. Concurrently, meteorological factors demonstrated clear cyclical variations (Figure 3), with detailed descriptive statistics provided in Supplementary Table S1. During the summer months (July–August), the average daily temperature peaked above 32°C, while the winter months (December to February) experienced the lowest temperatures, dropping below 10°C (Figure 3 A). This period was also marked by a significant increase in diurnal temperature variation, with the maximum daily range exceeding 13°C (Figure 3 E). The annual average relative humidity was approximately 76.10%, with the highest levels observed during the monsoon season from April to June, reaching or exceeding 97% (Figure 3 B). Precipitation showed a pronounced seasonal distribution, peaking from June to September, while the period from November to February was characterized by markedly reduced rainfall (Figure 3 C). Solar radiation (Figure 3 D) and UV index (Figure 3 F) both reached their annual maxima between July and September. High wind speeds were frequently observed in Putian city from July to September during the study period, which may correlate with the increased frequency of typhoons during this period (Figure 3 G). Throughout the year, the easterly wind direction was predominant, with infections in every month (Figure 3 H).

Time-series analysis of relationship between meteorological factors and Influenza cases in Putian city from

2018 to 2023. This figure illustrates the time-series relationship between key meteorological factors and influenza cases in Putian city over the period from 2018 to 2023. Panels (A) to (H) display the temporal fluctuations of various meteorological parameters, including average temperature, humidity, precipitation, solar radiation, daily temperature range, ultraviolet index, wind speed, and wind direction, respectively, in relation to the number of influenza cases. The left y-axis of each subplot represents the observed values of meteorological factor (indicated by the yellow line), while the right y-axis shows the number of influenza cases, with the red line representing Influenza A cases and the blue line representing Influenza B cases. The x-axis represents the time in years, covering five consecutive influenza seasons. These plots elucidate potential patterns and lag effects between climatic conditions and influenza incidence.

A comparative analysis of the temporal patterns of influenza cases and meteorological factors revealed several noteworthy correlations. Firstly, peaks in influenza A and B cases were predominantly observed during January to March, characterized by lower average temperatures and larger diurnal temperature ranges (Figures 3 A and E). Interestingly, influenza B occasionally exhibited summer outbreaks, indicating a potentially greater adaptability to higher temperatures. Secondly, extreme rainfall events and periods of high humidity were often temporally aligned with short-term surges in influenza A and B cases (Figures 3 B and C). For instance, the peak in influenza B cases in June 2022 coincided with an episode of extreme rainfall. Moreover, the majority of influenza outbreaks occurred during periods of reduced solar radiation and lower UV indices (Figures 3 D and F). Variations in wind speed were not significantly correlated with influenza incidence in our analysis (Figure 3 G). During periods of easterly winds, there was a noticeable decrease in the incidence of influenza A and B cases in Putian city (Figure 3 H). Our findings indicate that the incidence of influenza in the subtropical city of Putian was closely associated with various meteorological conditions.

DLNM analysis for the cumulative and lagged effects of multiple meteorological factors on influenza incidence risks

To explore the relationship between meteorological factors and influenza incidence, we initially quantified calculated the linear associations between various meteorological variables and the incidence of influenza A, influenza B, and ILI cases. This step was essential to identify whether a simple linear relationship could adequately explain the observed patterns in influenza incidence. To do this, we calculated the Pearson correlation coefficient matrix for these variables, providing a preliminary assessment of the strength and direction of the linear relationships. The complete correlation matrix of meteorological variables is presented in Supplementary Figure S2. Although certain meteorological factors exhibited statistically significant linear correlations with influenza incidence, the correlation coefficients were generally modest. This indicated that a linear model might not fully capture the complexity of the relationship between meteorological conditions and influenza incidence. Consequently, this finding suggested the presence of non-linear interactions or lagged effects that could not be adequately represented by linear associations alone. Therefore, we proceeded with the use of DLNM to further investigate the non-linear and lagged associations between meteorological factors and influenza risk.

Using median values of each meteorological factor as reference points, Figure 4 illustrates the cumulative effects of various meteorological factors on the risk of influenza A and B influenza-specific DLNM model, the combination of dfx for the natural spline modeling the infections over a 15-day lag period, as estimated using univariate DLNM models. For each exposure-response relationship and dftime for the natural spline modeling the long-term trend that yielded the minimum QAIC was selected as the optimal configuration and applied to the final model. Detailed QAIC values for all tested combinations are provided in Supplementary Table S2. The DLNM results reveal significant non-linear relationships between these factors and cumulative risks of influenza incidence, with distinct effect curves for influenza A and B (Figures 4 A and B). Overall, the findings indicate that high temperatures and high humidity levels were associated with an increased cumulative risk of influenza A, whereas low temperatures and high precipitation levels correlated with a heightened cumulative risk of influenza B.

Cumulative risk of Influenza associated with meteorological factors based on a 15-day Lag analysis using DLNM.

This figure illustrates the non-linear influence of various meteorological factors on the 15-day cumulative risk of influenza A and B subtypes, as determined through Distributed Lag Non-linear Models (DLNM). (A) Non-linear relationship between meteorological factors and cumulative risk of influenza A over a 15-day lag period. These plots depict the impact of temperature, humidity, precipitation, and other weather-related factors on the cumulative relative risk of influenza A. The x-axis of each subplot represents the observed values of a specific meteorological factor, while the y-axis indicates the cumulative relative risk. The shape of the curve reflects the non-linear trend of risk variation with changes in meteorological factors, and the shaded area denotes the 95% confidence interval. (B) Non-linear relationship between meteorological factors and cumulative risk of influenza B over a 15-day lag period. Similarly, these plots show the effect of various meteorological factors on the cumulative relative risk of influenza B. Each subplot highlights the non-linear association between the cumulative risk and specific weather conditions, providing insights into how these factors may differently influence the transmission of influenza B compared to influenza A.

Regarding temperature effects, using 24.1°C as the reference, the cumulative risk for influenza A initially increased as the average temperature rose from 8°C to 21°C, peaking at 15.5°C with a relative risk (RR) of 5.23 (95% CI: 2.15-12.73). However, when the average temperature exceeded 26°C, the cumulative risk escalated rapidly, reaching its maximum at 30°C (RR = 97.73, 95% CI: 15.24-626.71), before declining sharply as temperatures continued to rise. In contrast, the cumulative risk for influenza B decreased with rising average temperatures, peaking at 8°C (RR = 40.41, 95% CI: 11.47-142.31), followed by a gradual decline to a stable level.

In terms of diurnal temperature range, with an 8°C difference as the reference, the cumulative risk for influenza A initially increased and then decreased with increasing temperature differences, reaching its lowest risk at a diurnal temperature range of only 1°C (RR = 0.06, 95% CI: 0.00-0.98). For influenza B, the cumulative risk displayed a double-peaked pattern, with peaks at 5°C (RR = 21.25, 95% CI: 5.70-79.26) and 11°C (RR = 20.80, 95% CI: 5.34-81.00).

Regarding humidity, using 75.8% relative humidity as the reference, the cumulative risk for influenza A gradually declined within the 38%-52% humidity range, peaking at 38% (RR = 9.94, 95% CI: 1.05-93.83). Between 63% and 75% humidity, the risk initially increased before declining again, reaching its maximum at 71% (RR = 4.42, 95% CI: 2.15-9.14). The risk continued to decrease to its lowest point at 78% humidity (RR = 0.45, 95% CI: 0.24-0.82). When humidity rose further, the cumulative risk sharply increased again, reaching a peak at 87% (RR = 16.27, 95% CI: 6.39-41.42) before declining. In contrast, the cumulative risk for influenza B generally decreased with increasing humidity, although this trend was not statistically significant.

Regarding daily precipitation, using 0 mm as the reference, the cumulative risk for influenza A first rose and then fell with increasing rainfall, becoming statistically significant when precipitation exceeded 30 mm (RR = 0.19, 95% CI: 0.05-0.83), and thereafter decreasing gradually as precipitation continued to rise. Conversely, the cumulative risk for influenza B increased steadily with rising precipitation.

For solar radiation, with 205.3 W/m² as the reference, the cumulative risk for influenza A followed a rise-then-fall pattern within the 220-275 W/m² range, peaking at 260 W/m² (RR = 6.73, 95% CI: 1.41-32.22), before stabilizing. The cumulative risk for influenza B remained generally low, with significant reductions observed at 115 W/m² (RR = 0.14, 95% CI: 0.06-0.30) and 285 W/m² (RR = 0.17, 95% CI: 0.06-0.45).

For the UV index, with an index of 7 as the reference, the cumulative risk for influenza A exhibited two non-significant fluctuations, while the risk for influenza B generally declined. The cumulative risk for influenza B was highest at a UV index of 0 (RR = 80.78, 95% CI: 6.26-1042.87), but significantly decreased when the UV index exceeded 3. The impact of wind direction and speed on influenza risk appeared to be relatively minor, with only certain specific intervals showing a non-linear association with the risk of influenza A or B. Notably, high wind speeds (16 km/h, RR = 6.94, 95% CI: 1.98-24.38) increased the cumulative risk for influenza A, while under similar conditions, the risk for influenza B decreased, although these changes were not sufficient to alter the overall trend of influenza B prevalence.

Overall, the results from the DLNM analysis highlight the complex and differential impacts of various meteorological factors on the cumulative risk of influenza A and B. These findings underscore the importance of tailoring influenza prevention strategies to specific meteorological conditions and virus subtypes. A deeper understanding of the roles of temperature, humidity, and precipitation in influenza transmission could inform more effective public health interventions and reduce the risk of influenza outbreaks.

DLNM analysis for influenza incidence risks under extreme meteorological conditions

To further investigate the dynamic temporal variations in influenza incidence risks under extreme meteorological conditions, we continuously employed DLNM models to estimate the influenza A and B risks associated with the 5th and 95th percentiles of various meteorological factors across different lag times (Figures 5 A and B). The results reveal that most meteorological factors exhibited significant lagged effects on influenza risk under extreme conditions, with distinct lag patterns observed across different meteorological factors and influenza types. For instance, under extremely low temperatures (14°C), the cumulative risk for influenza A peaked at a lag of 8 days (RR = 1.18, 95% CI: 1.06-1.30), followed by a gradual decline. Similarly, influenza B showed a peak at lag day 8 (RR = 1.35, 95% CI: 1.20-1.52) before gradually declining. In contrast, extremely high temperatures (30.5°C) influenced only influenza A, with a cumulative risk peak at lag day 8 (RR=1.54, 95% CI: 1.27-1.86), followed by a gradual decrease, and a rise again at lag day 14.

Lagged effects of extreme meteorological conditions on Influenza A and B transmission based on DLNM.

This figure illustrates the lagged risk effects of extreme meteorological conditions on influenza A and B, analyzed using a Distributed Lag Non-linear Model (DLNM). The analysis compares the relative risks associated with extreme values of meteorological variables (e.g., extreme temperatures, humidity, and precipitation) across different lag times, highlighting the heterogeneous impacts of these variables on the transmission dynamics of different influenza types. Panel (A) depicts the lagged risk effects for influenza A under extreme meteorological conditions, while Panel (B) shows the corresponding effects for influenza B. In each subfigure, the x-axis represents the lag days, and the y-axis shows the relative risk. The orange and blue curves represent the relative risks under extreme high and low conditions of each meteorological factor, respectively. The shaded areas around the curves denote the 95% confidence intervals, indicating the uncertainty range of the estimates. This figure provides critical insights into the role of the temporal lag effects and offers valuable guidance for the development of preventive strategies against influenza outbreaks under extreme weather conditions. For Influenza A, extremely high conditions are indicated by solid orange lines and low conditions by solid blue lines. For Influenza B, extremely high conditions use solid yellow lines and low conditions use solid green lines.

The impact of diurnal temperature ranges also varied over time. Using an 8°C difference as the reference, the cumulative risks for both influenza A and B decreased with smaller daily temperature differences (3°C), reaching their lowest points at lag day 14 (RR = 0.78, 95% CI: 0.70-0.87) and lag day 12 (RR=0.89, 95% CI: 0.81-0.98), respectively. However, when the daily temperature difference reached 13°C, the cumulative risks for both influenza A and B peaked at lag day 7 (RR = 1.11, 95% CI: 1.00-1.22 for influenza A; RR = 1.18, 95% CI: 1.07-1.31 for influenza B), suggesting that significant fluctuations in diurnal temperature may trigger influenza outbreaks within a week.

For humidity, relative to the reference of 75.8%, low humidity (58%) significantly increased the risk of influenza A on lag day 1 (RR = 1.16, 95% CI: 1.04-1.30), while extremely high humidity (94%) led to a gradual decrease in cumulative risk, reaching its lowest point at lag day 14 (RR = 0.87, 95% CI: 0.79-0.97). For influenza B, the cumulative risk steadily declined under low humidity, reaching its minimum at lag day 6 (RR = 0.87, 95% CI: 0.78-0.97), whereas high humidity only caused a significant decline within the first 4 days, with no substantial changes thereafter.

Similarly, extreme precipitation events had noteworthy impacts. For a 5mm rainfall event, the cumulative risk for influenza A peaked at lag day 7 (RR = 1.10, 95% CI: 1.02-1.19), while influenza B showed a decrease at lag day 2 (RR = 0.80, 95% CI: 0.72-0.88), followed by a gradual increase. When precipitation reached 25mm, influenza A risk peaked at lag day 3 (RR = 1.16, 95% CI: 1.00-1.35), and reached its lowest at lag day 8 (RR = 0.80, 95% CI: 0.70-0.92); For influenza B, the peak risk occurred at lag day 7 (RR = 1.24, 95% CI: 1.13-1.39). These observations suggest that extreme precipitation events may significantly increase the risk of influenza outbreaks within one week.

The lagged effects of solar radiation, the UV index, wind speed, and wind direction were less pronounced, particularly for influenza A (Figures 5 A and B). The impact of these factors on influenza incidence was generally weaker, with some specific intervals showing minor non-linear associations. Overall, findings from the DLNM analysis demonstrate the complex lagged effects of extreme meteorological conditions on influenza incidence dynamics, with significant differences in the temporal responses of different influenza types to the same meteorological factor. These findings further elucidate the dynamic complexity of influenza’s meteorological effects, providing important insights that can inform more effective public health interventions and improve influenza early warning systems.

Influenza prediction using LSTM network

Building upon the previous analysis of the impact of meteorological factors on influenza dynamics, this study further aimed to construct a multifactorial influenza prediction network utilizing LSTM. To optimize the LSTM network, we employed Bayesian optimization through the Hyperopt framework, which allowed us to fine-tune key hyperparameters. This optimization process led to the identification of the optimal configurations for influenza A and B prediction algorithm (Table 1).

Optimal LSTM hyperparameters for predicting influenza A and B incidence.

LSTM: Long short-term memory. Hyperparameters were optimized through grid search. This table presents the optimal LSTM network structure (number of layers and neurons), activation function, batch size, learning rate, and number of epochs for predicting both influenza A and B infections. The number of neurons refers to the count of LSTM cells in each layer; detailed LSTM architecture is provided in Supplementary Information Additional file 1.

For influenza A, the optimal LSTM network consisted of three hidden layers with 92, 128, and 8 neurons, respectively, a learning rate of 0.0001, and a batch size of 32. In contrast, the influenza B LSTM network had two hidden layers with 64 and 256 neurons, a learning rate of 0.001, and a batch size of 128. Leveraging these optimal hyperparameters, alongside a combination of meteorological data, COVID-19 pandemic variables, and mask-wearing policies, we trained the LSTM network using historical data from 2018 to 2022. A 10% of the data were randomly selected as the validation set, while data from 2023 were subsequently used to test the network’s performance. The loss curves plotted during training revealed the convergence behavior of the LSTM. As shown in Figures 6 C and D, the training loss and validation loss for both the influenza A and B LSTM networks decreased progressively with increasing epochs, initially displaying a steep decline followed by stabilization. This pattern indicates that the algorithms effectively learned the characteristics of the data and converged to a stable solution. Furthermore, the close alignment between training and validation loss curves suggests that both LSTM networks for influenza A and B performed well and exhibited minimal overfitting, demonstrating strong learning capabilities and generalizability. Following adequate training, the LSTM networks were employed to predict influenza A and B cases in Putian city for the year 2023. The results revealed a strong match between the predicted and observed values (Figures 6 A and B). The constructed LSTM networks accurately captured the actual trends in influenza incidence, including the two outbreak peaks of influenza A during February-March and November-December of 2023, as well as the peak of influenza B in November-December, demonstrating good predictive performance. The predictive performance of the LSTM networks was quantitatively assessed using metrics such as MAE, RMSE, MAPE, and SMAPE. For influenza A, the MAE was 1.71, RMSE was 3.69, MAPE was 0.90, and SMAPE was 1.19; for influenza B, the MAE was 0.38, RMSE was 0.84, MAPE was 0.53, and SMAPE was 1.28 (Table 2). These findings underscore the precision of the LSTM in characterizing and predicting influenza trends. To evaluate the impact of COVID-19-related variables, we conducted a sensitivity analysis by excluding these variables and retraining the LSTM networks with identical hyperparameters. For influenza A, the network without COVID-19 variables yielded an MAE of 1.82, RMSE of 3.69, MAPE of 0.88, and SMAPE of 1.26; for influenza B, the metrics were MAE of 1.68, RMSE of 1.83, MAPE of 0.70, and SMAPE of 1.74. These results indicate a decline in predictive accuracy compared to the original networks, highlighting the importance of COVID-19-related variables in enhancing performance.

Performance of the LSTM network in predicting Influenza A and B, along with SHAP analysis results.

This figure comprehensively illustrates the performance of the long-short term memory (LSTM) network in predicting Influenza A and B cases, as well as the interpretability of the network using SHAP (SHapley Additive exPlanations) values. (C) and (D) Training loss curves: These panels show the variation in loss values during the training of the LSTM network. The x-axis represents the number of training epochs, while the y-axis indicates the loss values. The smooth decline and stabilization of the loss curves indicate effective learning by the network, with no significant signs of overfitting. This is a crucial measure of the network’s stability and effectiveness throughout the training process. (A) and (B) Comparison of predictions and observations: These panels compare the LSTM network’s predicted number of influenza cases (red line) with the observed cases (blue line) over time. The x-axis represents the date, and the y-axis reflects the number of cases. This comparison visually demonstrates the network’s predictive accuracy across different time points, particularly during peak and low influenza periods. Such visualization is instrumental in quantitatively assessing the network’s performance across various phases of influenza outbreaks, identifying potential systematic biases. (E) and (F) Feature importance ranking: These bar charts rank the importance of various input features based on SHAP values. The x-axis denotes the mean SHAP value, while the y-axis lists the variables. The charts reveal which meteorological and other factors are most critical in the model’s predictions, such as the number of Influenza A and B cases, the day of the week, and the ultraviolet index. This ranking allows researchers to identify key drivers and explore the underlying mechanisms by which these factors influence influenza spread. (G) and (H) SHAP Value Distributions: These scatter plots illustrate the distribution of SHAP values across different features. The x-axis shows the SHAP value, and the y-axis lists the features. The color of the points reflects the feature values (red indicating high values, blue indicating low values), while the magnitude of the SHAP values represents the impact of each feature on the prediction outcome. This graph provides a visual understanding of how and to what extent different features influence the influenza predictions, offering critical insights into the network’s decision-making process. (I) and (J) External validation: These panels show a comparison between the number of influenza cases predicted by the LSTM network (red line) and the observed cases (blue line) over time. The date is plotted on the x-axis and the number of cases is plotted on the y-axis.

LSTM network performance evaluation metrics for predicting influenza A and B incidence.

Note: MAE (Mean Absolute Error), RMSE (Root Mean Square Error), MAPE (Mean Absolute Percentage Error), and SMAPE (Symmetric Mean Absolute Percentage Error) were used to assess Long short-term memory (LSTM) network prediction accuracy. Lower values indicate better predictive performance.

The external validation using independent 2023 influenza surveillance and corresponding meteorological data from Sanming confirmed the LSTM networks’ potential generalizability. For influenza A, the network achieved an MAE of 1.29, RMSE of 2.24, MAPE of 0.85, and SMAPE of 0.61; for influenza B, the metrics were MAE of 0.99, RMSE of 1.23, MAPE of 0.68, and SMAPE of 0.45. These results, illustrated in Figures 6 I (influenza A) and J (influenza B), demonstrate the networks’ capability to accurately predict influenza trends in a new region, with errors consistently low across both subtypes, underscoring their robustness beyond the training and internal validation data from Putian. To benchmark the performance of the validated LSTM networks, we constructed the traditional ARIMA models using the same training and validation datasets and evaluated using the same metrics. For influenza A, the ARIMA model yielded an MAE of 2.03, RMSE of 4.99, MAPE of 1.00, and SMAPE of 2.00. For influenza B, the ARIMA model produced an MAE of 1.02, RMSE of 1.37, MAPE of 0.48, and SMAPE of 0.75. Overall, the LSTM model demonstrated superior performance compared to the ARIMA model, particularly for influenza A prediction where all metrics showed improvement. For influenza B, the LSTM model outperformed ARIMA in terms of MAE and RMSE, though ARIMA showed slightly better MAPE and SMAPE values.

To further elucidate the predictive logic behind the LSTM, we conducted a SHAP analysis. The results indicated that historical influenza incidence data were the most influential predictor for both influenza A and B, followed by DOW activity and solar radiation or UV index (Figures 6 E and F). This suggests that the LSTM autonomously learned the autoregressive characteristics of influenza incidence and the effects of meteorological factors. Specifically, SHAP visualizations revealed that higher historical incidence levels, dates closer to weekends, and lower solar radiation (or UV) intensities were associated with higher predicted new cases (Figures 6 G and H). These observations are consistent with established epidemiological insights, reinforcing the interpretability of the networks and demonstrating that the LSTM effectively identified key drivers of influenza outbreaks.

Discussion

This study, an ecological investigation in nature, comprehensively analysed the population-level relationship between influenza incidence and key meteorological factors in Putian, a subtropical coastal city in southeastern China, using time-series influenza surveillance and meteorological data from 2018 to 2023. The primary aim was to develop a predictive framework for public health surveillance rather than to infer individual-level causality. By employing epidemiological and data mining methods, including descriptive statistics, DLNM models, and LSTM neural networks, we have developed a novel approach for assessing and predicting meteorological effects on influenza incidence. This work not only expands the traditional perspective of environmental epidemiology but also offers actionable insights and tools for effective influenza control in subtropical regions. Our findings reveal complex, virus-specific dynamics that may inform future climate-based surveillance strategies for influenza viruses.

Unlike temperate regions, subtropical areas exhibit year-round influenza incidence, characterized by considerable epidemiological heterogeneity. For instance, a Hong Kong-based study indicates that the peak incidence of local influenza A outbreak typically occurs between January and April, whereas influenza B peaks from May to July (Kim, Park, and Lee 2020). Similarly, a study from Okinawa, Japan, has identified analogous trends, with influenza A peaking from December to March and influenza B from April to June (Iha et al. 2016). Our six-year influenza surveillance data show that both viruses circulate year-round in Putian, with notable peaks observed during Winter-Spring and Summer. We observed that influenza A predominantly peaked from December to April, while influenza B outbreaks occurred before and after this peak, suggesting meteorological factors like extreme cold or notable temperature fluctuations may trigger influenza epidemics. This variability in influenza incidence dynamics underscores the influence of meteorological factors on influenza transmission patterns (Dave and Lee 2019).Our analysis of age and gender distribution shows a higher incidence among individuals under 18 years (73.12%), with male predominance in childhood and female predominance in adulthood. This observation aligns with literature suggesting increased infection risk among school-aged children and adolescents due to more frequent close contact in group settings (Jackson et al. 2013). Notably, the significant decrease in influenza cases observed in 2020 likely reflects the impact of strict isolation measures and mask-wearing implemented during the COVID-19 pandemic, which effectively reduced direct human contact. Our findings highlight the heterogeneity of influenza incidence across subtropical regions.

The association between meteorological factors and influenza incidence is a critical topic in environmental epidemiology. Previous studies indicate that variations in average temperature, relative humidity, rainfall, diurnal temperature range, and wind speed significant influence influenza transmission through mechanisms affecting virus survival and transmissibility, host susceptibility and immunity, and population contact behaviors (Polozov et al. 2008; Sooryanarain and Elankumaran 2015). Our analysis of Pearson correlation coefficients, as shown in Supplementary Figure S2, revealed weak linear correlation (all below 0.3) between influenza morbidity and meteorological factors, highlighting that linear models inadequately capture the complexities of this relationship. Utilizing the DLNM’s cross-basis functions, we identified significant non-linear dose-response relationships and lag effects between influenza incidence risk and meteorological variables including average temperature, relative humidity, rainfall, and diurnal temperature range. These findings, representing an investigation of climate-influenza dynamics from a novel computational perspective, address a critical knowledge gap in the field, notwithstanding their concordance with previous studies.

We first observed significant differences in the non-linear associations between cumulative influenza incidence risk across virus types. For instance, temperatures between 26°C and 31.5°C favored influenza A, while the risk for influenza B incidence increased at temperatures below 20°C, contrasting with findings from other subtropical cities, such as Shanghai, where the highest cumulative risk for influenza A occurred at 1.4°C and 25.8°C, and for influenza B at 1.4°C (Zhang et al. 2020). Proposed hypotheses include enhanced virus stability at lower temperatures (Lowen and Steel 2014), impaired respiratory defences due to cold air inhalation (Eccles 2002), and increased indoor confinement during colder weather (Liao, Chang, and Liang 2005), all contributing to higher transmission rates. Conversely, elevated temperatures also correlate with increased influenza incidence due to reduced outdoor activity and reliance on air conditioning (Lofgren et al. 2007). The divergent responses of influenza A and B to meteorological factors may reflect virological and immunological differences. Influenza A’s stability in high temperatures and humidity could stem from heat-resistant glycoproteins, while influenza B may favor cooler, wetter conditions due to less tolerant structures (Marr et al. 2019). Immunologically, high temperatures impair adaptive immunity, potentially benefiting influenza A (Moriyama and Ichinohe 2019), whereas cooler settings preserve innate responses, facilitating control of influenza B (Lofgren et al. 2007). However, this complex interaction may also suppress certain immune components like immunoglobulin M, which could inadvertently favor influenza B’s transmission (Xu, Hu, and Tian 2017).

High relative humidity enhances the cumulative risk of influenza A by promoting virus transmission, corroborated by a study in eight cities around Chengdu (Zhou et al. 2022). Cellular models indicate that humidity influences respiratory droplets evaporation, altering solute concentration and influenza A virus viability (Yang, Elankumaran, and Marr 2012). Research in guinea pigs shows that lower humidity (20% and 35%) facilitate greater influenza A transmission compared to moderate (50%) or humid (80%) (Lowen and Steel 2014). Conversely, low relative humidity increases the infection risk of influenza B (Wu et al. 2021), a phenomenon not observed in our study, possibly due to geographical variability in transmission dynamics (Bloom-Feshbach et al. 2013; Tamerius et al. 2013). Influenza incidence in Putian typically peaked during the rainy season, with increased rainfall correlating positively with cumulative influenza B risk, aligning with findings from Arizona, where influenza risk increased during the rainy season (Soebiyanto, Adimi, and Kiang 2010). Our findings indicate that cumulative influenza A risk decreased with rainfall exceeding 30 mm, suggesting moderate rainfall may elevate incidence while heavy rainfall could inhibit transmission, potentially due to altered social behaviors, such as increased indoor gatherings.

Additionally, peaks in influenza A cases corresponded with declines in solar radiation and UV index values, suggesting that insufficient sunlight exposure may promote influenza A virus transmission, a pattern not observed with influenza B. Literature suggests that adequate solar radiation improves vitamin D absorption, modulates immune responses, and reduces airborne influenza virus viability (Cannell et al. 2006; Sagripanti and Lytle 2007). In our DLNM analysis, the cumulative influenza A risk gradually increased with the UV index and solar radiation, though not statistically significant, while the cumulative influenza B risk decreased. The contradictory effects of UV and solar radiation on influenza A and B from a Japanese focusing on vitamin D supplementation in children (Urashima et al. 2010) necessitate further investigation into the interplay between UV exposure, solar radiation, and influenza, as do the impact of diurnal temperature range and wind speed on influenza incidence. A study reported that the diurnal temperature range may serve as a potential risk factor for influenza, with significant fluctuations between 11°C and 14°C greatly increasing infection risk due to alterations in respiratory epithelial barrier function, inflammatory response modulation, and mucus production (Park et al. 2020). Our analysis corroborated that broader diurnal temperature ranges heightened the cumulative risks for both influenza A and B, potentially due to increased susceptibility under such conditions (Cheng et al. 2014). Wind speed did not significantly impact influenza, aligning with existing literature that deems it a non-critical meteorological variable in cumulative risk assessments (Ianevski et al. 2019). Future studies incorporating viral genomic sequencing are valuable to determine whether these differential responses reflect evolutionary adaptations.

The lag effects of meteorological factors on influenza incidence emerged as another key finding. DLNM analysis revealed varying lag effects within 0-15 days post-extreme meteorological conditions. For example, low average temperatures of 14°C and high averages of 30.5°C showed lag effects on influenza incidence risk lasting 6-15 days. Additionally, low temperatures recorded 8 days prior to patient visits significantly increased influenza infection risk (Tsuchihashi et al. 2011). Conversely, an elevated average temperature of 28°C exhibited a 7.3-day lag effect on influenza B (Wang et al. 2023). Extreme rainfall amounts of 5 mm and 25 mm also produced lag effects on both influenza A and B, consistent with a meta-analysis indicating that extreme rainfall in various subtropical regions results in lag effects on infectious disease incidences, including influenza (Aune, Davis, and Smith 2021). Our data further reveal that a diurnal temperature range of 13°C resulted in a 5 to 9-day lag risk for influenza B, while a wind speed of 26 km/h correlated with lag risk for influenza A, contradicting findings by Qi et al., who found no lag effect of wind speed on influenza (Qi et al. 2021). Overall, our results suggest that the 1-2 weeks following extreme weather events may be critical windows for influenza prevention and control, necessitating heightened vigilance for potential outbreaks. However, specific meteorological warning thresholds and time windows require optimization based on local conditions. This finding underscores the need for enhanced meteorological criteria in current influenza surveillance and early warning system, emphasizing that monitoring influenza incidence alone is insufficient without considering local meteorological anomalies as early warning indicators. Noteworthily, the 15-day lag, based on previous studies, may not optimize all meteorological effects universally, indicating that future research should systematically vary lag periods to refine exposure-response relationships.

The DLNM effectively isolates the individual effects of meteorological factors on influenza incidence, but does not consider potential interaction effects, such as the combined influence of high temperature and humidity on transmission risk. Although this focus on independent effects enhances model interpretability and mitigates multicollinearity risks, it limits our ability to capture synergistic effects. To address this limitation comprehensively, we developed the LSTM prediction algorithm that captures non-linear relationships and interactions through its recurrent architecture, especially for assessing and predicting meteorological effects on influenza in subtropical regions. The LSTM was selected for its capacity to automatically learn long-term dependencies in time series data through its gating mechanism, thus integrating historical influenza incidence with future predictions, overcoming limitations of traditional autoregressive models. The LSTM network incorporated significant meteorological factors identified in the DLNM analysis and innovatively included COVID-19 pandemic influence on influenza incidence, such as daily new COVID-19 cases, mask-wearing strictness index, and DOW. To the best of our knowledge, this is the first instance of simultaneously integrating these factors into an influenza LSTM prediction network. Compared to a previously similar study (Zhu et al. 2022), our approach expands the LSTM input dimension and enhances its applicability and authenticity in the complex real-world environment of concurrent respiratory pathogen epidemics. While the previously reported LSTM algorithm demonstrated feasibility in time-series prediction of influenza incidence, it primarily focused on a limited set of meteorological variables, neglecting other potentially significant factors such as the UV index and wind variations, which we incorporated into our LSTM networks for improved prediction accuracy.

Compared to the previous study that implicitly relied on standardized preset parameters of universal LSTM algorithm, we conducted more refined adjustments and optimizations. Hyperparameters, crucial for LSTM training and prediction precision (Li, Zhang, and Ma 2023), were optimized using the Hyperopt framework, based on Tree-based Parzen Estimators (TPE), a Bayesian optimization strategy that significantly outperforms traditional grid search methods (Hanifi et al. 2022). Previous comparisons focused on Scikit-opt, Hyperopt, and Optuna indicate Hyperopt’s superior performance in LSTM applications (Hanifi, Cammarono, and Zare-Behtash 2024). This study is the first to successfully apply Hyperopt for hyperparameter tuning in the LSTM influenza prediction network, with key hyperparameters, including learning rate, number of hidden layers, and number of neurons, meticulously tuned within established ranges. In 2012, Bengio identified the learning rate as a critical hyperparameters, recommending a range of 10-6 to 1 (Bengio 2012). Subsequent research confirmed its significance, demonstrating that learning rates of 0.0001, 0.001, or 0.01, resulted in superior convergence and performance of LSTM networks (Du et al. 2023). Consequently, we established a learning rate set of {0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1} to optimize hyperparameter configurations, acknowledging the potential increase in computational cost and time consumption. Theoretically, increasing the number of hidden layers and neurons enhances the algorithm’s fitting ability, yet excessive complexity risks overfitting and impedes convergence. Panchal’s findings suggest that 1 to 2 hidden layers suffice for most applications, with 3 hidden layers considered optimal for accuracy (Gaurang Panchal 2011). Thus, we limited our exploration to the LSTM configurations with 1-3 hidden layers, ultimately employing 2 or 3 hidden layers with neuron counts as power of 2, achieving commendable fitting results. We evaluated various activation functions, including newer options such as softplus, Elu, and swish, alongside traditional functions like sigmoid, Relu, and TanH, recognizing that no single activation function universally optimizes network performance (Jagtap and Karniadakis 2022). We finally included the six activation functions for screening during optimization.

The optimized LSTM algorithm demonstrated strong predictive performance, with loss curves for both influenza prediction networks exhibiting favourable convergence trends, consistent with previous research (Du et al. 2023). The internal evaluation metrics yielded an MAE of 1.71 and RMSE of 3.69 for influenza A, while the influenza B prediction network achieved an MAE of 0.38 and RMSE of 0.84. External validation with Sanming CDC data confirmed the generalizability of findings, yielding MAE values of 1.29 for influenza A and 0.99 for influenza B, indicating predictive capability across different geographic contexts within the subtropical climate zone. This suggests that the meteorological associations and prediction framework developed in this study may be applicable to other subtropical regions with similar climatic patterns. However, generalizability to subtropical regions in other continents (e.g., Southeast Asia, East Australia) necessitates further validation due to varying socioeconomic factors and healthcare infrastructure. Comparative analysis demonstrated that LSTM consistently outperformed ARIMA in predicting influenza A, while for influenza B, LSTM excelled in MAE (0.38 vs. 1.02) and RMSE (0.84 vs. 1.37) but slightly underperformed in MAPE (0.53 vs. 0.48) and SMAPE (1.28 vs. 0.75) due to the predominance of zero cases throughout much of 2023, a scenario where ARIMA tends to predict consistently low values. The superior MAE and RMSE values underscore LSTM’s efficacy in predicting epidemic peaks, essential for timely public health interventions. In addition, the comparative edge over previously reported ARIMA models (Li et al. 2024) and similar LSTM influenza prediction models (Zhu et al. 2022), positions our approach as a significant advancement in predictive modeling for climate-sensitive diseases. Although minor discrepancies were noted in certain instances (e.g., cases where the actual number of cases was 0 while the predicted number was 1), the overall predictive performance remained largely intact.

SHAP analysis revealed that the historical incidence of influenza cases was the primary factor influencing LSTM prediction outcomes. Furthermore, increased DOW activity near weekends, and reduced solar radiation or the UV index significantly affected the LSTM prediction results, likely due to heightened respiratory transmission risk during weekends and the inactivation of the influenza virus by solar radiation and UV exposure. While SHAP analysis enhances LSTM interpretability, supporting its practical application and dissemination, several methodological considerations require discussion. Although we treated each meteorological variables lagged sequence as a unified feature to preserve temporal dependencies, SHAP’s underlying assumption of feature independence may still not fully capture the complex temporal interactions within our LSTM network. This limitation suggests that our findings primarily reflect the aggregate importance of meteorological variables rather than their time-specific effects. Future work could explore temporal SHAP extensions or alternative interpretability methods designed for sequential data to better capture these temporal dynamics. Despite computational constraints limiting our exploration of additional hyperparameters, such as decay rate and weight initialization, our LSTM network demonstrated robust predictive performance for influenza incidence using meteorological and historical case data in subtropical coastal regions, with external validation confirming its generalizability to inland subtropical areas.

This study innovatively integrates DLNM with LSTM to explore the relationship between influenza and meteorological factors, establishing a framework that balances interpretability and predictive accuracy. By leveraging DLNM to identify critical meteorological influences and LSTM to predict influenza trends, including the effects of COVID-19 interventions, our approach distinguishes itself from prior work (Zhu et al. 2022), highlighting its novelty in climate-sensitive disease modeling. While meteorological factors remain the primary drivers of influenza transmission, the inclusion of COVID-19-related variables, such as mask-wearing stringency and daily case numbers, provides valuable insights into the LSTM network’s adaptability to public health interventions. Sensitivity analyses reveal that excluding these variables reduces predictive accuracy, underscoring their significance in capturing the dynamics of influenza during the pandemic. This study thus represents a novel exploration of how COVID-19-related factors can be integrated into forecasting models to improve their practical utility. Noteworthily, the lack of reliable data from late 2019, when the pandemic began, may restrict our capacity to accurately assess its initial impact. Nevertheless, the findings affirm deep learning’s potential in surveilling climate-driven infectious diseases, extensible to conditions like dengue or hand-foot-and-mouth disease.

Integrating demographic and socio-economic data could further evolve this framework into a comprehensive predictive tool, enhancing precision in public health responses. Furthermore, our LSTM network offers actionable insights for public health in subtropical regions. By pinpointing meteorological thresholds, such as high humidity and low temperatures, and their lag effects on influenza peaks, it enables early warning systems. This predictive capacity informs targeted strategies. For instance, forecasting high-risk periods allows pre-emptive vaccination campaigns to protect vulnerable populations before outbreaks escalate. Similarly, anticipated case surges guide efficient stockpiling of antivirals and allocation of healthcare resources, optimizing preparedness. Enhanced surveillance during predicted risk windows can also accelerate case detection, facilitating swift containment. In a warming climate, shifting weather patterns may disrupt influenza seasonality. The model’s adaptability permits recalibration of these strategies to maintain efficacy.

This study unavoidably presents certain limitations. Firstly, our primary analysis relies on influenza surveillance data collected from 7 sentinel hospitals in Putian. While these hospitals were designated by the local CDC, cover main urban and rural areas, and possess long-term surveillance experience ensuring data quality, this approach inherently captures only a fraction of all influenza cases occurring in the community. Consequently, our incidence rates likely underestimate the true burden of influenza activity. Additionally, while retrospective data met national standards, the absence of pre-2020 ethics oversight might suggest unrecorded deviations. Nonetheless, consistent protocol application and six-year diagnostic uniformity mitigate measurement bias and temporal confounding risks. Although our study was primarily conducted focusing a single city, we have demonstrated the generalizability of our LSTM neural network through both internal validations using Putian’s 2023 surveillance data and external validation using concurrent data from Sanming, another subtropical city. Nevertheless, the transferability of our findings to subtropical regions with different socioeconomic structures, healthcare systems, or population behaviors remains uncertain. The meteorological thresholds and lag patterns identified may vary across different subtropical regions globally, necessitating region-specific calibration. Future research should aim to integrate broader data sources, such as community-level surveillance or syndromic data streams, if available. Additionally, further investigation should prioritize multi-centre studies across diverse subtropical regions in other parts of the world to establish the core meteorological determinants that transcend geographical boundaries versus those that require local adaptation.

Secondly, our DLNM analysis concentrated on conventional meteorological parameters, primarily due to their established influence on influenza transmission and the availability of high-quality data. This focus allowed for a robust examination of individual weather effects but excluded potential confounders such as air pollutants (particularly particulate matter 2.5), incorporated time trend variables and optimized df using QAIC criteria. Given vaccination coverage and social behavioral factors. To address autocorrelation, we computational constraints and the inherent complexity of distributed lag non-linear modeling, our DLNM framework primarily analyzed the independent effects of individual meteorological variables without modeling their interactions. Consequently, our sensitivity analyses centered on QAIC-driven spline adjustments, while the lack of multidimensional sensitivity, such as expanded testing of lags and reference points, remains a crucial validation avenue. While these represent methodological limitations, our subsequent use of LSTM networks partially addressed this by capturing complex, non-linear relationships and interactions in temporal data. This dual-modeling approach reduces the limitations of manual parameter selection while potentially revealing more intricate relationships than traditional statistical approaches. Future investigation should explicitly model interaction effects within advanced DLNM frameworks, optimize lag period selection, and incorporate autoregressive components such as outcome lags or their logarithmic transformations. Additionally, including air pollutants, vaccination rates and social behaviors would improve our understanding of influenza transmission dynamics in subtropical regions.

Thirdly, our study utilized aggregate-level surveillance data, limiting the incorporation of individual-level variables such as vaccination status, detailed comorbidities, and socioeconomic status, all known confounders in influenza susceptibility and transmission. The national surveillance protocol does not routinely capture these details for city-wide, continuous monitoring, thereby restricting our framework’s granularity and causal inference capabilities. Although our sentinel hospitals are mainly in urban areas with less socioeconomic heterogeneity, this does not replace the need for individual socioeconomic status data. Despite that our LSTM network learns long-term temporal patterns from historical data, which may implicitly capture some slow-varying unobserved influences, this is not a direct adjustment for specific individual confounders. Therefore, while our findings reflect population-level associations and predictive capacities based on environmental and broad intervention measures, they should be interpreted within the context of an ecological study. Future research integrating individual-level data through health registries or cohort studies would further elucidate the interplay of these individual factors with environmental drivers of influenza dynamics.

In conclusion, this study illuminates the distinct meteorological influences on influenza A and B transmission in subtropical regions and develops an advanced DLNM-LSTM framework that successfully captures these complex dynamics. Our predictive model demonstrates superior accuracy over traditional approaches and maintains robustness across different subtropical locations. These findings provide a foundation for climate-sensitive digital surveillance systems that can enhance targeted public health interventions and strengthen preparedness for climate-driven shifts in influenza dynamics in an era of environmental change and emerging infectious threats.

Data Availability

The influenza surveillance data and meteorological records utilized in this study, as well as the code written for constructing the DLNM models and LSTM algorithms, are available upon reasonable request to the corresponding authors X-W. J. and M-J. Z. All data are managed in accordance with the ethical standards and data management policies of the affiliated institutions. Requests for data access will be reviewed and fulfilled in compliance with these policies and regulations. The computational codes used in this study, including those for the distributed lag non-linear models (DLNM) and the Bayesian-optimized Long Short-Term Memory (LSTM) neural network implementations, were written in R and Python. All custom scripts used for data analysis, model development, and visualization are available upon reasonable request from the corresponding authors X-W. J. and M-J. Z. The codes include data preprocessing, model training, validation procedures, and the generation of prediction results. Researchers interested in reproducing or building upon our methodology may contact the corresponding authors with specific requirements regarding the code components needed for their research purposes.

Acknowledgements

We acknowledge the invaluable contribution of the Putian CDC and its affiliated sentinel hospitals to this study. We express our deep respect and gratitude to the dedication of all participating healthcare workers, disease prevention personnel, clinical medical staff, and laboratory technicians involved in case screening, data recording, sample collection, and laboratory testing for their significant efforts and contributions to this study.

Additional information

Author Contributions

Conceptualisation: X-W. J., L-M. L., X. H., S. X., and L. X; Data curation: M-J. Z., J-L. T., Y-X. L., J-L. C., J-J. H., J. Q., and H-Y. P; Formal analysis: L. X., M-J. Z., J-L. T., Y-X. L., Z-Q. X., S. X., and X. H; Funding acquisition: L-M. L. and X-W. J; Investigation: M-J. Z., Y-X. L., J. Q., J-J. H., J-L. C., H-Y. P., and Z-F. R; Methodology: J-L. T., L. X., X-W. J., X. H., S. X., and M-J. Z; Software: J. Q., H-Y. P., J-L. T., and Z-F. R; Project administration: X-W. J., L-M. L., and X. H; Resource: X-W. J, L-M. L., S. X., and X. H; Supervision: X-W. J., L-M. L., M-J. Z., and Y-X. L.; Validation: L. X., Y-X. L., J-L. T., Z-Q. X., X. H., and M-J. Z; Visualisation: L. X., J-L. T., J-L. C., J. Q., J-J. H., and Z-F. R; Writing-original draft: L. X., and J-L. T; Writing-review & editing: L. X., X-W. J., L-M. L., S. X., and X. H. All authors have reviewed and approved the final version of the manuscript.

Code Availability Statement

The computational codes used in this study, including those for the distributed lag non-linear models (DLNM) and the Bayesian-optimized Long Short-Term Memory (LSTM) neural network implementations, were written in R and Python. All custom scripts used for data analysis, model development, and visualization are available upon reasonable request from the corresponding authors X-W. J. and M-J. Z. The codes include data preprocessing, model training, validation procedures, and the generation of prediction results. Researchers interested in reproducing or building upon our methodology may contact the corresponding authors with specific requirements regarding the code components needed for their research purposes.

Additional files

Supplemental Information