Modelling the climatic suitability of Chagas disease vectors on a global scale
Abstract
The Triatominae are vectors for Trypanosoma cruzi, the aetiological agent of the neglected tropical Chagas disease. Their distribution stretches across Latin America, with some species occurring outside of the Americas. In particular, the cosmopolitan vector, Triatoma rubrofasciata, has already been detected in many Asian and African countries. We applied an ensemble forecasting niche modelling approach to project the climatic suitability of 11 triatomine species under current climate conditions on a global scale. Our results revealed potential hotspots of triatomine species diversity in tropical and subtropical regions between 21°N and 24°S latitude. We also determined the climatic suitability of two temperate species (T. infestans, T. sordida) in Europe, western Australia and New Zealand. Triatoma rubrofasciata has been projected to find climatically suitable conditions in large parts of coastal areas throughout Latin America, Africa and Southeast Asia, emphasising the importance of an international vector surveillance program in these regions.
Introduction
The Triatominae are haematophagous insects of the order Hemiptera and comprise 152 species subdivided into 16 genera, including two fossils (Poinar, 2013; Mendonça et al., 2016; da Rosa et al., 2017; Monteiro et al., 2018). Triatomines are mainly distributed in Central and South America inhabiting environments ranging from tropical to temperate regions with cold winters (de la Vega and Schilman, 2018). Eleven species have also been recorded in the southern United States (Curtis-Robles et al., 2018). In the Old World, species belonging to the genus Linshcosteus occur in India, whereas eight species of the genus Triatoma, especially the cosmopolitan T. rubrofasciata occur in Africa, the Middle-East, Southeast Asia, and in the Western Pacific (Gorla et al., 1997; Monteiro et al., 2018).
Through their haematophagous lifestyle, triatomines function as vectors for pathogens such as Trypanosoma conorhini, T. rangeli and T. cruzi, the aetiological agent of Chagas disease (American trypanosomiasis) (Deane and Deane, 1961; Ferreira et al., 2015; Vieira et al., 2018). The flagellated protozoan parasite T. cruzi is transmitted by infectious faeces of the triatomines, which are rubbed into the bite wound. Further transmission routes include oral infection by the consumption of contaminated food or raw meat of infected mammalian hosts, congenital infection and transmission through blood transfusion or organ transplantation. Acute symptoms include fever, fatigue, headache, and myalgia, with long-term effects such as acute and chronic chagasic heart disease and gastrointestinal manifestations being more severe (Nóbrega et al., 2009; Coura and Viñas, 2010; Carlier et al., 2011; Rosas et al., 2012). Lee et al., 2013 calculated a global economic loss of $7.19 billion per year attributable to Chagas disease due to high health-care costs and lost productivity from early mortality. It is estimated that 6 to 7 million people worldwide are infected with Chagas disease, most of them living in Latin America. Caused by global immigration and travel, Chagas disease has been increasingly detected in the United States, Canada, European and some Western Pacific countries (WHO, 2019). However, due to the lack of vectors, there has been no vector-borne transmission outside of the Americas. This could change if the spread of the disease is followed by a propagation of the vectors. Although the mobility of triatomine bugs is generally limited, they can be passively transported by the shipment of infested goods and animals or luggage and along air transportation routes (Fleming-Moran, 1992; Coura and Viñas, 2010; Pinto Dias, 2013).
Among triatomine species, Triatoma rubrofasciata is most widely distributed and recorded from the United States of America, Central and South America, coastal regions of Africa and the Middle-East, the Atlantic Ocean (Azores) and port areas of the Indo-Pacific region (Dujardin et al., 2015a). It is the only member of the Triatoma genus occurring in the New and Old World, with frequent records in previously non-endemic areas (Liu et al., 2017; Huang et al., 2018). The evolutionary origin of T. rubrofasciata is unclear, although a New World origin seems more likely (Dujardin et al., 2015b). It is believed that through its close association with domestic rats (especially Rattus rattus), T. rubrofasciata was spread along the shipping routes of the 16th to 19th century (Schofield, 1988; Gorla et al., 1997; Patterson et al., 2001). T. rubrofasciata is able to transmit Trypanosoma cruzi in Latin America; however, there are no records of vector transmission in the Old World (Dujardin et al., 2015b).
In the past, Chagas disease vectors have frequently experienced range expansions (Pinto Dias, 2013). For instance, Rhodnius prolixus, one of the most important vectors of Chagas disease, was carried from Venezuela to Mexico and Central America by sea commerce and possibly by bird migration (Hashimoto and Schofield, 2012). In the light of ongoing globalization, changing climate and a concomitant shift of trade and travel routes, the probability of further migration of triatomine bugs and especially Triatoma rubrofasciata increases. This entails the risk of vector-associated transmission of Chagas disease (Schofield et al., 2009). Therefore, constant monitoring of the vectors, but also the determination of potentially suitable habitats for these vectors, appears to be of great importance. The potential spread of various triatomine species under current and future climatic conditions in South, Central and North America has been extensively studied, but there is a lack of knowledge about climatic suitability outside the Americas (Gurgel-Gonçalves et al., 2012; Garza et al., 2014; Parra-Henao et al., 2016). The aim of this study was to investigate the climatic suitability under current climatic conditions for eleven triatomine species on a global scale using ecological niche modelling (ENM). We concentrated on domestic and peri-domestic triatomine species representing different biogeographical regions and deemed to be the main vectors of Chagas disease. An ensemble forecasting approach was applied (Araújo and New, 2007), which is considered to yield robust estimations of the habitat suitability. In this way, we are able to identify areas at risk, pinpoint triatomine species which find suitable habitats outside their current range and therefore might possess a high potential for expansion.
Results
Potential distribution under current climate conditions
Global species distribution modelling revealed several regions with current suitable climatic conditions for the considered triatomine species. Comparing the modelled potential distribution of the species, differences in the preference of climatic conditions are evident. Rhodnius brethesi, R. ecuadoriensis and Triatoma maculata are limited to one or a few areas with mostly tropical climate. Triatoma brasiliensis, Panstrongylus geniculatus, P. megistus, R. prolixus, T. dimidiata and T. rubrofasciata find suitable climate conditions in a broad range of tropical and sub-tropical regions, while T. sordida and T. infestans possess a broad potential range in temperate regions (Figure 1).

Modelled current climatic suitability.
(A–J) Modelled climatic suitability (consensus map) of 10 triatomine species under current climate conditions. Hatched areas indicate regions where the projection is uncertain. Maps were built using WGS 84 as geographical system and ESRI ArcGIS (ESRI, 2018).
The projected range of R. brethesi and R. ecuadoriensis is limited to areas with equatorial, tropical wet climate. In the case of R. brethesi, these areas include, above all, the Amazon region in South America and Southeast Asia (Indonesia, Malaysia, New Guinea) (Figure 1A). R. ecuadoriensis shows only a small modelled potential distribution in western Ecuador, parts of Indonesia, Malaysia and Papua-New Guinea as well as the Congo Basin (Figure 1B). Beyond its observed distribution area in Venezuela, Guyana, Suriname and French Guiana, T. maculata possesses distribution potential under current climate conditions in the North of Brazil, Peru, Central Africa and parts of Southeast Asia (Malaysia, Indonesia, Philippines, New Guinea) (Figure 1C). T. brasiliensis prefers dry and wet savannah climate as found in eastern Brazil. Furthermore, this species has a modelled climatic distribution potential in southern West Africa, northern and southern Central Africa and East Africa (Figure 1D). Models project large climatically suitable areas for P. geniculatus, P. megistus, R. prolixus and T. dimidiata in tropical regions in South America, Central America, Caribbean, Central and East Africa, eastern Madagascar, in the South of India and Sri Lanka and throughout Southeast Asia (Figure 1E–H). In addition to species preferring tropical, wet climate, species that occur in temperate, semi-arid regions have also been modelled. These include T. sordida and especially T. infestans. The climatic suitability models of the latter show extensive potential spread in both semi-arid to humid and temperate to cold regions comprising the Southern Cone of South America, parts of Brazil, Bolivia and Peru, Mexico, Caribbean and Florida (USA), the South of Africa, parts of the Arabian Peninsula, the West and South of Australia, New Zealand and in Europe Portugal, Spain, France, Italy, Greece, Ireland and Great Britain (Figure 1J). For T. rubrofasciata, primarily coastal regions in tropical savannah and monsoon areas were predicted as climatically suitable including the east coast of South America and Central America, large parts of Brazil and Venezuela, Caribbean, Florida (USA), the coasts of Central and Eastern Africa, the eastern coast of Madagascar, southern India and Sri Lanka, Thailand, Malaysia and Indonesia, Vietnam, southern China and Japan, Philippines and the eastern coast of Australia (Figure 2).

Modelled current climatic suitability of T. rubrofasciata (consensus map) and observed occurrence records outside the Americas.
Hatched areas indicate regions where the projection is uncertain. Maps were built using WGS 84 as geographical system and ESRI ArcGIS (ESRI, 2018).
The observed occurrence of the considered species is mainly consistent with the projected climatic suitability in Latin America. Nevertheless, it is noteworthy that for some species the modelled climate suitability in Central and South America exceeds the area of current occurrence. For example, T. dimidiata is not observed in large parts of Brazil, Peru and Bolivia, although a good climatic suitability has been projected (Figure 1H).
Potential hotspots of triatomine diversity are revealed by the species diversity map displaying the number of the modelled triatomine species that find suitable climatic conditions in the respective regions. The quantity of species varies between zero and seven. Areas of great triatomine diversity have foremost tropical forest and savannah-like climate and include predominantly regions between 21°N and 24°S latitude (Figure 3). However, even in temperate regions, some species could find suitable climatic conditions, for example in Portugal, Spain and eastern Australia.

Species diversity.
The map is based on the combined binary modelling results highlighting potential hotspots of triatomine species diversity. Hatched areas indicate regions where the projection is uncertain. Maps were built using WGS 84 as geographical system and ESRI ArcGIS (ESRI, 2018).
The most important bioclimatic variable is the temperature seasonality (BIO4) for all considered species, closely followed by the minimum temperature of the coldest month (BIO6) and maximum temperature of the warmest month (BIO5). The three precipitation variables (BIO13, BIO14, and BIO15) seem to shape the species distribution in a subordinate way.
Model evaluation
The evaluation of the global projection of the climatic suitability shows that almost all actual occurrence points of T. rubrofasciata (with coordinates provided) are within an area classified as climatically suitable by the models (Supplementary file 1). This is particularly evident in South India, Vietnam, South China, Taiwan or the Philippines. A few occurrence points are located in areas projected as less suitable including seaports such as Singapore and the Okinawa islands in Japan. According to the models, every country in which T. rubrofasciata has been found (but without specific coordinates of occurrence records given) provides at least one area offering suitable climatic conditions, for example Indonesia, Madagascar and several African countries.
The discriminatory capacity of all models displays a good predictive performance, which is also reflected in the AUC values of over 0.9. However, differences in the performance of the models become apparent. GBM (generalised boosted models) performs particularly well for all species and achieves the highest AUC values – the same applies to GAM (generalized additive models). Slightly lower AUC values are generated by ANN (artificial neuronal networks) and MAXENT (maximum entropy) (Supplementary file 2, Supplementary file 3).
Discussion
With a few exceptions, Triatominae are currently widespread in Central and South America where they transmit the causative agent of Chagas disease, Trypanosoma cruzi. However, it is not yet known whether areas outside the Americas provide suitable habitats for triatomine species. This study analyses the potential geographical distribution of 11 triatomine species under current climatic conditions on a global scale. For this purpose, we used ENM to identify regions that offer climatically suitable conditions for the examined species.
Despite climatic requirements, the triatomine species have different microhabitat preferences and host spectra and therefore, possess a dissimilar vector potential. For instance, R. brethesi is closely associated with palm trees and features prominently in the sylvatic transmission of Trypanosoma cruzi feeding particularly on opossums (Didelphis spp.). Domestic or peri-domestic behaviour is not observed, thus, R. brethesis’ domestic vector potential is most likely low (Coura et al., 2002; Rocha et al., 2004). Rhodnius ecuadoriensis is distributed in southern Ecuador and northern Peru, where it exhibits domestic and peri-domestic behaviour invading chicken coops and human dwellings. It establishes dense populations and is commonly infected with T. cruzi indicating a high vector potential. In the sylvatic environment, R. ecuadoriensis is mainly found in palm trees (Phytelephas aequatorialis) (Abad-Franch et al., 2005). The T. cruzi infection rate of Triatoma maculata depends on the geographical area. In most regions T. maculata has ornithophilic feeding preferences, whereas studies in the Colombian Caribbean region reported active transmission in peri-domestic human dwellings involving dogs as reservoir hosts (Cantillo-Barraza et al., 2014; Cantillo-Barraza et al., 2015). For these three species, only a few regions outside the Americas with distinct tropical climate have been projected as climatically suitable including the Congo Basin in Central Africa and a few parts of Southeast Asia. Here, T. maculata comparatively shows the largest distribution comprising potentially suitable areas within tropical rainforest and savannah climates (Figure 1C).
Triatoma brasiliensis is located in tropical savannah regions with a hot and dry climate altering with intensive rain. There, it is found under rock piles feeding on a broad range of reptile and mammalian hosts, including humans. Due to its high Trypanosoma cruzi infection and intradomiciliary infestation rates, Triatoma brasiliensis is considered as an important vector for Chagas disease (Carcavallo et al., 1997). The climatic niche of Panstrongylus geniculatus was projected to be very broad (Figure 1E). A result that is reinforced by the literature in which the species is described as eurythermic and adapted to several dry as well as humid ecotopes (Patterson et al., 2009). It feeds on the blood of various hosts, such as marsupials, opossums, anteaters, armadillos, bats, cats, birds and chicken, in whose nests and coops it can be found. Additionally, it invades human houses and possesses a high susceptibility to Trypanosoma cruzi (Feliciangeli et al., 2004). Similar climatic distribution patterns are shown by the anthropophilic species R. prolixus, which permanently colonises human dwellings (Rabinovich et al., 2011). P. megistus occurs primarily in the Atlantic rain forests of South America and requires high relative humidity for breeding. With exception of the triatomine species occurring in temperate climate regions, solely the modelled distribution of P. megistus is highly influenced by the bioclimatic variable describing the maximum temperature of the warmest month (BIO5). The reason could be that the species is strongly limited by arid climates (Forattini, 1980). Outside of South and Central America, the models of these six species show widespread potential distribution areas with suitable climate conditions in the Caribbean, Central and East Africa, Madagascar, South India and throughout Southeast Asia.
Triatoma infestans and T. sordida find climatically suitable habitats in subtropical, but also temperate regions as they exist in the southern cone of South America and northern Central America, respectively. They are also the only triatomine species for which habitats in southern Europe, eastern Australia and large parts of South Africa have been projected to be climatically suitable (Figure 1I–J). T. infestans is adapted to deal with cold temperatures and therefore, has a greater diversity of wild habitats, which is probably related to the behavioural plasticity of the species. It shows exceptional domestic behaviour and is classified as one of the most important Chagas disease vectors in South America (Brenière et al., 2017; Belliard et al., 2019). T. sordida also has a propensity for peri-domestic behaviour in the absence of primary domestic vector species, thus, it often occurs when T. infestans is eradicated from human dwellings (Diotaiuti et al., 1993; Galvão and Justi, 2015).
Previous niche modelling approaches focused mainly on the potential distribution in North, Central and South America neglecting the actual global distribution achieved by T. rubrofasciata and the risks emerging from other Triatominae. For example, Garza et al., 2014 predicted a potential northern shift in the distribution of T. gerstaeckeri and a northern and southern distributional shift of T. sanguisuga, both important vectors of T. cruzi in the United States, under future climate conditions. Similar studies have been conducted for regions in Brazil, Mexico, Colombia, Chile and Venezuela in Central and South America (Sandoval-Ruiz et al., 2008; Gurgel-Gonçalves et al., 2012; Hernández et al., 2013; Ceccarelli and Rabinovich, 2015; Parra-Henao et al., 2016).
In general, the projected climatic habitat suitability reflects the realised species distribution in South America exceptionally well (Supplementary file 4). In some areas, however, the models appear to slightly overestimate the potential distribution as it could be noted in the modelling of T. dimidiata. Although eastern and western South America were projected as climatically suitable, the observed distribution of T. dimidiata covers exclusively Central America and northern South America. Such discrepancies could be ascribed to factors not related to climate conditions, such as dispersal limitations and interspecific competition. Furthermore, highly effective vector control measurements such as indoor residual spraying and housing improvements significantly reduce the distribution of triatomine insects in areas that would otherwise be considered a suitable environment (Cucunubá et al., 2018). Sampling biases, such as insufficient or inhomogeneous sampling can result in species distribution modelling reflecting sampling effort rather than actual distribution. Nevertheless, the occurrence data used were taken from a published atlas of the Chagas disease vectors and, thus, represent a comprehensive and reliable source (Carcavallo et al., 1998; Fergnani et al., 2013). The applied ensemble forecast approach is considered to return more robust estimations compared to single algorithms and is therefore an eligible tool for ENM. Nevertheless, like all modelling approaches, ensemble forecasting is also subject to certain restrictions. This includes in particular the generation of pseudo-absences, since reliable absence data were not available. We have sought to minimize this problem by using a geographically filtered absence selection, where pseudo-absences are sampled in a defined spatial distance to occurrence points. The fact that explanatory variables are only available up to the year 2000 entails that the modelled distribution could shift slightly under present conditions. Thus, the distribution of thermophilic species might be underestimated due to a changing climate. Areas in which the projection is uncertain are additionally displayed. It is striking that this affects particularly dry and cold regions, such as the Atacama Desert, the Sahara, the Namib and Kalahari Desert, Antarctica, or parts of the Arabian Peninsula. The reason for this is probably the extreme maximum and minimum temperatures as well as the precipitation patterns in these areas, which deviate significantly from the values of the bioclimatic variables used for model training. The comparison between the global projected climatic suitability and the actual occurrences of T. rubrofasciata imply that the algorithms also project well outside the Americas. Almost all occurrence records are located in areas projected to be at least partially climatically suitable (Figure 2). Occurrence points located in areas with less projected suitability often occur in large trade centres, such as Singapore where individual triatomines could be introduced by shipping or air transport. This could be also true for smaller islands, such as the Japanese Okinawa islands. Whether established populations exist there is not affirmed. Each of the countries where T. rubrofasciata has been reported but no coordinates are available possess at least one area projected as climatically suitable for T. rubrofasciata. This applies especially to coastal regions of Angola, Cambodia, China, Comoros, the Republic of the Congo, the Democratic Republic of the Congo, Guinea, India, Indonesia, Japan, Madagascar, Malaysia, Martinique (France), Mauritius, Myanmar, Philippines, Saudi Arabia, Seychelles, Sierra Leone, Singapore, South Africa, Sri Lanka, Taiwan (China), Tanzania, Thailand, Tonga and Vietnam. Nevertheless, the occurrence data for T. rubrofasciata is very scarce and probably incomplete due to a lack of monitoring. Therefore, occurrence points from 1958 to 2018 were used for validation including data collected outside the period in which the climatic variables were recorded. This could lead to a slight shift between the actually observed occurrence of T. rubrofasciata and the projected climatically suitable habitat. For example, it is possible that environments that were modelled as climatically unsuitable for the period 1970 to 2000 now have climatic conditions suitable for T. rubrofasciata.
The diversity map, which was compiled based on the binary modelling results, highlights regions possessing suitable climatic conditions for various triatomine species in tropical regions especially between 21°N and 24°S latitude. In South America, diversity hotspots are for the most part modelled for regions south of the equator. Indeed, a direct correlation exists between temperature and triatomine species richness leading to a significant increase of diversity from the poles towards the equator. This effect seems to be pronounced in the southern hemisphere of South America (Rodriguero and Gorla, 2004). According to Péneau et al., 2016 highly diverse vector communities as well as less diverse communities can lead to peaks of Chagas transmission, while intermediate levels of triatomine diversity lowers the risk of transmission. This correlation is mostly associated with a dominant, highly vector competent key species being more abundant in less diverse communities. An increase in biodiversity reduces the contribution of this key species to the Chagas disease transmission rate, while the contribution of secondary species increases. At intermediate levels of biodiversity, this does not compensate the reduced risk associated with the key species, whereas in highly diverse communities, the contribution of the secondary species to the transmission rate nearly matches the contribution of the key species. Outside their native range, highest triatomine species richness can also be found in tropical regions. In Africa, the regions of greatest triatomine diversity are projected north and south of the equator in a tropical savannah climate similar to the climate south of the equator in South America (Geiger, 1961). This climatic characteristic also applies to Southeast Asia, where diversity hotspots are less prominent. Furthermore, our findings indicate that there are suitable climatic habitats for triatomine species in temperate areas, such as Portugal, Spain and Italy in Europe or eastern Australia.
The results of the relative contribution of the bioclimatic variables correspond to findings of other studies, which also identify temperature seasonality (BIO4) as an important determinant of triatomine species distribution. This is probably attributable to physiological limiting factors, such as the temperature-dependent development and temperature-induced dispersal stimulation of the triatomines (Diniz-Filho et al., 2013; Pereira et al., 2013; Ceccarelli et al., 2015). Whether temperature or precipitation have a higher impact on triatomine distribution appears to be dependent on the considered species and further ecological factors (Gorla, 2002; Bustamante et al., 2007; de la Vega et al., 2015). For instance, although its lifecycle is not bound to water unlike many insects, a high relative humidity seems to be crucial for the development of T. vitticeps, a triatomine species occurring in the Atlantic forests of Brazil (de Souza et al., 2010). Our results indicate a low influence of the precipitation variables (BIO13-15) on the triatomine distribution. However, species interactions and other than the climatic influences on the Triatominae are not taken into account with this approach. For example, the niche of host species as well as microclimatic effects and habitat structures play an important role in the distribution of triatomine species.
Through our work, the global climatic suitability for many triatomine species has been demonstrated by ENM. Based on the results of this study, the ability to transmit Trypanosoma cruzi and the wide distribution achieved, Triatoma rubrofasciata currently seems to be the potentially most perilous source of autochthonous Chagas infections outside of the Americas. However, it has not yet been possible to provide evidence of such disease transmission (Rebêlo et al., 1998; Dujardin et al., 2015b). Due to their limited dispersal by flight and the fact that they do not have resting stages, transport of triatomine bugs seems to be very rare. Nonetheless, older instars of some triatomine species are able to survive several months without a blood meal and therefore, might survive longer transports (Costa and Perondini, 1973; Cortéz and Gonçalves, 1998; Almeida et al., 2003). Vulnerable dispersal routes could be shipping traffic fostering the distribution of coastal triatomine species as shown for T. rubrofasciata, but also trade and travel by air traffic. Our results, as well as the immigration of infected people, show that the potential transmission of Trypanosoma cruzi is not necessarily a sole Latin American problem (WHO, 2019). It has been estimated that there are >80,000 individuals infected with Chagas in Europe and the Western Pacific region, >300,000 in the United States, >3000 in Japan and >1500 in Australia (Coura and Viñas, 2010). A further spread of Triatoma rubrofasciata into regions with suitable climatic conditions, but also the possible introduction of other Chagas vectors in non-endemic areas, could aggravate the situation and increase the number of infections. In particular, regions offering suitable climatic conditions to a large number of different triatomine species are at risk. Therefore, it may be beneficial to establish national and international vector surveillance programs to monitor the spread of vectors, in particular for T. rubrofasciata as it is implemented in southern China (Liu et al., 2017), and to register Chagas disease as a reportable disease.
Materials and methods
Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
---|---|---|---|---|
Software, algorithm | RStudio | R Development Core Team, 2019 | RRID:SCR_000432 | |
Software, algorithm | ArcGIS for Desktop | ESRI, 2018 | RRID:SCR_011081 | |
Software, algorithm | biomod2 package | Thuiller et al., 2019 | Available at https://cran.r-project.org/package=biomod2 |
Occurrence data
Request a detailed protocolEleven triatomine species were considered for species distribution modelling representing different biogeographical regions in Latin America. The selection was also based on their importance as Chagas disease vectors and their presence in domestic and peri-domestic environments.
Occurrence data of the triatomine species were obtained from data provided by Fergnani et al., 2013. This American distribution dataset contains point data for each species with associated coordinates and was generated to study patterns on morphological diversity and species assemblages in Neotropical Triatominae (Fergnani et al., 2013). In total, 4155 unique occurrence points were provided ranging from 31 for Rhodnius ecuadoriensis to 1180 for Panstrongylus geniculatus (Table 1, Supplementary file 5). Fergnani et al., 2013 abstracted the occurrence data from distribution data from the 'Atlas of Chagas disease vectors in the Americas' (Carcavallo et al., 1998). In this atlas, the distribution of the species in the Americas is presented as detailed maps. These maps were copied and digitised at a 0.1° x 0.1° resolution and converted into a grid comprising the information of occurrence for each grid cell using an equal area Mollweide map projection. With the help of the map projection, occurrence points with coordinates were created (Fergnani et al., 2013).
Model specifications.
Occurrence points for all considered species used for modelling and model evaluation (AUC).
Species | Occurrence records | AUC ensemble models |
---|---|---|
Panstrongylus geniculatus | 1180 | 0.985 |
Panstrongylus megistus | 401 | 0.976 |
Rhodnius brethesi | 85 | 0.991 |
Rhodnius ecuadoriensis | 31 | 0.989 |
Rhodnius prolixus | 540 | 0.981 |
Triatoma brasiliensis | 178 | 0.994 |
Triatoma dimidiata | 300 | 0.962 |
Triatoma infestans | 631 | 0.977 |
Triatoma maculata | 132 | 0.992 |
Triatoma rubrofasciata | 268 | 0.98 |
Triatoma sordida | 409 | 0.978 |
Total | 4155 |
In order to assess the reliability and completeness of the data obtained from Carcavallo et al., 1998 and Fergnani et al., 2013, we compared it to further occurrence datasets. The publication by Ceccarelli et al., 2018 also contains comprehensive distribution data on triatomines. However, a direct comparison of both plotted datasets showed, that the data points obtained from Ceccarelli et al., 2018 are completely covered by the data from the ‘Atlas of the Chagas disease vectors in the Americas’ (Carcavallo et al., 1998; Fergnani et al., 2013; Supplementary file 6). Furthermore, the occurrence data from the ‘Atlas of Chagas disease vectors’ (Carcavallo et al., 1998; Fergnani et al., 2013) match the time period of the climatic conditions used as predictor variables (1970–2000) and are probably less susceptible to sampling bias.
Additional global occurrences of Triatoma rubrofasciata from an intensive literature search were used solely for independent global model validation and were not included in the modelling approach (Jurberg and Galvão, 2006; Eugenio and Minakawa, 2012; VAST, 2014; Dujardin et al., 2015b; Liu et al., 2017; Ceccarelli et al., 2018; Dong et al., 2018; Huang et al., 2018; GBIF.org, 2019a). This type of global validation was only feasible for T. rubrofasciata as it is the only triatomine species with known occurrences both inside and outside the Americas. For the data collection, the search engine ‘Google scholar’ and ‘Web of Knowledge’ were searched for the keywords ‘Triatoma rubrofasciata occurrence’, ‘Triatoma rubrofasciata distribution’ and ‘Triatoma rubrofasciata records’ considering only data points from outside the Americas. Records in English language, with included coordinates, and from all temporal periods were taken into account.
Climate data
Request a detailed protocolIt is well described that temperature and relative humidity have a strong impact on the development and distribution of Triatominae favouring mild temperatures and median to high humidity (Guarneri et al., 2003; Lazzari, 1991; Luz et al., 1998; Catalá et al., 2017). Therefore, we proceeded on the assumption that the distribution of the Triatominae is mainly climatically controlled. Bioclimatic variables provided by WorldClim comprising data on temperature and precipitation patterns were used as environmental variables (Fick and Hijmans, 2017). Nineteen different variables are available referring to the climate conditions empirically recorded over a period of 30 years from 1970 to 2000. We chose a subset of six variables to train the models. Studies have indicated that a major limiting factor of triatomine distribution is the minimum temperature of the coldest month (de la Vega et al., 2015). However, this seems to be species-specific, since temperature seasonality has also often been identified as an important determinant (Pereira et al., 2013; Ceccarelli et al., 2015). Hence, temperature seasonality (BIO4), maximum temperature of the warmest month (BIO5) and minimum temperature of the coldest month (BIO6) were chosen as explanatory variables for temperature. The precipitation and also the relative humidity play a decisive part in the distribution, but also the spatial delimitation of different triatomine species (Gurgel-Gonçalves et al., 2011; Ibarra-Cerdeña et al., 2014). Therefore, as explanatory variables for precipitation, we considered precipitation of the wettest month (BIO13), precipitation of the driest month (BIO14) and precipitation seasonality (BIO15). In order to avoid collinearity between the environmental variables, the Pearson correlation coefficient was computed (Pearson < 0.8) using the function cor of R’s stats package (R Development Core Team, 2013).
Species distribution modelling
Request a detailed protocolThe modelling of the habitat suitability was performed with an ensemble forecasting approach incorporating six different algorithms. Modelling was executed in the R environment (R Development Core Team, 2019) using the biomod2 package (Thuiller et al., 2019) (Source code 1). The algorithms were selected based on their modelling performance and advantages and included ANN – artificial neuronal networks, GAM – generalized additive models, GBM – generalized boosted models, GLM – generalized linear models, MARS – multivariate adaptive regression splines and MAXENT – maximum entropy approach (Elith et al., 2006; Li and Wang, 2013). The models were trained solely on the South American dataset with a spatial extent of 105°W to 35°W longitude and 30°N to 45°S latitude. The discriminatory capacity of the algorithms was evaluated using the receiver operating characteristic curve (ROC). A greater area under the curve (AUC 0–1) indicates a better predictive model performance. The results were then projected on a global scale. The models were run using the following single algorithm parameters: a stepwise feature selection with quadratic terms based on the Akaike Information Criterion (AIC) was used to generate the generalised linear models (GLM); generalised boosted models (GBM) were run with a maximum of 5 000 trees to ensure fitting, a minimum number of observations in trees’ terminal nodes of 10, a learning rate of 0.01 and a interaction depth of 7; for generalised additive models (GAM) a binomial distribution and logit link function was applied and the initial degrees of smoothing was set to 4; the minimum interaction degree of the multivariate adaptive regression splines (MARS) was set to two with the number of terms to retain in the final model set to 17; artificial neuronal networks (ANN) were produced with fivefold cross-validation resulting in eight units in the hidden layer and a weight decay of 0.001; for the maximum entropy approach (MAXENT) we used linear, quadratic and product features and deactivated threshold and hinge features, while the number of iterations was increased to 10 000 to ensure convergence of the algorithm.
A background selection process was implemented and 10 000 pseudo-absences were chosen. For the presence/absence models ANN, GAM, GBM, GLM and MARS, we used the pseudo-absence selection parameter ‘disk’ which defines a minimal distance (80 km) to presence points for selecting pseudo-absence candidates. Since MAXENT is a presence/background modelling tool, the pseudo-absences were randomly chosen. Cross-validation of the models was carried out by splitting the dataset in a subset used for calibrating the models (70%) and a second subset to evaluate them (30%). All model predictions are scaled with a binomial GLM. This should lead to a reduction in projection scale amplitude and ensure comparable predictions. Consensus maps were built combining the modelling results of all algorithms with an AUC value >0.75. Their impact on the consensus maps was weighted by the mean of the AUC scores. Applying an ensemble forecasting approach yield a robust projection of the species’ climate suitability.
We considered the relative contribution of the bioclimatic variables for all six algorithms and calculated their average importance over all models. Finally, the variables most contributing to the projection of the considered species were identified.
The modelled climatic suitability for eleven triatomine species was projected on a global extent under current climatic conditions. During model projection with biomod2, clamping masks were created. These masks signify areas where projections are uncertain because the values of the bioclimatic variables are outside the range used for calibrating the models. More precisely, this means that the models were trained with climatic variables whose values for temperature and precipitation correspond to the values occurring in South America. If the models are projected globally onto areas in which the temperature and precipitation patterns are outside the range of the trained values, extrapolation occurs and the projection can be regarded as uncertain. The clamping masks are integrated into climatic suitability maps indicating areas where at least one variable exceeds their training range by hatching. Additionally, a detailed clamping mask is given in Supplementary file 7.
In order to convert the continuous climatic suitability maps into binary presence/absence maps, the equal sensitivity and specificity threshold was applied. Based on the binary modelling results, we compiled a diversity map combining all considered triatomine species. This merged map displays the number of species which find suitable climatic conditions in the respective grid cell on a global scale.
All maps were created with ESRI ArcGIS (ESRI, 2018).
Model evaluation for Triatoma rubrofasciata
Request a detailed protocolThe evaluation of the global projection of climate suitability was performed with occurrence data of T. rubrofasciata, the only member of the Triatoma genus distributed in the Old and New World (Dujardin et al., 2015b). The global projection was compared to occurrence data of T. rubrofasciata outside of South America. Occurrence references with two levels of accuracy were taken into account; records with exact coordinates and records on country level. Occurrence points in areas with uncertain prediction have not been considered. Consistency of the global modelled climate suitability conditions and the actual occurrence of the species were compared and the resulting model performance was assessed.
Data availability
All data generated or analysed during this study are included in the manuscript and supporting files.
-
Global Biodiversity Information FacilityGBIF Occurrence Download.https://doi.org/10.15468/dl.yneo2v
-
figshareLarge-scale patterns in morphological diversity, and species assembly in Neotropical Triatominae (Heteroptera: Reduviidae).https://doi.org/10.6084/m9.figshare.653959.v6
References
-
Field ecology of sylvatic Rhodnius populations (Heteroptera, triatominae): risk factors for palm tree infestation in western ecuadorTropical Medicine & International Health 10:1258–1266.https://doi.org/10.1111/j.1365-3156.2005.01511.x
-
Ensemble forecasting of species distributionsTrends in Ecology & Evolution 22:42–47.https://doi.org/10.1016/j.tree.2006.09.010
-
Thermal tolerance plasticity in chagas disease vectors Rhodnius prolixus (Hemiptera: reduviidae) and Triatoma infestansJournal of Medical Entomology 56:997–1003.https://doi.org/10.1093/jme/tjz022
-
Atlas of Chagas Disease Vectors in the Americas747–792, Geographical distribution and Alti - Latitudinal dispersion, Atlas of Chagas Disease Vectors in the Americas, Editora Fiocruz.
-
BookBiology of TriatominaeIn: Telleria J, Tibayrenc M, editors. American Trypanosomiasis Chagas Disease: One Hundred Years of Research. Elsevier. pp. 145–167.
-
Resistance to starvation of Triatoma rubrofasciata (De geer, 1773) under laboratory conditions (Hemiptera: reduviidae: triatominae)Memórias Do Instituto Oswaldo Cruz 93:549–554.https://doi.org/10.1590/S0074-02761998000400024
-
Resistência do Triatoma brasiliensis ao jejumRevista De Saúde Pública 7:207–217.https://doi.org/10.1590/S0034-89101973000300003
-
Emerging chagas disease in amazonian BrazilTrends in Parasitology 18:171–176.https://doi.org/10.1016/S1471-4922(01)02200-0
-
Complementary paths to chagas disease elimination: the impact of combining vector control with etiological treatmentClinical Infectious Diseases 66:S293–S300.https://doi.org/10.1093/cid/ciy006
-
Bionomics and spatial distribution of triatomine vectors of Trypanosoma cruzi in Texas and Other Southern States, USAThe American Journal of Tropical Medicine and Hygiene 98:113–121.https://doi.org/10.4269/ajtmh.17-0526
-
Ecological and physiological thermal niches to understand distribution of chagas disease vectors in Latin AmericaMedical and Veterinary Entomology 32:1–13.https://doi.org/10.1111/mve.12262
-
Studies on the lifecycle of Trypanosoma conorrhini. “In vitro” development and multiplication of the bloodstream trypanosomesRev Inst Med Trop Sao Paulo. 3:149–160.
-
Geographical patterns of triatominae (Heteroptera: reduviidae) richness and distribution in the western hemisphereInsect Conservation and Diversity 6:704–714.https://doi.org/10.1111/icad.12025
-
The ecology of Triatoma sordida in natural environments in two different regions of the state of minas gerais, BrazilRevista Do Instituto De Medicina Tropical De São Paulo 35:237–245.https://doi.org/10.1590/S0036-46651993000300004
-
Complete mitochondrial genome of the chagas disease vector, Triatoma rubrofasciataThe Korean Journal of Parasitology 56:515–519.https://doi.org/10.3347/kjp.2018.56.5.515
-
The rising importance of Triatoma rubrofasciataMemórias Do Instituto Oswaldo Cruz 110:319–323.https://doi.org/10.1590/0074-02760140446
-
SoftwareArcGIS Desktop. Release 10.7Environmental Systems Research Institute.
-
ConferenceThe kissing bug in Quezon CityThe 64nd Annual Meeting of the Japan Society of Medical Entomology and Zoology.https://doi.org/10.11536/jsmez.64.0_67_1
-
Mixed domestic infestation by Rhodnius prolixus stäl, 1859 and Panstrongylus geniculatus latreille, 1811, vector incrimination, and seroprevalence for Trypanosoma cruzi among inhabitants in El Guamito, lara state, VenezuelaThe American Journal of Tropical Medicine and Hygiene 71:501–505.https://doi.org/10.4269/ajtmh.2004.71.501
-
Large-scale patterns in morphological diversity and species assemblages in neotropical triatominae (Heteroptera: reduviidae)Memórias Do Instituto Oswaldo Cruz 108:997–1008.https://doi.org/10.1590/0074-0276130369
-
WorldClim 2: new 1‐km spatial resolution climate surfaces for global land AreasInternational Journal of Climatology 37:4302–4315.https://doi.org/10.1002/joc.5086
-
Biogeografia, origem e distribuição da domiciliação de triatomíneos no brasilRevista De Saúde Pública 14:265–299.https://doi.org/10.1590/S0034-89101980000300002
-
Projected future distributions of vectors of Trypanosoma cruzi in North America under climate change scenariosPLOS Neglected Tropical Diseases 8:e2818.https://doi.org/10.1371/journal.pntd.0002818
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.yneo2v
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.z5ujep
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.f7llvz
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.wlpggh
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.tivrxd
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.j3ameo
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.vjqcgv
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.y9thnj
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.lfcmbf
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.cyfvru
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.hi7c3s
-
DataGBIF occurrence downloadGlobal Biodiversity Information Facility.https://doi.org/10.15468/dl.kw4n5q
-
SoftwareKlima der Erde (Überarbeitete Neuausgabe von Geiger, R.: Koppen-Geiger)Wandkarte 1:16 Mill., Klett-Perthes.
-
Biosystematics of old world triatominaeActa Tropica 63:127–140.https://doi.org/10.1016/S0001-706X(97)87188-4
-
Variables ambientales registradas por sensors remotos como indicadores de la distribución geográfica de Triatoma infestans (Heteroptera: reduviidae)Ecología Austral 12:117–127.
-
The effect of temperature on the behaviour and development of Triatoma brasiliensisPhysiological Entomology 28:185–191.https://doi.org/10.1046/j.1365-3032.2003.00330.x
-
Geometric morphometrics and ecological niche modelling for delimitation of near-sibling triatomine speciesMedical and Veterinary Entomology 25:84–93.https://doi.org/10.1111/j.1365-2915.2010.00920.x
-
Geographic distribution of chagas disease vectors in Brazil based on ecological niche modelingJournal of Tropical Medicine 2012:1–15.https://doi.org/10.1155/2012/705326
-
Elimination of Rhodnius prolixus in central americaParasites & Vectors 5:45–52.https://doi.org/10.1186/1756-3305-5-45
-
Modeling the spatial distribution of chagas disease vectors using environmental variables and people´s knowledgeInternational Journal of Health Geographics 12:29.https://doi.org/10.1186/1476-072X-12-29
-
Biology, ecology and systematics of triatominae (Heteroptera, Reduviidae), vectors of chagas disease, and implications for human healthBiol Linz 50:1096–1116.
-
Temperature preference in Triatoma infestans (Hemiptera: reduviidae)Bulletin of Entomological Research 81:273–276.https://doi.org/10.1017/S0007485300033538
-
Global economic burden of chagas disease: a computational simulation modelThe Lancet Infectious Diseases 13:342–348.https://doi.org/10.1016/S1473-3099(13)70002-1
-
Applying various algorithms for species distribution modellingIntegrative Zoology 8:124–135.https://doi.org/10.1111/1749-4877.12000
-
The effect of fluctuating temperature and humidity on the longevity of starved Rhodnius prolixus (Hem., Triatominae)Journal of Applied Entomology 122:219–222.https://doi.org/10.1111/j.1439-0418.1998.tb01487.x
-
Evolution, systematics, and biogeography of the triatominae, vectors of chagas diseaseAdvances in Parasitology 99:265–344.https://doi.org/10.1016/bs.apar.2017.12.002
-
Oral transmission of chagas disease by consumption of açaí palm fruit, BrazilEmerging Infectious Diseases 15:653–655.https://doi.org/10.3201/eid1504.081450
-
Amazonian triatomine biodiversity and the transmission of chagas disease in french guiana: in medio stat sanitasPLOS Neglected Tropical Diseases 10:e0004427.https://doi.org/10.1371/journal.pntd.0004427
-
Climatic factors influencing triatomine occurrence in Central-West brazilMemórias Do Instituto Oswaldo Cruz 108:335–341.https://doi.org/10.1590/S0074-02762013000300012
-
Human chagas disease and migration in the context of globalization: some particular aspectsJournal of Tropical Medicine 2013:789758.https://doi.org/10.1155/2013/789758
-
Panstrongylus hispaniolae sp. n. (Hemiptera: reduviidae: triatominae), a fossil triatomine in Dominican amber, with evidence of gut flagellatesPalaeodiversity 6:1–8.
-
SoftwareR: A language and environment for statistical computing, version 3.0.2R Foundation for Statistical Computing, Vienna, Austria.
-
SoftwareR: A language and environment for statistical computing, version 3.6.0R Foundation for Statistical Computing, Vienna, Austria.
-
Ecological patterns of blood-feeding by kissing-bugs (Hemiptera: reduviidae: triatominae)Memórias Do Instituto Oswaldo Cruz 106:479–494.https://doi.org/10.1590/S0074-02762011000400016
-
Espécies de triatominae (Hemiptera: reduviidae) do estado do maranhão, brasilCadernos De Saúde Pública 14:187–192.https://doi.org/10.1590/S0102-311X1998000100027
-
Latitudinal gradient in species richness of the new world triatominae (Reduviidae)Global Ecology and Biogeography 13:75–84.https://doi.org/10.1111/j.1466-882X.2004.00071.x
-
BookBiosystematics of Triatominae. Biosystematics of haematophagous insectsIn: Service M. W, editors. Systematics Association Special. Clarendon Press. pp. 287–312.
-
Distribución de los vectores de la enfermedad de chagas en países “no endémicos”: La posibilidad de transmission vectorial fuera de América LatinaRevista Argentina De Zoonosis Y Enfermedades Infecciosas Emergentes 11:20–27.
-
Triatomines: trypanosomatids, Bacteria, and viruses potential vectors?Frontiers in Cellular and Infection Microbiology 8:405.https://doi.org/10.3389/fcimb.2018.00405
Decision letter
-
Anna AkhmanovaSenior and Reviewing Editor; Utrecht University, Netherlands
-
Zulma CucunubáReviewer
In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.
Acceptance summary:
This paper provides an interesting analysis of the climate and environmental factors driving the vector occurrence for Chagas disease, a tropical disease that is transmitted by insects of the Triatominae subfamily and affects several million people worldwide. The topic and the results of this research are relevant, novel and with important public health implications for a global vector surveillance effort.
Decision letter after peer review:
Thank you for submitting your article "Modelling the climatic suitability of Chagas disease vectors on a global scale" for consideration by eLife. Your article has been reviewed by Neil Ferguson as the Senior Editor, a Reviewing Editor, and two reviewers. The following individuals involved in review of your submission have agreed to reveal their identity: Zulma Cucunubá (Reviewer #2).
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.
Summary:
This paper provides an interesting analysis of the climate and environmental drivers of Chagas vector occurrence, and comments on the implications for surveillance.
Both reviewers found the paper of interest but identified major limitations which need to be addressed. Given the nature of some comments, acceptance of a revised manuscript is not guaranteed.
Essential revisions:
Refer to the full reviews for details but the following are the most critical issues:
- the source data – both reviewers comment that it is not very current. More recent occurrence data should be included if at all possible. Why was the Cecarelli, 2018 dataset not used? More detail on the data is also needed (see reviewer 2).
- pseudo absence points – comments of reviewer 1 need to be fully addressed, including the lack of absence points in the validation dataset.
- validation set – data from only one species was used (see reviewer 1) – again, the rationale for this needs to be given, and I would prefer to see spatially stratified model selection and cross-validation used (e.g. spatial block bootstrap).
- model choice and settings used – these need to be justified and sensitivity analyses undertaken (see reviewer 1). In general, more detail of the modelling (including the ensemble approach) is needed – see both reviews.
Reviewer #1:
Overall, an interesting topic with some relevant approaches applied. Currently some serious lack of detail and rigour in the modelling approach that prevents this from being a valuable addition to the literature and may not be feasible to address in reasonable timescales. This makes the results and their significance difficult to interpret.
Essential revisions:
The occurrence data from these models come from a single source published in 1998 (Carcavallo et al., 1998). Surely there must be more up-to-date data on occurrence of these species? Particularly with the advent of services like GBIF (which the authors cite for one species). The robustness of these maps could be substantially improved if more modern data were included.
The "validation set" is comprised of data from a literature review for Triatoma rubrofasciata and appears to cover a more modern time period (citations dated 2006-2009). Why was this only done for one species? Doing a prospective evaluation of the ENM is certainly one approach, but the limitations of this validation approach should be explored, e.g. confirmation bias (are people just doing surveys in areas where the atlas indicates presence?), important changes in the distribution over time, etc, etc.
Pseudo-absence and lack of absence data in validation set. The choice of random pseudo absence generation when combined with non-systematically sampled occurrence data is problematic for both accuracy metrics and overprediction and has been discussed at length (e.g. Chefaoui and Lobo, 2008) – effectively it means you map surveillance effort not occurrence of the species. I don't think random pseudo absence data is a suitable choice for this approach given the variable surveillance effort. Also including Arctic and Antarctic areas and generally areas that are a long way away from presence points is a good way to artificially boost your AUC – is anyone really hypothesising that these species can spread to these regions? The lack of absence data in the "validation set" is also problematic and leads the models to prioritise sensitivity over specificity. Arguably this should be the other way around as the primary use for the maps is to target surveillance to areas where importation may be a problem. The authors should consider a more nuanced approach to absence data. Could occurrence points for other species be indicative of surveillance effort?
Subsection “Species distribution modelling”, "All algorithms were run with default settings" – these are a complex set of methods with a large number of tuneable hyperparameters. I'm not sure it is a fair comparison to just leave them with default settings, nor is it a good way to optimise fit. I'd like to see a clear rationale for why these classes of methods were chosen, relevant choices for hyperparameters and ideally some experiments to validate these choices.
Reviewer #2:
The manuscript reports a niche modelling predicting the global climatic suitability of eleven triatomine species (competent Trypanosoma cruzi vector).
The topic and the results of this research are very relevant, novel and with important public health implications for a global vector surveillance effort, especially regarding Triatoma rubrofasciata. The paper also provides a comprehensive discussion. However, the manuscript lacks some methodological details to help the reader understand how the analysis was conducted and which are the limitations and implications of both the methodological approach and therefore the results.
Occurrence records:
The section describing the data should be extended. I acknowledge the authors have used a dataset of 4155 unique points, most of them already collated by other authors (Fergnani et al., 2013) who in turn extracted the data from another publication (Carcavallo et al., 1998). But some basic information is needed in order to assess what the dataset encompasses.
Was there any quality control used for data extraction?
Are there any concerns about biases in data collection?
Did the data undertake any time standardisation?
What are the potentials concerns of not having occurrence data beyond 1999 for the American triatomine species?
It would be important to have a figure showing the distribution of the data points per species, even if it is just on the Supplementary materials.
Very important, some of the authors from the main source of information (Fergnani et al., 2013), published a data paper (Ceccarelli et al., 2018) with 21815 georeferenced triatomine records updated until 2017. What are the implications of using (or not using) a more updated dataset like this one?
Results section:
I acknowledge this is a prediction effort at a global scale, but I found hard to understand how the model predicts about 70% of climatic suitability for some species across very large areas that include the highlands (above 2500 MAMSL) in South America (i.e. R. prolixus or P. geniculatus in Bogota). This even considering the model does not include a climate change scenario. Also, when compared to previous publications on climatic suitability in the Americas I found concerning differences for some species such as R. prolixus in Colombia (Parra-Henao et al., 2016) or P. megistus in Brazil (Gurgel-Gonçalves et al., 2012).
From the maps it seems this work tends to predict a much larger distribution of some of these species in the Americas than previous works did. Not having high resolution maps and the absence of country boundaries makes it harder to tell about potential problems in the predictions at a smaller scale.
It is interesting to see that although for some species the number of records is very scarce (i.e. Rhodnius ecuadoriensis n = 31) the AUC values are still very high. There is no mention about the limitations regarding the data on the Discussion section.
On Figure 1A 'consensus model' is mentioned. But the basic details about this model have not been mentioned on the Materials and methods section.
Discussion section:
The authors mention "we were able to divide the considered species roughly into three groups dependent on their climatic habitat preferences". I did not find clearly which are those three groups and which were the methods to identify them.
To put this work into context, it would be important to include a discussion point about the highly effective vector control and other factors (i.e. housing conditions) that would potentially determine environmental suitability, beyond the climatic suitability.
[Editors’ note: further revisions were suggested prior to acceptance, as described below.]
Thank you for submitting your article "Modelling the climatic suitability of Chagas disease vectors on a global scale" for consideration by eLife. Your article has been reviewed by Anna Akhmanova as the Senior Editor, a Reviewing Editor, and two reviewers. The reviewers have opted to remain anonymous.
The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission. In recognition of the fact that revisions may take longer than the two months we typically allow, until the research enterprise restarts in full, we will give authors as much time as they need to submit revised manuscripts.
The reviewers agreed that the paper has been substantially improved, but also identified some remaining points that need to be addressed. No collection of new data will be needed to address reviewer comments.
Essential revisions:
1) Please add additional detail about the data used, how they were extracted, curated and filtered prior to analysis to the Materials and methods section of the manuscript.
2) Running the models with default parameter values only – both reviewers felt this point was insufficiently addressed. Please conduct further dataset-specific analyses to support your choice of model parameters.
3) Please review the manuscript figures to make it clearer how well the model prediction matches the data and be more explicit how uncertainty was calculated and represented and include this in the main text when discussing findings.
Please see below the individual comments from each reviewer for a more detailed explanation of issues related to each of the above points. All reviewer points will need to be addressed point-by-point in your revised submission.
Reviewer #1:
I'd like to thank the authors for their detailed responses and additions to this work in regards to the majority of my points raised. I think all but one of these have now been adequately addressed. On point 6 [running models with default parameters only] – I don't think this particular comment has been addressed. Suggesting that such parameters have been "optimised by the biomod2 development team" is not realistic given the breadth of problems that these algorithms are applied to. To take one example, in the documentation for GAMs in the "mgcv" package (that biomod2 calls) there is extensive advice on basis dimension choice for smooths and the explicit statement "The choice of the basis dimension (k in the s, te, ti and t2 terms) is something that should be considered carefully" and a range of model diagnostic statistics and plots are suggested to tune such parameters. This is one example of many and, as a reader, I do not have great confidence in the work if some of these model flexibility parameters are not at least explored. What makes the issue worse is that a reader currently has no way of diagnosing what impact this oversight might have as there are no model coefficients or effects plots presented in the manuscript. I appreciate that this is a common oversight in many ML modelling applications, but even a basic sensitivity analysis would be a big improvement over using the default values.
Reviewer #2:
I acknowledge the authors have made substantial improvements to the original version of the manuscript following the reviewers' recommendations. The modifications imply a remarkable change on the original predicted distributions. However, some considerations in terms of the methodology and the presentation of the results remain.
About the data:
My main worry is that the methods section remains limited in the details and particularly in terms of the data that has been used, which makes very difficult to understand all the work that has been done. I suggest the authors consider adding a sub-section on the Materials and methods section dedicated exclusively to explain where the data come from.
For example, the authors mention as data source the "Atlas of Chagas disease vectors in the Americas (Carcavallo et al., 1998) which were digitised at a 0.1o x 0.1o resolution by Fergnani et al. (2013)". What does exactly "digitised" mean? Is it Fergnani already a modelling work on the Atlas data? What is the difference between Carcavallo and Fergnani data? This becomes even more important as Carcavallo is a book with restricted access so that it is difficult to trace the original source.
This is further confusing later when the authors cite Supplementary file 4 as the occurrence data, citing Carcavallo and not Fergnani.
In subsection “Occurrence data” they mention that "In total, 4155 unique occurrence points were collected ranging from 31 for Rhodnius ecuadoriensis to 1180 for Panstrongylus geniculatus (Table 1)." Were these points collected by the authors? This is somehow contradictory to the use of already collected data from Carcavallo/Fergnani.
Further on the same topic, the authors mention on their reply to the reviewers that they have "We carefully compared both datasets and plotted them in ArcGIS. It turned out that the Ceccarelli as well as the GBIF occurrence records are completely covered by the Atlas data". This should be explicitly mentioned in the Materials and methods section and the comparison map added as supplementary information.
Also, the authors mention (subsection “Occurrence data”) that "Additional global occurrences of Triatoma rubrofasciata from an intensive literature search were used". However, in the Materials and methods section there is not mention to the details of the review process followed to obtain such data (which databases, which quality control, which languages, which temporal filter they have used, etc). If the data for Triatoma rubrofasciata is used as data points, how different is the methodology for this species compared to the other species?
About the statistical methods:
In subsection “Species distribution modelling”, the authors mention "All algorithms were run with default settings except for MAXENT, GLM and GBM." In response to a reviewer's comment about what those default setting imply, the authors mention that "We have carefully examined the different parameters and changed the information criteria for the stepwise selection procedure in GLM to 'Akaike Information Criteria (AIC)' and the number of terminal nodes in GBM to 6 as it is recommended by Friedman (2002)". I believe the reasoning behind the "default settings" has not been clarified yet.
About the Results section and Discussion section:
In Supplementary file 4 there is not needed to show the background colours but simply the distribution of the data. The background does not really help to see the data.
Could you please explain why in the Global validation it was possible to estimate sensitivity but not specificity for T. rubrofasciata?
In the Results section it is mentioned several times some agreements and disagreements between the model and the data for various species. For example, in the Discussion section "the models appear to slightly overestimate the potential distribution as it could be noted in the modelling of T. dimidiate". However, it is actually hard for the reader to note exactly where these potential overestimates are occurring. It will be great if you can have a figure (even if it is a set of figures in Supplementary file) where you show both the model predictions with the occurrence data on top so the reader can judge and understand where the model is fitting well and not that well, as you have done for T. rubrofasciata on Figure 2.
In Figure 1 (and also Figure 2) it is mentioned that "Hatched areas indicate regions where the projection is uncertain". There are two problems with this uncertainty:
- The size of the panels makes the figures so small that it is impossible to actually see the hatched areas.
- What does it mean "uncertain"? It should be clearly explained in the Materials and methods section how such uncertainty was estimated. Is there a metric for such uncertainty?
These problems with showing uncertainty in both Figure 1 and Figure 2 could be solved by having other similar figures exclusively for uncertainty.
To avoid confusion, I encourage authors to use a more cautious language when referring to climate suitability rather than actual presence of a particular species. For example in subsection “Potential distribution under current climate conditions” they mention "T. brasiliensis prefers dry and wet savannah climate as found in eastern Brazil and southern West Africa, northern and southern Central Africa and East Africa". But, in reality T. brasiliensis hasn't ever been found in Africa.
https://doi.org/10.7554/eLife.52072.sa1Author response
Essential revisions:
Refer to the full reviews for details but the following are the most critical issues:
- the source data – both reviewers comment that it is not very current. More recent occurrence data should be included if at all possible. Why was the Cecarelli, 2018 dataset not used? More detail on the data is also needed (see reviewer 2).
- pseudo absence points – comments of reviewer 1 need to be fully addressed, including the lack of absence points in the validation dataset.
-validation set – data from only one species was used (see reviewer 1) – again, the rationale for this needs to be given, and I would prefer to see spatially stratified model selection and cross-validation used (e.g. spatial block bootstrap).
- model choice and settings used – these need to be justified and sensitivity analyses undertaken (see reviewer 1). In general, more detail of the modelling (including the ensemble approach) is needed – see both reviews.
Before answering point by point, we would like to briefly summarise our answers to the main issues:
A) Occurrence data used for modelling
There are generally two potential datasets that can be used as occurrence records and possess different advantages and drawbacks; Carcavallo et al. (1998), which were digitised at a 0.1° x 0.1° resolution by Fergnani et al. (2013) and used in our first version, and point data from e.g. Ceccarelli et al. (2018) and GBIF (Global Biodiversity Information Facility). We carefully compared both datasets and plotted them in ArcGIS. It turned out that the Ceccarelli as well as the GBIF occurrence records are completely covered by the Atlas data. We therefore do not assume that the Atlas data underestimate the distribution of the vectors.
Furthermore, we decided to use the Atlas data for the following reasons:
- the Atlas data match the time period of the climatic conditions used as predictor variables (WorldClim version 2, 1970-2000).
- point data such as GBIF data reflect the sampling and surveillance effort in a considered area, while the Atlas data are probably less effected by sampling bias.
- it also follows that the selected pseudo-absences are more reliable.
B) Background selection
We agree that the random background selection is not appropriate for all applied algorithms. We thus reran the analysis and now differentiate between presence-background models (MAXENT) and presence-absence models (ANN, GAM, GBM, GLM and MARS). During modelling with MAXENT, we still selected the background randomly as it is required. During modelling with the presence-absence algorithms, we selected the background (to be interpreted as pseudo-absences) using two different stratified sampling methods and compared the results. In the end, we decided to choose a background selection with geographical filtration (‘disk’) and not environmental filtration (‘sre’), because the latter tends to overestimate the distribution of the species (Figure 1).
C) Model validation
Triatoma rubrofasciata is the only triatomine species with recorded occurrences outside the Americas and in Latin America. Therefore, in our first version, we used this species to independently evaluate the global projection of the models comprising a sensitivity analysis. In the revised version, we also applied a cross-validation by splitting the dataset in a training set (70%) and a test set (30%).
D) More details on Materials and methods
We carefully reworked the Material and methods section, added more details to this part and expanded on some points in the Discussion section.
Reviewer #1:
Overall, an interesting topic with some relevant approaches applied. Currently some serious lack of detail and rigour in the modelling approach that prevents this from being a valuable addition to the literature and may not be feasible to address in reasonable timescales. This makes the results and their significance difficult to interpret.
Essential revisions:
The occurrence data from these models come from a single source published in 1998 (Carcavallo et al., 1998). Surely there must be more up-to-date data on occurrence of these species? Particularly with the advent of services like GBIF (which the authors cite for one species). The robustness of these maps could be substantially improved if more modern data were included.
We agree with reviewer 1 that data acquisition is quite often a challenge and at the same time the most crucial aspect for modelling. We are aware of the publication of recent occurrence data, in particular of the paper Ceccarelli et al. (2018) and the data from GBIF. We compared the recorded distribution of both datasets in ArcGIS and found that the distribution of the occurrence records from Ceccarelli as well as from GBIF are completely covered by the data of the Atlas (Carcavallo et al., 1998). This suggests that in recent years there has been no significant range expansion despite changes in climate. It could further indicate that the different species have stable distributions in South America.
Furthermore, the environmental variables from WorldClim comprise a time period from 1970 to 2000. Unfortunately, newer comprehensive climatic data are not available. Therefore, the Atlas data fit the explanatory climate variables much better than the newer GBIF and Ceccarelli distribution data. In addition, the comparison of the data, as already described, has not shown any noteworthy distribution shifts in recent years.
We would also like to point out that the Atlas data do not represent single point data, but have been digitized from maps in the Atlas (Carcavallo et al., 1998). This greatly mitigates a potential sampling or surveillance bias, which is evidently present in the Ceccarelli data as well as the GBIF data. This becomes clear, for example, in P.geniculatus, which is widespread in humid tropical forests to tropical dry forests and savannah in Colombia, Venezuela, Trinidad, Guyana, Surinam, French Guyana, Brazil, Ecuador, Peru, Bolivia, Paraguay, Uruguay, Argentina, Mexico, Guatemala, Nicaragua, Costa Rica and Panama (Patterson et al., 2009). The Ceccarelli and GBIF data cover many occurrence points in Colombia, French Guiana and in the state of Espírito Santo in Brazil, where a ten-year entomological surveillance program was applied (Leite et al., 2007). In the Amazon Basin, however, the availability of occurrence points is very scarce and all registered points are located along river branches. Furthermore, the GBIF as well as the Ceccarelli data underestimate the species distribution achieved, especially for Triatoma rubrofasciata. For example, the vector has been detected in the Caribbean and in Central and North America which is not reproduced in the GBIF nor the Ceccarelli data sets, but in the Atlas data sets.
Due to a lower sampling bias, the Atlas data are also more suitable for presence-absence (PA) models such as GLM, GBM, GAM, ANN and MARS since the calculated pseudo-absence points are more reliable.
The "validation set" is comprised of data from a literature review for Triatoma rubrofasciata and appears to cover a more modern time period (citations dated 2006-2009). Why was this only done for one species? Doing a prospective evaluation of the ENM is certainly one approach, but the limitations of this validation approach should be explored, e.g. confirmation bias (are people just doing surveys in areas where the atlas indicates presence?), important changes in the distribution over time, etc, etc.
Triatomarubrofasciata is the only triatomine species found both in the Americas and in the Old World and has already spread to various countries. Therefore, it is the only species suitable for independent model validation outside the Americas. Due to very limited data, occurrence points of T.rubrofasciata from 1958 to 2018 were collected and used for validation. It is true that this results in certain limitations that need to be addressed. We have adjusted our Discussion section accordingly.
Also, a manual sensitivity analysis of the T. rubrofasciata modelling results was performed. For this purpose, the modelled climatic habitat suitability of T.rubrofasciata was read from global occurrence points and a cut-off of 50.95% was applied. This cut-off was obtained from the calculation of the ROC. The number of ‘true positives’ in the total amount of presence data was determined. This resulted in a sensitivity of 66.6%.
Additionally, we applied a cross-validation of the South American training data set.
Pseudo-absence and lack of absence data in validation set. The choice of random pseudo absence generation when combined with non-systematically sampled occurrence data is problematic for both accuracy metrics and overprediction and has been discussed at length (e.g. Chefaoui and Lobo, 2008) – effectively it means you map surveillance effort not occurrence of the species. I don't think random pseudo absence data is a suitable choice for this approach given the variable surveillance effort.
We thank reviewer 1 for this valuable advice and fully implemented it. A stratified background selection is strongly required when using presence-absence models but a random selection is appropriate when using presence-background models. We account for this in the revised analysis and used a stratified background sampling (with two different strategies: ‘disk’ and ‘sre’, for details see below) for the presence-absence models (GLM, GBM, GAM, ANN and MARS) but retained the random background selection for MAXENT.
We used both the pseudo-absence selection option ‘sre’ and the option ‘disk’ and compared them to one another. For the option ‘sre’ a surface range envelope model is first carried out (using a specified quantile 2.5%) on the species of interest, and then the pseudo-absence data are extracted outside of this envelope broadly comprising the environmental conditions for the species. This avoids absence selection in the same niche as the niche of the considered species. For the option ‘disk’, a minimal distance to presence points (80km, diagonal between two occurrence points) for selecting pseudo-absences candidates is defined. This avoids absence selection close to observed presences, and thus with similar climatic conditions. In the end, we opted for the absence selection with parameter ‘disk’ (Materials and methods section), because models with an absence selection with a surface range envelope tend to project very large areas as climatic suitable for the considered species – a circumstance that we could also observe in part in our models (Figure 1).
MAXENT is a presence-background modelling tool applying the principle of maximum entropy in such a way that the estimated species distribution deviates from a uniform distribution as little as it is necessary to explain the observations. Therefore, ‘random’ background selection is still required (nicely explained in Elith et al., 2011). Finally, we built the consensus maps from all six models. It is probably advisable that ensemble models in which presence-absence and presence-background models are used simultaneously, a nuanced pseudo-absence selection approach should always be used. Unfortunately, assembling all models requires a lot of manual work.
Also including Arctic and Antarctic areas and generally areas that are a long way away from presence points is a good way to artificially boost your AUC – is anyone really hypothesising that these species can spread to these regions?
We agree with the reviewer that an AUC validation applied to a broad range of environmental conditions can artificially improve the AUC outcome. However, the AUC values we provide are based only on the South American data (the training set) and do not include these regions. Only the projections of the potential distributions were carried out at a global scale. Nevertheless, we have now removed the Antarctic region for the depiction of the projection results (Figure 1) and tried to describe the process of modelling more clearly (Materials and methods section).
The lack of absence data in the "validation set" is also problematic and leads the models to prioritise sensitivity over specificity. Arguably this should be the other way around as the primary use for the maps is to target surveillance to areas where importation may be a problem. The authors should consider a more nuanced approach to absence data. Could occurrence points for other species be indicative of surveillance effort?
We followed the reviewer’s suggestion and adapted the validation of the models. It was carried out by splitting the original Latin American dataset into a subset used for calibrating the models (70% of the dataset) and a second subset to evaluate them (30%, cross-validation) (Materials and methods section).
For the manual global validation with T.rubrofasciata, we cannot provide meaningful absence data due to potential dispersal of the species.
The distribution of all South American GBIF data could be used to map the surveillance effort and to incorporate this into the modelling as a correction factor. This is a very interesting method, but not feasible here, since the Atlas data are not point data.
Subsection “Species distribution modelling”, "All algorithms were run with default settings" – these are a complex set of methods with a large number of tuneable hyperparameters. I'm not sure it is a fair comparison to just leave them with default settings, nor is it a good way to optimise fit. I'd like to see a clear rationale for why these classes of methods were chosen, relevant choices for hyperparameters and ideally some experiments to validate these choices.
We have carefully examined the different parameters and changed the information criteria for the stepwise selection procedure in GLM to ‘Akaike Information Criteria (AIC)’ and the number of terminal nodes in GBM to 6 as it is recommended by Friedman, (2002) (Materials and methods section). The other settings in biomod2 have already been optimized by the biomod2 development team and are not consistent with the default settings of the individual algorithms. However, if you have specific suggestions, we are happy to implement them.
Reviewer #2:
The manuscript reports a niche modelling predicting the global climatic suitability of eleven triatomine species (competent Trypanosoma cruzi vector).
The topic and the results of this research are very relevant, novel and with important public health implications for a global vector surveillance effort, especially regarding Triatoma rubrofasciata. The paper also provides a comprehensive discussion. However, the manuscript lacks some methodological details to help the reader understand how the analysis was conducted and which are the limitations and implications of both the methodological approach and therefore the results.
Occurrence records:
The section describing the data should be extended. I acknowledge the authors have used a dataset of 4155 unique points, most of them already collated by other authors (Fergnani et al., 2013) who in turn extracted the data from another publication (Carcavallo et al., 1998). But some basic information is needed in order to assess what the dataset encompasses.
Was there any quality control used for data extraction?
We agree with the reviewer that a detailed quality control would be appropriate if the distribution data were point data from different sources, such as GBIF. Accordingly, it would be reasonable to sort out points that are close to one another, uncertain data, very old or new data that do not fit the explanatory climate variables. However, this is not useful for the Atlas data, as these are digitised distribution maps. For further details regarding the distribution data see point A ‘Occurrence data used for modelling’.
Are there any concerns about biases in data collection?
A sampling bias should not be very pronounced due to the digitised distribution maps and the lack of point occurrences from different sources.
Did the data undertake any time standardisation?
As part of the revision, we dealt a lot with the comparison of Atlas data and point data from GBIF and the literature and have considered this point in more depth. The Atlas data from 1998 fit well with the period in which the climate data (1970-2000) were recorded.
What are the potentials concerns of not having occurrence data beyond 1999 for the American triatomine species?
Since we train the models with climate data from 1970 to 2000, it is advisable not to use any occurrence data after 2000. It is possible and very likely that the climate has changed since then and that the modelled habitat suitability would look slightly different under present conditions. Particularly in the course of climate change, the models might underestimate the current distribution of thermophilic species. We included this point in the discussion (Discussion section). All in all, it would be good to have more recent and up-to-date climatic data, but these have not (yet) been published.
It would be important to have a figure showing the distribution of the data points per species, even if it is just on the Supplementary materials.
Done as suggested. We included distribution maps in the supplementary materials.
Very important, some of the authors from the main source of information (Fergnani et al., 2013), published a data paper (Ceccarelli et al., 2018) with 21815 georeferenced triatomine records updated until 2017. What are the implications of using (or not using) a more updated dataset like this one?
The issue was also mentioned by reviewer 1 and we combined our reply. For detailed explanation, please refer to point A “Occurrence data used for modelling”.
Results section:
I acknowledge this is a prediction effort at a global scale, but I found hard to understand how the model predicts about 70% of climatic suitability for some species across very large areas that include the highlands (above 2500 MAMSL) in South America (i.e. R. prolixus or P. geniculatus in Bogota). This even considering the model does not include a climate change scenario. Also, when compared to previous publications on climatic suitability in the Americas I found concerning differences for some species such as R prolixus in Colombia (Parra-Henao et al., 2016) or P. megistus in Brazil (Gurgel-Gonçalves et al., 2012).
From the maps it seems this work tends to predict a much larger distribution of some of these species in the Americas than previous works did. Not having high resolution maps and the absence of country boundaries makes it harder to tell about potential problems in the predictions at a smaller scale.
This could be due to the coarse resolution of the used occurrence data (rasterised polygon data). By applying the changed pseudo-absence selection, the modelled potential distribution has decreased. The algorithms work now more discriminatory, especially in higher altitudes. This can be seen, for example, in the different models for Rhodnius prolixus and Triatoma dimidiata (Figure 1). Furthermore, this work strives to indicate areas at risk on a global scale where triatomine species find climatically suitable environments. Therefore, we tried to use the algorithms not too restrictively.
High resolution maps (500dpi) are now available in the supplementary materials and administrative borders are included in the maps.
It is interesting to see that although for some species the number of records is very scarce (i.e. Rhodnius ecuadoriensis n = 31) the AUC values are still very high. There is no mention about the limitations regarding the data on the Discussion section.
This is not surprising, on the contrary, because of the few occurrence points, the AUC value is high for this species. The AUC value depends on the prevalence and, therefore, cannot be compared between species, but can only be used to compare the discriminatory quality of different models for a single species.
On Figure 1A 'consensus model' is mentioned. But the basic details about this model have not been mentioned on the Materials and methods section.
‘Consensus model’ refers to the ‘consensus maps’ of the ensemble forecasting described in the Materials and methods section. We harmonised our wording so that this becomes clearer.
Discussion section:
The authors mention "we were able to divide the considered species roughly into three groups dependent on their climatic habitat preferences". I did not find clearly which are those three groups and which were the methods to identify them.
This was a rough classification based on modelled habitat preferences as it is described in the Results section. We acknowledge that this apportionment into groups might be misleading and therefore opted for its removal (Results section; Discussion section).
To put this work into context, it would be important to include a discussion point about the highly effective vector control and other factors (i.e. housing conditions) that would potentially determine environmental suitability, beyond the climatic suitability.
A good point which we missed to mention in the first version. We included information about vector control measurements (Discussion section).
[Editors’ note: further revisions were suggested prior to acceptance, as described below.]
Essential revisions:
1) Please add additional detail about the data used, how they were extracted, curated and filtered prior to analysis to the Materials and methods section of the manuscript.
We have completely revised the subsection “Data collection” and tried to explain the processing of the occurrence data more precisely. We also go into more detail about the compilation of the Triatoma rubrofasciata dataset.
2) Running the models with default parameter values only – both reviewers felt this point was insufficiently addressed. Please conduct further dataset-specific analyses to support your choice of model parameters.
A tuning step of the individual algorithm parameters was conducted using the BIOMOD_tuning function of the R package biomod2. The optimised parameters were then used to perform a new modelling analysis.
3) Please review the manuscript figures to make it clearer how well the model prediction matches the data and be more explicit how uncertainty was calculated and represented and include this in the main text when discussing findings.
We have added several supplementary figures, which compare the modelling results with the occurrence of the species as well as the different datasets. An additional figure of the modelling uncertainty has also been attached and addressed in the main text.
Reviewer #1:
I'd like to thank the authors for their detailed responses and additions to this work in regards to the majority of my points raised. I think all but one of these have now been adequately addressed. On point 6 [running models with default parameters only] – I don't think this particular comment has been addressed. Suggesting that such parameters have been "optimised by the biomod2 development team" is not realistic given the breadth of problems that these algorithms are applied to. To take one example, in the documentation for GAMs in the "mgcv" package (that biomod2 calls) there is extensive advice on basis dimension choice for smooths and the explicit statement "The choice of the basis dimension (k in the s, te, ti and t2 terms) is something that should be considered carefully" and a range of model diagnostic statistics and plots are suggested to tune such parameters. This is one example of many and, as a reader, I do not have great confidence in the work if some of these model flexibility parameters are not at least explored. What makes the issue worse is that a reader currently has no way of diagnosing what impact this oversight might have as there are no model coefficients or effects plots presented in the manuscript. I appreciate that this is a common oversight in many ML modelling applications, but even a basic sensitivity analysis would be a big improvement over using the default values.
Based on the valuable suggestions of both reviewers, we decided to rerun the modelling and completely revise the selection of the algorithm parameters. For this purpose, we used the internal function BIOMOD_tuning from the biomod2 package. This function was designed to tune the biomod single models parameters based on the optimisation of the AUC values and returns a ModelingOptions object which can be used for modelling. The following optimised parameters where then used to perform a new modelling approach: (1) a stepwise feature selection with quadratic terms based on the Akaike Information Criterion (AIC) was used to generate the generalised linear models (GLM); (2) generalised boosted models (GBM) were run with a maximum of 5 000 trees to ensure fitting, a minimum number of observations in trees’ terminal nodes of 10 as we have large training datasets, a learning rate of 0.01 and a interaction depth (maximum number of nodes per tree) of 7; (3) for generalised additive models (GAM) a binomial distribution and logit link function was applied, the initial degrees of smoothing was set to 4; (4) the minimum interaction degree of the multivariate adaptive regression splines (MARS) was set to 2 with the number of terms to retain in the final model set to 17; (5) artificial neuronal networks (ANN) were produced with fivefold cross-validation resulting in 8 units in the hidden layer and a weight decay of 0.001; for the maximum entropy approach (MAXENT) we used linear, quadratic and product features and deactivated threshold and hinge features, the number of iterations was increased to 10 000 (Materials and methods section).
The use of the adjusted parameters did not result in major changes in the modelling output (see Figure 1). It can therefore be assumed that an ensemble forecasting approach with several algorithms produces results that are rather robust against changes in the single algorithm parameters. However, we opted to include the new modelling results in the manuscript since the AUC values were improved by the parameter optimisation indicating a better predictive model performance.
Reviewer #2:
I acknowledge the authors have made substantial improvements to the original version of the manuscript following the reviewers' recommendations. The modifications imply a remarkable change on the original predicted distributions. However, some considerations in terms of the methodology and the presentation of the results remain:
About the data:
My main worry is that the Materials and methods section remains limited in the details and particularly in terms of the data that has been used, which makes very difficult to understand all the work that has been done. I suggest the authors consider adding a sub-section on the Materials and methods section dedicated exclusively to explain where the data come from.
For example, the authors mention as data source the " the 'Atlas of Chagas disease vectors in the Americas' (Carcavallo et al., 1998) which were digitised at a 0.1o x 0.1o resolution by Fergnani et al. (2013)". What does exactly "digitised" mean? Is it Fergnani already a modelling work on the Atlas data? What is the difference between Carcavallo and Fergnani data? This becomes even more important as Carcavallo is a book with restricted access so that it is difficult to trace the original source.
This is further confusing later when the authors cite Supplementary file 4 as the occurrence data citing Carcavallo and not Fergnani.
In subsection “Occurrence data” they mention that "In total, 4155 unique occurrence points were collected ranging from 31 for Rhodnius ecuadoriensis to 1180 for Panstrongylus geniculatus (Table 1)." Were these points collected by who, by the authors? This is somehow contradictory to the use of already collected data from Carcavallo/Fergnani.
We have completely revised the subsection “Occurrence data” and explain the collection and processing of the data in more detail. In particular, we discuss the conversion of the distribution maps from the Atlas into point data by Fergnani et al. (2013) (Materials and methods section):
“Occurrence data of the triatomine species were obtained from data provided by Fergnani et al. (2013). […] With the help of the map projection, occurrence points with coordinates were created (Fergnani et al., 2013).”
Furthermore, since both Carcavallo et al. (1998) and Fergnani et al. (2013) are equally relevant for providing the occurrence data, we have decided to always cite both sources.
Further on the same topic, the authors mention on their reply to the reviewers that they have "We carefully compared both datasets and plotted them in ArcGIS. It turned out that the Ceccarelli as well as the GBIF occurrence records are completely covered by the Atlas data". This should be explicitly mentioned in the Materials and methods section and the comparison map added as supplementary information.
Done as suggested. We explicitly mentioned the comparison of the Atlas data and the Ceccarelli and GBIF data in the Materials and methods section, explained the choice of the Atlas dataset (Materials and methods section) and added a set of figures (Supplementary file 6) showing occurrence data from the ‘Atlas of Chagas disease vectors in the Americas’ (Carcavallo et al., 1998), occurrence data from Ceccarelli et al. (2018) and occurrence data from GBIF.org (2019).
Also, the authors mention (subsection “Occurrence data”) that "Additional global occurrences of Triatoma rubrofasciata from an intensive literature search were used". However, in the Materials and methods section there is not mention to the details of the review process followed to obtain such data (which databases, which quality control, which languages, which temporal filter they have used, etc). If the data for Triatoma rubrofasciata is used as data points, how different is the methodology for this species compared to the other species?
The occurrence data for the modelling approach of all species were solely obtained from the ‘Atlas of Chagas disease’ (Carcavallo et al., 1998) and more specifically the publication by Fergnani et al. (2013) which covers only the Americas. Individual records from other sources were not included.
The independent global model validation was only feasible with T.rubrofasciata as it is the only triatomine species with known occurrence data in the Americas and outside the Americas. Therefore, additional sources had to be used as the Atlas provides no records outside of the Americas.
However, we now define both datasets in more detail and describe the collection of the occurrence records of T.rubrofasciata (Materials and methods section).
About the statistical methods:
In subsection “Species distribution modelling”, the authors mention "All algorithms were run with default settings except for MAXENT, GLM and GBM.". In response to a reviewer's comment about what those default setting imply, the authors mention that "We have carefully examined the different parameters and changed the information criteria for the stepwise selection procedure in GLM to 'Akaike Information Criteria (AIC)' and the number of terminal nodes in GBM to 6 as it is recommended by Friedman (2002)". I believe the reasoning behind the "default settings" has not been clarified yet.
We tuned the parameters of the algorithms and reran the models again, with no major changes in the modelled climatic suitability results. However, the AUC values were slightly improved and we decided to include the new modelling results in the manuscript. For a more detailed description, please refer to point 1.
About the Results section and Discussion section:
In Supplementary file 4 there is not needed to show the background colours but simply the distribution of the data. The background does not really help to see the data.
Done as suggested. The background colours of Supplementary file 5 (former Supplementary file 4) have been removed.
Could you please explain why in the Global validation it was possible to estimate sensitivity but not specificity for T. rubrofasciata?
The sensitivity measures the percentage of true positives which means in this case congruence of projected and actual occurrence. The specificity measures the percentage of the correctly identified (true) negatives. However, since it is not possible to determine actual ‘climatic’ absences for a species that is currently spreading to non-endemic areas, the specificity cannot be calculated here.
In the Results section it is mentioned several times some agreements and disagreements between the model and the data for various species. For example, in the Discussion section "the models appear to slightly overestimate the potential distribution as it could be noted in the modelling of T. dimidiate". However, it is actually hard for the reader to note exactly where these potential overestimates are occurring. It will be great if you can have a figure (even if it is a set of figures in Supplementary file) where you show both the model predictions with the occurrence data on top so the reader can judge and understand where the model is fitting well and not that well, as you have done for T. rubrofasciata on Figure 2.
A set of figures was added (Supplementary file 4) depicting the modelling results as well as the occurrence of each species in South America obtained from the ‘Atlas of Chagas disease vectors in the Americas’ (Ceccarelli et al., 1998).
In Figure 1 (and also Figure 2) it is mentioned that "Hatched areas indicate regions where the projection is uncertain". There are two problems with this uncertainty:
- The size of the panels makes the figures so small that it is impossible to actually see the hatched areas.
- What does it mean "uncertain"? It should be clearly explained in the Materials and methods how such uncertainty was estimated. Is there a metric for such uncertainty?
These problems with showing uncertainty in both Figure 1 and Figure 2 could be solved by having other similar figures exclusively for uncertainty.
A short description of the clamping mask (indicated as hatched areas in the figures) was given in the Materials and methods section. However, we have expanded on the explanation and tried to use a clearer vocabulary (Materials and methods section). Additionally, a figure has been added (Supplementary file 7) depicting the areas where the environmental variables are outside their trainings range.
To avoid confusion, I encourage authors to use a more cautious language when referring to climate suitability rather than actual presence of a particular species. For example in subsection “Potential distribution under current climate conditions” they mention "T. brasiliensis prefers dry and wet savannah climate as found in eastern Brazil and southern West Africa, northern and southern Central Africa and East Africa". But, in reality T. brasiliensis hasn't ever been found in Africa.
Done as suggested. We carefully examined the manuscript for inaccuracies and improved them if necessary (e.g. Results section).
https://doi.org/10.7554/eLife.52072.sa2Article and author information
Author details
Funding
No external funding was received for this work.
Acknowledgements
We thank Nicholas J Tobias (Senckenberg Gesellschaft für Naturforschung, Frankfurt am Main) for kindly revising the manuscript.
Senior and Reviewing Editor
- Anna Akhmanova, Utrecht University, Netherlands
Reviewer
- Zulma Cucunubá
Publication history
- Received: September 20, 2019
- Accepted: May 5, 2020
- Accepted Manuscript published: May 6, 2020 (version 1)
- Version of Record published: May 19, 2020 (version 2)
Copyright
© 2020, Eberhard et al.
This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Metrics
-
- 2,085
- Page views
-
- 244
- Downloads
-
- 21
- Citations
Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.
Download links
Downloads (link to download the article as PDF)
Open citations (links to open the citations from this article in various online reference manager services)
Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)
Further reading
-
- Ecology
- Evolutionary Biology
Strong sexual selection frequently leads to sexual conflict and ensuing male harm, whereby males increase their reproductive success at the expense of harming females. Male harm is a widespread evolutionary phenomenon with a strong bearing on population viability. Thus, understanding how it unfolds in the wild is a current priority. Here, we sampled a wild Drosophila melanogaster population and studied male harm across the normal range of temperatures under which it reproduces optimally in nature by comparing female lifetime reproductive success and underlying male harm mechanisms under monogamy (i.e. low male competition/harm) vs. polyandry (i.e. high male competition/harm). While females had equal lifetime reproductive success across temperatures under monogamy, polyandry resulted in a maximum decrease of female fitness at 24°C (35%), reducing its impact at both 20°C (22%), and 28°C (10%). Furthermore, female fitness components and pre- (i.e. harassment) and post-copulatory (i.e. ejaculate toxicity) mechanisms of male harm were asymmetrically affected by temperature. At 20°C, male harassment of females was reduced, and polyandry accelerated female actuarial aging. In contrast, the effect of mating on female receptivity (a component of ejaculate toxicity) was affected at 28°C, where the mating costs for females decreased and polyandry mostly resulted in accelerated reproductive aging. We thus show that, across a natural thermal range, sexual conflict processes and their effects on female fitness components are plastic and complex. As a result, the net effect of male harm on overall population viability is likely to be lower than previously surmised. We discuss how such plasticity may affect selection, adaptation and, ultimately, evolutionary rescue under a warming climate.
-
- Ecology
- Evolutionary Biology
Circadian clocks infer time of day by integrating information from cyclic environmental factors called zeitgebers, including light and temperature. Single zeitgebers entrain circadian rhythms, but few studies have addressed how multiple, simultaneous zeitgeber cycles interact to affect clock behavior. Misalignment between zeitgebers (‘sensory conflict’) can disrupt circadian rhythms, or alternatively clocks may privilege information from one zeitgeber over another. Here, we show that temperature cycles modulate circadian locomotor rhythms in Nematostella vectensis, a model system for cnidarian circadian biology. We conduct behavioral experiments across a comprehensive range of light and temperature cycles and find that Nematostella’s circadian behavior is disrupted by chronic misalignment between light and temperature, which involves disruption of the endogenous clock itself rather than a simple masking effect. Sensory conflict also disrupts the rhythmic transcriptome, with numerous genes losing rhythmic expression. However, many metabolic genes remained rhythmic and in-phase with temperature, and other genes even gained rhythmicity, implying that some rhythmic metabolic processes persist even when behavior is disrupted. Our results show that a cnidarian clock relies on information from light and temperature, rather than prioritizing one signal over the other. Although we identify limits to the clock’s ability to integrate conflicting sensory information, there is also a surprising robustness of behavioral and transcriptional rhythmicity.