The genomes of polyextremophilic cyanidiales contain 1% horizontally transferred genes with diverse adaptive functions

  1. Alessandro W Rossoni
  2. Dana C Price
  3. Mark Seger
  4. Dagmar Lyska
  5. Peter Lammers
  6. Debashish Bhattacharya
  7. Andreas PM Weber  Is a corresponding author
  1. Heinrich Heine University, Germany
  2. Rutgers University, United States
  3. Arizona State University, United States
34 figures, 6 tables and 1 additional file

Figures

Geographic origin and habitat description of the analyzed Cyanidiales strains.

Available reference genomes are marked with an asterisk (*), whereas ‘na’ indicates missing information.

https://doi.org/10.7554/eLife.45017.002
Species tree of the 13 analyzed extremophilic Cyanidiales genomes using mesophilic red (Porphyra umbilicalis, Porphyridium purpureum) and green algae (Ostreococcus tauri, Chlamydomonas reinhardtii) as outgroups.

IQTREE was used to infer a single maximum-likelihood phylogeny based on orthogroups containing single-copy representative proteins from at least 12 of the 17 taxa (13 Cyanidiales + 4 Others). Each orthogroup alignment represented one partition with unlinked models of protein evolution chosen by IQTREE. Consensus tree branch support was determined by 2000 rapid bootstraps. All nodes in this tree had 100% bootstrap support, and are therefore not shown. Divergence time estimates are taken from Yang et al. (2016). Similarity is derived from the average one-way best blast hit protein identity (minimum protein identity threshold = 30%). The minimal protein identity between two strains was 65.4%, measured between g. sulphuraria SAG21.92, which represent the second most distant sampling locations (12,350 km). Similar lineage boundaries were obtained for the C. merolae samples (66.4% protein identity), which are separated by only 1150 km.

https://doi.org/10.7554/eLife.45017.004
Differential gene expression of G. sulphuraria 074W.

(A) and C. merolae 10D (B), here measured as log fold change (logFC) vs transcription rate (logCPM). Differentially expressed genes are colored red (quasi-likelihood (QL) F-test, Benjamini-Hochberg, p <= 0.01). HGT candidates are shown as large circles. The blue dashes indicate the average logCPM of the dataset. Although HGT candidates are not significantly more or less expressed than native genes, they react significantly stronger to temperature changes in G. sulphuraria 074W (‘more red than black dots'). This is not the case in high CO2 treated C. merolae 10D.

https://doi.org/10.7554/eLife.45017.005
Comparative analysis of the 96 OGs potentially derived from HGT.

(A) OG count vs. the number of Cyanidiales species contained in an OG (=OG size). Only genes from the sequenced genomes were considered (13 species). A total of 60 OGs are exclusive to the Galdieria lineage (11 species), 23 OGs are exclusive to the Cyanidioschyzon lineage (two species), and 13 OGs are shared by both lineages. A total of 46/96 HGT events seem to be affected by later gene erosion/partial fixation. (B) OG-wise PID between HGT candidates vs. their potential non-eukaryotic donors. Point size represents the number of sequenced species contained in each OG. Because only two genomes of Cyanidioschyzon were sequenced, the maximum point size for this lineage is 2. The whiskers span minimum and maximum shared PID of each OG. The PID within Cyanidiales HGTs vs. PID between Cyanidiales HGTs and their potential non-eukaryotic donors is positively correlated (Kendall's tau coefficient, p=0.000747), showing evolutionary constraints that are gene function dependent, rather than time-dependent. (C) Density curve of average PID towards potential non-eukaryotic donors. The area under each curve is equal to 1. The average PID of HGT candidates found in both lineages (‘ancient HGT’, left dotted line) is ~5% lower than the average PID of HGT candidates exclusive to Galdieria or Cyandioschyzon (‘recent HGT’, right dotted lines). This difference is not significant (pairwise Wilcoxon rank-sum test, Benjamini-Hochberg, p>0.05). (D) Presence/Absence pattern (green/white) of Cyanidiales species in HGT OGs. Some patterns strictly follow the branching structure of the species tree. They represent either recent HGTs that affect a monophyletic subset of the Galdieria lineage, or are the last eukaryotic remnants of an ancient gene that was eroded through differential loss. In other cases, the presence/absence pattern of Galdieria species is random and conflicts with the Galdieria lineage phylogeny. HGT would assume either multiple independent acquisitions of the same HGT candidate, or a partial fixation of the HGT candidate in the lineage, while still allowing for gene erosion. According to DL, these are the last existing paralogs of an ancient gene, whose erosion within the eukaryotic kingdom is nearly complete.

https://doi.org/10.7554/eLife.45017.006
Figure 5 with 96 supplements
The analysis of OGs containing HGT candidates revealed different patterns of HGT acquisition.

Some OGs contain genes that are shared by all Cyanidiales, whereas others are unique to the Galdieria or Cyanidioschyzon lineage. In some cases, HGT appears to have replaced the eukaryotic genes in one lineage, whereas the other lineage maintained the eukaryotic ortholog. Here, some examples of OG phylogenies are shown, which were simplified for ease of presentation. The first letter of the tip labels indicates the kingdom. A = Archaea (yellow), B = Bacteria (blue), E = Eukaryota (green). Branches containing Cyanidiales sequences are highlited in red. (A) Example of an ancient HGT that occurred before Galdieria and Cyanidioschyzon split into separate lineages. As such, both lineages are monophyletic (e.g., OG0001476). (B) HGT candidates are unique to the Galdieria lineage (e.g. OG0001760). (C) HGT candidates are unique to the Cyanidioschyzon lineage (e.g. OG0005738). (D) Galdieria and Cyanidioschyzon HGT candidates are derived from different HGT events and share monophyly with different non-eukaryotic organisms (e.g., OG0003085). (E) Galdieria HGT candidates cluster with non-eukaryotes, whereas the Cyanidioschyzon lineage clusters with eukaryotes (e.g., OG0001542). (F) Cyanidioschyzon HGT candidates cluster with non-eukaryotes, whereas the Galdieria lineage clusters with eukaryotes (e.g., OG0006136).

https://doi.org/10.7554/eLife.45017.007
Figure 5—figure supplement 1
Sequence tree of orthogroup OG0001476.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.008
Figure 5—figure supplement 2
Sequence tree of orthogroup OG0001486.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.009
Figure 5—figure supplement 3
Sequence tree of orthogroup OG0001509.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.010
Figure 5—figure supplement 4
Sequence tree of orthogroup OG0001513.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.011
Figure 5—figure supplement 5
Sequence tree of orthogroup OG0001542.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.012
Figure 5—figure supplement 6
Sequence tree of orthogroup OG0001613.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.013
Figure 5—figure supplement 7
Sequence tree of orthogroup OG0001658.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.014
Figure 5—figure supplement 8
Sequence tree of orthogroup OG0001760.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.015
Figure 5—figure supplement 9
Sequence tree of orthogroup OG0001807.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.016
Figure 5—figure supplement 10
Sequence tree of orthogroup OG0001810.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.017
Figure 5—figure supplement 11
Sequence tree of orthogroup OG0001929.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.018
Figure 5—figure supplement 12
Sequence tree of orthogroup OG0001938.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.019
Figure 5—figure supplement 13
Sequence tree of orthogroup OG0001955.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.020
Figure 5—figure supplement 14
Sequence tree of orthogroup OG0001976.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.021
Figure 5—figure supplement 15
Sequence tree of orthogroup OG0001994.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.022
Figure 5—figure supplement 16
Sequence tree of orthogroup OG0002036.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.023
Figure 5—figure supplement 17
Sequence tree of orthogroup OG0002051.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.024
Figure 5—figure supplement 18
Sequence tree of orthogroup OG0002191.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.025
Figure 5—figure supplement 19
Sequence tree of orthogroup OG0002305.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.026
Figure 5—figure supplement 20
Sequence tree of orthogroup OG0002337.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.027
Figure 5—figure supplement 21
Sequence tree of orthogroup OG0002431.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.028
Figure 5—figure supplement 22
Sequence tree of orthogroup OG0002483.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.029
Figure 5—figure supplement 23
Sequence tree of orthogroup OG0002574.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.030
Figure 5—figure supplement 24
Sequence tree of orthogroup OG0002578.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.031
Figure 5—figure supplement 25
Sequence tree of orthogroup OG0002609.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.032
Figure 5—figure supplement 26
Sequence tree of orthogroup OG0002676.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.033
Figure 5—figure supplement 27
Sequence tree of orthogroup OG0002727.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.034
Figure 5—figure supplement 28
Sequence tree of orthogroup OG0002785.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.035
Figure 5—figure supplement 29
Sequence tree of orthogroup OG0002871.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.036
Figure 5—figure supplement 30
Sequence tree of orthogroup OG0002896.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.037
Figure 5—figure supplement 31
Sequence tree of orthogroup OG0002999.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.038
Figure 5—figure supplement 32
Sequence tree of orthogroup OG0003085.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.039
Figure 5—figure supplement 33
Sequence tree of orthogroup OG0003250.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.040
Figure 5—figure supplement 34
Sequence tree of orthogroup OG0003367.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.041
Figure 5—figure supplement 35
Sequence tree of orthogroup OG0003441.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.042
Figure 5—figure supplement 36
Sequence tree of orthogroup OG0003539.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.043
Figure 5—figure supplement 37
Sequence tree of orthogroup OG0003777.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.044
Figure 5—figure supplement 38
Sequence tree of orthogroup OG0003782.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.045
Figure 5—figure supplement 39
Sequence tree of orthogroup OG0003834.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.046
Figure 5—figure supplement 40
Sequence tree of orthogroup OG0003846.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.047
Figure 5—figure supplement 41
Sequence tree of orthogroup OG0003856.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.048
Figure 5—figure supplement 42
Sequence tree of orthogroup OG0003901.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.049
Figure 5—figure supplement 43
Sequence tree of orthogroup OG0003905.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.050
Figure 5—figure supplement 44
Sequence tree of orthogroup OG0003907.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.051
Figure 5—figure supplement 45
Sequence tree of orthogroup OG0003929.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.052
Figure 5—figure supplement 46
Sequence tree of orthogroup OG0003954.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.053
Figure 5—figure supplement 47
Sequence tree of orthogroup OG0004030.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.054
Figure 5—figure supplement 48
Sequence tree of orthogroup OG0004102.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.055
Figure 5—figure supplement 49
Sequence tree of orthogroup OG0004142.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.056
Figure 5—figure supplement 50
Sequence tree of orthogroup OG0004203.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.057
Figure 5—figure supplement 51
Sequence tree of orthogroup OG0004258.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.058
Figure 5—figure supplement 52
Sequence tree of orthogroup OG0004339.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.059
Figure 5—figure supplement 53
Sequence tree of orthogroup OG0004392.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.060
Figure 5—figure supplement 54
Sequence tree of orthogroup OG0004405.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.061
Figure 5—figure supplement 55
Sequence tree of orthogroup OG0004486.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.062
Figure 5—figure supplement 56
Sequence tree of orthogroup OG0004658.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.063
Figure 5—figure supplement 57
Sequence tree of orthogroup OG0005083.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.064
Figure 5—figure supplement 58
Sequence tree of orthogroup OG0005087.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.065
Figure 5—figure supplement 59
Sequence tree of orthogroup OG0005153.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.066
Figure 5—figure supplement 60
Sequence tree of orthogroup OG0005224.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.067
Figure 5—figure supplement 61
Sequence tree of orthogroup OG0005235.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.068
Figure 5—figure supplement 62
Sequence tree of orthogroup OG0005280.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.069
Figure 5—figure supplement 63
Sequence tree of orthogroup OG0005479.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.070
Figure 5—figure supplement 64
Sequence tree of orthogroup OG0005540.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.071
Figure 5—figure supplement 65
Sequence tree of orthogroup OG0005561.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.072
Figure 5—figure supplement 66
Sequence tree of orthogroup OG0005596.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.073
Figure 5—figure supplement 67
Sequence tree of orthogroup OG0005683.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.074
Figure 5—figure supplement 68
Sequence tree of orthogroup OG0005694.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.075
Figure 5—figure supplement 69
Sequence tree of orthogroup OG0005738.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.076
Figure 5—figure supplement 70
Sequence tree of orthogroup OG0005963.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.077
Figure 5—figure supplement 71
Sequence tree of orthogroup OG0005984.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.078
Figure 5—figure supplement 72
Sequence tree of orthogroup OG0006136.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.079
Figure 5—figure supplement 73
Sequence tree of orthogroup OG0006143.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.080
Figure 5—figure supplement 74
Sequence tree of orthogroup OG0006191.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.081
Figure 5—figure supplement 75
Sequence tree of orthogroup OG0006251.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.082
Figure 5—figure supplement 76
Sequence tree of orthogroup OG0006252.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.083
Figure 5—figure supplement 77
Sequence tree of orthogroup OG0006435.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.084
Figure 5—figure supplement 78
Sequence tree of orthogroup OG0006482.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.085
Figure 5—figure supplement 79
Sequence tree of orthogroup OG0006498.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.086
Figure 5—figure supplement 80
Sequence tree of orthogroup OG0006623.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.087
Figure 5—figure supplement 81
Sequence tree of orthogroup OG0006670.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.088
Figure 5—figure supplement 82
Sequence tree of orthogroup OG0007051.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.089
Figure 5—figure supplement 83
Sequence tree of orthogroup OG0007123.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.090
Figure 5—figure supplement 84
Sequence tree of orthogroup OG0007346.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.091
Figure 5—figure supplement 85
Sequence tree of orthogroup OG0007383.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.092
Figure 5—figure supplement 86
Sequence tree of orthogroup OG0007550.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.093
Figure 5—figure supplement 87
Sequence tree of orthogroup OG0007551.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.094
Figure 5—figure supplement 88
Sequence tree of orthogroup OG0007596.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.095
Figure 5—figure supplement 89
Sequence tree of orthogroup OG0008189.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.096
Figure 5—figure supplement 90
Sequence tree of orthogroup OG0008334.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.097
Figure 5—figure supplement 91
Sequence tree of orthogroup OG0008335.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.098
Figure 5—figure supplement 92
Sequence tree of orthogroup OG0008579.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.099
Figure 5—figure supplement 93
Sequence tree of orthogroup OG0008680.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.100
Figure 5—figure supplement 94
Sequence tree of orthogroup OG0008822.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.101
Figure 5—figure supplement 95
Sequence tree of orthogroup OG0008898.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.102
Figure 5—figure supplement 96
Sequence tree of orthogroup OG0008996.

The tree is based on amino acid sequences. Archaea (yellow), Bacteria (blue), Cyanidiales (red), Eukaryotes including other red algae (green). Branches containing Cyanidiales sequences are highlited in red.

https://doi.org/10.7554/eLife.45017.103
HGT vs. non-HGT orthogroup comparisons.

(A) Maximum PID of Cyanidiales genes in native (blue) and HGT (yellow) orthogroups when compared to non-eukaryotic sequences in each OG. The red lines denote the 70% PID threshold for assembly artifacts according to ‘the 70% rule’. Dots located in the top-right corner depict the 73 OGs that appear to contradict this rule, plus the 5 HGT candidates that score higher than 70%. 18/73 of those OGs are not derived from EGT or contamination within eukaryotic assemblies. (B) Density curve of average PID towards non-eukaryotic species in the same orthogroup (potential non-eukaryotic donors in case of HGT candidates). The area under each curve is equal to 1. The average PID of HGT candidates (left dotted line) is 6.1% higher than the average PID of native OGs also containing non-eukaryotic species (right dotted line). This difference is significant (Wilcoxon rank-sum test, p>0.01). (C) Distribution of OG-sizes (=number of Galdieria species present in each OG) between the native and HGT dataset. A total of 80% of the HGT OGs and 89% of the native OGs are present in either ≤10 species, or ≤2 species. Whereas 52.5% of the native gene set is conserved in ≤10 Galdieria strains, only 36.1% of the HGT candidates are conserved. In contrast, about 50% of the HGT candidates are present in only one Galdieria strain. (D) Pairwise OG-size comparison between HGT OGs and native OGs. A significantly higher PID when compared to non-eukaryotic sequences was measured in the HGT OGs at OG-sizes of 1 and 11 (Wilcoxon rank-sum test, BH, p<0.01). No evidence of cumulative effects was detected in the HGT dataset. However, the fewer Galdieria species that are contained in one OG, the higher the average PID when compared to non-eukaryotic species in the same tree (Jonckheere-Terpstra, p<0.01) in the native dataset.

https://doi.org/10.7554/eLife.45017.104
Cyanidiales live in hostile habitats, necessitating a broad range of adaptations to polyextremophily.

The majority of the 96 HGT-impacted OGs were annotated and putative functions identified (in the image, colored fields are from HGT, whereas gray fields are native functions). The largest number of HGT candidates is involved in carbon and amino acid metabolism, especially in the Galdieria lineage. The excretion of lytic enzymes and the high number of importers (protein/AA symporter, glycerol/H2O symporter) within the HGT dataset suggest a preference for import and catabolic function.

https://doi.org/10.7554/eLife.45017.106
Appendix 1—figure 1
Raw read length distribution of the sequenced Cyanidiales strains.

The strains were sequenced in 2016/2017 using PacBio’s RS2 sequencing technology and P6-C4 chemistry (the only exception being C. merolae Soos, which was sequenced as pilot study using P4-C2 chemistry in 2014). Seven strains, namely G. sulphuraria 5572, G. sulphuraria 002, G. sulphuraria SAG21.92, G. sulphuraria Azora, G. sulphuraria MtSh, G. sulphuraria RT22 and G. sulphuraria MS1 were sequenced at the University of Maryland Institute for Genome Sciences (Baltimore, USA). The remaining three strains, G. sulphuraria YNP5578.1, G. phlegrea Soos and C. merolae Soos, were sequenced at the Max-Planck-Institut für Pflanzenzüchtungsforschung (Cologne, Germany).

https://doi.org/10.7554/eLife.45017.109
Appendix 3—figure 1
%GC – Galdieria sulphuraria 074W: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.114
Appendix 3—figure 2
%GC – Galdieria sulphuraria MS1: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.115
Appendix 3—figure 3
%GC – Galdieria sulphuraria RT22: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.116
Appendix 3—figure 4
%GC – Galdieria sulphuraria SAG21: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.117
Appendix 3—figure 5
%GC – Galdieria sulphuraria Mount Shasta (MtSh): (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.118
Appendix 3—figure 6
Galdieria sulphuraria Azora: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.119
Appendix 3—figure 7
%GC – Galdieria sulphuraria Mount Shasta YNP5578.1: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.120
Appendix 3—figure 8
%GC – Galdieria sulphuraria 5572: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.121
Appendix 3—figure 9
%GC – Galdieria sulphuraria 002: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.122
Appendix 3—figure 10
%GC – Galdieria phlegrea Soos: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.123
Appendix 3—figure 11
%GC – Galdieria phlegrea DBV009: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.124
Appendix 3—figure 12
%GC – Cyanidioschyzon merolae Soos: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.125
Appendix 3—figure 13
%GC – Cyanidioschyzon merolae 10D: (Left) Violin plot showing the %GC distribution across native transcripts and HGT candidates.

(Mid) Cumulative %GC distribution of transcripts. Red line shows the average, blue line a normal distribution based on the average value. (Right) Ranking all transcripts based upon their %GC content. Red ‘*' demarks HGT candidates. As the %GC content was normally distributed, students test was applied for the determination of significant differences between the native gene and the HGT candidate subset.

https://doi.org/10.7554/eLife.45017.126
Appendix 4—figure 1
Exon/Intron – Galdieria sulphuraria 074W: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.130
Appendix 4—figure 2
Exon/Intron – Galdieria sulphuraria MS1: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.131
Appendix 4—figure 3
Exon/Intron – Galdieria sulphuraria RT22: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.132
Appendix 4—figure 4
Exon/Intron – Galdieria sulphuraria SAG21: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.133
Appendix 4—figure 5
Exon/Intron – Galdieria sulphuraria MtSh: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.134
Appendix 4—figure 6
Exon/Intron – Galdieria sulphuraria Azora: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.135
Appendix 4—figure 7
Exon/Intron – Galdieria sulphuraria YNP5578.1: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.136
Appendix 4—figure 8
Exon/Intron – Galdieria sulphuraria 5572: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.137
Appendix 4—figure 9
Exon/Intron – Galdieria sulphuraria 002: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.138
Appendix 4—figure 10
Exon/Intron – Galdieria phlegrea Soos: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.139
Appendix 4—figure 11
Exon/Intron – Cyanidioschyzon merolae Soos: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*” demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates..

https://doi.org/10.7554/eLife.45017.140
Appendix 4—figure 12
Exon/Intron – Cyanidioschyzon merolae 074W: (Left) Mid) Cumulative %GC distribution of transcripts.

Red line shows the average, blue line a normal distribution based on the average value. The data is categorical (genes have either one, two, three etc. exons) and does not follow a normal distribution. (Mid) Ranking all transcripts based upon their number of exons. Red ‘*' demarks HGT candidates. As the number of exons was not normally distributed, transcripts were ranked by number of exons. In order to resolve the high number of tied ranks (e.g. many transcripts have two exons) a bootstrap was implied by which the rank of transcripts sharing the same number of exons was randomly assigned 1000 times. Wilcoxon-Mann-Whitney-Test applied for the determination of significant rank differences between the native gene and the HGT candidate subset. (Right) Violin plot showing the number of exons per transcript distribution across native transcripts and HGT candidates.

https://doi.org/10.7554/eLife.45017.141
Appendix 5—figure 1
Best Blast Hit between each of the 13 Cyanidiales species and their most similar non-eukaryotic Ortholog in each OG-phylogeny.

Values are given as average percent protein identity between Cyanidiales and non-eukaryotic ortholog. White boxes represent missing Cyanidiales orthologs.

https://doi.org/10.7554/eLife.45017.143

Tables

Table 1
Summary of the 13 analyzed Cyanidiales genomes.

The existing genomes of Galdieria sulphuraria 074W, Cyanidioschyzon merolae 10D, and Galdieria phlegrea are marked with ‘#'. The remaining 10 genomes are novel. Genome Size (Mb): size of the genome assembly in Megabases. Contigs: number of contigs produced by the genome assembly. The contigs were polished with quiver Contig N50 (kb): Contig N50. %GC Content: GC content of the genome given in percent. Genes: transcriptome size of species. Orthogroups: All Cyanidiales genes were clustered into a total of 9075 OGs. Here we show how many OGs there are per species. HGT Orthogroups: Number of OGs derived from HGT. HGT Genes: Number of HGT gene candidates found in species. %GC Native: GC content of the native transcriptome given in percent. %GC HGT: GC content of the HGT gene candidates given in percent % Multiexon Native: % of multiallelic genes in the native transcriptome. % Multiexon HGT: percent of multiallelic genes in the HGT gene candidates. S/M Native: Ratio of Multiexonic vs Singleexonic genes in native transcriptome. S/M HGT: Ratio of Multiexonic vs Singleexonic genes in HGT candidates. Asterisks (*) denote a significant difference (p<=0.05) between native and HGT gene subsets. EC, PFAM, GO, KEGG: Number of species-specific annotations in EC, PFAM, GO, KEGG.

https://doi.org/10.7554/eLife.45017.003
StrainGenome featuresGene and OG countsHGTsHGT vs native gene subsetsAnnotations
 Genome Size (Mb)ContigsContig N50 (kb)%GC ContentGenesOrthogroupsHGT orthogroupsHGT genes%GC Native%GC HGT(%) Multiexon Native(%) Multiexon HGTExon/Gene NativeExon/Gene HGTECPFAMKEGGGO
G. sulphuraria 074W#13.78433172.336.8971745265515538.9939.62*73.647.3*2.253.2*938307332416572
G. sulphuraria MS114.89129172.137.6274415389545839.5940.79*83.462.1*2.53.88*930307731786564
G. sulphuraria RT2215.62118172.937.4369825186515439.5440.85*74.751.9*2.633.95*941311832236504
G. sulphuraria SAG2114.31135158.237.9259564732444740.0441.47*84.883.04.025.03*931304731436422
G. sulphuraria MtSh14.95101186.640.0461604746464741.3342.48*79.763.8*3.154.32*939311432446450
G. sulphuraria Azora14.06127162.340.1063054905495841.3442.57*84.575.9*2.684.03*934307231816474
G. sulphuraria YNP5587.114.42115170.840.0561184846464641.3342.14*74.554.3*2.613.65*938308432066516
G. sulphuraria 557214.28108229.737.9964725009465339.6840.5*78.445.3*2.153.53*936310832526540
G. sulphuraria 00214.11107189.339.1659124701465240.7641.35*97.150.0*2.373.73*927306031846505
G. phlegrea DBV009#11.4193112.037.8678365562546239.9740.58*nananana935301831256512
G. phlegrea Soos14.87108201.137.5261254624444739.5740.73*77.543.2*2.193.33*929303431976493
C. merolae 10D#16.7322859.154.8148033980333356.5756.570.50.011.01883281128326213
C. merolae Soos12.3335567.554.3344063574343454.8454.269.42.91.061.1886278728236188
Table 2
Natural habitats of extant prokaryotes harboring the closest orthologs to Cyanidiales HGTs.

Numbers in brackets represent how many times HGT candidates from Cyanidiales shared monophyly with non-eukaryotic organisms; for example Proteobacteria were found in 53/96 of the OG monophylies. Kingdom: Taxon at kingdom level. Species: Scientific species name. Habitat: habitat description of the original sampling site. pH: pH of the original sampling site. Temp: Temperature in Celsius of the sampling site. Salt: Ion concentration of the original sampling site. na: no information available.

https://doi.org/10.7554/eLife.45017.105
KingdomPhylogenyNatural habitat of closest non-eukaryotic ortholog
DivisionSpeciesHabitat descriptionpHMax. tempSalt
BacteriaProteobacteria (53)Acidithiobacillus thiooxidans (4)Mine drainage/Mineral ores2.0–2.530°C‘hypersaline’
Carnimonas nigrificans (4)Raw cured meat3.035°C8% NaCl
Methylosarcina fibrata (4)Landfill5.0–9.037°C1% NaCl
Sphingomonas phyllosphaerae (3)Phyllosphere of Acacia cavenna28°Cna
Gluconacetobacter diazotrophicus (3)Symbiont of various plant species2.0–6.0na‘high salt’
Gluconobacter frateurii (3)nananana
Luteibacter yeojuensis (3)Rivernanana
Thioalkalivibrio sulfidiphilus (3)Soda lake8.0–10.540°C15% total salts
Thiomonas arsenitoxydans (3)Disused mine site3.0–8.030°C‘halophilic’
Firmicutes (28)Sulfobacillus thermosulfidooxidans (6)Copper mining2.0–2.545°C‘salt tolerant’
Alicyclobacillus acidoterrestris (4)Soil sample2.0–6.053°C5% NaCl
Gracilibacillus lacisalsi (3)Salt lake7.2–7.650°C25% total salts
Actinobacteria (19)Amycolatopsis halophila (3)Salt lake6.0–8.045°C15% NaCl
Rubrobacter xylanophilus (3)Thermal industrial runoff6.0–8.060°C6.0% NaCl
Chloroflexi (12)Caldilinea aerophila (4)Thermophilic granular sludge6.0–8.065°C3% NaCl
Ardenticatena maritima (3)Coastal hydrothermal field5.5–8.070°C6% NaCl
Ktedonobacter racemifer (3)Soil sample4.8–6.833°C>3% NaCl
Bacteroidetes Chlorobi (10)Salinibacter ruber (4)Saltern crystallizer ponds6.5–8.052°C30% total salts
Salisaeta longa (3)Experimental mesocosm (Salt)6.5–8.546°C20% NaCl
Nitrospirae (7)Leptospirillum ferriphilum (4)Arsenopyrite biooxidation tank0–3.040°C2% NaCl
Fibrobacteres (6)Acidobacteriaceae bacterium TAA166 (3)nananana
Deinococcus (5)Truepera radiovictrix (3)Hot spring runoffs7.5–9.5na6% NaCl
ArchaeaEuryarchaeota (6)Ferroplasma acidarmanus (3)Acid mine drainage0–2.540°C‘halophilic’
Appendix 1—table 1
Sequencing and Assembly stats.

The strains were sequenced using PacBio’s RS2 sequencing technology and P6-C4 chemistry (the only exception being C. merolae Soos, which was sequenced using P4-C2 chemistry). For genome assembly, canu version 1.5 was used, followed by polishing three times using the Quiver algorithm. Genes were predicted with MAKER v3 beta(Doolittle, 1999; Doolittle, 1999). The performance of genome assemblies (not shown here) and gene prediction was assessed using BUSCO v.3. Raw Reads: Number of raw PacBio RSII reads. Raw Reads N50: 50% of the raw sequence is contained in reads with sizes greater than the N50 value. Raw Reads GC: GC content of the raw reads in percent. Raw Reads (bp): Total number of sequenced basepairs (nucleotides) per species. Raw Coverage (bp): Genomic coverage by raw reads. This figure was computed once the assembly was finished. Unitigging (bp): Total number of basepairs that survived read correction and trimming. This amount of sequence is what the assembler considered when constructing the genome. Unitigging Coverage: Genomic coverage by corrected and trimmed reads. Genome Size (bp): Size of the polished genome. Genome GC: GC content of the polished genome. Contigs: Number of contigs. Contig N50: 50% of the final genomic sequence is contained in contigs sizes greater than the N50 value. Genes: Number of genes predicted by Maker v3 beta. BUSCO (C): Percentage of complete gene models. BUSCO (C + F): Percentage of complete and fragmented gene models. Fragmented gene models are also somewhat present. BUSCO (D): Percentage of duplicated gene models. BUSCO (M): Percentage of missing gene models.

https://doi.org/10.7554/eLife.45017.110
SpeciesRaw readsRaw reads N50Raw reads GCRaw reads (bp)Raw reads coverageUnitigging
(bp)
Unitigging
coverage
Genome
size (bp)
Genome
GC
ContigsContig
N50
GenesBusco
(C)
Busco
(C + F)
Busco
(D)
Busco
(M)
G. sulphuraria RT221637641202335.83%142437248191.20110867709870.991561785237.43%118172878698292.8%94.5%6.3%5.5%
G. sulphuraria 0021319781010937.90%94609350167.0580560841057.091411021939.16%107189293591287.5%92.5%5.0%7.5%
G. sulphuraria 55721014721044936.45%80220330756.1966462655446.551427736837.99%108229711647291.5%93.5%5.0%6.5%
G. sulphuraria MS1128294999136.18%93454662162.7777758787652.231488794637.62%129172087744190.8%94.1%4.0%5.9%
G. sulphuraria MtSh1589361361739.19%1523875693101.95123539461482.651494761440.04%101186619616087.4%91.7%6.9%8.3%
G. sulphuraria Azora825441024437.09%65128093046.3155172052439.231406379340.10%127162248630588.4%92.0%2.3%8.0%
G. sulphuraria SAG21.92714801034136.67%56487414939.4741379365928.911431282437.92%135158217595683.8%88.4%3.6%11.6%
G. sulphuraria YNP5587.1774211384236.69%76960672353.3861390525042.581441654740.05%115170797611891.8%93.5%5.0%6.5%
G. phlegrea Soos922631436536.01%96670204965.0061958074141.661487269637.52%108201071612592.1%93.8%7.9%6.2%
C. merolae Soos154461792452.92%84854269868.8257054283046.271232996154.33%35567466440685.2%89.5%2.0%10.5%
G. sulphuraria074W*1371200436.89%433172322717783.8%87.4%2.3%10.3%
C. merolae 10D*1672894554.81%22859119504490.4%93.4%1.3%6.6%
G. phlegrea DBV009*1141318337.86%93111993783668.3%88.1%3.6%11.9%
Appendix 3—table 1
%GC analysis of the Cyanidiales transcriptomes.

%GC content of HGT genes was compared to the %GC content of native genes using students test. Legend: HGT Genes: number of HGT gene candidates found in species. Avg. %GC Native: average %GC of native transcripts. Avg. %GC HGT: average %GC of HGT candidates. P-Val (T-test): significance value (p-value) of student’s test. Delta: difference in %GC between average %GC of native genes and the average %GC of HGT candidates.

https://doi.org/10.7554/eLife.45017.113
HGT genesAvg. %GC NativeAvg. %GC HGTp-Val (T-test)Delta
Galdieria_sulphuraria_074W5538.9939.620.0460.63
Galdieria_sulphuraria_MS15839.5940.7901.2
Galdieria_sulphuraria_RT225439.5440.8501.31
Galdieria_sulphuraria_SAG214740.0441.4701.43
Galdieria_sulphuraria_MtSh4741.3342.4801.15
Galdieria_sulphuraria_Azora5841.3442.5701.23
Galdieria_sulphuraria_YNP558714641.3342.140.0060.81
Galdieria_sulphuraria_55725339.6840.50.0020.82
Galdieria_sulphuraria_0025240.7641.350.0160.59
Galdieria_phlegrea_DBV085439.9740.580.0160.61
Galdieria_phlegrea_Soos4439.5740.7301.16
Cyanidioschyzon_merolae_10D3356.5756.570.9960
Cyanidioschyzon_merolae_Soos3454.8454.260.479−0.58
Appendix 4—table 1
Single exon genes vs multiexonic.

The ratio of single exon genes vs multiexonic genes was compared between HGT candidates and native Cyanidiales genes (Fisher enrichment test). Legend: HGT Genes: number of HGT gene candidates found in species. Single Exon HGT: number of single exon genes in HGT candidates. Multi Exon HGT: number of multiexonic genes in HGT candidates. Single Exon Native: number of single exon genes in native Cyanidiales genes. Multi Exon Native: number of multiexonic genes in native Cyanidiales genes. HGT SM Ratio percentage of single exon genes within the HGT candidate genes. Native SM Ratio percentage of single exon genes within the native genes. Delta: difference in percent between the percentage of single exon genes between the native genes and HGT candidates. Fisher p-val: p-value of fisher enrichment test.

https://doi.org/10.7554/eLife.45017.128
HGT genesSingle exon (HGT)Multi exon (HGT)Single exon (Native)Multi exon (Native)Fisher's pSingle exon % (HGT)Single exon % (Native)Multi exon % (HGT)Multi exon
% (Native)
Galdieria_sulphuraria_074W552926187952404.05E-0552.7%26.4%47.3%73.6%
Galdieria_sulphuraria_MS1582236122461590.000109837.9%16.6%62.1%83.4%
Galdieria_sulphuraria_RT22542628175651720.000407948.1%25.3%51.9%74.7%
Galdieria_sulphuraria_SAG214783990150080.685217.0%15.2%83.0%84.8%
Galdieria_sulphuraria_MtSh471730123948740.0105436.2%20.3%63.8%79.7%
Galdieria_sulphuraria_Azora58143996652860.0355824.1%15.5%75.9%84.5%
Galdieria_sulphuraria_YNP55871462125154845240.0034145.7%25.5%54.3%74.5%
Galdieria_sulphuraria_5572532924138950301.75E-0754.7%21.6%45.3%78.4%
Galdieria_sulphuraria_00252262614047208.75E-0750.0%2.9%50.0%97.1%
Galdieria_phlegrea_DBV00954nanananananananana
Galdieria_phlegrea_Soos442522136947095.17E-0656.8%22.5%43.2%77.5%
Cyanidioschyzon_merolae_
10D
333304744261100.0%99.5%0.0%0.5%
Cyanidioschyzon_merolae_Soos3433139604120.36797.1%90.6%2.9%9.4%
Appendix 4—table 2
Exon/Gene ratio.

The ratio of exons per gene was compared between HGT candidates and native Cyanidiales genes (Wilcox ranked test). Legend: HGT Genes: number of HGT gene candidates found in species. E/G All: average number of exons per gene across the whole transcriptome. E/G Native: average number of exons per gene across in native genes. E/G HGT: average number of exons per gene in HGT gene candidates. p-Val (Wilcox) SM Ratio p-value of non-parametric Wilcox test for significant differences. Delta: difference in average number of exons per gene the native genes and HGT candidates.

https://doi.org/10.7554/eLife.45017.129
HGT genesMean exon per transcript (HGT)Mean exon per transcript (Native)Wilcox (p)Delta
Galdieria_sulphuraria_074W552.253.29.40E-060.95
Galdieria_sulphuraria_MS1582.53.881.41E-051.38
Galdieria_sulphuraria_RT22542.633.953.42E-061.32
Galdieria_sulphuraria_SAG21474.025.030.00041.01
Galdieria_sulphuraria_MtSh473.154.320.00111.17
Galdieria_sulphuraria_Azora582.684.039.92E-051.35
Galdieria_sulphuraria_YNP55871462.613.652.30E-041.04
Galdieria_sulphuraria_5572532.153.532.25E-071.38
Galdieria_sulphuraria_002522.373.732.65E-061.36
Galdieria_phlegrea_DBV00954nananana
Galdieria_phlegrea_Soos442.193.331.19E-051.14
Cyanidioschyzon_merolae_10D3311.011.00E + 000.01
Cyanidioschyzon_merolae_Soos341.061.12.10E-010.04

Additional files

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Alessandro W Rossoni
  2. Dana C Price
  3. Mark Seger
  4. Dagmar Lyska
  5. Peter Lammers
  6. Debashish Bhattacharya
  7. Andreas PM Weber
(2019)
The genomes of polyextremophilic cyanidiales contain 1% horizontally transferred genes with diverse adaptive functions
eLife 8:e45017.
https://doi.org/10.7554/eLife.45017