Figures and data

Overview of Lake Malawi cichlids.
A summary of the Lake Malawi cichlids and the results from this paper. On the left, a phylogeny showing the seven major lineages that currently live in the lake which are referred to as ecogroups. AC is Astatotilapia calliptera, a riverine species that lives in the margins of the river/lake interface. For each ecogroup, we summarize the 1) estimated number of species in the lake, 2) the number of species we have Bionano optical read data for, 3) the number of species we have PacBio based genome assembies, 4) the number of species with Illumina short-read sequencing data, and 5) the primary habitat these ecogroups live in. Deep and shallow benthics are distinguished by the lake bottom depths where they live while utakas live in the water column but mate over the shore. We also summarize the distribution of 6 large inversions we have identified, colored by the chromosome they are found on (legend). We propose that the inversions on 9, 11, and 20 originated in the pelagic lineage and spread to the benthics by hybridization very early on in the evolution of the benthic lineage. Alternatively, these inversions distribution could be explained by incomplete lineage sorting.

Generation of a new high quality reference genome for Metriaclima zebra.
Alignment of Bionano maps derived from a single Metriaclima zebra male to two different reference sequences for Chromosome 17 (white and grey, respectively). M_zebra_UMD2a reference (white) is the existing reference genome and M_zebra_GT3a reference (grey) was created for this manuscript using a combination of PacBio HiFi reads and Bionano molecules. For each reference, the top bar shows the expected location of each CTTAAG motif (blue) created from an in silico annotation of the reference sequence. These are aligned (grey lines) to observed locations in maps generated from Bionano Saphyr optical molecules (bottom bar). Deviations from co-linearity indicate reference errors (green, blue, and red highlights). Yellow indicates regions of the genome that lack aligned motifs. A summary of the improvements in the new genome is found in Table S1.

Inversion genotypes of 31 samples based on Bionano data.

Identification of six large single or double inversions segregating within Lake Malawi cichlids.
Representative alignments of inversions or double inversions identified from blood samples obtained from 30 individuals from eleven species. For each inversion, the top shows predicted CTTAAG motifs from an in-silico annotation of the reference genome, the bottom shows motifs identified using Bionano molecules obtained from an individual of the species indicated on the left, and the grey lines indicate matching motifs based upon predicted and observed distances. Yellow indicates regions containing no matches between motifs in the reference genome and the mapped sample. Single inversions were identified on chromosomes 2, 9, 10, 13, and 20. Tandem double inversions were identified on chromosomes 11 and 20. The estimated length and percentage of the chromosome spanned by the inversion is shown on the right. The single inversion on 20 identified in Rhamphochromis longiceps has the same position as the first inversion of the double inversion on 20 identified from Mchenga conophoros. A list of all samples and their inversion genotype is in Table 1. Note that an error or polymorphism in the reference genome in chromosome 2 is indicated in the first panel.

Distribution of inversions within the Lake Malawi ecogroups.
Principal component analysis was used to analyze SNVs distributions in 365 samples within each inversion to genotype the inversion haplotype for all six inversions. (A) We illustrate the approach for genotyping the chromosome 10 inversions. The PCA plot for the entire genome is shown on the left, with each sample colored by its ecogroup. Samples from each ecogroup cluster together, reflecting their evolutionary history. Individuals with Bionano data are shown as grey Xs. On the right, for the SNVs that fall within the chromosome 10 inversion, the deep benthic/shallow benthic and utaka individuals split into multiple clusters. Using the samples genotyped with Bionano data, we assign each cluster to an inversion genotype, which is shown by the black, green, and red boxes. To make comparisons of the whole genome PCA and chromosome 10 PCA plots easier, we reversed the y-axis on the right panel. The assigned genotypes of each sample can be found in Supplemental Table 2. Interactive PCA plots for each inversion are included in Figures S9-S17. (B) The derived allele frequency was calculated for each inversion within the seven ecogroups. The number of species that were genotyped and the estimated number of species that live in Lake Malawi are listed in the whole-genome phylogeny (left).

Evolutionary history of large inversions violates species tree.
(A) Maximum likelihood trees built from either whole genome SNVs (top) or within the 9, 11, and 20 inversions. Between 2-25 representative samples were selected for each ecogroup (see Table S2) and individuals that were heterozygous were excluded from this analysis. Clades were collapsed by ecogroup for visualization purposes. Full phylogenies are presented in Figures S18 to S25. The whole genome tree indicates Diplotaxodon and Rhamphochromis species split first and formed their own clade. For each of the displayed inversions, the first split was between samples carrying the inverted vs. non-inverted genotypes. Additionally, the Diplotaxodon individuals now formed a clade with the benthics that carry the inversion. (B) Density plots of dXY values comparing benthic species and species of the indicated ecogroup. For the displayed inversions, the Diplotaxodon species were much closer in evolutionary distance when compared to the whole genome plot.

A role for the chromosome 10 inversion in sex determination
(A) Schematic of the two inversion states (normal and inverted) and their assignment as X or Y chromosomes. (B) 24 offspring (12 male and 12 female) from two separate male/female pairs were genotyped for the inversion. For 47 of 48 individuals, the XY animals were male and the XX animals were female. One male was XX. (C) FST analysis comparing 24 males and 24 females identified chromosome 10 as the primary genetic region that segregated with sex. (D) Comparison of heterozygosity levels of 24 male and 24 female animals also identified chromosome 10 as an outlier.










