Chromosome-scale genome assembly of the European common cuttlefish Sepia officinalis
Figures
Sepia officinalis assembly statistics and quality control.
(A) Specimen of S. officinalis (credit: Stephan Junek, MPI for Brain Research). (B) Overview of the genome assembly workflow. Genome size was estimated from short DNA reads (Illumina) using GenomeScope (Ranallo-Benavidez et al., 2020; Vurture et al., 2017). The primary assembly was generated from long DNA reads (PacBio Sequel II) and chromatin conformation capture (Hi-C) reads (Dovetail OmniC) with hifiasm (Cheng et al., 2021). Assembly was scaffolded with YAHS (Zhou et al., 2023) and residual small scaffolds were manually placed in chromosomes. (C) Snail plot of chromosome-scale S. officinalis assembly generated using blobtools2 (Challis et al., 2020) showing scaffold statistics (e.g. number of scaffolds, median scaffold length N50), base composition, and completeness measured using Benchmarking Universal Single-Copy Orthologs (BUSCO) (Simão et al., 2015) against the metazoa_odb12 database. (D) Hi-C heatmap showing the 47 chromosome-scale scaffolds with few sequences remaining in unplaced scaffolds. X and y-axes show the genome position in Mbp. The heatmap was generated using juicebox (Dudchenko et al., 2018), 0–7039 observed counts (balanced) are shown.
HapHiC scaffolding for different numbers of expected chromosome scaffolds show 47 chromosomes as most supported.
Hi-C contact maps from HapHiC (Zeng et al., 2024) are shown for 46, 47, 48, 49, and 50 expected chromosome scaffolds. Assembled chromosomes are shown as blue boxes, Hi-C signal indicating a false (unsupported) merger is shown by cyan arrow, false splits are shown by black arrows. The contact maps differ from the map shown in Figure 1, which was created using YAHS and manual curation.
Comparison of two Sepia officinalis chromosome-scale assemblies indicates chromosome number of 1n=46.
Datasets were collected from two S. officinalis animals, one as described in this study (MPIBR), the second by the Darwin Tree of Life consortium (DToL) (Blaxter et al., 2022). Both datasets were assembled using a common pipeline (hifiasm and YAHS). (A) Hi-C contact map of the MPIBR primary assembly, scaffolded using YAHS without manual curation. Assembled 47 chromosome scaffolds are shown as blue boxes. (B) Hi-C contact map of DToL primary assembly, scaffolded using YAHS without manual curation, showing 49 assembled chromosome scaffolds as blue boxes. (C) Whole-genome alignment of both scaffolded assemblies using Winnowmap2 (Jain et al., 2022), showing DToL on x-axis and MPIBR on the y-axis. The 4 ‘breakpoints’ of chromosomes in either of the assemblies (three breaks in DToL chromosomes compared to MPIBR, one break in MPIBR compared to DToL) are highlighted in different colors. (D) Ribbon diagram showing the four breakpoints from (C) compared to the chromosome-scale assembly from another cuttlefish, Acanthosepion esculentum (1n=46). The color of breakpoints are the same in panels C+D.
BUSCO completeness results.
(A) Comparison of two S. officinalis chromosome-scale assemblies, which were constructed from two independent datasets (this study: MPIBR, Darwin Tree of Life project: DToL), assembled using a common pipeline (hifiasm Cheng et al., 2021 with PacBio HiFi and Hi-C reads). Results for the database metazoa_odb12, the zoom in shows only duplicated, fragmented, and missing fractions to improve readability. The DToL assemblies have slightly higher completeness than MPIBR, due to the higher sequencing coverage used as input. In both datasets, compared to the primary assembly (‘.hic’), the phased haplotypes (‘.hic.hap1’ and ‘hic.hap2’) have less duplicated but more missing genes. (B) BUSCO results for the mollusca_odb12 database, showing the same trend as in (A). (C) Comparison of different BUSCO databases odb10 and odb12 on the manually curated assembly (‘sepoff241117’). For the mollusca gene sets (top), a strong improvement in completeness was observed between odb10 and odb12, reflecting that the updated gene set is more concise and conserved across species. For the metazoa gene sets (bottom), the completeness was marginally increased for odb12 compared to odb10.
Analysis of raw data at breakpoints between S. officinalis assemblies hints at a technical cause of breakpoints.
(A) Coverage of HiC and HiFi data shown for pairs of scaffolds exhibiting breakpoints. Blue shows MPIBR data, orange shows Darwin Tree of Life project (DToL) data. For each breakpoint, trans HiC contacts are shown on top across the full scaffold, with terminal 200 kb windows highlighted in yellow. Both terminal windows are shown below with aligned HiFi reads (gray horizontal bars) and normalized HiFi read density. Trans HiC contacts are shown as purple dots. Right gray box: same data shown for the complete breakpoint scaffold of the other assembly, with trans HiC contacts calculated to a size-matched scaffold. (B) Distribution of normalized trans HiC contact rate (pairs per Mb Young, 1963a) for random scaffold pairs (‘background pairs,’ gray) and within scaffolds (‘intra scaffold,’ green) for MPIBR (left) and DToL (right) data. Values for scaffolds with breakpoints are indicated in blue and orange, respectively. (C) Histogram of contact rates from (B) shown for random scaffold pairs and breakpoint pairs. Contact rates and empirical p-values of breakpoint pairs are indicated in blue (left, MPIBR) and orange (right, DToL). Joint p-value for three rates for DToL breakpoints is indicated in box (Wilcoxon rank-sum, one-tailed). (D) Repeat analysis of 200 kb scaffold ends at breakpoints and control scaffolds (gray box). Overall repeat content (% of base pairs) and type are shown.
Syntenic comparison of three decapod species.
(A) Taxonomy of selected cephalopod species showing their genome size (in gigabases, Gb) and haploid chromosome numbers. Taxonomy information was downloaded from NCBI taxonomy browser, divergence times for Coleoidea and Decapodiformes from Kröger et al., 2011 and for Sepiidae from López-Córdova et al., 2022. (B) Genome-wide syntenic relationship between chromosomes of E. scolopes (Albertin et al., 2022) (top), D. pealeii (Albertin et al., 2022) (middle), and S. officinalis (bottom). Colored braids connect syntenic regions across genomes, with chromosomes drawn to physical scale. Euprymna chromosomes 45 and 46 are not shown because they contain too few orthogroups. (C) Detailed synteny of Sepia chromosomes 40 (magenta) and 43 (dark blue) shown, that are joined in the other species and cause the different haploid chromosome number in Sepia. Riparian plots were generated using GENESPACE v1.2.3 (Lovell et al., 2022).
Syntenic relationship between S. officinalis and D. pealeii chromosomes.
Dot plot showing finer-resolution syntenic anchor hits (perfectly collinear blast hits within the same orthogroup). Genes are ordered along the chromosomes, only chromosome pairs with a minimum synteny score of 10 and at least 10 syntenic genes are shown. Synteny analysis and visualization were performed using GENESPACE v1.2.3 (Lovell et al., 2022).
Syntenic relationship between S. officinalis and E. scolopes chromosomes.
Dot plot showing finer-resolution syntenic anchor hits (perfectly collinear blast hits within the same orthogroup). Genes are ordered along the chromosomes, only chromosome pairs with a minimum synteny score of 10 and at least 10 syntenic genes are shown. E. scolopes chromosomes 45 and 46 are not shown because they contain too few orthogroups. Synteny analysis and visualization were performed using GENESPACE v1.2.3 (Lovell et al., 2022).
Syntenic comparison of four decapod species hints at a cephalopod sex chromosome.
(A) Riparian plot showing synteny relationships of chromosomes from four decapod species, generated using GENESPACE (Lovell et al., 2022) with orthogroups. Euprymna chromosomes 45 and 46 are not shown because they contain too few orthogroups. Chromosome split in S. officinalis compared to other species is shown in purple, putative sex chromosome as identified recently (Coffing et al., 2025) is shown in cyan. (B) Normalized coverage of sequencing data in S. officinalis chromosomes. (C) Normalized coverage of short reads to female A. esculentum genome, reproduced from Coffing et al., 2025. Decrease in read coverage for chromosome 46 is visible, the putative Z sex chromosome. Read depth was calculated from Illumina gDNA reads in windows of 500,000 bp and normalized to the median coverage of chromosome 1. Box plots showing median divergence (box dividing line), interquartile range (box), and 1.5 times the interquartile range (whiskers). The putative Z chromosome is highlighted in cyan. Chromosomes with significantly reduced read coverage (orange label) were identified by a one-sided Wilcoxon rank-sum test of each chromosome’s normalized depth windows against all remaining chromosomes (Benjamini-Hochberg-corrected, at least 10% decrease in median normalized depth, *p<0.5, **p<0.01, ***p<0.001).
Genome annotation for Sepia officinalis.
(A) Annotation of repeat landscape of the S. officinalis genome, annotated using RepeatModeler (Flynn et al., 2020). Full repeat landscape is shown on the left, annotated repeats (excluding unclassified or simple repeats) are shown on the right. (B–C) Quality control of gene annotation and comparison to two other cuttlefish species using OMArk (Nevers et al., 2022). Results shown for Acanthosepion lycidas (GCA_963932145.1, Ensembl Genebuild), Sepia officinalis (BRAKER, this study), and Acanthosepion pharaonis (Song et al., 2021) (BRAKER). Lophotrochozoa was used as the ancestral clade. (B) Completeness assessed by the presence of genes conserved in the clade, classified as single or multiple copies (duplicated), or missing. (C) Consistency assessed by the proportion of proteins placed in the correct lineage (consistent); placement in incorrect lineages randomly (inconsistent) or to specific species (contamination), or no placement in known gene families (unknown). (D) Phylogenetic tree of 13 molluscan species used for analysis of gene families with Orthofinder (Emms et al., 2025). Species are colored by clade: purple = coleoid cephalopods, blue = nautiloid (non-coleoid cephalopod), green = non-cephalopod mollusk. (E) Heatmap of largest gene families (orthogroups from Orthofinder, with more than 100 genes in any species), ordered from largest gene count across all species on the left. Families with at least one gene in S. officinalis are depicted. Rows show gene counts for each species (color capped at 500 genes), columns show orthogroups and their annotation by eggNOG mapper (Cantalapiedra et al., 2021; Huerta-Cepas et al., 2017) or InterProScan (Blum et al., 2025), if available. Clade colors match (D).
Gene family expansion analysis.
(A) Gene family expansion analysis using CAFE5 (Mendes et al., 2021) with a gamma model (k=3) on all smaller gene families (less than 100 genes in any species). 30 families with the most change in different categories are shown (expanded only in S. officinalis (pink), in all coleoids (orange), in all species (yellow), in non-cephalopod mollusks (green), or overall contraction (blue)). Rows show change (expansion or contraction) of gene families in any species, columns show orthogroups and annotation, if available. Dots show significant change (p<0.05), gene counts are shown for any orthogroup with at least 12 genes in any species. (B) Gene families with differential expression in bulk RNA-seq data. Dot size shows the number of differentially expressed (DE) genes for each tissue. (C) Dotplots of enriched (gene ontology GO) terms for large gene families, enriched using clusterProfiler using a hypergeometric test. Dot size shows the number of expressed genes per family with this GO term, x-axis shows percentage of expressed genes from all genes with this GO term. Dot color shows adjusted p-value after Benjamini-Hochberg false discovery rate (FDR) correction. CC: cellular component, MF: molecular function, BP: biological process. (D) Heatmap of z-scored expression of all DE genes from the gene families with enriched GO terms.
Expression of expanded gene families in tissue bulk RNA-seq data.
Bulk RNA-seq data collected from one adult S. officinalis from different brain tissues (optic lobes - yellow, basal lobes - turquoise, vertical and subvertical lobes - orange, posterior subesophageal mass - purple), retina (red), and skin (blue, from the dorsal mantle). Tissue color code is identical throughout the figure. (A) Principal component analysis (PCA) of the data, showing the first 2 PCs, colored by tissue. (B) Barplot showing number of differentially expressed (DE) genes (i.e. marker genes) for each tissue, calculated against all other tissues using DESeq2 (Love et al., 2014). (C) Largest gene families (orthogroups) with differential expression in bulk RNA-seq data. Dot size shows the number of DE genes for each tissue. Families with enriched gene ontology (GO) terms are highlighted in gray. (D+E) Dotplots of enriched gene ontology (Aleksander et al., 2026; Ashburner et al., 2000) (GO) terms for large gene families, enriched using clusterProfiler (Xu et al., 2024) using a hypergeometric test. Dot size shows the number of expressed genes per family with this GO term, x-axis shows the percentage of expressed genes from all genes with this GO term. Dot color shows the adjusted p-value after Benjamini-Hochberg false discovery rate (FDR) correction. CC: cellular component, MF: molecular function, BP: biological process. (F) Heatmap of z-scored expression of all DE genes from the largest gene families with enriched GO terms.
Analysis of Hi-C read pairs from both S. officinalis assemblies.
Hi-C reads were aligned to the primary contigs from hifiasm (as is used for scaffolding with YAHS) and analyzed using pairtools. Note the higher fraction of long-range contacts (at least 1 kb cis pairs or trans pairs) in the MPIBR data (top) compared to DToL (bottom). Due to overall higher coverage, the absolute number of read pairs is higher for DToL than for MPIBR data.
Quantification of repeat content in chromosome scaffolds and unplaced residual scaffolds.
Density plot showing fraction of repeat masked bases in total sequence length for chromosome scaffolds (i.e. scaffolds 1-47) in teal and all remaining small scaffolds (1840 scaffolds) in purple. Median repeat fraction is shown as vertical lines.
Tables
Statistics of S. officinalis assemblies from two independent datasets, assembled using a common pipeline.
| MPIBR reassembly | DToL reassembly | |||||
|---|---|---|---|---|---|---|
| version | p_ctg | hap1 | hap2 | p_ctg | hap1 | hap2 |
| number of contigs | 8.289 | 10.651 | 10.425 | 8.783 | 11.026 | 11.089 |
| raw length [bp] | 6.049.669.443 | 5.675.386.986 | 5.662.586.038 | 6.053.996.452 | 5.721.157.269 | 5.950.565.264 |
| N50 length [bp] | 1.723.203 | 1.032.632 | 1.010.375 | 1.810.137 | 1.165.578 | 1.182.649 |
| average contig length [bp] | 729.843 | 532.850 | 543.173 | 689.285 | 518.878 | 536.618 |
Overview of gene annotation of 13 molluscan species used for gene family analysis.
| Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
|---|---|---|---|---|
| Biological sample (Sepia officinalis) | European cuttlefish, F1 individual | Eggs supplied by Flying Sharks – consultoria e inovação, Lda., Horta, Azores, Portugal | 6-month-old adult; F1 from eggs collected in the Portuguese Atlantic; used for long-read DNA, Iso-Seq, Omni-C and short-read DNA library preparation | |
| Biological sample (Sepia officinalis) | European cuttlefish, F0 individual | Eggs supplied by Université de Caen Normandie, France | 8-month-old adult; F0 from eggs collected in Normandie, France; used for short-read RNA-seq library preparation | |
| Commercial assay or kit | MagAttract HMW DNA Kit | Qiagen | Cat#67563 | Genomic DNA extraction from flash-frozen brain tissue for PacBio HiFi library |
| Commercial assay or kit | NEB Monarch gDNA Purification Kit | New England Biolabs | Cat#T3010S | Genomic DNA extraction from optic lobe; used both for short-read Illumina library preparation and as template for sex-genotyping qPCR |
| Commercial assay or kit | Direct-zol RNA Miniprep Kit | Zymo Research | Cat#R2050 | RNA isolation and DNase I treatment for Iso-Seq libraries |
| Commercial assay or kit | Direct-zol RNA Microprep Kit | Zymo Research | Cat#R2062 | RNA isolation and DNase I treatment for short-read RNA-seq libraries |
| Commercial assay or kit | TeloPrime Full-Length cDNA Amplification Kit V2 | Lexogen | Cat#013.08; Cat#013.24 | Full-length cDNA synthesis targeting 5' cap and poly-A tail for Iso-Seq |
| Commercial assay or kit | SMRTbell express template prep kit 2.0 | PacBio | Cat#100-938-900 | Long-read library preparation for both HiFi DNA and Iso-Seq sequencing |
| Commercial assay or kit | Sequel II binding kit 2.2 | PacBio | Cat#102-089-000 | Used for HiFi DNA sequencing |
| Commercial assay or kit | Sequel II binding kit 2.1 | PacBio | Cat#101-843-000 | Used for Iso-Seq sequencing |
| Commercial assay or kit | Sequel II sequencing kit 2.0 | PacBio | Cat#101-820-200 | Used for PacBio Sequel II runs (HiFi DNA and Iso-Seq) |
| Commercial assay or kit | SMRT Cell 8 M tray | PacBio | Cat#101-389-001 | 5 SMRT cells used for HiFi DNA sequencing; 2 SMRT cells for Iso-Seq (pooled tissues +optic lobe) |
| Commercial assay or kit | Dovetail Omni-C Kit | Dovetail Genomics | Cat#21005 | Omni-C proximity ligation library prepared from brain tissue |
| Commercial assay or kit | Illumina DNA PCR-Free Tagmentation Library Prep Kit | Illumina | Cat#20041794 | Short-read DNA library preparation from 500 ng of high-MW gDNA |
| Commercial assay or kit | IDT for Illumina DNA/RNA UD Indexes, Set A | Illumina | Cat#20026121 | Dual indexes for short-read DNA library |
| Commercial assay or kit | Illumina DNA PCR-Free Sequencing and Indexing primer | Illumina | Cat#20041797 | Used during NextSeq2000 P3 sequencing of short-read DNA library |
| Commercial assay or kit | Qubit ssDNA Assay Kit | Thermo Fisher Scientific | Cat#Q10212 | Quantification of dual-indexed single-stranded short-read DNA libraries |
| Commercial assay or kit | Illumina TruSeq Stranded mRNA Library Prep Kit | Illumina | Cat#20020594 | Short-read RNA-seq libraries prepared from 300 ng total RNA |
| Commercial assay or kit | IDT for Illumina xGen UDI-UMI Adapters | Integrated DNA Technologies (IDT) | Cat#10005903 | Adapters used with TruSeq Stranded mRNA library prep |
| Commercial assay or kit | Illumina NextSeq500 mid output flow cell (300 cycles) | Illumina | Cat#20024905 | Used for short-read RNA sequencing |
| Commercial assay or kit | Illumina NextSeq2000 P3 flow cell (300 cycles) | Illumina | Cat#20040561 | Used for short-read RNA and DNA sequencing |
| Commercial assay or kit | KAPA SYBR FAST qPCR Master Mix (2×Universal) | Roche / KAPA Biosystems | Cat#KK4600 | qPCR master mix used for sex-chromosome genotyping qPCR |
| Chemical compound, drug | TRIzol Reagent | Invitrogen / Thermo Fisher Scientific | Cat#15596026 | Homogenization reagent for RNA isolation from flash-frozen tissues (Iso-Seq and short-read RNA-seq) |
| Sequence-based reagent | SepOff_chr2_auto_G2_F (qPCR primer) | Rubino et al., 2025; 10.1101/2025.10.28.685099 | 5'-TTTGCCACTGTGTCCCTTTATAC-3'; forward primer targeting an autosomal locus on chromosome 2; used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting | |
| Sequence-based reagent | SepOff_chr2_auto_G2_R (qPCR primer) | Rubino et al., 2025; 10.1101/2025.10.28.685099 | 5'-ACACACACAGGCTGCTTATTG-3'; reverse primer targeting an autosomal locus on chromosome 2; used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting | |
| Sequence-based reagent | SepOff_chr46_sex_H2_F (qPCR primer) | Rubino et al., 2025; 10.1101/2025.10.28.685099 | 5'-TTTCAACCCATCTGCGTCTATAG-3'; forward primer targeting a sex-chromosomal locus on chromosome 46 used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting | |
| Sequence-based reagent | SepOff_chr46_sex_H2_R (qPCR primer) | Rubino et al., 2025; 10.1101/2025.10.28.685099 | 5'-ACTCCTCTCGTTGCATGATTAC-3'; reverse primer targeting a sex-chromosomal locus on chromosome 46 used in qPCR sex genotyping; synthesized from IDT at 100 nmol DNA-oligo scale with standard desalting | |
| Other | Lambda DNA-HindIII Digest | New England Biolabs | Cat#3012 | Molecular weight ladder; 100 ng loaded alongside gDNA on 0.75% agarose gel to assess DNA integrity |
| Other | Hard-Shell 96-Well PCR Plates | Bio-Rad | Cat#HSP9601 | 96-well qPCR plates used for sex-chromosome genotyping |
| Other | Microseal 'B' PCR Plate Sealing Film | Bio-Rad | Cat#MSB1001 | Adhesive sealing film used to seal 96-well qPCR plates for sex-chromosome genotyping |
| Software, algorithm | VecScreen | NCBI; https://www.ncbi.nlm.nih.gov/tools/vecscreen/ | RRID:SCR_016577 | Adapter/vector trimming of PacBio HiFi reads prior to assembly |
| Software, algorithm | Meryl | Rhie et al., 2020; 10.1186/s13059-020-02134-9 | k-mer counting (k=21) for k-mer distribution estimation; bundled with Merqury | |
| Software, algorithm | Merfin | Formenti et al., 2022; 10.1038/s41592-022-01445-y | Provides Meryl wrapper used for k-mer distribution estimation | |
| Software, algorithm | GenomeScope 2.0 | Ranallo-Benavidez et al., 2020; 10.1038/s41467-020-14998-3 | RRID:SCR_017014 | Genome size estimation from Illumina short reads and PacBio HiFi data |
| Software, algorithm | hifiasm | Cheng et al., 2021; 10.1038/s41592-020-01056-5 | RRID:SCR_021069 | Primary genome assembly from combined HiFi+Hi C reads; also used for mitochondrial assembly |
| Software, algorithm | YAHS | Zhou et al., 2023; 10.1093/bioinformatics/btac808 | RRID:SCR_022965 | Hi-C scaffolding on phased haplotype 1 with custom -r/-R/-q/--telo-motif parameters |
| Software, algorithm | JBAT (Juicebox Assembly Tools) | Dudchenko et al., 2018; 10.1101/254797 | Manual curation of scaffolds into chromosome-scale scaffolds | |
| Software, algorithm | BUSCO v5.5.0 | Simão et al., 2015; 10.1093/bioinformatics/btv351 | RRID:SCR_015008 | Assembly and annotation completeness assessment using metazoa_odb10, metazoa_ob12, mollusca_odb10 and mollusca_odb12 lineages |
| Software, algorithm | minimap2 | Li, 2018; 10.1093/bioinformatics/bty191 | RRID:SCR_018550 | Used for: aligning mt genome reference NC_007895.1 to long reads; aligning short and long RNA reads to genome; aligning HiFi reads to scaffolded assemblies for coverage |
| Software, algorithm | seqtk | Li, 2013 | RRID:SCR_018927 | seqtk subseq used to extract reads matching mt genome reference for mitochondrial assembly |
| Software, algorithm | RepeatMasker v4.1.7-p1 | Smit et al., 2025; http://www.repeatmasker.org | RRID:SCR_012954 | Soft-masking of repetitive elements (-xsmall, -gff); also used to characterize repeat content at scaffold junctions; run with rmblast v2.14.1+ |
| Software, algorithm | RepeatModeler v2.0.6 | Flynn et al., 2020; 10.1073/pnas.1921046117 | RRID:SCR_015027 | De novo repeat library construction (without LTRstruct option) |
| Software, algorithm | BRAKER3 (incl. TSEBRA) | Simão et al., 2015; Hoff et al., 2019; Brůna et al., 2021; Gabriel et al., 2021; Hoff et al., 2016; Stanke et al., 2006; Stanke et al., 2008; Li, 2023; Iwata and Gotoh, 2012; Gotoh, 2008; Buchfink et al., 2015; Kovaka et al., 2019; ; Huang and Li, 2023; Pertea and Pertea, 2020; Gabriel et al., 2024; 10.1007/978-1-4939-9173-0_5 | RRID:SCR_018964 | Gene model prediction via Docker container on softmasked genome; used both RNA-seq (--bam) and protein (--prot_seq) input; UTRs added with --addUTR=on; TSEBRA tuned to maximize BUSCO completeness on metazoa_odb10 |
| Software, algorithm | StringTie v3.0.0 | Shumate et al., 2022; 10.1371/journal.pcbi.1009730 | RRID:SCR_016323 | Transcript model prediction with --conservative and --mix options; GTFs merged with transcript merge mode |
| Software, algorithm | TransDecoder v5.7.0 | Haas, 2026; https://github.com/TransDecoder/TransDecoder | RRID:SCR_017647 | Translation of coding regions in transcripts (default parameters) |
| Software, algorithm | OMArk v0.3.0 | Nevers et al., 2022; 10.1101/2022.11.25.517970 | Annotation completeness assessment; ancestral clade Lophotrochozoa; run on webserver without splice information | |
| Software, algorithm | InterProScan v5.73–104 | Blum et al., 2025; 10.1093/nar/gkae1082 | RRID:SCR_005829 | Protein orthology and GO annotation with options -iprlookup -goterms |
| Software, algorithm | eggNOG-mapper v2.1.12 | Cantalapiedra et al., 2021; 10.1093/molbev/msab293 | RRID:SCR_021165 | Functional/orthology annotation via webserver with eggNOG v5.0 database, default parameters |
| Software, algorithm | Winnowmap2 | Jain et al., 2022; 10.1038/s41592-022-01457-8 | RRID:SCR_025349 | Whole-genome pairwise alignments of S. officinalis and A. esculentum (GCA_964036315.1) assemblies |
| Software, algorithm | R v4.4.2 | R Development Core Team, 2024 | RRID:SCR_001905 | Statistical environment for downstream analyses and visualization (whole-genome alignment plots and other custom scripts) |
| Software, algorithm | GENESPACE v1.2.3 | Lovell et al., 2022; 10.7554/eLife.78526 | Pairwise synteny analysis across all chromosomes of compared species with default parameters; riparian plots and pairwise dotplots | |
| Software, algorithm | DIAMOND2 | Buchfink et al., 2015; 10.1038/nmeth.3176 | RRID:SCR_016071 | Protein sequence similarity in fast mode within GENESPACE |
| Software, algorithm | OrthoFinder v2.5 | Emms and Kelly, 2019; 10.1186/s13059-019-1832-y | RRID:SCR_017118 | Orthogroup and pairwise orthologue inference with hierarchical orthogroups (HOGs); used within GENESPACE |
| Software, algorithm | OrthoFinder v3.1.0 | Emms et al., 2025; 10.1101/2025.07.15.664860 | RRID:SCR_017118 | Orthogroup inference across 13 molluscan species for gene family expansion analysis; default parameters; rooted species tree generated via STAG (Ponte et al., 2023) and STRIDE (Andrews et al., 2013) |
| Software, algorithm | MCScanX | Wang et al., 2012; 10.1093/nar/gkr1293 | RRID:SCR_022067 | Pairwise syntenic block identification (onlyOgAnchors = TRUE, blkSize = 5, nGaps = 5, blkRadius = 25, synBuff = 100, nSecondaryHits = 0) |
| Software, algorithm | dbscan (R package) | Hahsler et al., 2019; 10.18637/jss.v091.i01 | Density-based clustering of MCScanX anchor hits into syntenic regions | |
| Software, algorithm | bwa-mem2 v2.3 | Vasimuddin et al., 2019; 10.1109/IPDPS.2019.00041 | RRID:SCR_022192 | Alignment of Hi-C reads for breakpoint coverage analysis |
| Software, algorithm | pairtools v1.1.0 | Abdennur et al., 2023; 10.1101/2023.02.13.528389 | RRID:SCR_023038 | Quantification of Hi-C contacts; extraction of trans pairs from deduplicated read pairs (pair type UU) |
| Software, algorithm | pysam v0.22.1 | pysam-developers, 2026; https://github.com/pysam-developers/pysam | RRID:SCR_021017 | HiFi read depth via count_coverage (MAPQ ≥10, 1 kb bins); spanning reads identified by querying split alignments |
| Software, algorithm | STAR v2.7.11b | Dobin et al., 2013; 10.1093/bioinformatics/bts635 | RRID:SCR_004463 | Alignment of short reads to chromosome-scale assembly (sex chromosome analysis and RNA-seq) |
| Software, algorithm | mosdepth | Pedersen and Quinlan, 2018; 10.1093/bioinformatics/btx699 | RRID:SCR_018929 | Sequencing coverage calculation for sex chromosome analysis |
| Software, algorithm | ape v5.8.1 (R package) | Paradis and Schliep, 2019; 10.1093/bioinformatics/bty633 | RRID:SCR_017343 | Conversion of rooted OrthoFinder species tree to ultrametric tree |
| Software, algorithm | CAFE5 v5.1.1 | Mendes et al., 2021; 10.1093/bioinformatics/btaa1022 | RRID:SCR_018924 | Gene family evolution rate estimation |
| Software, algorithm | bedtools v2.30 | Quinlan and Hall, 2010; 10.1093/bioinformatics/btq033 | RRID:SCR_006646 | bedtools intersect for CDS–RepeatMasker overlap analysis of expanded gene family members |
| Software, algorithm | featureCounts (Subread v2.0.8) | Liao et al., 2014; 10.1093/bioinformatics/btt656 | RRID:SCR_012919 | Gene-level read counting from STAR-aligned RNA-seq BAMs (-t exon, -g gene_id, -p --countReadPairs, -Q 255) |
| Software, algorithm | DESeq2 v1.42.0 | Love et al., 2014; 10.1186/s13059-014-0550-8 | RRID:SCR_015687 | Tissue marker identification in bulk RNA-seq data |
| Software, algorithm | apeglm | Zhu et al., 2019; 10.1093/bioinformatics/bty895 | RRID:SCR_026951 | log2 fold-change shrinkage applied to DESeq2 results |
| Software, algorithm | clusterProfiler v4.12.6 | Yu et al., 2012; 10.1089/omi.2011.0118 | RRID:SCR_016884 | GO enrichment via enricher() with custom GO annotations from InterProScan |
| soff250801_mpibr | soff250801_dtol | |||||
|---|---|---|---|---|---|---|
| contigs | p_ctg | hap1 | hap2 | P_ctg | hap1 | hap2 |
| records | 8.289 | 10.651 | 10.425 | 8.783 | 11.026 | 11.089 |
| length.raw | 6.049.669.443 | 5.675.386.986 | 5.662.586.038 | 6.053.996.452 | 5.721.157.269 | 5.950.565.264 |
| length._min | 2.496 | 2.959 | 2.959 | 5.151 | 5.999 | 6.915 |
| length.n25 | 825.171 | 518.304 | 518.787 | 865.207 | 560.043 | 580.263 |
| length.n50 | 1.723.203 | 1.032.632 | 1.010.375 | 1.810 .137 | 1.165.578 | 1.182.649 |
| length.n75 | 3.129.788 | 1.845.042 | 1.789 .755 | 3.317 .923 | 2.125.467 | 2.203.556 |
| length.max | 14.924.420 | 7.470 .229 | 10.284.785 | 15.032.877 | 9.223.473 | 11.008.627 |
| length.med | 309.429 | 293.763 | 314.184 | 245.676 | 239.367 | 250.010 |
| length.avg | 729.843 | 532.850 | 543.173 | 689.285 | 518.878 | 536.618 |
| length.top46 | 339.372.830 | 213.767 .346 | 225.912.626 | 364.533 .811 | 278.738 .760 | 283.344.104 |
| frac.top46 | 5,609774769 | 3,766568633 | 3,98956633 | 6,021374705 | 4,872069529 | 4,761633415 |