Information on each culture sequenced in this study, divided into sections by topic. See Materials and methods for details on each topic. Note that we sequenced and assembled two strains of Stephanoeca diplocostata. We determined their protein catalogs to be equivalent for the purposes of gene family construction, and so we used only one of the two cultures to represent the species (7.2, ATCC 50456 isolated from France). Strain information: information about the cultures used, including American Type Culture Collection (ATCC) number for each culture and NCBI Taxonomy ID for each species. Previous names for cultures indicate prior names used as labels in culture collections or publications, but which were subsequently determined to have been incorrectly applied [some of these species names are no longer valid (Codosiga gracilis, Acanthoecopsis unguiculata, Diplotheca costata, Savillea micropora, Monosiga gracilis and Monosiga ovata), whereas others are still valid at the time of publication, but their descriptions did not match the cultures (Salpingoeca amphoridium, Salpingoeca minuta, Salpingoeca pyxidium, Salpingoeca gracilis and Salpingoeca napiformis)]. Previous names for species were never applied to the cultures used, but are considered to be invalid names that were previously used for the species and have been subsequently replaced. Origin of cultures: source, isolation location and year, if known. Growth media, antibiotic treatments and cell types: the sequence of antibiotic treatments used to obtain a culture for high-volume growth, the growth medium and temperature, and the estimated proportion of each cell type in sequenced cultures. Culture conditions in large batches for harvesting: information on how each culture was grown and harvested for mRNA sequencing. Amount of total RNA used for each sample and groupings for preparation and sequencing: the amount of total RNA used for each culture, based on the estimated proportion of choanoflagellate RNA versus bacterial RNA in each total RNA extraction, with the goal of beginning each sequencing preparation with 2 µg of choanoflagellate total RNA. Library prep. group and Seq. flow cell group are arbitrary labels to indicate which sets of samples were prepared for sequencing at the same time or sequenced on the same flow cell. Read counts and quality trimming: results of sequencing before quality trimming and after quality trimming. Read error correction: parameters for error correction by Reptile, adapted to each set of reads independently. Assembly read counts and N50s: the counts of contigs and predicted proteins at each step of the assembly process. N50 is the length at which 50% of the nucleotides/amino acids in the assembly are contained in contigs/proteins greater than or equal to that length. NCBI Short Read Archive: identifiers to retrieve the raw, unprocessed reads for each library at the NCBI Short Read Archive. NCBI Transcriptome Shotgun Assembly (TSA): identifiers to retrieve the unannotated assembled contigs for each species, the counts of contigs entirely excluded or trimmed due to adapter or bacterial sequences identified during the submission process, and the counts of contigs removed because they were below the minimum length of 200 bases (after trimming) imposed by the TSA.