(a) Strategy for designing germline-optimized coding sequences. We used an RNAseq dataset from one-cell embryos (Gerstein et al., 2014; Hillier et al., 2009) as a proxy for germline-expressed mRNAs, because one-cell embryos are not transcriptionally active and thus their entire mRNA content should be derived from the maternal germline. We made a list of all 12-nucleotide coding words and assigned each word a score that is the sum of the RPKM expression values of the germline mRNAs in which that word appears. Thus, words that appear frequently, or in highly-expressed germline genes, will have high scores. Although the germline silencing machinery is presumably blind to reading frame, we only considered in-frame 12mers so that our design algorithm would implicitly account for codon usage. Finally, we developed an algorithm to assemble any desired coding sequence from our list of coding 12mers. The algorithm is as follows: (1) Based on the desired amino acid sequence, assemble a list of all 12mers that could appear in the coding sequence, and sort these words by score; (2) Assemble a draft coding sequence by plugging in one word at a time, beginning with the highest-scoring possible word and continuing until the sequence is complete; (3) Compute a score for the entire draft sequence by averaging the scores of each word it contains; (4) Refine the sequence by randomly choosing one word at a time, changing it to a different word, and checking whether the overall sequence score improves. Repeat step (4) until no further improvements are found after a certain number of iterations, which is chosen based on the sequence length such that each residue has a 99% chance of being tested at least once. The random optimization in step (4) is necessary because step (2) favors the highest-scoring words without considering context. Choosing a high-scoring word at one position constrains the subsequent choice of words that overlap the high-scoring word. In some cases, a higher overall score results from choosing two moderately-scoring words, rather than a high- and a low-scoring word, when the two words overlap. Random optimization ensures that these cases are found and maximizes the overall score of the designed sequence. (b) Results of applying our germline optimization algorithm to a selected set of fluorescent protein tags. Codon adaptation index (Redemann et al., 2011) and our germline optimization score are shown for coding sequences designed to maximize optimal codon usage (CAI = 1 indicates that sequence was designed to have a codon adaptation index as close as possible to 1.00) or to maximize the germline optimization score (GLO, GermLine Optimized). Germline optimization significantly increases the germline optimization score as expected, at the cost of a moderate reduction in codon adaptation index. Despite the lower codon adaptation index, germline-optimized transgenes were most often expressed at higher levels than their codon-optimized counterparts (see above and our unpublished observations). Constructs containing the listed germline-optimized fluorescent protein sequences will be deposited at Addgene.