(A) Example of Multiple-Sequence Alignment (MSA), here of the WW domain (PF00397). Each column corresponds to a site on the protein, and each line to a different sequence in the family. The color code for amino acids is as follows: red = negative charge (E,D), blue = positive charge (H, K, R), purple = non charged polar (hydrophilic) (N, T, S, Q), yellow = aromatic (F, W, Y), black = aliphatic hydrophobic (I, L, M, V), green = cysteine (C), grey = other, small amino acids (A, G, P). (B) In a Restricted Boltzmann Machine (RBM), weights connect the visible layer (carrying protein sequences ) to the hidden layer (carrying representations ). Biases on the visible and hidden units are introduced by the local potentials and . Owing to the bipartite nature of the weight graph, hidden units are conditionally independent given a visible configuration, and vice versa. (C) Sequences in the MSA (dots in sequence space, left) code for proteins with different phenotypes (dot colors). RBM define a probabilistic mapping from sequences onto the representation space (right), which is indicative of the phenotype of the corresponding protein and encoded in the conditional distribution , Equation (3) (black arrow). The reverse mapping from representations to sequences is , Equation (4) (black arrow). In turn, sampling a subspace in the representation space (colored domains) defines a complex subset of the sequence space, and allows the design of sequences with putative phenotypic properties that are either found in the MSA (green circled dots) or not encountered in Nature (arrow out of blue domain). (D) Three examples of potentials defining the hidden-unit type in RBM (see Equation (1) and panel (B)): quadratic (black, , ) and double Rectified Linear Unit (dReLU) (dReLU1 (green), , ; and dReLU2 (purple), , , , ) potentials. In practice, the parameters of the hidden unit potentials are fixed through learning of the sequence data. (E) Average activity of hidden unit , calculated from Equation (3), as a function of the input defined in Equation (2). The three curves correspond to the three choices of potentials in panel (A). For the quadratic potential (black), the average activity is a linear function of . For dReLU1 (green), small inputs barely activate the hidden unit, whereas dReLU2 (Purple) essentially binarizes the inputs .