Repeated clustering with Gaussian mixture models (GMMs) produces reliable clusters for zebra finch syllable latent features with six clusters, but not for mouse syllable latent features with more than two clusters. Both sets of syllable latent descriptions (zebra finch, 14,270 syllables; mouse, 15,712 syllables) are repeatedly split into thirds. The first and second splits are used to train GMMs (full covariance, best of five fits, fit via expectation maximization), which are used to predict labels on the third split. Given these predicted labels and a matching of labels between the two GMMs, a syllable can be considered consistently labeled if it is assigned the same label class by the two GMMs. The Hungarian method is used to find the label matching that maximizes the portion of consistently labeled syllables. Top row: the portion of consistently labeled syllables for 20 repetitions of this procedure is shown for varying numbers of clusters, k. Note that zebra finch syllables achieve near-perfect consistency for six clusters, the number of clusters found by hand labeling (syllables A–F in Figure 6c), and this consistency degrades with more clusters. By contrast, the consistency of mouse USV clusters is poor. Somewhat surprisingly, the clustering is very consistent for two clusters (k = 2). To test whether this is a trivial effect of rarely used clusters, we calculated the entropy of the empirical label distributions (bottom row). We find in each case, and specifically the case, that the empirical distribution entropy is close to the maximum possible entropy (plotted as dashed horizontal lines), indicating that all clusters are frequently used. While consistent with mouse USVs forming two clusters, this result is not sufficient proof of syllable clustering or even bimodality. Yet, for cluster identity to be a practical syllable descriptor, clusters should be readily identifiable from the data. Thus the combination of VAE-identified latent features and Gaussian clusters does not appear to be a suitable description of mouse USVs for k > 2 clusters. Note that the portion of consistently labeled zebra finch syllables for k < 6 clusters form well-defined bands at multiples of , indicating that cluster structure is so well defined by the data that the GMMs do not split individual clusters across multiple Gaussian components.