Figures and data

Overview and quality assessment of the ProteinConformers and ProteinConformers-lite datasets.
(A) Schematic conformational energy landscape with representative conformers positioned in distinct minima of T1035, illustrating the breadth of physically realistic states captured by ProteinConformers. (B) Sequence-length distribution of ProteinConformers. (C) Conformer counts per protein, sorted by sequence length; points are colored by the log10 of near-native (TM-score ⩾ 0.5) conformation counts, with representative structures shown as insets. (D) TM-score coverage across proteins from ProteinConformers-lite and sampled ATLAS dataset, binned into 32 equal-width intervals from non-native to near-native. (E) Comparison of φ, ψ, ω dihedral angle and C–N bond length distributions between ProteinConformers-lite and Top2018 dataset, quantified by Jensen–Shannon divergence (JS), Earth Mover’s Distance (EMD), and Pearson correlation coefficient (r), demonstrating consistent local stereochemical quality. (F) Ramachandran outlier rates were averaged within 32 equal-width TM-score bins. The mean outlier rate of Top2018 dataset (13%, green dashed line) and the TM-score threshold of 0.5 separating non-native and near-native states are shown, with a cubic fit (blue curve) summarizing the trend.

Interactive web interface of the ProteinConformers dataset.
(A) Main overview table displaying all 734 proteins in the dataset, with sortable columns for structural metadata and CASP-related annotations. (B) Dashboard view for a selected target (e.g. T0819), showing an interactive 3D alignment of the native structure (green) with a selected decoy (orange), along with basic protein metadata and secondary structure composition. (C) Visualization of the native structure’s distance map (top) and orientation map (bottom), supporting global structural comparison. (D) Detailed table listing decoy models associated with the selected target, including secondary structure content, similarity scores, and energetic profiles. (E) Interface for selecting and filtering decoys by structural or energetic criteria. Selected decoys can be downloaded as a customized dataset. (F) Download panel summarizing metadata for the selected protein, including file sizes and sequence length, and providing options for downloading either the native structure or the full conformer set.

Geometric features used for CGM.
For Ω, Θ, and Φ, a pseudo Cβ is used for glycine.