Tools and Resources

Computational and Systems Biology

Martinize2 and Vermouth provide a unified framework for molecular topology generation

Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Netherlands
Heidelberg Institute for Theoretical Studies (HITS), Germany
Interdisciplinary Center for Scientific Computing, Heidelberg University, Germany
CiTIUS Intelligent Technologies Research Centre, Spain
Laboratoire de Biologie et Modélisation de la Cellule, CNRS, France
Centre Blaise Pascal de Simulation et de Modélisation Numérique, Ecole Normale Supérieure de Lyon, France

Nov 20, 2025

https://doi.org/10.7554/eLife.90627.4

Open access
Copyright information

Figures
Tables
Additional files

11 figures, 3 tables and 1 additional file

Figures

Figure 1

Download asset Open asset

Fundamental stages in topology generation from atomistic structures.

First, the provided input is parsed (step 1). Second, for every parsed residue, its atoms are identified and, if needed, atom names are corrected and missing atoms are added (step 2). Third, mappings are taken from the library and a resolution transformation to the required output resolution is performed (step 3). Fourth, intra-residue interactions are added from blocks taken from the library, and inter-residue interactions are added from links taken from the library (step 4). Fifth, optionally, post-processing is performed to add, for example an EN (step 5). Finally, the produced topology is written to output files (step 6).

Figure 2

Download asset Open asset

Organization of the Vermouth library.

The vermouth library defines 5 types of data structures (blue) to store molecular information and force field information. For convenience, it also defines two collection classes (orange) composed of several data-structure instances. Data structures are initiated or get input from parsers, which read 6 types of data files (see Appendix 1—table 1 for more details on file types). The central data structure(s) are **Molecule** and **System**. These are changed, updated, or transformed by so-called **Processor** classes, which take force field data as input. Parsers, data structures, and **Processors** only depend on three Python libraries as shown. At the moment, vermouth also exposes four types of writers (not shown here) to go along with the parsers (see Appendix 1—table 2).

Figure 3

Download asset Open asset

Illustration of atom recognition, mapping, and linking in topology generation.

(a) To identify all atoms in the input molecule (black and orange) every residue in the molecule is overlaid with its canonical reference (blue and green). Atoms are recognized when they overlap with atoms in the reference (green). Atoms not present in the molecule are also identified (blue) and will be added later. Finally, atoms in the molecule not described by the canonical references are also labeled (orange) so that they may be identified later. (b) Identifying the terminal atoms that are not part of the canonical residues. The modification templates are depicted in blue and the atoms they match in orange. The cysteine does not participate since it does not carry any unexpected atoms and is depicted in gray for clarity. (c) Mappings (blue, orange, and green) describe a molecular fragment at two different resolutions and a correspondence between their particles. The correspondence is depicted approximately here. The mappings are applied to the molecule (black). (d) Example of applying a Link. The link depicted (dark blue) adds an angle potential over CG backbone beads.

Figure 4

Download asset Open asset

Workflows for identifying protonation states or PTMs exemplified on protonated histidine.

In route (a), the residue name of the protonated histidine extracted from the atomistic coordinates matches the residue name in the library and matches the fragment. Hence, the protonation state is correctly picked up. In route (b), the residue name matches that of neutral Histidine in the library. A mismatch of the fragments is recognized, and the extra hydrogen is labeled. Subsequently, by matching the extra hydrogen to a modification of the Histidine block, the protonated Histidine is recognized as neutral Histidine plus protonation modification, and the correct parameters for protonated Histidine are generated.

Figure 5

Download asset Open asset

Example of automated identification of PTMs.

CG Martini model of phosphorylated Tyrosine found in the EGFR kinase activation loop. The mapped structure of the phosphorylated residue is shown as beads overlying the atomistic structure.

Figure 6

Download asset Open asset

Fine-tuning options for the elastic network.

(a) ENs and backbone bonds within the human insulin dimer when generated with the molecule or all-option. The dimer consists of two chains colored in red and orange, which are connected by two disulfide bridges shown in purple. EN bonds are generated between the two chains and within the chains. (b) EN and backbone bonds within the insulin dimer when generated with the chain option. In this case, no elastic bonds are generated between the two chains. They are only connected by the disulfide bridge and non-bonded interactions. (c) EN within the FtsZ protein, when generated for both the intrinsically disordered tail domains (orange) and structural domain (red) (d) EN within the FtsZ protein when the EN is only generated within the structural domain by defining the EN unit as going from residues 12–320.

Figure 7

Download asset Open asset

Ligands, cofactors, and polymers transformed to CG Martini level.

(a) Flavin Reductase with two FMNs and one NDP cofactor bound in the reference AA state and mapped to Martini CG as indicated by the spheres. The inset shows a zoom onto FMN; (b) Lysozyme with benzene ligand bound in the reference AA structure and mapped to Martini CG resolution; (c) Crown ether with Martini beads shown on top of the AA structure; (d) Branched polyethylene at AA resolution (left) and Martini resolution (right) with the linear chain part shown in gray and the branches in yellow.

Figure 8

Download asset Open asset

Summary of the successes and failures of the high-throughput pipeline.

We ran the pipeline on the 87084 structures from the template library used by the I-TASSER (Yang et al., 2015) protein prediction software, of which 73% could be converted with Martinize2. The other 26.4% failed mostly due to missing coordinates and unrecognized residues. For 100% of the converted structures, a GROMACS run input file (i.e. tpr-file) could be generated, and on all but 13 of the converted structures, an energy minimization could be performed.

Figure 9

Download asset Open asset

Two examples of problematic atomistic protein structures flagged by Martinize2.

(a) The cysteine residue with too small O-O and O-C distances leads to superfluous bonds being recognized. (b) The incorrect interatomic distances in the histidine ring led to missing bonds (transparent), an erroneous O-N bond connecting the histidine to a neighboring asparagine. Additionally, a nitrogen atom is switched for an oxygen atom in asparagine.

Appendix 3—figure 1

Download asset Open asset

Comparison of processing speeds between Martinize and Martinize2.

11 Protein structures with increasing size from a dataset of 200,000 structures were processed with Martinize and Martinize2 to record the computation time. The protein structures were spaced evenly by residue count.

Appendix 4—figure 1

Download asset Open asset

Example of finding all LCISs between graphs X and Y.

Grayed out nodes are not used (they are excluded from the comparison by the shrinking step), but are depicted for clarity. Since nodes A₁ and A₂ in Y are symmetry equivalent, not all subgraphs are taken into account. Those that are excluded due to symmetry reasons are depicted in the box Symmetry pruned. Iteration 1: We try to find a subgraph isomorphism between X and Y. None is found. Iteration 2: Y is shrunk to produce the graphs depicted. We try to find subgraph isomorphisms between these and X. None are found. Iteration 3: all graphs from iteration 2 are shrunk further. Since a subgraph isomorphism can be found between at least one of these ({A₁, A₂, B}) and X, the algorithm terminates afterwards. To highlight how often the algorithm discovers that {A₁, B} is subgraph isomorphic to X, it is shown in bold.

Tables

Appendix 1—table 1

Data Parsers object returned as well as format definition and extension.

Extension	Data class	Parser name	Input format
.ff	Links Block Modifications	read_ff	in house force-field format
.itp	Block	read_itp	GROMACS topology file; all [molecule] directive content
.map	Mapping	read_mapping	mapping file as defined using backwards style
.pdb	System	read_pdb	canonical PDB format
.gro	Molecule	read_gro	Gromacs.gro file

Appendix 1—table 2

Data Writers and the object returned as well as format definition and extension.

Input format	Data class	Parser name	Output format
.gro	System	write_gro	G96 gro file
.pdb	System	write_pdb	PDB file

.top	System	write_top	Pseudo topology file
.itp	System	write_itp	GROMACS topology file; all [molecule] directive content

Appendix 2—table 1

Limited overview of selected competing tools capable of generating MD topologies.

‘Force Field’ lists the force fields for which this tool can generate topologies without changing the source code. ‘Type of system’ describes the type of system this tool can generate topologies for. ‘External data files’ means whether the force field parameters used are included in separate data files, making it possible to easily change them. ‘Notes’ lists additional remarks and comments, ‘builds coordinates’ means it is capable of constructing coordinates for complete systems, rather than only for e.g. missing side chains.

Name	Force field	Type of system	External data files	Notes
pdb2gmx (Abraham et al., 2015; Páll et al., 2015)	Any AA/UA	Linear polymers	Yes
LEaP (Case et al., 2005)	Any AA/UA	Linear polymers	Yes
CHARMM (Brooks et al., 2009)	Any AA/UA	Linear polymers	Yes
Psfgen (Phillips et al., 2005)	Any AA/UA	Linear polymers	Yes
Martinize 1 (de Jong et al., 2013; Uusitalo et al., 2015)	Martini	Proteins, DNA	No
Sirah Tools (Machado and Pantano, 2016)	Any CG	Linear polymers	Yes	Performs mapping only
DoGlycans (Danne et al., 2017)	AMBER, OPLS	Sugars	Yes	Builds coordinates
HOOBAS (Girard et al., 2019)	Multiple	Multiple	Yes	No user interface, builds coordinates
CHARMM-GUI (Jo et al., 2017; Qi et al., 2015; Jo et al., 2008)	Multiple	Multiple	No	Web server, builds coordinates
VerMoUTH/Martinize2	Multiple	Multiple	Yes	This work
ATB (Malde et al., 2011; Canzar et al., 2013)	GROMOS54a7 GROMOS54a8	Small molecules	N/A	Automatic de novo parametrization
LigParGen (Jorgensen and Tirado-Rives, 2005; Dodda et al., 2017b; Dodda et al., 2017a)	OPLS-AA	Small molecules	N/A	Automatic de novo parametrization
CGenFF (Vanommeslaeghe and MacKerell, 2012)	CHARMM General Force Field	Small molecules	N/A	Automatic de novo parametrization