Architectural overview of the proposed model. AMR labels of spectrum-drug pairs can be represented in an incomplete matrix. A microbial sample that is susceptible to a drug is denoted by a negative label (orange), whereas positive labels (blue) signify an intermediate or resistant combination. Instance (spectrum) and target (drug) embeddings xi and t j are obtained from their respective input representations passed through their respective neural network branch. The two resulting embeddings are aggregated to a single score by their (scaled) dot product. The cross-entropy loss optimizes this score to be maximal or minimal for positive or negative combinations of microbial spectra and drugs, respectively. On the upper right-hand side, different metrics are visualized. Whereas micro ROC-AUC takes all prediction-label pairs together, the instance-wise and macro ROC-AUC compute their score per spectrum or drug, respectively, and then average.

All tested model sizes for the (instance) spectrum branch.

Hidden sizes represent the evolution of the hidden state dimensionality as it goes through the model, with every hyphen defining one fully connected layer. The listed number of parameters only include those of the instance (spectrum) branch.

Barplots showing test performance results for all trained models. Errorbars represent the standard deviation over three random seeds. The x-axis and colors show the different drug and spectrum embedders, respectively.

Test performance of selected dual-branch models, compared to the performance of a collection of models — each trained on only one species-drug combination — coined “specialists”.

Performance is computed on the subset of labels spanning the 200 most-common species-drug combinations.

Transfer learning of DRIAMS-A models to other hospitals. Errorbands show the standard deviation over three runs. Results in terms of other evaluation metrics are shown in Appendix C Figure 10.

UMAP scatterplots of test set MALDI-TOF spectra embeddings xi. Only embeddings belonging to the 25 most-occurring species in the test set are shown. The panels on the right show the same embeddings as on the left, but colored according to its AMR status to a certain drug. The four displayed drugs are selected based on a ranking of the product of the number of positive and negative labels . In this way, the drugs that have a lot of observed labels, both positives and negatives, are displayed.

Full list of modifications made to drug names in DRIAMS.

Modifications consist of (1) removal of drugs, (2) merging of drugs, and (3) renaming drugs.

Structure used for the residual blocks, used in the 1D CNN, 2D CNN, and transformer. In the case of convolutions, the output is zero padded so as to produce the same output dimensions as in the input.

Overview of all different drug embedders tested in this work. One-hot embeddings are the only technique not incorporating prior knowledge of the structure of the compound. Hence, they are the only technique incapable of directly transferring to new compounds. Morgan fingerprints produce a bit-vector containing information on the presence of certain substructures. DeepSMILES strings are encoded and processed with a 1D CNN, GRU, or transformer. Drawings of molecules are processed with a 2D CNN. A string kernel on SMILES strings produces a numerical vector for every drug (taken as the row in the resulting Gram matrix).

All hyperparameter tuning experiments. All evaluations are listed in terms of validation Micro ROC-AUCs. All numbers are averages of three model runs, with errorbars showing standard deviations. In every experiment, the highest average is chosen to use in the final models.

Full table of test results.

The listed averages and standard deviations are calculated over three independent runs of the same model. The best models for every metric per drug embedder are underlined. The overall best model for every metric is in bold face.

Barplots showing test performance results for all trained models. Colors represent the different spectrum embedder model sizes. Performance is shown in terms of Macro ROC-AUC (computed per drug and averaged) and in terms of Instance-wise Prec@1 of the negative class. The Prec@1 evaluates how often the top-ranked prediction is correct. The Prec@1 of the negative class, hence, reports the proportion of cases for which the “most-likely susceptible drug” prediction is actually an effective one. In a scenario where the top recommended drug is always administered, it corresponds to the percentage of correctly-suggested treatments. Errorbars represent the standard deviation over three random seeds.

Performance of models compared against a linear spectrum embedder baseline. The comparison is only shown for the two best-performing drug embedders (in terms of Micro ROC-AUC). Errorbars represent the standard deviation over three random seeds.

Transfer learning of DRIAMS-A models to other hospitals. Errorbands show the standard deviation over three runs.

UMAP scatterplots of test set MALDI-TOF spectra embeddings xi. Only embeddings belonging to the 25 most-occurring species in the test set are shown. Spectra are colored according to its AMR status to a certain drug. The twenty displayed drugs are selected based on a ranking of the product of the number of positive and negative labels. In this way, the drugs that have a lot of observed labels, both positives and negatives, are displayed. The drugs here are ranked 5-24 (the first four are shown in Figure 4).