Classification of sequencing reads in Thlaspi arvense WGS data.

(A) Workflow of the analyses, including reads classification (orange nodes) into target, ambiguous and exogenous reads, and downstream analysis (dark blue nodes) (see Methods). (B) Fractions of exogenous reads assigned to different taxonomic groups by MG-RAST (44,45). (C) Read counts assigned to nine selected groups in our 207 T. arvense samples from different European regions. (D) Aphids and mildew occurring on T. arvense leaves during our experiment.

Population differences and SNP-based heritability for different types of exogenous read counts.

Population differences were tested with a linear model, SNP-based heritabilities (and their confidence intervals) estimated with the R package heritability.

Relationships between climates of origin or glucosinolate levels of plants and the exogenous reads loads.

(A) Correlations with bioclimatic variables. (B) Correlations with baseline glucosinolate (GS) levels measured in the same pennycress lines in another experiment. All correlations in (A) and (B) were done after correction for population structure. Aphid-related read counts are in green, mildew-related in gray, others in black. (C) Boxplot of the aphid reads residuals in samples where benzyl GS was detected vs. not.

Genome-wide association analyses for aphid and mildew loads.

We show only the results for M. persicae and MG-RAST Erysiphales read-counts; for full results see S3 Fig. (A) Manhattan plots, annotated with genes potentially affecting aphid/mildew colonization. The genome-wide significance (horizontal red line) was calculated based on unlinked variants (53), the blue line corresponds to –log(p) = 5. (B) Corresponding to the Manhattan plots on the left, enrichment of a priori candidates and expected false discovery rates (as in (52)) for increasing significance thresholds. (C) Allelic effects of the red-marked variants in the corresponding Manhattan plots, with genotypes on the x-axes and the read-count residuals on the y-axes. (D) The candidate genes marked in panel A, their putative functions and distances to the top variant of the neighboring peak. Candidates in dark blue are the a priori candidates included in the enrichment analyses and involved in defense response (GO:0006952). GS: glucosinolates. (E) Zoom-in for the Manhattan plot of Erysiphales load, around the first peak in Scaffold 1, with gene and TE models below, and a priori candidates in blue.

Differential methylation associated with aphid and mildew loads.

(A) Differentially Methylated Region (DMR) densities in different genomic features when comparing the 20 samples with the most vs. the least M. persicae (top) or Erysiphales (bottom) load. CDS: coding sequences. (B) Manhattan plots from EWA analyses based on individual cytosines within DMRs, with sequence contexts in different colours and annotation of genes close to low P-value cytosines. The genome-wide Bonferroni significance thresholds (dashed red lines) were calculated based on the number of DMRs. (C) Candidate genes and TEs marked in panel B, their putative functions, genomic locations of associated DMRs, and whether affected samples were hyper- or hypomethylated. (D) Zoomed-in Manhattan plot for Erysiphales load around the peak in Scaffold 4, with gene and TE models given below. The CG methylation in the 20 most and least affected samples was calculated over 50 bp bins (see S5C Fig for methylation in other contexts).