Overview of the benchmark. (A) The datasets used consist of silver standards generated from single-cell RNA-seq data, gold standards from imaging-based data, and a case study from liver data. Our simulation engine synthspot enables creation of artificial tissue patterns. (B) We evaluated deconvolution methods on three overall performance metrics (RMSE, AUPR, and JSD), and further checked specific aspects of performance, i.e., how well methods detect rare cell types and handle reference datasets from different sequencing technologies. (C) Our benchmarking pipeline is entirely accessible and reproducible through the use of Docker containers and Nextflow. (D) To evaluate performance on the liver case study, we leveraged prior knowledge of the localization and composition of cell types in the liver to calculate the AUPR and JSD. We also investigated method performance on three different sequencing protocols.