A. Keypoint detection performance of HRNet-W48 models measured using PCK values at different thresholds. Left: Models trained on the full training sets of COCO, OpenApePose (OAP), and OpenMonkeyPose (OMP), and tested on the same dataset, as well as across datasets. Right: Models trained on different sizes of the full OAP training set, and tested on the OAP testing set. B. Barplots showing the keypoint detection performance of state of the art (HRNet-W48) models as measured using percent keypoints correct at 0.2 (PCK@0.2) and area-under-the-curve (AUC) of the PCK curves at thresholds ranging from 0.01-1. Error bars: standard deviation of the performance metrics. Models are trained on different sizes of the full training set of OAP and tested on held-out OAP test sets. C. Same as 4B but: models are trained on full training sets of COCO, OAP, and OMP, and tested on the same dataset, as well as across datasets.