Weight loss, insulin resistance, and study design confound results in a meta-analysis of animal models of fatty liver

  1. Harriet Hunter
  2. Dana de Gracia Hahn
  3. Amedine Duret
  4. Yu Ri Im
  5. Qinrong Cheah
  6. Jiawen Dong
  7. Madison Fairey
  8. Clarissa Hjalmarsson
  9. Alice Li
  10. Hong Kai Lim
  11. Lorcan McKeown
  12. Claudia-Gabriela Mitrofan
  13. Raunak Rao
  14. Mrudula Utukuri
  15. Ian A Rowe
  16. Jake P Mann  Is a corresponding author
  1. School of Clinical Medicine, University of Cambridge, United Kingdom
  2. Leeds Institute for Medical Research & Leeds Institute for Data Analytics, University of Leeds, United Kingdom
  3. Institute of Metabolic Science, University of Cambridge, United Kingdom

Abstract

The classical drug development pipeline necessitates studies using animal models of human disease to gauge future efficacy in humans, however there is a low conversion rate from success in animals to humans. Non-alcoholic fatty liver disease (NAFLD) is a complex chronic disease without any established therapies and a major field of animal research. We performed a meta-analysis with meta-regression of 603 interventional rodent studies (10,364 animals) in NAFLD to assess which variables influenced treatment response. Weight loss and alleviation of insulin resistance were consistently associated with improvement in NAFLD. Multiple drug classes that do not affect weight in humans caused weight loss in animals. Other study design variables, such as age of animals and dietary composition, influenced the magnitude of treatment effect. Publication bias may have increased effect estimates by 37-79%. These findings help to explain the challenge of reproducibility and translation within the field of metabolism.

eLife digest

Obesity and diabetes are increasingly common diseases that can lead to other complications such as fatty liver disease. Fatty liver disease affects one in five people and is caused by a built-up of fat in the liver, which can result in scarring of the liver tissue and other serious complications.

There is currently no cure for fatty liver disease. Drugs that have been effective in treating the condition in mice, lack efficacy in humans. To better understand why this is the case, Hunter, de Gracia Hahn, Duret, Im et al. conducted a review of over 5,000 published studies, analysing over 600 experiments.

Hunter et al. asked which drugs improved fatty liver in mice the most and if they had the same effect in humans. They also tested whether the age of the mice affected the outcome of the experiments. The analyses revealed that the drugs that work best in mice are different to the ones that show some effect in humans.

In mice, many of the drugs reduced their weight or lowered their blood sugar levels, which also improved the fatty liver condition. Moreover, drugs appeared to be less effective the older the mice were. However, most of these drugs do not cause weight loss or lower blood sugar levels in humans, suggesting that factors other than the intended action of these drug could affect the outcome of a mouse study.

These findings will help shape future research into obesity, diabetes and fatty liver disease using mice. They highlight that results obtained from studies with mice so far do not predict if a drug will work in humans to treat fatty liver disease. Moreover, weight loss seems to be the most important factor linked to how efficiently a drug treats fatty liver disease.

Introduction

Interventional studies in animals are an integral component of drug development. If a disease can be suitably modelled in an animal, then the therapeutic response to a treatment observed in animals should inform its potential efficacy in humans (Howells et al., 2014). However, there is a well-documented translational gap between preclinical studies and subsequent outcomes in humans (Hackam and Redelmeier, 2006; Landis et al., 2012; Perel et al., 2007). Multiple factors contribute to this, including bias within study design (Macleod et al., 2015), insufficiently powered preclinical studies (Macleod et al., 2005), and biological differences between species (Mestas and Hughes, 2004; Rangarajan and Weinberg, 2003).

Systematic analyses of preclinical studies have found that publication bias may account for at least a third of the estimate of efficacy in trials (Henderson et al., 2015; Sena et al., 2010; van der Worp et al., 2010). In addition, other variables of animal model design can influence the magnitude of the treatment response (Watzlawick et al., 2019) and reporting of model design is often incomplete (Flórez-Vargas et al., 2016). These findings are highly relevant in the context of the ‘reproducibility crisis’ (Baker, 2016; von Herrath et al., 2019) as well as having ethical implications for the use of animals in research that is not of optimum quality (Prescott and Lidster, 2017).

Non-alcoholic fatty liver disease (NAFLD) is a highly active field of animal research (Brenner, 2018; Farrell et al., 2019). NAFLD is a common condition characterised by increased liver fat (hepatic steatosis) that may progress to inflammation in the form of non-alcoholic steatohepatitis (NASH) and fibrosis (Sanyal, 2019). Cirrhosis, end-stage liver disease, and hepatocellular carcinoma develop in a small proportion of patients. However, due to the high prevalence of obesity, NAFLD is the second most common indication for liver transplant in the United States (Younossi et al., 2018), predicted to overtake hepatitis C virus. NAFLD is intricately related with insulin resistance and therefore usually coexists with other features of the metabolic syndrome, such as type 2 diabetes and its recognised complications including cerebrovascular disease, coronary artery disease, and chronic kidney disease (Byrne and Targher, 2015).

There are currently no approved pharmacological therapies for NAFLD (Chalasani et al., 2018). Several Phase three trials are ongoing (Ratziu et al., 2019), but many interventions that appeared to have substantial efficacy in preclinical models have failed to be replicated in humans (Budas et al., 2016; Harrison et al., 2018; STELLAR-3 and STELLAR-4 Investigators et al., 2020; Sanyal et al., 2014). These studies have used a wide range of preclinical NAFLD models, including genetically modified animals (e.g. leptin deficient ob/ob mice), hypercaloric diets (e.g. high-fat diet), and toxic insults (e.g. streptozocin injections), all of which may be used in varying combinations and with different parameters (Anstee and Goldin, 2006). It is not known if, or which of, these variables influence treatment response to therapeutic agents in preclinical models of NAFLD, and which models are better predictors of response in humans.

Therefore, we performed a meta-analysis of interventional rodent studies of NAFLD to describe which drug classes were associated with improvement in NAFLD and whether any study characteristics (or biases) were linked to the magnitude of effect.

Results

We performed a systematic search to identify interventional studies in rodent models of NAFLD. Our searches yielded 8621 articles, which after screening gave 5458 articles for full-text review (Figure 1). Studies were included in the meta-analysis if they used a pharmacological class that had been used in Phase 2 or three trials for NAFLD in humans (Supplementary file 1) and reported at least one of: hepatic triglyceride content, NAFLD Activity Score (NAS, or any of its components), portal inflammation, or fibrosis stage. After adjustments made for shared controls, 414 studies were included in the meta-analysis, comprising 603 cohorts of rodents (10,364 animals). Studies were predominantly performed in male animals (527/578, 91%). The median age at the start of intervention was 9-weeks old (range 0.6–80 weeks) for a median duration of 6 weeks (range 1 day – 60 weeks).

Study inclusion and exclusion flow chart.
Figure 1—source data 1

Dataset used in this meta-analysis.

Details and raw data of all studies included in the meta-analysis. This data can be used with the R code found in Supplementary Methods to run all analyses.

https://cdn.elifesciences.org/articles/56573/elife-56573-fig1-data1-v2.xlsx

Hepatic triglyceride content was the most widely reported measure: 474/603 (79%) cohorts. Steatosis grade was the most frequently reported histological measure (174/603 (29%) cohorts), compared to: NAS 144/603 (24%), lobular inflammation 143/603 (24%), ballooning 106/603 (18%), and fibrosis in 58/603 (9.6%) cohorts. Portal inflammation was only reported in 8 cohorts from three studies, therefore meta-analysis was not possible for this outcome.

Meta-analysis of hepatic triglyceride content

We used random-effects meta-analysis to estimate the mean difference (MD) in hepatic triglyceride (TG) content between intervention and control groups (Figure 2A). The overall mean difference in hepatic TG content was −29.9% (95% CI −33%, −27%) with considerable between-study heterogeneity (I2 = 90% (95% CI 89%, 90%), PQ <1×10−300). Exclusion of outliers minimally affected the overall estimate (−30.2% (95% CI −33%, −27%), Figure 2—source data 1).

Figure 2 with 2 supplements see all
Meta-analysis of hepatic triglyceride content in rodent studies of NAFLD.

(A) Forest plot with subgrouping by class of drug. Individual studies have been hidden and only subgroup summaries are illustrated. Results are expressed as a percentage difference relative to control (/placebo). The total number of animals per subgroup is calculated from the sum of control and interventional animals for each subgroup. CI, confidence interval; DPP4, Dipeptidyl peptidase-4; FXR, Farnesoid X receptor; GLP-1, Glucagon-like peptide-1; MD, mean difference; LXR, Liver X receptor; PDE, Phosphodiesterase; PPAR, Peroxisome proliferator-activated receptor; SCD-1, Stearoyl–CoA desaturase-1; SGLT2, Sodium-glucose co-transporter-2; TUDCA, Tauroursodeoxycholic acid. (B) Meta-regression bubble plot using (log) difference in weight between intervention and control animals, after removal of studies using models that induce weight loss. (C) Meta-regression bubble plot using (log) difference in fasting insulin between intervention and control animals, after removal of studies using models that induce weight loss.

Figure 2—source data 1

Results of meta-analysis and meta-regression of hepatic triglyceride content in rodent studies of NAFLD.

Tab 1. Results from meta-analysis of hepatic triglyceride with subgroup by drug class. Tab 2. Results from meta-analysis of hepatic triglyceride with subgroup by individual drug. Tab 3. Results from meta-analysis of hepatic triglyceride with subgroup by drug class, after removal of outlier studies. Tab 4. Results from univariable meta-regression analyses. Tab 5. Results from model 1 (without drug) and model 2 (including drug used) multivariable meta-regression analyses.

https://cdn.elifesciences.org/articles/56573/elife-56573-fig2-data1-v2.xlsx

For comparison, a relative decline of liver fat by ≥30%, as measured by magnetic resonance imaging proton-density fat fraction (MRI-PDFF), has been determined as the reduction required to achieve histological response in humans with NAFLD (Jayakumar et al., 2019; Loomba et al., 2020; Stine et al., 2020).

We hypothesised that much of this heterogeneity would be due to the different drug class interventions, with some classes having a greater effect than others. On meta-analysis using drug class as a subgroup, 22/28 (79%) of drug classes demonstrated a significant reduction in hepatic TG (i.e. the upper limit of their 95% CI was negative). If we were to use ≥30% reduction as a benchmark for clinical significance (analogous to change in MRI-PDFF), only 3/28 (11%) of drug classes passed this cut-off: fibrates, omega-3 polyunsaturated fatty acids (mixtures), and DPP-4 inhibitors.

The 95% CI of 24/28 drug classes overlapped with the CI of the overall effect estimate. Two drug classes, thiazolidinediones and vitamin E, were found to have a smaller mean reduction in hepatic TG and two classes had a greater reduction: fibrates and mixtures of omega-3 polyunsaturated fatty acids (PUFA). However, ‘PUFA mixtures’ was a comparatively broad drug class, and many PUFA mixtures included eicosapentaenoic acid (EPA) or docosahexaenoic acid (DHA), which individually showed no significant reduction in hepatic TG. There remained substantial or considerable heterogeneity within drug class subgroups (PQ <0.05 for 21/28 drug classes, Figure 2—source data 1).

In order to investigate whether this heterogeneity was due to variation between individual drugs within classes we repeated the meta-analysis with subgroup by individual drugs (Figure 2—figure supplement 1). There was sufficient data for meta-analysis of 28 individual drugs (from the original 28 drug classes). 22/28 (79%) individual drugs were found to have a significant reduction in hepatic TG. Vitamin E was associated with a smaller mean reduction in hepatic TG than the 95% CI of the overall estimate, whilst fenofibrate was the only drug with a greater mean difference than the overall estimate. There remained considerable heterogeneity within subgroups for 20/28 drugs (I2 = 75–100%, PQ <0.05).

We then performed univariable meta-regression to investigate which variables accounted for the heterogeneity in results (Figure 2—source data 1). Though individual drug used was the single variable that accounted for most heterogeneity (adj R2 = 4.9%, p=0.02), the majority of variation in results was unaccounted. An association was also observed for weight difference (adj R2 = 3.3%, p=6.4×10−4), where greater weight loss in the intervention group was associated with a greater reduction in hepatic TG. This association was stronger after removal of NAFLD models that induce weight loss (e.g. methionine-choline deficient diet (MCD), Figure 2B) and similar results were obtained for difference in fasting insulin levels (Figure 2C).

When these study characteristics were combined for multivariable meta-regression using an unbiased method, 10 variables were predicted to substantially contribute to the variation in hepatic TG difference (Table 1). In final model 1, weight difference was the only variable to be significantly associated with MD in hepatic TG (p=0.003). Including drug used in model two was able to account for all heterogeneity in results (Figure 2—source data 1) in a small subset of cohorts (k = 42), though neither of these models were significantly predictive of outcome following permutation tests (p-value*>0.05).

Table 1
Summary of findings across all outcomes and multivariable meta-regression analyses.

Six separate meta-analyses were performed with subgrouping by classes of drug. Drug classes associated with outcome showed a significant reduction in the severity of NAFLD for that outcome, defined by the upper limit of their 95% confidence interval (CI). Differential efficacy refers to drug classes where their 95% CI did not overlap with that of the overall estimate. Multivariable meta-regression was performed using two models, where there was sufficient data: model one did not include drug class, model two included drug. For each analysis and model, the top variables are those identified to be substantially account for heterogeneity using multiple-variable inference. K refers to the number of cohorts included in each analysis. P-val* for each model refers to the overall model p-value (test of moderators) obtained after running multiple permutation tests, where p<0.1 should be considered indicative of an effect. ARB, angiotensin receptor blocker; DPP4-i, Dipeptidyl peptidase-4 inhibitor; EPA, eicosapentaenoic acid; FXR, Farnesoid X receptor; GLP-1, glucagon-like peptide-1; PPAR, peroxisome proliferator-activated receptor; PUFA; omega-3 polyunsaturated fatty acid; SCD1-i, stearoyl–CoA desaturase-1 inhibitor; SGLT2-i, sodium-glucose co-transporter-2 inhibitor; TUDCA, tauroursodeoxycholic acid.

Meta-analysis with subgroup by drug classMulti-variable meta-regression – model 1Multi-variable meta-regression – model 2
OutcomeDrug classes associated with outcomeDifferential efficacyTop predictorsFinal modelTop predictorsFinal model
Hepatic TG22/28 (79%): SCD1-i, PUFA-mix, Fibrates, Bifidobacterium sp., DPP4-i, Curcumin, EPA, Silymarin, TUDCA, Polyphenol, GLP1 agonist, ARB, FXR agonist, SGLT2-i, PPARα-δ agonist, Cholesterol Absorption Inhibitor, Berberine, Statin, Biguanide, Lactobacillus sp., Vitamin EGreater reduction: Fibrates,
PUFA-mix
Smaller reduction:
Thiazolidinediones, Vitamin E
Weight, Insulin,
Fat (%kcal), Model, Age at start, Background, Glucose, Sex, Duration, Quality score
(k = 333)
R2 = 48.9%,
P-val*=0.22
K = 67
Insulin, Fat (%kcal), Weight,
Glucose, Age at start, Sex, Drug
(k = 222)
R2 = 100%,
P-val*=0.26
K = 42
Steatosis9/22, (41%): Fibrates, GLP-1 agonist, DPP4-i, Probiotic (mix), Curcumin, Thiazolidinediones, Lactobacillus sp., Statin, ARBGreater reduction: FibratesGlucose, Fat (%kcal), Sex
(k = 94)
R2 = 91.8%,
P-val*=0.03
K = 19
Fat (%kcal), Sex, Weight
(k = 62)
R2 = 60.3%,
P-val*=0.098
K = 27
Lobular inflammation9/16 (56%): Fibrates, Probiotic (mix), Statin, ARB, FXR agonist, DPP4-i, Biguanide, Thiazolidinediones, Vitamin D-Glucose, Fat (%kcal)
(k = 81)
R2 = 49.8%,
P-val*=0.43
K = 19
--
Ballooning8/14 (57%): Fibrates, Biguanide, Thiazolidinediones, Vitamin D, DPP4-i, ARB, FXR agonist, Probiotic (mix)Greater reduction: Fibrates
Smaller reduction:
Probiotic (mix)
Glucose
(k = 56)
R2 = 8.1%,
P-val*=0.38
K = 26
--
NAFLD Activity Score10/14 (71%):Fibrates, DPP4-i, GLP1 agonist, Probiotic (mix), Vitamin D, Silymarin, Biguanide, Thiazolidinediones, FXR agonist, ARBGreater reduction: FibratesGlucose, Fat (%kcal), Age at start, Weight
(k = 89)
R2 = 78.0%,
P-val*=0.03
K = 19
Fat (%kcal), Weight, Background, Age at start, Sex
(k = 58)
R2 = 63.1%,
P-val*=0.001
K = 30
Fibrosis2/5 (40%): FXR agonist, Statin-Model, Weight, Glucose, Fat (%kcal), Duration, Age at start
(k = 58)
R2 = 100%,
P-val*=0.67
K = 16
--

Given that meta-regression implicated weight loss and improved insulin sensitivity in results, we explored how these traits were distributed by drug class (Figure 3A). Including all available data, we observed that 12/33 (36%) drug classes showed a significant reduction in weight (i.e. the upper limit of their 95% CI was below 1, Figure 3—source data 1). 17/32 (53%) and 15/25 (60%) of drug classes were associated with reductions in fasting glucose (Figure 3B) and insulin (Figure 3—figure supplement 1A), respectively. There was a positive correlation between weight, glucose, and insulin differences (Figure 3—figure supplement 1B). In addition, there was a negative correlation between weight difference and study duration or the age of mice at the end of intervention, that is longer studies (or those in older mice) were associated with greater weight loss in interventional groups.

Figure 3 with 1 supplement see all
Weight and glucose difference associated with use of each drug class.

(A) Box plot illustrating the difference in weight in interventional animals, expressed as a decimal of the weight of the control animals. Raw data points are plotted for each drug class. (B) Box plot for difference in fasting glucose in interventional animals, expressed as a decimal of the weight of the control animals. Raw data points are plotted for each drug class.

Figure 3—source data 1

Results of difference in weight, glucose, and insulin for each drug class.

Mean, standard deviation, and 95% confidence intervals for the percentage difference in weight, fasting glucose, and fasting insulin between interventional and placebo animals. ACC, acetyl-CoA carboxylase; ACE, angiotensin-2 converting enzyme; CB1, cannabinoid receptor 1; DPP4 Dipeptidyl peptidase-4; FXR, Farnesoid X receptor; GLP-1, glucagon-like peptide-1; LXR, liver X receptor; PDE, phosphodiesterase; PPAR, peroxisome proliferator-activated receptor; SCD1, stearoyl–CoA desaturase-1; SGLT2, sodium-glucose co-transporter-2; TUDCA, tauroursodeoxycholic acid; and UDCA, ursodeoxycholic acid.

https://cdn.elifesciences.org/articles/56573/elife-56573-fig3-data1-v2.xlsx

We then explored whether these results showed study distribution (publication) bias or were heavily influenced by individual outliers (Figure 2—figure supplement 2). There was an uneven distribution of studies with a bias towards a reduction in hepatic TG, which was supported by Egger’s test (β = -.83 [95% CI −1.3, −0.4], p=2.2×10−4). Using the trim-and-fill method to account for this bias, we estimated that the true overall mean difference in hepatic TG would be −18.7% (95% CI −21%, −16%), over a third smaller than the original estimate.

Meta-analysis of histological steatosis grade

Whilst hepatic TG was the most widely reported measure, histological assessment of disease is considered the gold standard for patients with NAFLD. Therefore, we performed a meta-analysis of MD in steatosis grade (Figure 4A). The overall MD in steatosis was −0.7 (95% CI −0.8, −0.5) again with considerable heterogeneity (I2 = 94% (95% CI 93%, 95%), PQ <1×10−300). Compared to hepatic TG, fewer drug classes were identified to be associated with a significant reduction in steatosis grade (8/22, 36%), though again fibrates showed the largest effect size. Similar results were obtained when performing subgrouping by individual drugs, rather than classes (Figure 4—source data 1).

Figure 4 with 1 supplement see all
Meta-analysis of steatosis grade in rodent studies of NAFLD.

(A) Forest plot with subgrouping by class of drug. Individual studies have been hidden and only subgroup summaries are illustrated. The total number of animals is calculated from the sum of control and interventional animals for each subgroup. CI, confidence interval; DPP4, Dipeptidyl peptidase-4; FXR, Farnesoid X receptor; GLP-1, Glucagon-like peptide-1; MD, mean difference; TUDCA, Tauroursodeoxycholic acid. (B) Meta-regression bubble plot using (log) difference in fasting glucose between interventional and control animals, after removal of studies using models that induce weight loss. (C) Meta-regression bubble plot using (log) difference in fasting insulin between interventional and control animals, after removal of studies using models that induce weight loss.

Figure 4—source data 1

Results of meta-analysis and meta-regression of steatosis grade in rodent studies of NAFLD.

Tab 1. Results from meta-analysis of steatosis grade with subgroup by drug class. Tab 2. Results from meta-analysis of steatosis grade with subgroup by individual drug. Tab 3. Results from meta-analysis of steatosis grade with subgroup by drug class, after removal of outlier studies. Tab 4. Results from univariable meta-regression analyses. Tab 5. Results from model 1 (without drug) and model 2 (including drug class used) multivariable meta-regression analyses.

https://cdn.elifesciences.org/articles/56573/elife-56573-fig4-data1-v2.xlsx

Univariable meta-regression found a marked association between difference in plasma glucose levels and MD in steatosis grade (Figure 4B, adj R221%, p=2.4×10−6). Similar associations were observed for difference in weight and insulin levels, particularly after removal of weight-loss inducing models (Figure 4C). In addition, the sex of animals (adj R27%, p=0.01) and genetic background were associated with MD in steatosis grade (Figure 4—source data 1). When factors were combined in multivariable meta-regression (Table 1), a model using sex, fasting glucose difference, and fat (%kcal) in diet accounted for 92% of variability in a small subset of cohorts (k = 19), which remained robust after a multiple permutation test (p-value*=0.03).

Meta-analysis of lobular inflammation

9/16 (56%) drug classes were associated with a reduction in MD of lobular inflammation (Figure 5A). Again there was considerable heterogeneity within drug classes and when subgrouping by individual drugs (Figure 5—source data 1).

Figure 5 with 1 supplement see all
Meta-analysis of lobular inflammation in rodent studies of NAFLD.

(A) Forest plot with subgrouping by class of drug. Individual studies have been hidden and only subgroup summaries are illustrated. The total number of animals is calculated from the sum of control and interventional animals for each subgroup. CI, confidence interval; DPP4, Dipeptidyl peptidase-4; FXR, Farnesoid X receptor; GLP-1, Glucagon-like peptide-1; MD, mean difference. (B) Meta-regression bubble plot using (log) difference in weight between interventional and control animals, after removal of studies using models that induce weight loss. (C) Meta-regression bubble plot using (log) fat (%kcal) in diet for each cohort.

Figure 5—source data 1

Results of meta-analysis and meta-regression of lobular inflammation in rodent studies of NAFLD.

Tab 1. Results from meta-analysis of lobular inflammation with subgroup by drug class. Tab 2. Results from meta-analysis of lobular inflammation with subgroup by individual drug. Tab 3. Results from meta-analysis of lobular inflammation with subgroup by drug class, after removal of outlier studies. Tab 4. Results from univariable meta-regression analyses. Tab 5. Results from multivariable meta-regression analyses.

https://cdn.elifesciences.org/articles/56573/elife-56573-fig5-data1-v2.xlsx

Univariable meta-regression identified an association with difference in weight (Figure 5B, adj R215%, p=4.0×10−4), as had been observed for steatosis grade and hepatic TG content. In addition, an association was found for fat %kcal in diet and MD in lobular inflammation: a higher %kcal fat in diet was associated with a smaller difference in lobular inflammation (Figure 5C, adj R221%, p=1.7×10−5), indicating that study design was associated with size of treatment response. The bubble plot of fat content in diet also illustrated that the majority of studies reporting fat content in diet used either 40–45% or 60% kcal fat (Figure 5C).

Meta-analysis of hepatocellular ballooning

8/14 (57%) drug classes were associated with a reduction in hepatocellular ballooning (Figure 6A). Fibrates showed greater reduction in ballooning than other studied drug classes, however this could not be replicated at an individual drug level (Figure 6—figure supplement 1).

Figure 6 with 1 supplement see all
Meta-analysis of hepatocellular ballooning in rodent studies of NAFLD.

(A) Forest plot with subgrouping by class of drug. Individual studies have been hidden and only subgroup summaries are illustrated. The total number of animals is calculated from the sum of control and interventional animals for each subgroup. CI, confidence interval; DPP4, Dipeptidyl peptidase-4; FXR, Farnesoid X receptor; GLP-1, Glucagon-like peptide-1; MD, mean difference; TUDCA, tauroursodeoxycholic acid. (B) Meta-regression bubble plot using (log) fat (%kcal) in diet for each cohort. (C) Meta-regression bubble plot using (log) fructose/glucose (% weight) in diet for each cohort. (D) Meta-regression bubble plot using (log) duration of intervention (in weeks) for each cohort.

Figure 6—source data 1

Results of meta-analysis and meta-regression of hepatocellular ballooning in rodent studies of NAFLD.

Tab 1. Results from meta-analysis of hepatocellular ballooning with subgroup by drug class. Tab 2. Results from meta-analysis of hepatocellular ballooning with subgroup by individual drug. Tab 3. Results from meta-analysis of hepatocellular ballooning with subgroup by drug class, after removal of outlier studies. Tab 4. Results from univariable meta-regression analyses.

https://cdn.elifesciences.org/articles/56573/elife-56573-fig6-data1-v2.xlsx

Similar to previous analyses, difference in fasting glucose (adj R217%, p=9.0×10−4) and weight (adj R28%, p=0.01) were associated with the magnitude of treatment effect. Study design characteristics also influenced difference in ballooning, namely percentage of fat in diet (Figure 6B, greater reduction in ballooning where a lower %kcal was used) and percentage of fructose/glucose in diet (Figure 6C); however, there were only 12 studies contributing to this analysis. In addition, longer studies were associated with larger reductions in ballooning severity (Figure 6D).

Meta-analysis of NAFLD activity score (NAS)

The NAFLD activity score is a composite of steatosis, lobular inflammation, and ballooning scores. The results largely reflected those observed for the previous three meta-analyses (Figure 7A). 10/14 (71%) drug classes were associated with a significant reduction in NAS, with fibrates being the most beneficial drug class. Meta-regression found associations for difference in weight (Figure 7B) and glucose (Figure 7C) to account for 11% and 12% of heterogeneity in results, respectively.

Figure 7 with 1 supplement see all
Meta-analysis of NAFLD Activity Score (NAS) in rodent studies of NAFLD.

(A) Forest plot with subgrouping by class of drug. Individual studies have been hidden and only subgroup summaries are illustrated. k represents the number of cohorts in each subgroup. The total number of animals is calculated from the sum of control and interventional animals for each subgroup. CI, confidence interval; DPP4, Dipeptidyl peptidase-4; FXR, Farnesoid X receptor; GLP-1, Glucagon-like peptide-1; MD, mean difference. (B) Meta-regression bubble plot using (log) difference in weight between interventional and control animals, after removal of studies using models that induce weight loss. (C) Meta-regression bubble plot using (log) difference in glucose between interventional and control animals, after removal of studies using models that induce weight loss.

Figure 7—source data 1

Results of meta-analysis and meta-regression of NAFLD Activity Score (NAS) in rodent studies of NAFLD.

Tab 1. Results from meta-analysis of NAS with subgroup by drug class. Tab 2. Results from meta-analysis of NAS with subgroup by individual drug. Tab 3. Results from meta-analysis of NAS with subgroup by drug class, after removal of outlier studies. Tab 4. Results from univariable meta-regression analyses. Tab 5. Results from model 1 (without drug) and model 2 (including drug used) multivariable meta-regression analyses.

https://cdn.elifesciences.org/articles/56573/elife-56573-fig7-data1-v2.xlsx

multiple-variable meta-regression models were able to account for more than 60% of variation in results (in a small subset of cohorts) using genetic background, fat in diet, age at start of intervention, weight and glucose difference, but without requiring drug or drug class (Table 1).

Meta-analysis of fibrosis stage

Fibrosis stage is the histological feature that most strongly correlates with liver-related outcomes in humans with NAFLD (Angulo et al., 2015; Ekstedt et al., 2015), and was therefore pre-specified as the primary outcome measure for this study. However, it was reported in only 58/603 (9.6%) of cohorts. Only FXR agonists and statins (2/5, 40% drug classes) were associated with a significant reduction in fibrosis stage (Figure 8A), where the overall mean difference was −0.5 (95% CI −0.6, −0.3) stages. Meta-regression replicated previous findings for other traits, showing that difference in weight was associated with reduction in fibrosis stage (Figure 8B, adj R227%, p=0.004).

Figure 8 with 1 supplement see all
Meta-analysis of fibrosis stage in rodent studies of NAFLD.

(A) Forest plot with subgrouping by class of drug. Individual studies have been hidden and only subgroup summaries are illustrated. The total number of animals is calculated from the sum of control and interventional animals for each subgroup. CI, confidence interval; FXR, Farnesoid X receptor; GLP-1, Glucagon-like peptide-1; MD, mean difference. (B) Meta-regression bubble plot using (log) difference in weight between interventional and control animals, after removal of studies using models that induce weight loss.

Figure 8—source data 1

Results of meta-analysis and meta-regression of fibrosis stage in rodent studies of NAFLD.

Tab 1. Results from meta-analysis of fibrosis stage with subgroup by drug class. Tab 2. Results from meta-analysis of fibrosis stage with subgroup by individual drug. Tab 3. Results from meta-analysis of fibrosis stage with subgroup by drug class, after removal of outlier studies. Tab 4. Results from univariable meta-regression analyses. Tab 5. Results from multivariable meta-regression analyses.

https://cdn.elifesciences.org/articles/56573/elife-56573-fig8-data1-v2.xlsx

Bias analyses of histological outcomes and study quality

Funnel plots for steatosis grade, lobular inflammation, fibrosis stage, and NAS were asymmetric (Figure 9), supported by the results of Egger’s test for each analysis.

Figure 9 with 1 supplement see all
Funnel plots illustrating study distribution bias from meta-analyses of histological features.

(A) Funnel plot illustrating study distribution (publication) bias in 145 original studies (solid grey circles) with 54 added studies (from trim-and-fill) for meta-analysis of steatosis grade. The statistical significance associated with each study is illustrated with the coloured background. Egger’s test p-value indicates the likelihood that the original studies came from a symmetrical distribution. (B) Funnel plot for lobular inflammation meta-analysis with 103 original studies and 42 added studies. (C) Funnel plot for fibrosis stage meta-analysis with 34 original studies and 14 added studies. (D) Funnel plot for NAS meta-analysis with 106 original studies and 43 added studies.

Using the trim-and-fill method to account for these differences substantially altered the overall effect estimates: for steatosis grade, there was an 79% reduction in estimated effect size to −0.14 (95% −0.3, +.01); for lobular inflammation, a 70% reduction in effect size to −0.18 (95% −0.32, −0.05); for fibrosis, 72% reduction to −0.12 (95% −0.33, +.08); and NAS, 55% reduction in effect size to −0.82 (95% −0.1.1, −0.5).

We used a four-item scale to estimate study quality (Figure 9—figure supplement 1). We found that 497/603 (82%) cohorts were at high risk of bias due to either absence of randomisation or absence of blinding. In addition, we used post-hoc power calculations to estimate the proportion of studies that were adequately powered. For analysis of hepatic TG, 39% (185/474) cohorts had a power of 80% or greater on post-hoc calculation. However, using the results from this meta-analysis, to achieve a power of 80% with significance set as p=0.05, group size would need to be n = 16. 4.2% (20/474) cohorts included 16 or more animals and would have met sufficient power to detect associations, based on these data.

Similar results were obtained for histological steatosis grade: 70/174 (40%) reported results consistent with >80% power but only 27/174 (16%) had a group size large enough to be expected to reach 80% power.

Summary of findings across traits

The majority of drug classes (or individual drugs) were found to show a significant reduction in severity of NAFLD. Fibrates (for which most data were available for fenofibrate) demonstrated the greatest improvement in several outcome measures (Table 1).

Univariable meta-regression found that weight loss and lower fasting glucose were associated with a greater improvement in multiple outcomes (Figure 10). In addition, diet composition influenced the magnitude of treatment response for lobular inflammation, ballooning, and fibrosis.

Summary of univariable meta-regression results across all outcomes.

Heatmap illustrating the results of univariable meta-regression analyses using continuous variables. Beta-regression co-efficient was normalized within each outcome analysis (e.g. steatosis grade) to mean = 0, standard deviation = 1. Rows (variables used as predictors in meta-regression) and columns (outcome measures for NAFLD) are clustered for similarity.

Discussion

Through meta-analysis and meta-regression we have illustrated that weight loss and alleviation of insulin resistance are consistently associated with treatment response in interventional trials for NAFLD in rodents. This extends beyond drugs that cause weight loss in humans. In addition, we have found that study design characteristics (e.g. diet composition) can influence the magnitude of treatment response. These findings suggest that factors other than the pharmacological mechanism of the trialled drug may confound the results observed in such studies.

All stages of NAFLD show a strong, positive correlation with severity of insulin resistance in humans and type 2 diabetes is a major risk factor for the presence of advanced fibrosis (Younossi et al., 2019). Consistent with this, weight loss and improvement in insulin sensitivity are associated with histological improvement in NAFLD (Koutoukidis et al., 2019), particularly evident from studies of bariatric surgery (Lassailly et al., 2015; Lee et al., 2019) and liraglutide (Armstrong et al., 2015). Therefore, it is not a surprising observation to see this replicated in our meta-regression analyses and it is consistent with previous observations (Hui et al., 2015). On multiple-variable inference, weight loss or fasting glucose were the most important variables across several outcome metrics. This provides strong evidence that (in rodents) alleviation of insulin resistance, usually mediated by weight loss, improves features of NAFLD, independent of the drug used.

Some drug classes that caused weight loss in rodents are also well established to cause weight loss in humans (e.g. GLP-1 agonists and metformin), whilst others are not (e.g. vitamin D and statins). The findings for insulin sensitivity were similar, with over 50% of drugs reducing fasting glucose. Again, some drugs were consistent with their effect in humans (e.g. thiazolidinediones, DPP4-inhibitors) but not others (e.g. ezetimibe). It is not clear whether this is due to reduced food intake or other toxic effects of the drugs. It should be noted that some individual studies faithfully recapitulated observations in humans, for example weight gain, adipose expansion, and improved insulin sensitivity with thiazolidinedione use. However across the dataset as a whole, these observations suggest that ‘off-pharmacological-target’ effects, causing changes in weight and glucose homeostasis, may account for some of the translational gap between agents efficacious in rodents but not humans.

Though there are no licensed therapies for NAFLD, drug development is a highly active field (Friedman et al., 2018) and there have been over 30 drugs used in Phase 2 or three trials. Some have demonstrated potential efficacy in well-conducted randomized controlled trials, most notably GLP-1 agonists (Armstrong et al., 2015) and pioglitazone (Cusi et al., 2016; Sanyal et al., 2010). However, the majority of early phase trials did not find substantial benefit from the trialled interventions (Supplementary file 1). Whereas in animals, a large number of drugs (and classes) demonstrated significant efficacy across several outcome measures. This did not appear to be consistent with the results from human trials, for example we observed that vitamin D was associated with a significant reduction in NAS, however several trials have not found any benefit from its use in humans (Barchetta et al., 2016; Dabbaghmanesh et al., 2018). In addition, the magnitude of effect observed in rodents was not consistent with human data. For example, there is reasonably convincing evidence that pioglitazone improves NAFLD in humans, however it had one of the smallest improvements in hepatic TG. Similarly, GLP-1 agonists, which met their primary outcome in a human Phase two study (Armstrong et al., 2015), rank in the middle for most outcomes in this analysis. Fibrates had one of the largest treatment effects across multiple analyses but this does not appear to be consistent with human evidence to date (Fabbrini et al., 2010; Oscarsson et al., 2018). Fibrate use was also associated with a median 10% wt loss in these analyses, which has not been observed in large randomised trials in humans (Keech et al., 2005). Even though we found evidence for efficacy of the majority of drugs included in this analysis, the 95% CI for treatment effect size overlapped for most drug classes. This is generally consistent with findings reported in preclinical models of spinal cord injury where the effect size of several different types of treatment overlapped (Watzlawick et al., 2019). Overall, the trends observed are not consistent with findings in humans and there does not appear to be any clear patterns that indicate potentially successful translation.

Several study design characteristics affected treatment response across multiple outcome measures, including the age of animals, sex, genetic background, and dietary composition. There are a huge number of variables in the design of an interventional animal study and many were simplified for the input into analyses. For example, the ‘model’ used was simplified to a ‘core’ model (e.g. leptin deficient (ob/ob) mice) and separated from the genetic background of the animals for this analysis. Similarly, we studied several dietary components in isolation, which could have led to the observation that a higher proportion of dietary fat (e.g. 60% kcal) was associated with a smaller treatment response. This may be because lower fat containing diets (e.g. 40% kcal) may be combined with added cholesterol or other components, such as fructose. However these data do illustrate the concept that multiple factors associated with model design influence not only animal phenotype but magnitude of treatment response. This was demonstrated using multiple-variable meta-regression models where in some analyses the majority of variation in results could be accounted for (in a small subset of cohorts) without including drug as a covariate, particularly for NAS and steatosis grade.

It should be noted that there have been more systematic analyses of genetic background on NAFLD (Chella Krishnan et al., 2018; Hui et al., 2015) as well as in other fields, including immunology (Martin et al., 2017) and behavioural neuroscience (Homanics et al., 1999; Liu and Gershenfeld, 2001). We were surprised to find that genetic background was a top variable in comparatively few of our multivariate models. Based on observations from the Hybrid Mouse Diversity Panel (Chella Krishnan et al., 2018; Hui et al., 2018; Hui et al., 2015), we anticipate that the true impact of genetic background may be greater than we could quantify, due to our inclusion of a narrow range of backgrounds that had been used in multiple studies and our exclusion of mixed genetic backgrounds from analysis.

The vast majority of included studies demonstrated an improvement in NAFLD, which could be partly accounted for by a trend towards reporting positive results that is publication bias. Using the trim-and-fill method, we estimated that study distribution bias (most likely publication bias in this case) may have substantially increased the reported magnitude of effect (e.g. overall reduction in hepatic TG of 19% compared to 30%). The presence of publication bias did not come as a surprise (Tsilidis et al., 2013) and this dataset provides useful replication of the strong evidence base for this in preclinical neurological studies. A previous work on preclinical models of sunitinib calculated the overestimate from potential publication bias at 45% (Henderson et al., 2015). The results from power calculations are also likely to reflect publication bias: based on the overall effect summary, a minority of cohorts were of sufficient size to be predicted to achieve the power of 80%. Similarly, we have replicated previously described low rates of randomisation and blinding in animal studies (Bahor et al., 2017).

We found very few studies to report portal inflammation severity. In humans, (peri-)portal inflammatory activity has been shown to correlate with severity of fibrosis in both adults and children with NAFLD (Brunt et al., 2009; Mann et al., 2016; Rakha et al., 2010). Therefore, this remains a relatively unexplored area worthy of investigation, as targeting portal inflammation may be beneficial in slowing disease progression.

There are several implications of these results. Firstly, it is not surprising that there are multiple reports of difficulty in reproducing preclinical studies in the field of metabolism (von Herrath et al., 2019) given that study design has a considerable effect on treatment response. Variations in what may appear to be small details (such as age at the start of study diet) influence results and therefore could silence subtle differences or generate false positives.

Secondly, these results also help to explain the difficulty in bridging the preclinical to human translational gap (Denayer et al., 2014), which might be relevant beyond the field of metabolism research. For example, we did not observe an association between drug dose and treatment effect size. In addition, studies were overwhelmingly performed in male animals, whereas human studies are more evenly balanced (e.g. 60% female in the ‘STELLAR-3/–4’ trial [STELLAR-3 and STELLAR-4 Investigators et al., 2020]). Sex was a top predictor of several multivariable inference models and therefore the lack of inclusion of female mice may hinder identification of drugs for translation. Similarly, studies were almost uniformly done on young mice who were growing, unlike the focus on adult patients in all major phase 3 NAFLD trials.

The main strength of this work is the number of included studies, interventions, and variables. This has facilitated a detailed analysis of a single disease area. However this study has simplified some study characteristics to facilitate meta-regression analyses, which may have under-estimated the impact of particular variables on outcome measures. One such simplification was grouping of drugs into classes, some of which (e.g. ‘Probiotics (mix)’) were comparatively vague, compared to those with well-defined mechanisms (e.g. thiazolidinediones). Similarly, we used a simplified categorisation of rodent models (e.g. high-fat diet), combined with individual continuous metrics (e.g. fat %kcal), which will not capture the full variation of models used. We used fasting glucose and insulin as proxies for insulin resistance, however these are not direct measures of insulin resistance. This would require results from hyperinsulinaemic-euglycaemic clamps, or at least insulin tolerance tests, but these were performed in comparatively few studies. Similarly, we elected to record histological outcomes only where it was reported according to standard criteria for reporting human biopsies of NAFLD. There are a wide variety of other methods of interpreting liver histology, some of which are more quantitative (e.g. collagen proportionate area), though again these were less frequently reported. It should also be noted that this study did not have a pre-specified statistical analysis plan, which increases its risk of bias.

There is a wide range of other variables that were not considered in this analysis. Some were unreported variables, such as technique of animal handling. A further factor of potential relevance is the bacterial status of rodents, which is known to affect liver phenotypes (Kaden-Volynets et al., 2019), potentially via intestinal dysbiosis (Balmer et al., 2014; Mazagova et al., 2015). Furthermore, many studies did not report certain variables, for example genetic background of animals was not reported in 5.3% (32/603), which reduced the number of studies included in meta-regression analyses. This was most obvious for multiple-variable meta-regression, where some final models included fewer that 20 data points. However this meta-analysis has included a large number of articles, which gives considerable confidence in the findings we have replicated across several outcome measures.

Conclusion

Multiple drug classes improve NAFLD in rodents, however these results may be confounded by weight loss and alleviation of insulin resistance not observed in humans treated with the same drugs. Publication bias over-estimates these effect sizes by at least a third and a variety of other study design characteristics also influence treatment response. Therefore, standardisation of practices is needed in preclinical studies of metabolism to improve the translatability and reproducibility of findings.

Materials and methods

Key resources table
Reagent type
(species) or resource
DesignationSource or referenceIdentifiersAdditional information
Software, algorithmR [base], dmetar (RRID:SCR_019054), metaphor (RRID:SCR_003450), meta (RRID:SCR_019055)RR 4.0.2
Software, algorithmGraphPad Prism (RRID:SCR_002798)GraphPad PrismGraphPad Prism v8

Review protocol and search strategy

Request a detailed protocol

The systematic review protocol was prospectively registered with SyRF (Systematic Review Facility) and is available from: https://drive.google.com/file/d/0B7Z0eAxKc8ApQ0p4OG5SblRlRTA/view.

PubMed via MEDLINE and EMBASE was searched for published articles of experimental rodent models of fatty liver, NAFLD, or non-alcoholic steatohepatitis (NASH). The following search term was used: (‘Non-alcoholic fatty liver disease’ OR ‘Nonalcoholic fatty liver disease’ OR ‘NAFLD’ OR ‘non-alcoholic steatohepatitis’ OR ‘nonalcoholic steatohepatitis’ OR ‘NASH’ OR ‘fatty liver’ OR ‘hepatic steatosis’) AND (‘mouse’ OR ‘animal’ OR ‘rat’ OR ‘murine’ OR ‘animal model’ OR ‘murine model’ OR ‘rodent model’ OR ‘experimental model’) NOT (‘Review’). Both databases were searched using the ‘Animal’ filters (de Vries et al., 2014; Hooijmans et al., 2010), the results combined, and duplicates eliminated. The search was completed in January 2019.

Study selection and eligibility criteria

Request a detailed protocol

Our inclusion criteria were as follows: primary research articles using mice or rats to model NAFLD (to include hepatic steatosis, NASH, and NASH-fibrosis), use of pharmacological intervention with a control (or placebo) group, and that the pharmacological intervention class (e.g. statins) had been used in Phase 2 or three trials in humans for treatment of NAFLD/NASH. Studies were excluded if: not modelling NAFLD/NASH; studies in humans or any animal other than mice and rats; reviews, comments, letters, editorials, meta-analyses, ideas; articles not in English (unless there was an available translation); studies not reporting any relevant outcome metrics (hepatic triglyceride content relative to hepatic protein (e.g. mg/mg or µM/mg), NAFLD Activity Score [Brunt et al., 2011; Kleiner et al., 2005] or any of its components), portal inflammation grade [Brunt et al., 2009], or histological fibrosis stage (0–4); and studies using a pharmacological agent class that had not been used in Phase 2/3 studies in humans for NAFLD.

Abstracts and titles were screened to identify relevant studies using Rayyan (Ouzzani et al., 2016). Potentially relevant studies had their full-text extracted and were assessed against inclusion/exclusion criteria independently by two reviewers, with discrepancies settled by discussion with JPM.

Data collection

Request a detailed protocol

The variables extracted were as follows: phenotypic characteristics of animal model used (sex, diet [including percentage of fat, glucose, fructose, sucrose, and cholesterol in diet], rodent age, genetic alterations, background animal strain); drug treatment (dose, drug class, duration, age at intervention), hepatic triglyceride content and liver histology. Fructose/glucose concentration in diet was collected together as a single data point as they were frequently combined in diets. Liver histology results were extracted where the (human) NAFLD Activity Score (NAS [0–8]) and/or any of its components had been used (steatosis grade [0–3], lobular inflammation [0–3], and ballooning severity [0–2]; portal inflammation severity [0–2]); and/or histological fibrosis stage [0–4]. Studies frequently included multiple cohorts or interventional arms, which were defined as use of a different animal model of NAFLD, a different drug, or a different drug dose. Data were extracted for each cohort or interventional arm separately.

Quality assessment

Request a detailed protocol

Each paper was assessed in the following four areas: use of a protocol, reporting use of randomisation, reporting use of blinding, and a power calculation. ‘Use of a protocol’ assessed the article specifically referring to a protocol that was in place and prior to the start of the study. These were each given a score of 1, and each paper was assigned an overall ‘quality score’. A post-hoc power calculation was performed for each study using the means of each group and a common SD (Cohen, 1988) using the pwr (Champely, 2018) package in R. In addition, a ‘pre-test’ sample size calculation was performed using: the overall effect summary from meta-analysis, power = 80%, and p-value=0.05.

Shared control group adjustment

Request a detailed protocol

Multiple studies used a single placebo (or control) group for several experimental arms. Where possible, the experimental arms were combined into a single experimental cohort and compared to the control group (Higgins and Green, 2011). Where this was not appropriate (e.g. interventions from different drug classes), the control group was divided evenly across interventional groups. Therefore, each control animal was included only once in analyses.

Data processing

Request a detailed protocol

Where possible, drugs were grouped into classes based upon their pharmacological mechanism of action. The majority were well-established classes of drugs: angiotensin receptor blockers, biguanides, dipeptidyl peptidase 4 (DPP4) inhibitors, fibrates, glucagon-like peptide-1 (GLP-1) agonists, statins etc. In some cases there was only a single drug represented in their class, for example: polyphenols – resveratrol, and cholesterol absorption inhibitors – ezetimibe. More novel agents fell into pharmacological classes based on mechanism that are less well established, for example: stearoyl–CoA desaturase-1 inhibitors, or PPARα/δ agonists. Other agents, particularly where the mechanism of action is unclear, were made a class of their own, for example, whilst eicosapentaenoic acid and docosahexaenoic acid are both omega-3 polyunsaturated fatty acids (PUFA), their mechanism is not clear and therefore were classed individually, with other mixtures of PUFA being classed separately. Similarly, berberine and silymarin were classed individually. Where individual bacterial strains were used for probiotics they were classed accordingly (e.g. Lactobacillus sp.), but where a mixture of strains were used a ‘Probiotic (mix)’ category was allocated. For analyses by individual drugs, all agents were separated, though for some drugs (e.g. berberine) this was unchanged from their ‘drug class’ grouping.

Prior to analysis, hepatic triglyceride content was normalized as a percentage of placebo (or control) for each cohort.

Weight, fasting glucose, and fasting insulin of interventional groups were expressed as a proportion difference relative to placebo (e.g. 20% lower fasting glucose in interventional group = 0.8).

All continuous variables were examined for normality using histograms and, where distributions were skewed, variables were logarithmically transformed prior to use in regression analyses.

Statistical analysis – meta-analysis

Request a detailed protocol

Primary outcome was the mean difference in histological fibrosis stage in the interventional group compared to control/placebo. Secondary outcomes were histological features: hepatic triglyceride (TG) content, steatosis grade, lobular inflammation, ballooning, and overall NAS. There was insufficient data to perform meta-analysis for portal inflammation severity.

Random-effects meta-analysis using the Hartung-Knapp-Sidik-Jonkman method was used to calculate mean difference in the outcome measure. Each meta-analysis was run three times, once with subgrouping by drug class, then a sensitivity analysis using subgrouping by drug class after excluding outliers (as described below), and then once using individual drugs. Drug classes, or individual drugs, were only included in meta-analyses where there was data from minimum three unique articles reporting that outcome.

Drugs or drug classes were considered to have a significant effect on the outcome if their 95% CI did not cross zero. Drugs (or drug classes) were also assessed to have greater (or smaller) difference in the outcome measure if their 95% CI did not overlap with the 95% CI of the overall effect estimate. Additionally, for hepatic TG only, drugs were compared to a benchmark of 30% reduction in liver fat. This was based on data from MRI-PDFF in humans that suggests ≥30% reduction in liver fat is associated with a substantial histological response (Jayakumar et al., 2019; Loomba et al., 2020; Stine et al., 2020).

Heterogeneity within drug classes (or individual drugs) and across the whole dataset was reported using Cochran’s Q, Higgin’s and Thompson’s I2, and 𝜏2. Interpretation of I2 was performed according to the Cochrane Handbook where ‘considerable heterogeneity’ refers to PQ <0.05 and I2 = 75–100% (Higgins and Green, 2011). Potential outliers were identified using a Baujat plot (Baujat et al., 2002) and by assessment of standard deviation (SD), where all studies with excess contribution to heterogeneity on visual inspection of the Baujat plot or SD >95th centile were excluded in a sensitivity analysis.

Study distribution (‘publication’) bias was assessed using funnel plot with Egger’s test. Given evidence of study distribution bias, Duval and Tweedie’s trim-and-fill procedure (Duval and Tweedie, 2000) was performed to estimate the impact of bias on the overall measure.

Statistical analysis – meta-regression

Request a detailed protocol

Mixed-effects meta-regression was performed to assess which baseline variables were associated with heterogeneity in each outcome measure. Meta-regression was performed using both categorical variables (e.g. drug class, sex, animal background, NAFLD model design) and continuous variables (e.g. percentage of components in diet, age at intervention, drug dose). For each regression analysis, variables were only included where three or more unique articles reported each variable. The number of cohorts included in each regression analysis is reported with their results. Univariable meta-regressions were considered significant where p-value<0.05 and were replicated in more than one outcome metric (e.g. hepatic TG and steatosis grade).

Univariable meta-regression was repeated for weight, glucose, and insulin difference after removal of models causing weight loss. These analyses of weight loss (or gain) with secondary changes in glycaemic control are most relevant to obese or insulin resistant animals. We hypothesised that trends would be strengthened after removal of models that did not recapitulate the metabolic syndrome. Models excluded were: methionine-choline deficient diet (with or without added high-fat), orotic acid, choline deficient diet (with or without added high-fat), and choline deficient L-amino-acid defined diet. Models were excluded irrespective of their genetic background, for example leptin receptor deficiency (db/db) plus methionine-choline deficient diet was excluded for this sensitivity analysis. For these three variables, due to replication of testing, statistical significance was set at p-value<0.025.

multiple-variable meta-regression was performed to assess what proportion of between-study heterogeneity could be accounted for by baseline characteristics (using adjusted R2). First variables were examined for multicollinearity and where two variables had Pearson correlation >0.6, one was removed. Then, multimodel inference (dmetar::multimodel.inference, RRID:SCR_019054) was used to obtain the model with the best fit for the data. Initially, drug (or drug class) was not included as an input variable as this greatly increased the number of variables and reduced the number of studies for inclusion. The optimum model (defined by the lowest Akaike’s Information Criterion) was then used in multiple-variable meta-regression (known as ‘final model 1’). The robustness of this model was tested using a permutation test (metafor::permutest, RRID:SCR_003450).

This process was repeated to generate ‘final model 2’, by additionally including individual drugs (for TG) or drug class (for steatosis grade and NAS), as input variables in the multimodel inference stage. It was not possible to generate a 2nd multivariable meta-regression model including drug (or drug class) for lobular inflammation, ballooning, and fibrosis due to insufficient data.

For multivariable meta-regression, individual variables were defined as ‘Top predictors’ if they had a predictor importance >0.8 on dmetar::multimodel.inference analysis. Individual variables were considered significant within each model where p-value<0.05. Models were considered to significantly predict outcomes where p-value*<0.05 after use of metafor::permutest.

Statistical analysis was performed using R 4.0.2 for Mac (Harrer et al., 2019; R Core Development team, 2019) with packages dmetar (Harrer et al., 2019), meta (RRID:SCR_019055, [Schwarzer G, 2007]), and metafor (Viechtbauer, 2010). Graphs were also generated using GraphPad Prism (RRID:SCR_002798, v8.0 for Mac, GraphPad Software, La Jolla California, USA).

Data availability

The raw dataset used for analysis, including references to individual studies, are available Figure 1-source data 1 and deposited in the Dryad repository at https://doi.org/10.5061/dryad.pzgmsbcgc. R code used for analysis are available in Source code 1. Source data files have been provided for Figures 2-8.

The following data sets were generated
    1. Mann JP
    (2020) Dryad Digital Repository
    Data from: Weight loss, insulin resistance, and study design confound results in a meta-analysis of animal models of fatty liver.
    https://doi.org/10.5061/dryad.pzgmsbcgc

References

  1. Report
    1. Harrer M
    2. Cuijpers P
    3. Furukawa TA
    4. Ebert DD
    (2019)
    Doing Meta-Analysis in R: A Hands-on Guide
    PROTECT Lab Erlangen.
  2. Book
    1. Higgins JPT
    2. Green S
    (2011)
    Cochrane Handbook for Systematic Reviews of Interventions
    The Cochrane Collaboration.
  3. Software
    1. R Core Development team
    (2019)
    A language and environment for statistical computing
    R Foundation for Statistical Computing, Vienna, Austria.

Decision letter

  1. Joel K Elmquist
    Reviewing Editor; University of Texas Southwestern Medical Center, United States
  2. Eduardo Franco
    Senior Editor; McGill University, Canada
  3. Sarah McCann
    Reviewer; The Berlin Institute of Health, Germany

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

Acceptance summary:

Your systematic analysis of animal studies of fatty liver disease as they compare to human disease is timely. In particular, your studies will be of interest to colleagues in the pharmaceutical industry who are working to develop treatments for the growing problem of NAFLD.

Decision letter after peer review:

Thank you for submitting your article "Multiple drug classes show similar treatment effect sizes in animal models of fatty liver disease" for consideration by eLife. Your article has been reviewed by two peer reviewers, and the evaluation has been overseen by a Reviewing Editor and Eduardo Franco as the Senior Editor. The following individual involved in review of your submission has agreed to reveal their identity: Sarah McCann (Reviewer #2).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

As the editors have judged that your manuscript is of interest, but as described below that additional experiments are required before it is published, we would like to draw your attention to changes in our revision policy that we have made in response to COVID-19 (https://elifesciences.org/articles/57162). First, because many researchers have temporarily lost access to the labs, we will give authors as much time as they need to submit revised manuscripts. We are also offering, if you choose, to post the manuscript to bioRxiv (if it is not already there) along with this decision letter and a formal designation that the manuscript is "in revision at eLife". Please let us know if you would like to pursue this option. (If your work is more suitable for medRxiv, you will need to post the preprint yourself, as the mechanisms for us to do so are still in development.)

Summary:

This study provides a comprehensive overview of the field of preclinical NAFLD research. These reviews are important for summarizing and evaluating current evidence and identifying gaps and areas for improvement for future research. It is clearly structured and follows widely accepted methods. The lack of a pre-specified statistical analysis plan and deviations from the protocol represent possible risks of bias in the review and the search is out of date. I'm unsure whether the relatively strong emphasis on the weak association between drug class and treatment effect is justified. Importantly, the data are discussed in relation to current issues in preclinical research around reproducibility and translation.

Essential revisions:

A very major concern is the overly broad inclusion and grouping criteria used to structure this analysis. It seems to be too broad for a biologically meaningful and productive meta analysis. NAFLD is a heterogenous mix of states (steatosis, steatitis, cirrhosis). Liver triglyceride per se is not a robust measure of NAFLD. A reduction in liver triglyceride alone would likely not be sufficient for regulatory approval for a NAFLD drug. Were the studies that were pooled looking at liver triglyceride as their primary metric, or were they a mix, including other (inflammation, fibrosis) end points more relevant to NAFLD?

How were the drug classes chosen? There seem to be four types of classes, which are not equally valid groupings. 1) Some drug classes include only are a single compound (e.g. Vit E, curcumin,.…), 2) others are groups with a common primary mechanism (e.g. DPP4 inhibitor, SGLT2 inhibitor, FXR agonist, etc), but include different compounds so off-target activities presumably vary, 3) others are poorly defined entities that one could argue should constituent each be treated as separate single compounds (e.g. probiotics, Protoberberine alkaloids, polyphenols), and 4) well defined classes which can have individual members with very different biological activities (e.g., bile acids, ), again, suggesting the more subgroups might be needed for biological sense.

A major determinant of liver triglyceride is body triglyceride. Why was body weight not included in the analysis? I do not think using % overcomes the massive differences in liver triglyceride due to massive differences in body weight.

'Model' includes both genetic models and diet models, finding only 10 categories. This seems like an overly small number of categories. There are many kinds of high-fat diets that have very different compositions and that give very different results.

The analyte in this study is % loss of liver triglyceride. How robust a metric is this for a meta analysis? Liver triglyceride correlates with total body triglyceride and body weight; loss of enough body weight will cause an experiment to be stopped for humane reasons. So the % liver triglyceride would seem to me to be a potentially truncated/limited/bounded metric. Is this affecting the variation that is needed/used for the meta analysis?

It is great to see that there was a published study protocol, however there was no statistical analysis plan specified. There are also several deviations from the protocol that have not been addressed e.g. lack of secondary outcomes, not addressing key research questions as planned, analysis of subgroups. The search is 2.5 years old and while there are over 200 included studies, representing a substantial body of work, the authors note this is a highly active field and thus it is could be a substantial limitation if the data are out of date.

It is reported that there is weak evidence of difference between drug classes and p values are given (p=0.014 subgroup analysis; p=0.002 meta-reg) but as there is no pre-specified statistical analysis plan and no description in the Materials and methods, we don't know what the authors consider to be significant effects. The authors conclude quite strongly that there is limited difference between classes and describe overlapping CIs but e.g. the mean effect of the most effective class is ~4 times higher than the least effective, with no overlapping CIs. It would be useful to know when interpreting these results what a clinically relevant treatment effect would be. The wide CIs limit strong conclusions but as the authors note, the number of animals in each group is low and some classes have few comparisons.

I found the description that "The confidence intervals of 20 out of 21 drug classes overlapped" slightly confusing (Figure 2) when Thiazolidinediones CIs do not overlap with at least 3 other classes but do overlap with many others.

There is also substantial within-class heterogeneity for some drug classes. Could there be differences in efficacy between individual drugs not captured at the class level? How many individual drugs are represented by each class? (Post hoc) subgroup analysis of classes with sufficient comparisons might provide more nuanced information on treatment response and factors affecting efficacy.

While it is dangerous to over-interpret results, I would probably recommend softening the argument of limited differences slightly.

Does it make sense to look at drug dose as a potential source of heterogeneity across different drugs and classes? We would most likely expect individual drugs to have different effective doses.

The multivariable regression model explains ~50% of the heterogeneity in a subset of the studies but is not discussed anywhere in the manuscript (10 variables/71 studies?) Is there something about these studies that can provide information on the broader dataset or is this analysis of limited value?

As only drugs/classes that have been used in clinical trials have been included, it would be interesting to note any similarities/differences in results e.g. GLP-1 agonists demonstrate a robust effect in the current data and the authors report that they demonstrated potential efficacy in clinical trial. Is there anything we can learn from examples of potentially successful translation?

https://doi.org/10.7554/eLife.56573.sa1

Author response

Essential revisions:

A very major concern is the overly broad inclusion and grouping criteria used to structure this analysis. It seems to be too broad for a biologically meaningful and productive meta analysis. NAFLD is a heterogenous mix of states (steatosis, steatitis, cirrhosis). Liver triglyceride per se is not a robust measure of NAFLD. A reduction in liver triglyceride alone would likely not be sufficient for regulatory approval for a NAFLD drug. Were the studies that were pooled looking at liver triglyceride as their primary metric, or were they a mix, including other (inflammation, fibrosis) end points more relevant to NAFLD?

Thank you for this suggestion. We agree that liver triglyceride content, particularly in isolation, is not the optimum indicator of NAFLD severity. Therefore we have now added meta-analyses for histological outcomes: steatosis, lobular inflammation, ballooning, overall NAFLD Activity Score, and fibrosis stage. (We also collected data on portal inflammation severity but there was insufficient data for meta-analysis.) We consider this now to appropriately reflect the spectrum of disease in NAFLD.

The included studies rarely identified a primary outcome and they were difficult to categorise into, for example, those focusing on a ‘liver phenotype’ versus those focusing on the systemic ‘metabolic syndrome’. Therefore we believe that reporting both hepatic TG content as well as histology is the least biased method.

Reassuringly, we observed similar trends for multiple outcomes. We have focused the main conclusions of our study on those that were consistent across several meta-analyses and meta-regressions.

Though histological outcomes are more directly comparable to human data, we have retained hepatic TG as an outcome because it is the most widely reported measure.

How were the drug classes chosen? There seem to be four types of classes, which are not equally valid groupings. 1) Some drug classes include only are a single compound (e.g. Vit E, curcumin,.…), 2) others are groups with a common primary mechanism (e.g. DPP4 inhibitor, SGLT2 inhibitor, FXR agonist, etc), but include different compounds so off-target activities presumably vary, 3) others are poorly defined entities that one could argue should constituent each be treated as separate single compounds (e.g. probiotics, Protoberberine alkaloids, polyphenols), and 4) well defined classes which can have individual members with very different biological activities (e.g., bile acids, ), again, suggesting the more subgroups might be needed for biological sense.

We appreciate that the grouping of drugs into classes is a simplification and it was challenging to be consistent as not all the drugs neatly fit into pharmacological classes. We had originally aimed to group drugs based on their mechanism of action. We have now refined the grouping, being more stringent to separate drugs where possible and where their mechanism is less well established. We have added detail to the Materials and methods to describe the process of grouping drugs into classes.

We have also run each meta-analysis with sub-grouping by individual drugs and see consistent trends for the principal findings.

We appreciate that there are some drug groups that remain broad, in particular “Probiotics (mix)” and “Omega-3 polyunsaturated fatty acids (mix)”. Where possible we have separated groups within them, specifically Lactobacillus / Bifidobacterium and eicosapentaenoic acid / docosahexaenoic acid. However we have retained these broader groups to reduce the number of studies being excluded.

We have also added a section in the limitations of the Discussion to address that this is a simplification and grouping is variably imprecise. We have also specifically highlighted to that conclusions from less well-defined classes may need to be interpreted with caution.

A major determinant of liver triglyceride is body triglyceride. Why was body weight not included in the analysis? I do not think using % overcomes the massive differences in liver triglyceride due to massive differences in body weight.

Thank you for this suggestion and we apologise for not including this initially. Given the body weight often influences liver triglyceride via insulin resistance, we have now performed a series of analyses to address this. We felt it was important to include glucose (and insulin) as there are some models (particularly lipodystrophic models) that may develop marked hepatic steatosis without becoming obese but are insulin resistant.

We have collected data for, and calculated, the difference in body weight, fasting glucose, and fasting insulin for all included studies, then included these as variables in meta-regression. We have found highly consistent associations across multiple outcomes where weight loss and lower insulin/glucose are associated with greater treatment response.

We noted that many drugs were associated significant differences in weight, glucose, or insulin, in animals though they do not cause such responses in humans (e.g. vitamin D or statins). Our interpretation is that the weight loss and change in insulin sensitivity seen in animals may be confounding results and contributing to the lack of translation from animals to humans.

We do appreciate the fasting insulin and glucose are technically not direct measures of insulin resistance, however they are the most widely reported metrics. We have added a section in the limitations to acknowledge that other data (e.g. hyperinsulinaemic-euglycaemic clamps) would be more accurate.

'Model' includes both genetic models and diet models, finding only 10 categories. This seems like an overly small number of categories. There are many kinds of high-fat diets that have very different compositions and that give very different results.

We realise that this is one of several simplifications we have performed in an attempt to synthesise the data. We performed a categorical grouping of models as a ‘high-level’ descriptor. This identified 137 different ‘core’ model categories. They are found in Column E (“Model_simple”) of Figure 1—source data 1 (i.e. the raw data used in the meta-analysis). These are highly variable, for example: “SREBP-1c transgenic overepression + High-fat and high-fructose diet”, “ApoE knockout (ApoE -/-) + high-fat, high-cholesterol diet + T0901317 (LXR agonist)”, and “Pemt-/- + high-fat diet”.

In order for these Models to be included as variables in the meta-regression we required a minimum of 3 unique studies to use them. Due to variation and the high number of studies that used a few common models (e.g. “high-fat diet”, “Leptin deficiency (ob/ob)”), comparatively few different models were included in the meta-regression. We appreciate that our analysis will not have captured the full variation of Model and we have now addressed this as a limitation in the Discussion.

In addition, we have now added data on other dietary constituents (cholesterol, sucrose, and fructose/glucose), which have been included as variables in the meta-regression. Again, we appreciate that this is simplifying the complexity of dietary models however we now use these data to draw general conclusions that diet can affect treatment response. We have specifically commented that analyses of fat %kcal in isolation may be misleading because lower fat %kcal is more frequently coupled with added cholesterol (and/or fructose) than 60% kcal fat diets.

Whilst updating our analyses, we noted that many continuous variables used in the meta-regression were heavily skewed. We have therefore log-transformed them prior to analysis.

The analyte in this study is % loss of liver triglyceride. How robust a metric is this for a meta analysis? Liver triglyceride correlates with total body triglyceride and body weight; loss of enough body weight will cause an experiment to be stopped for humane reasons. So the % liver triglyceride would seem to me to be a potentially truncated/limited/bounded metric. Is this affecting the variation that is needed/used for the meta analysis?

Thank you for highlighting this point. To our knowledge, the most systematic previous analysis of liver fat in animals was from the Hybrid Mouse Diversity Panel (Hui et al., 2015) who found >30 fold variation in hepatic TG across strains.

It is difficult to know whether studies were stopped early due to animal health (including weight loss). We have now included weight difference between placebo and intervention as a variable, which demonstrated a strong association with hepatic TG.

As described above, we have now added multiple histological outcome measures, therefore we have increased confidence in our findings given that we are no longer basing our conclusions on a single metric.

Whilst we did not formally record this, we can anecdotally report that fewer than five of all included studies reported specifics of harm to animals or protocol changes due to animal welfare.

We also note that the range of percentage change in hepatic TG is similar to that reported in another pre-clinical meta-regression analysis in a different field (Watzlawick et al., 2019). Therefore we believe that the dynamic range is appropriate for the analyses performed, especially in combination with the other metrics used.

It is great to see that there was a published study protocol, however there was no statistical analysis plan specified. There are also several deviations from the protocol that have not been addressed e.g. lack of secondary outcomes, not addressing key research questions as planned, analysis of subgroups. The search is 2.5 years old and while there are over 200 included studies, representing a substantial body of work, the authors note this is a highly active field and thus it is could be a substantial limitation if the data are out of date.

We appreciate that the lack of a pre-specified statistical analysis plan is a potential risk of bias. We believe that through sharing our full raw data, code for analysis, the results of all analyses conducted (i.e. not selective reporting) we have minimised the bias introduced from this. We have also specifically commented on it in the Discussion to highlight it as a limitation.

In addition, we have improved the methodology we have used for multivariable meta-analysis to use an approach less open to bias, which considers all variables for importance. We now present results from multi-model inference to determine the top model, which is then subjected to a multiple permutation test to determine the fit of the final models.

With this revised version of the study we have now been able to address several of our secondary outcomes regarding liver histology and insulin resistance/weight.

In the future we hope to build on this initial database to further study our secondary outcomes with a more qualitative approach but we would consider that to be a separate project and anticipate that would take a substantial duration. We trust that the revisions to the current study have made it a more complete analysis to address one research question.

We have now updated our search to January 2019, which required screening a further 1118 articles (991 for full-text review). This, combined with addition of studies reporting histology, has increased the number of included studies from 244 (414 cohorts) to 414 studies (603 cohorts). We appreciate that this remains out of date however our interpretation is that we have now accumulated sufficient evidence to support the general conclusions we have reached, particularly given that we are focusing on observations that were replicated across multiple outcome measures.

It is reported that there is weak evidence of difference between drug classes and p values are given (p=0.014 subgroup analysis; p=0.002 meta-reg) but as there is no pre-specified statistical analysis plan and no description in the Materials and methods, we don't know what the authors consider to be significant effects. The authors conclude quite strongly that there is limited difference between classes and describe overlapping CIs but e.g. the mean effect of the most effective class is ~4 times higher than the least effective, with no overlapping CIs. It would be useful to know when interpreting these results what a clinically relevant treatment effect would be. The wide CIs limit strong conclusions but as the authors note, the number of animals in each group is low and some classes have few comparisons.

We acknowledge that it was not helpful to make non-specific statements about the magnitude of effects, such as ‘weak’ or ‘large’ without pre-specifying criteria. We have added to our Materials and methods to detail how all analyses were assessed for significance.

We have also tempered our discussion regarding the similarity of drug classes. We have commented where confidence intervals have overlapped with the 95% CI for the overall estimate and stated which drugs had deviated from the others based on their 95% CI.

There is a recently established benchmark for change in hepatic TG: a reduction of ≥30% on MRI-measured liver fat is linked to a significant improvement in histology. We have now included this comparison in our manuscript (Jayakumar et al., 2019; Loomba et al., 2020; Stine et al., 2020).

We have also added a table (Supplementary file 1) to provide a narrative summary of the evidence for each drug class studied, with references to the key published RCTs.

I found the description that "The confidence intervals of 20 out of 21 drug classes overlapped" slightly confusing (Figure 2) when Thiazolidinediones CIs do not overlap with at least 3 other classes but do overlap with many others.

We apologise for not being clear. As described above, we have removed non-specific statements about overlapping drug classes and have provided further clarity on the criteria against each analysis was assessed.

There is also substantial within-class heterogeneity for some drug classes. Could there be differences in efficacy between individual drugs not captured at the class level? How many individual drugs are represented by each class? (Post hoc) subgroup analysis of classes with sufficient comparisons might provide more nuanced information on treatment response and factors affecting efficacy.

Thank you for this suggestion. We have now performed a sub-analysis by individual drugs for each outcome metric.

We found that considerable heterogeneity remains even when analysing by individual drug, a trend which was replicated for multiple outcomes.

There is variation of effect size within drug classes where more than one drug contributed (e.g. statin), however it is difficult to make firm conclusion about superiority individual drugs within classes because their confidence intervals overlap. We have provided the full results for all analyses to allow readers to examine results for individual drugs that may be of interest.

While it is dangerous to over-interpret results, I would probably recommend softening the argument of limited differences slightly.

We acknowledge that our original phrasing was too bold. We have edited the language and our conclusions significantly in light of our new results.

Does it make sense to look at drug dose as a potential source of heterogeneity across different drugs and classes? We would most likely expect individual drugs to have different effective doses.

This is an important suggestion and we had also expected to observe an effect of drug dose, however we did not find one in our original submission. We have checked and re-analysed the data to ensure we weren’t missing an effect and excluding any studies that used unusual drug doses or units. We then scaled all drug doses by each individual drug to facilitate maximum power for meta-regression. With this methodology we still did not observe any effect using our standard minimum of 3 studies per drug for inclusion. We also tried re-running the analysis where at least 10 studies were required for each drug and still did not observe any effect.

For example, this bubble plot reflects scaled drug dose (x-axis) and mean difference in hepatic TG (y-axis) for fenofibrate, the drug with the largest treatment effect size and had a suitable number of data points for meta-regression:

Author response image 1

We interpret this observation to be a further explanation for challenging translation from animals to humans and have added a comment regarding this in the Discussion.

The multivariable regression model explains ~50% of the heterogeneity in a subset of the studies but is not discussed anywhere in the manuscript (10 variables/71 studies?) Is there something about these studies that can provide information on the broader dataset or is this analysis of limited value?

Apologies for not having discussed these results thoroughly. Following the updates we have described above, we have more robust multivariable meta-regression results and improved performance of models. Several models (particularly for NAS and Steatosis grade) account for >60% of variation, without including drug as a variable, and remain robust after a permutation test.

Whilst we could not replicate the same model accuracy across all outcomes, we have used multimodel inference to provide additional evidence for the importance of individual variables (e.g. fat %kcal, glucose difference). We now conclude these analyses illustrate how the combination of study design and weight loss / insulin resistance are central to determining treatment response in NAFLD.

As only drugs/classes that have been used in clinical trials have been included, it would be interesting to note any similarities/differences in results e.g. GLP-1 agonists demonstrate a robust effect in the current data and the authors report that they demonstrated potential efficacy in clinical trial. Is there anything we can learn from examples of potentially successful translation?

This is a key question and we have now attempted to address it in more detail in the Discussion. The drugs that show good efficacy in humans (thiazolidinediones, GLP-1 agonists) do not stand out at the top of any outcome measure. Conversely, fibrates are consistently the drug class with the most efficacy in animals but do not show efficacy in humans.

Unfortunately, our interpretation of these findings is that there is no clearly discernible pattern.

https://doi.org/10.7554/eLife.56573.sa2

Article and author information

Author details

  1. Harriet Hunter

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review and editing
    Contributed equally with
    Dana de Gracia Hahn, Amedine Duret and Yu Ri Im
    Competing interests
    No competing interests declared
  2. Dana de Gracia Hahn

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review and editing
    Contributed equally with
    Harriet Hunter, Amedine Duret and Yu Ri Im
    Competing interests
    No competing interests declared
  3. Amedine Duret

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Formal analysis, Methodology, Writing - original draft, Writing - review and editing
    Contributed equally with
    Harriet Hunter, Dana de Gracia Hahn and Yu Ri Im
    Competing interests
    No competing interests declared
  4. Yu Ri Im

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Formal analysis, Investigation, Writing - original draft, Writing - review and editing
    Contributed equally with
    Harriet Hunter, Dana de Gracia Hahn and Amedine Duret
    Competing interests
    No competing interests declared
  5. Qinrong Cheah

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  6. Jiawen Dong

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  7. Madison Fairey

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  8. Clarissa Hjalmarsson

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  9. Alice Li

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  10. Hong Kai Lim

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-7266-7790
  11. Lorcan McKeown

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  12. Claudia-Gabriela Mitrofan

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
  13. Raunak Rao

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-6954-575X
  14. Mrudula Utukuri

    School of Clinical Medicine, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Data curation, Investigation, Writing - review and editing
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0003-1510-469X
  15. Ian A Rowe

    Leeds Institute for Medical Research & Leeds Institute for Data Analytics, University of Leeds, Leeds, United Kingdom
    Contribution
    Investigation, Methodology, Writing - review and editing
    Competing interests
    No competing interests declared
  16. Jake P Mann

    Institute of Metabolic Science, University of Cambridge, Cambridge, United Kingdom
    Contribution
    Conceptualization, Data curation, Formal analysis, Supervision, Funding acquisition, Investigation, Methodology, Writing - original draft, Project administration, Writing - review and editing
    For correspondence
    jm2032@cam.ac.uk
    Competing interests
    No competing interests declared
    ORCID icon "This ORCID iD identifies the author of this article:" 0000-0002-4711-9215

Funding

Wellcome Trust (216329/Z/19/Z)

  • Jake P Mann

European Society for Paediatric Research (Young Investigator Start-Up Grant)

  • Jake P Mann

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Senior Editor

  1. Eduardo Franco, McGill University, Canada

Reviewing Editor

  1. Joel K Elmquist, University of Texas Southwestern Medical Center, United States

Reviewer

  1. Sarah McCann, The Berlin Institute of Health, Germany

Version history

  1. Received: March 3, 2020
  2. Accepted: October 15, 2020
  3. Accepted Manuscript published: October 16, 2020 (version 1)
  4. Version of Record published: November 6, 2020 (version 2)

Copyright

© 2020, Hunter et al.

This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Metrics

  • 1,634
    Page views
  • 214
    Downloads
  • 1
    Citations

Article citation count generated by polling the highest count across the following sources: Crossref, PubMed Central, Scopus.

Download links

A two-part list of links to download the article, or parts of the article, in various formats.

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

  1. Harriet Hunter
  2. Dana de Gracia Hahn
  3. Amedine Duret
  4. Yu Ri Im
  5. Qinrong Cheah
  6. Jiawen Dong
  7. Madison Fairey
  8. Clarissa Hjalmarsson
  9. Alice Li
  10. Hong Kai Lim
  11. Lorcan McKeown
  12. Claudia-Gabriela Mitrofan
  13. Raunak Rao
  14. Mrudula Utukuri
  15. Ian A Rowe
  16. Jake P Mann
(2020)
Weight loss, insulin resistance, and study design confound results in a meta-analysis of animal models of fatty liver
eLife 9:e56573.
https://doi.org/10.7554/eLife.56573

Further reading

    1. Medicine
    Xiaoyan Yang, Hsiang-Chun Chang ... Hossein Ardehali
    Research Article

    Sirtuins (SIRT) exhibit deacetylation or ADP-ribosyltransferase activity and regulate a wide range of cellular processes in the nucleus, mitochondria and cytoplasm. The role of the only sirtuin that resides in the cytoplasm, SIRT2, in the development of ischemic injury and cardiac hypertrophy is not known. In this paper, we show that the hearts of mice with deletion of Sirt2 (Sirt2-/-) display improved cardiac function after ischemia-reperfusion (I/R) and pressure overload (PO), suggesting that SIRT2 exerts maladaptive effects in the heart in response to stress. Similar results were obtained in mice with cardiomyocyte-specific Sirt2 deletion. Mechanistic studies suggest that SIRT2 modulates cellular levels and activity of nuclear factor (erythroid-derived 2)-like 2 (NRF2), which results in reduced expression of antioxidant proteins. Deletion of Nrf2 in the hearts of Sirt2-/- mice reversed protection after PO. Finally, treatment of mouse hearts with a specific SIRT2 inhibitor reduced cardiac size and attenuates cardiac hypertrophy in response to PO. These data indicate that SIRT2 has detrimental effects in the heart and plays a role in cardiac response to injury and the progression of cardiac hypertrophy, which makes this protein a unique member of the SIRT family. Additionally, our studies provide a novel approach for treatment of cardiac hypertrophy and injury by targeting SIRT2 pharmacologically, providing a novel avenue for the treatment of these disorders.

    1. Medicine
    Luyang Cao, Lixiang Ma ... Jingsong Xu
    Research Article

    Billions of apoptotic cells are removed daily in a human adult by professional phagocytes (e.g. macrophages) and neighboring nonprofessional phagocytes (e.g. stromal cells). Despite being a type of professional phagocyte, neutrophils are thought to be excluded from apoptotic sites to avoid tissue inflammation. Here, we report a fundamental and unexpected role of neutrophils as the predominant phagocyte responsible for the clearance of apoptotic hepatic cells in the steady state. In contrast to the engulfment of dead cells by macrophages, neutrophils burrowed directly into apoptotic hepatocytes, a process we term perforocytosis, and ingested the effete cells from the inside. The depletion of neutrophils caused defective removal of apoptotic bodies, induced tissue injury in the mouse liver, and led to the generation of autoantibodies. Human autoimmune liver disease showed similar defects in the neutrophil-mediated clearance of apoptotic hepatic cells. Hence, neutrophils possess a specialized immunologically silent mechanism for the clearance of apoptotic hepatocytes through perforocytosis, and defects in this key housekeeping function of neutrophils contribute to the genesis of autoimmune liver disease.