Cook County Health and Hospital System, IL, USA; Midwestern University, IL, USA; Jinnah Sindh Medical University, Rafiqi H J Rd, Karachi Cantonment Karachi, Pakistan Cook County Health and Hospital System, IL, USA; Midwestern University, IL, USA; Jinnah Sindh Medical University, Rafiqi H J Rd, Karachi Cantonment Karachi, Pakistan
aDepartment of Medicine, Cook County Health and Hospital System, IL, USA (Rohit Agrawal, Muhammad Majeed, Yazan Abu Omar, Mbachi, Chimezi); bCollege of Medicine, Midwestern University, IL, USA (Palak Patel); cJinnah Sindh Medical University, Rafiqi H J Rd, Karachi Cantonment Karachi, Pakistan (Shaheera Kamal); dDivision of Gastroenterology and Hepatology, Department of Medicine, Cook County Health and Hospital System, County, Chicago, IL, USA (Melchor Demetria, Seema Gandhi)
Background: In 2012, the American Association for the Study of Liver Diseases published practice guidelines for the management of patients with ascites caused by cirrhosis, using data from randomized controlled trials (RCTs) and observational studies. We reexamined the strength of these RCTs by calculating the fragility index (FI), a novel metric proposed for evaluating the robustness of RCTs.
Methods: We screened all RCTs referenced in the guidelines for specific criteria. We calculated the FI and fragility quotient (FQ), and analyzed the correlation between FI and several variables.
Results: Twenty-one RCTs were included. The median (25th, 75th) FI and FQ were 1 (interquartile range [IQR] 0.5-6) and 0.070 (IQR 0.008-0.166), respectively. For studies that reported the number of patients lost to follow up (12 RCTs), the median of patients lost was 2 (IQR 0-6.5). There was no significant correlation between FI and sample size (rs=0.357), P-value (rs=-0.299), number lost to follow up (rs=0.355), Science Citation Index (rs=0.347), year of publication (rs=-0.085), blinding (rpb=-0.18) or number of centers (rpb=0.10). However, a significant correlation was seen between FI and number needed to treat (rs=-0.549; P=0.015).
Conclusions: RCTs in the field of cirrhosis-related ascites are fragile. Of the 21 trials analyzed, 13 had an FI of 3 or below and these trials influenced 13 of the 49 recommendations in the guidelines. We recommend the incorporation of FI and FQ in addition to P-value to better understand the meaning of the results in gastroenterological studies.
Key words: Clinical practice guideline, gastroenterology, randomized clinical trial, biostatistics, fragility index
Ann Gastroenterol 2019; 32 (6): 642-649
Cirrhosis is the leading cause of ascites in the United States, accounting for around 85% of cases [1]. Ascites is one of the 3 major complications encountered in patients with cirrhosis and has implications such as increased hospital admissions and increased cost of care [2,3]. In addition, it is associated with significant morbidity and approximately 40% mortality within 5 years [3]. Therefore, there is a significant need for quality guidelines to assist physicians in managing ascites.
In 2012, the American Association for the Study of Liver Diseases (AASLD) published the “Management of adult patients with ascites caused by cirrhosis”, which included 49 recommendation statements based on multiple randomized trials, non-randomized trials, meta-analysis, observational studies and consensus opinions from a panel of experts [4]. Recommendations are considered strongest when based on data from randomized control trials (RCTs) and their subsequent meta-analysis.
Data are considered significant when the P-value is ≤0.05. However, P-values have been criticized for being extremely simplistic and not efficient in expressing the true significance of data [5-9]. The fragility index (FI) estimates the minimum number of events that would have to change to modify the results of a particular study from significant to nonsignificant [10]. The FI can thus help to assess the robustness of the trial results that form the foundation for these guideline recommendations. The use of the FI was first shown to be of value in a paper published in 2014, which reported a median FI of 8 for approximately 400 RTCs, and since then it has gained traction as a novel metric in several specialties, such as critical care medicine, pediatrics, urology, and spinal surgery [10-14]. In fact, an article in Chest journal examined the strength of the trial outcomes used to create the 2018 CHEST guideline and expert panel report on antithrombotic therapy for venous thromboembolism disease [15]. Likewise, in an article published in JAMA Surgery, Tignanelli et al proposed the routine use of FI for trauma RCTs to assist physicians in making better decisions about trauma patients [16]. It has been proposed that the fragility quotient (FQ), which is FI divided by the total sample size, should be reported along with FI, so as to convert an absolute measure (FI) into a relative one (FQ) for better understanding of the fragility of trials [17].
FI and FQ are yet to be utilized in the field of gastroenterology. The aim of this study was to determine the FI and FQ of the trial outcomes used to create the recommendations for the management of ascites in cirrhotic patients outlined in the AASLD guidelines, in order to gauge the strength of these recommendations.
We screened all RCTs referenced in the 2012 AASLD guidelines on the management of adult patients with ascites due to cirrhosis. Randomized trials using a 1:1 allocation ratio, parallel 2-group designs, and those with at least one dichotomous outcome qualified for inclusion. Only statistically significant outcomes were further included in the final analysis.
The review of the guidelines was followed by a MEDLINE and PubMed data base search to acquire the abstracts and full-texts of eligible trials. Two independent investigators (RA and MM) screened the abstracts and full texts to identify eligible RCTs and extracted data using a pre-specified piloted data collection electronic form. In case of any discrepancies, a third investigator (YA) was referred to for consensus.
The variables extracted from each RCT were the outcomes reported, sample size, sample size of each group, the event rates of outcomes in each group, P-value, number lost to follow up, year of publication, number of centers and the trial’s Science Citation Index (SCI). SCI is a tool for identifying a researcher’s publications and the number of times his paper has been cited by other authors [18]. When available, additional data on the type of blinding (unblinded, single-blind or double-blind), RCT type (placebo-controlled or active comparator), type of intervention (pharmaceutical, surgical or other), type of funding (government or private), and whether the intention-to-treat principle was employed or not were also collected. Primary outcomes were prioritized for the analysis; however, when these not specified or statistically insignificant, secondary or any significant dichotomous outcomes were included. For each trial, the intervention effect was calculated and expressed as the number needed to treat (NNT).
FI represents the minimum number of patients whose status needs to be changed from a “nonevent” to an “event” to make the P-value nonsignificant (i.e., exceed 0.05). The lower the FI, the more fragile the trial data from an RCT [10]. The FQ, on the other hand, is a relative measure of fragility calculated by FI divided by the sample size of a given trial [17]. These can, however, only be applied to RCTs with dichotomous outcomes.
All statistical analyses were performed using SPSS version 23.0 (IBM, Chicago, IL, USA; 2012). Data from each trial were depicted in a 2-by-2 contingency table (Table 1) and the FI was calculated demonstrated by Walsh et al [10]. Events were added to the smaller event group and non-events were simultaneously subtracted while maintaining a constant patient population. Fisher’s exact test was then used to re-calculate the 2-sided P-value while iteratively adding events until the first time the P-value exceeded 0.05. The number of additional events required to reach a P-value >0.05 was considered as that trial’s FI. We determined the median FI amongst the identified trials of recorded events. We analyzed the correlation between FI and sample size, P-value, number lost to follow up, NNT, year of publication, and SCI, expressed as a Spearman correlation coefficient (rs). Two trials, one with a high sample size and the other with a high FI, were excluded from the correlation analysis, as they would have skewed the analysis in a non-meaningful way, potentially creating false-positive relations. The point-biserial correlation coefficient (rpb) was used to expresses correlations between dichotomous outcomes (blinding vs. no blinding and single- vs. multicenter trials) and FI. Differences in FI between several groups were assessed by Mann-Whitney U test to acknowledge the non-parametric distribution of data points. We considered P-values <0.05 to be statistically significant.
Table 1 Calculation of the fragility index [10]
Of the 214 references used to develop the guidelines, 57 were RCTs. Twenty-one RCTs had a 1:1 parallel trial design, dichotomous outcomes and significant P-values. Fig. 1 demonstrates the flow of articles through the screening process.
Figure 1 Flow of articles through screening and reasons for exclusion
The median sample size was 80 (interquartile ratio [IQR] 60-106). Twelve trials (57%) reported the number of patients lost to follow up. For studies that reported the number of patients lost to follow up (12 RCTs), the median of patients lost was 2 (IQR 0-6.5). The median number of citations was 335 (IQR 142-417). Nine trials (42%) were blinded, 5 trials (23%) were unblinded, while 7 trials (33%) did not report it. Nine trials (42%) were multicenter trials, 10 (47%) were single-center, while 2 (0.1%) trials did not report it. Twelve trials (57%) reported an intention-to-treat analysis. The median year of publication was 2000 (IQR 1994-2007), with 11 trials (52%) published before 2000.
The median (25th, 75th) FI was 1 (IQR 0.5-6), ranging from 0-17. A histogram showing the frequencies of FI scores is represented in Fig. 2. Six trials had an FI of 1 while 5 trials had an FI of 0. The median (25th, 75th) FQ was 0.07 (IQR 0.008-0.166). There was no significant correlation between FI and sample size (rs=0.357), P-value (rs=-0.299), number lost to follow up (rs=0.355), SCI (rs=0.347), year of publication (rs=-0.085), blinding (rpb=-0.18), or number of centers (rpb=-0.10). None of the P-values were significant. Our analysis showed a significant correlation between FI and NNT (rs=-0.549; P=0.015). A scatter plot relating FI to sample size is represented in Fig. 3. Table 2 represents the complete FI/FQ and NNT data, along with other characteristics, for all the 21 trials included.
Table 2 List of all included studies (N=21) and basic characteristics
Figure 2 Distribution of fragility index values from 21 trials. The median number of patients whose status would have to change from a non-event to an event to change a statistically significant result to a non-significant result was 1 (interquartile range 0.5-6)
Figure 3 Scatterplot of fragility index and sample size
When segregated by sample size, the median FI for 11 (52%) trials that had a sample size of less than 80 was 1 (IQR 0-2), compared to the median FI of 4.5 (IQR 0.75-9.75; P=0.106) for trials with a sample size of 80 and above. For the 11 trials published in 2000 and before, the median FI was 1 (IQR 0-6), lower than the median FI of 2 (IQR 0.75-6.75; P=0.47) for trials published after 2000. Likewise, the median FI was 1 (0-2) for the 11 trials cited 334 times or less, compared to 4 (IQR 0.75-7.5; P=0.22) for studies with more than 334 citations. Trials with an NNT of 6 and below had a median FI of 1.5 (IQR 0.75-6; P=0.22), compared to 1 for those with an NNT greater than 6 (Table 3). However, none of these associations were found to be significant (Table 4).
Table 3 Median levels of the fragility index (FI) according to the characteristics of the randomized clinical trials (N=21 trials included)
Table 4 Correlation of trial characteristics with fragility index
Our investigation demonstrated a median (25th, 75th) FI of 1 and FQ of 0.070 for all the 21 trials, suggesting that only 1 inferior event on average in either arm would render their significant findings insignificant. The median number of patients lost to follow up reported in the 12 trials was 2. It is also worth mentioning that the FI was less than or equal to the number lost to follow up in 6 of the 21 trials. If data from one of these patients had been available, it could have swayed the final outcome of the study in a significant direction.
A combination of results from several studies influences specific recommendations, with some having a stronger FI than the other, but the average FI for most of these studies was low. Recommendations 8, 9, 12, 24, 27, 33-37 and 39-41 are based on the RCTs included in our analysis. Recommendation 8, which suggests the use of baclofen, is based on an RCT with an FI of 9 and a number lost to follow up of 12. Likewise, recommendations 24, 27 and 33 hinge on RCTs with an FI lower than the number of patients lost to follow up. These numbers clearly bring into question the strength of the recommendations. On the other hand, our analysis showed that recommendations concerning the use of antibiotics for spontaneous bacterial peritonitis (SBP) in patients with gastrointestinal bleeding and previous episodes of SBP (Recommendations 34, 35) are supported by slightly stronger RCTs with FIs of 12 and 3, respectively, with both studies having lost no patients to follow up.
Narayan et al reported an FI of 3 (IQR 1-4.5), where 67.5% of the trials had a number of patients lost to follow up greater than the FI in an analysis of RCTs in the field of urology [11]. Likewise, a study examining RCTs in spine surgery reported an FI of 2, where 25% of the trials had a number of patients lost to follow up greater than the FI [14]. These results are in line with our findings. If there a large number of patients are lost to follow up, the results should be interpreted with caution, as the data are likely to be very fragile. Although the trials we analyzed were of 2-by-2 factorial design and included only dichotomous outcomes, our finding still suggests that traditional statistical tests, such as P-value and 95% confidence intervals, are not reliable in isolation for evaluating the effectiveness of interventions in RCTs [16].
Finally, we also found there was no correlation between FI and the other variables included in the analysis, except for NNT. This reconfirms that FI is an independent measure of robustness and is not affected by other parameters, some of which can affect the P-value. Studies in the fields of urology and spine surgery have, however, reported a positive correlation between FI and sample size, while other studies have demonstrated a weak positive correlation. A study in pediatrics published in 2017 reported no correlation between FI and sample size [11-15]. The relationships between FI and sample size, along with other parameters, are inconsistent among several studies and a concrete understanding is yet to be established. With the exception of one very large trial (N=6632 and FI=17) [22], the remaining trials [19-21,23-39] in our analysis had sample sizes in a narrow range (27-117) that does not allow any assessment of the possible relationship between the FI and sample size. Our findings of no correlation between FI and other parameters are either true or attributable to the limited number of trials and their sample sizes.
Our methodology included duplicative data extraction with 2 independent investigators. We had no limitations on including interventions and outcomes, considering either primary or secondary outcomes. Given the nature of FI analysis, we included only randomized trials with dichotomous outcomes and significant P-values, which restricts the generalization of our results to all the RCTs referenced in the 2012 guidelines. The median sample size of the RCTs in our investigation was 80, which could be the reason why a positive correlation could not be established between FI and other parameters. The research community is yet to establish a specific threshold for FI to establish its significance, as seen with a P-value of 0.05. Increased use of this metric is therefore needed to better understand the true nature and value of FI.
In conclusion, RCTs conducted in the field of cirrhosis-related ascites hinge on a very small number of superior events, therefore making these studies fragile. The trials become weaker when large numbers of patients are lost to follow up and this number exceeds FI, seen in the studies we examined. Based on our investigation, it is clear that the guidelines on ascites exhibit high fragility, and this should be taken into account when making clinical decisions. When interpreting the outcomes of RCTs, FI should not be used in isolation, but rather coupled with other measures of statistical significance, such as P-value, 95% confidence intervals and sample size, to determine the robustness of the trials. We recommend the incorporation of FI and FQ in gastroenterological trials to better understand the true strength of the data outcomes.
What is already known:
Recommendations from guidelines are considered strongest when based on data from randomized controlled trials (RCTs)
Data is defined as significant on the basis of a P-value, which has been criticized for being simplistic
Fragility index (FI) and fragility quotient (FQ) have been proposed as a novel metric to evaluate the robustness of RCTs
What the new findings are:
When evaluating the strength of a few recommendations of guidelines on the management of cirrhosis, we found that the RCTs hinge on a very small number of superior events that make these recommendations extremely fragile
FI/FQ should be used along with P-value when we evaluate the strength of these recommendations
Priyanka Agrawal, MPH, MScD, Research Associate in the Department of International Health, Johns Hopkins Bloomberg School of Public Health.
1. Runyon BA, Montano AA, Akriviadis EA, Antillon MR, Irving MA, McHutchison JG. The serum-ascites albumin gradient is superior to the exudate-transudate concept in the differential diagnosis of ascites. Ann Intern Med 1992;117:215-220.
2. Ginés P, Quintero E, Arroyo V, et al. Compensated cirrhosis:natural history and prognostic factors. Hepatology 1987;7:122-128.
3. Planas R, Montoliu S, BallestéB, et al. Natural history of patients hospitalized for management of cirrhotic ascites. Clin Gastroenterol Hepatol 2006;4:1385-1394.
4. Runyon BA;AASLD. Introduction to the revised American Association for the Study of Liver Diseases Practice Guideline management of adult patients with ascites due to cirrhosis 2012. Hepatology 2013;57:1651-1653.
5. Breau RH, Dahm P, Fergusson DA, Hatala R. Understanding results. J Urol 2009;181:985-992.
6. Goodman SN. Toward evidence-based medical statistics. 1:The P value fallacy. Ann Intern Med 1999;130:995-1004.
7. McIlroy D. Seduced by a P-value. Anaesth Intensive Care 2014;42:551-554.
8. Rothman KJ. Significance questing. Ann Intern Med 1986;105:445-447.
9. Rozeboom WW. The fallacy of the null-hypothesis significance test. Psychol Bull 1960;57:416-428.
10. Walsh M, Srinathan SK, McAuley DF, et al. The statistical significance of randomized controlled trial results is frequently fragile:a case for a Fragility Index. J Clin Epidemiol 2014;67:622-628.
11. Narayan VM, Gandhi S, Chrouser K, Evaniew N, Dahm P. The fragility of statistically significant findings from randomised controlled trials in the urological literature. BJU Int 2018;122:160-166.
12. Ridgeon EE, Young PJ, Bellomo R, Mucchetti M, Lembo R, Landoni G. The fragility index in multicenter randomized controlled critical care trials. Crit Care Med 2016;44:1278-1284.
13. Matics TJ, Khan N, Jani P, et al. The fragility index in a cohort of pediatric randomized controlled trials. J Clin Med 2017;6.
14. Evaniew N, Files C, Smith C, et al. The fragility of statistically significant findings from randomized trials in spine surgery:a systematic survey. Spine J 2015;15:2188-2197.
15. Edwards E, Wayant C, Besas J, et al. How fragile are clinical trial outcomes that support the CHEST clinical practice guidelines for VTE?Chest 2018;154:512-520.
16. Tignanelli CJ, Napolitano LM. The fragility index in randomized clinical trials as a means of optimizing patient care. JAMA Surg 2019;154:74-79.
17. Ahmed W, Fowler RA, McCredie VA. Does sample size matter when interpreting the fragility index?Crit Care Med 2016;44:e1142-e1143.
18. Garfield E. The evolution of the Science Citation Index. Int Microbiol 2007;10:65-69.
19. Addolorato G, Leggio L, Ferrulli A, et al. Effectiveness and safety of baclofen for maintenance of alcohol abstinence in alcohol-dependent patients with liver cirrhosis:randomised, double-blind controlled study. Lancet 2007;370:1915-1922.
20. Garg H, Sarin SK, Kumar M, Garg V, Sharma BC, Kumar A. Tenofovir improves the outcome in patients with spontaneous reactivation of hepatitis B presenting as acute-on-chronic liver failure. Hepatology 2011;53:774-780.
21. Angeli P, Fasolato S, Mazza E, et al. Combined versus sequential diuretic treatment of ascites in non-azotaemic patients with cirrhosis:results of an open randomised clinical trial. Gut 2010;59:98-104.
22. Pitt B, Remme W, Zannad F, et al;Eplerenone Post-Acute Myocardial Infarction Heart Failure Efficacy and Survival Study Investigators. Eplerenone, a selective aldosterone blocker, in patients with left ventricular dysfunction after myocardial infarction. N Engl J Med 2003;348:1309-1321.
23. Ginés P, Arroyo V, Quintero E, et al. Comparison of paracentesis and diuretics in the treatment of cirrhotics with tense ascites. Results of a randomized study. Gastroenterology 1987;93:234-241.
24. Rössle M, Ochs A, Gülberg V, et al. A comparison of paracentesis and transjugular intrahepatic portosystemic shunting in patients with ascites. N Engl J Med 2000;342:1701-1707.
25. Lebrec D, Giuily N, Hadengue A, et al. Transjugular intrahepatic portosystemic shunts:comparison with paracentesis in patients with cirrhosis and refractory ascites:a randomized trial. French Group of Clinicians and a Group of Biologists. J Hepatol 1996;25:135-144.
26. Bureau C, Garcia-Pagan JC, Otal P, et al. Improved clinical outcome using polytetrafluoroethylene-coated stents for TIPS:results of a randomized study. Gastroenterology 2004;126:469-475.
27. Felisart J, Rimola A, Arroyo V, et al. Cefotaxime is more effective than is ampicillin-tobramycin in cirrhotics with severe infections. Hepatology 1985;5:457-462.
28. Sort P, Navasa M, Arroyo V, et al. Effect of intravenous albumin on renal impairment and mortality in patients with cirrhosis and spontaneous bacterial peritonitis. N Engl J Med 1999;341:403-409.
29. Soriano G, Guarner C, TeixidóM, et al. Selective intestinal decontamination prevents spontaneous bacterial peritonitis. Gastroenterology 1991;100:477-481.
30. Ginés P, Rimola A, Planas R, et al. Norfloxacin prevents spontaneous bacterial peritonitis recurrence in cirrhosis:results of a double-blind, placebo-controlled trial. Hepatology 1990;12:716-724.
31. Blaise M, Pateron D, Trinchet JC, Levacher S, Beaugrand M, Pourriat JL. Systemic antibiotic therapy prevents bacterial infection in cirrhotic patients with gastrointestinal hemorrhage. Hepatology 1994;20:34-38.
32. Fernández J, Ruiz del Arbol L, Gómez C, et al. Norfloxacin vs ceftriaxone in the prophylaxis of infections in patients with advanced cirrhosis and hemorrhage. Gastroenterology 2006;131:1049-1056;quiz 1285.
33. Singh N, Gayowski T, Yu VL, Wagener MM. Trimethoprim-sulfamethoxazole for the prevention of spontaneous bacterial peritonitis in cirrhosis:a randomized trial. Ann Intern Med 1995;122:595-598.
34. Rolachon A, Cordier L, Bacq Y, et al. Ciprofloxacin and long-term prevention of spontaneous bacterial peritonitis:results of a prospective controlled trial. Hepatology 1995;22:1171-1174.
35. Fernández J, Navasa M, Planas R, et al. Primary prophylaxis of spontaneous bacterial peritonitis delays hepatorenal syndrome and improves survival in cirrhosis. Gastroenterology 2007;133:818-824.
36. Tyagi P, Sharma P, Sharma BC, Puri AS, Kumar A, Sarin SK. Prevention of hepatorenal syndrome in patients with cirrhosis and ascites:a pilot randomized control trial between pentoxifylline and placebo. Eur J Gastroenterol Hepatol 2011;23:210-217.
37. Akriviadis E, Botla R, Briggs W, Han S, Reynolds T, Shakil O. Pentoxifylline improves short-term survival in severe acute alcoholic hepatitis:a double-blind, placebo-controlled trial. Gastroenterology 2000;119:1637-1648.
38. Sanyal AJ, Boyer T, Garcia-Tsao G, et al. A randomized, prospective, double-blind, placebo-controlled trial of terlipressin for type 1 hepatorenal syndrome. Gastroenterology 2008;134:1360-1368.
39. Martín-LlahíM, Pépin MN, Guevara M, et al. Terlipressin and albumin vs albumin in patients with cirrhosis and hepatorenal syndrome:a randomized study. Gastroenterology 2008;134:1352-1359