by Zuzana Irsova, Pedro R. D. Bom, Tomas Havranek, and Heiko Rachinger
Meta-analysis upweights studies reporting lower standard errors and hence more precision. But in empirical practice, notably in observational research, precision is not given to the researcher. Precision must be estimated, and thus can be p-hacked to achieve statistical significance. Simulations show that a modest dose of spurious precision creates a formidable problem for inverse-variance weighting and bias-correction methods based on the funnel plot. Selection models fail to solve the problem, and the simple mean can dominate sophisticated estimators. Cures to publication bias may become worse than the disease. We introduce an approach that surmounts spuriousness: the Meta-Analysis Instrumental Variable Estimator (MAIVE).
The paper is available at meta-analysis.cz/maive. We provide a package for R (maive), which makes it easy to use the new method.
What is already known
In meta-analysis it's optimal to give more weight to more precise studies.
Inverse-variance weighting maximizes efficiency and may attenuate publication bias.
Inverse-variance weighting is used by all common estimators.
What is new
If reported precision exaggerates real one, inverse-variance weighting creates a bias.
Bias in current methods due to spurious precision can exceed publication bias.
Spurious precision arises naturally in observational research via p-hacking.
Meta-Analysis Instrumental Variable Estimator (MAIVE) corrects for spuriousness.
Potential impact
Meta-analysts should use MAIVE if they suspect p-hacking.
The difference between MAIVE and unadjusted estimators can measure spuriousness.
MAIVE substantially improves the robustness of the current meta-analysis toolkit.
Inverse-variance weighting reigns in meta-analysis. [1] More precise studies, or rather those seemingly more precise based on lower reported standard errors, get a greater weight explicitly or implicitly. The weight is explicit in traditional summaries, such as the fixed-effect model (assuming a common effect) and the random-effects model (allowing for heterogeneity). [2,3] These models work as weighted averages, the weight diluted in random effects by a heterogeneity term. The weight is also explicit in publication bias correction models based on the funnel plot. [4–12] In funnel-based models, reported precision is particularly important because the weighted average gets reinforced by assigning more importance to supposedly less biased (nominally more precise) studies. The weight is implicit in selection models estimated using the maximum likelihood approach, [13–18] which often reduce to the random-effects model in the absence of publication bias.
Reported precision turned spurious
The tacit assumption behind all these techniques is that the reported, nominal precision represents the true, underlying precision. The standard error, inverse of precision, is given to the researcher by her data and methods. It's fixed and can't be manipulated, consciously or unconsciously. The assumption is plausible in experimental research, for which most meta-analysis methods were developed. But in observational research, where thousands of meta-analyses are produced each year, the derivation of the standard error is often a key part of the empirical exercise. Consider a regression analysis with longitudinal data: explaining the health outcomes of patients treated by different physicians and observed over several years. Individual observations aren't independent, and standard errors need to be clustered. [19] But how? At the level of physicians, patients, or years? Should one use double clustering [20] or perhaps wild bootstrap [21]? It's complicated, and with a different computation of confidence intervals the researcher will report different precision for the same estimated effect size.
Spurious precision can arise in many contexts other than longitudinal data analysis. Ordinary least squares, the workhorse of observational research, assume homoskedasticity of residuals. The assumption is often violated, and in these cases researchers should use heteroskedasticity-robust standard errors, [22] typically larger than plain vanilla standard errors. If researchers ignore heteroskedasticity, they report precise estimates, but the precision is spurious. Similar problems may arise due to nonstationarity in time series [23] and a myriad of other issues. When a study with exaggerated precision enters meta-analysis, it gets too much impact because of inverse-variance weighting. If a meta-analyst spots the methodological problem, she can exclude the study or add a corresponding control. Either way, the weighting problem isn't properly addressed. And spotting misspecification is hard, because the tweak lifting reported precision can be hidden within a complex model.
Various sources of spuriousness
Spurious precision can also arise due to cheating. For economics journals, quasi-experimental evidence shows that the introduction of obligatory data sharing substantially reduced the reported t-statistics. [24] Prior to the introduction of data sharing some authors had probably cheated by manipulating data or results. Pütz and Bruns find hundreds of reporting errors in top economics journals; when they ask authors to explain the errors, the authors are four times more likely to admit a mistake in the standard error than in the estimated effect size. [67] But cheating, mistakes, and other issues that can affect the standard error independently of the estimated effect size aren't necessary to produce spurious precision. A realistic mechanism is p-hacking, in which the researcher adjusts the entire model to produce statistically significant results. After adjusting the model, both the effect size and standard error change, and both can jointly contribute to statistical significance. We examine, by employing Monte Carlo simulations, the consequences of cheating and the more realistic p-hacking behavior, of which spurious precision is a natural result.
Figure 1 gives intuition on the cheating/clustering/heteroskedasticity/nonstationarity simulation. For brevity we call it a cheating scenario. Researchers crave statistically significant estimates and to that effect manipulate effect sizes or standard errors at will, but not both at the same time. The scenario is simplistic, and we start with it because it allows for a clean separation of selection on estimates (conventional in the literature) and selection on standard errors (our focus). The separation isn't so clean in the p-hacking scenario but can be mapped back to the cheating scenario. The mechanism of the left-hand panel of Figure 1 is analogous to the Lombard effect in psychoacoustic: [25,26] speakers increase their vocal effort in response to noise. Here researchers increase their selection effort in response to noise in data or methods, noise that produces imprecision and insignificance. When researchers so cheat with effect sizes, the results are consistent with funnel-based models of publication bias: funnel asymmetry arises, the most precise estimates remain close to the true effect, and inverse-variance weighting helps mitigate the bias—aside from improving the efficiency of the aggregate estimate, the original rationale for using the weights. [27,28]
Bias in inverse-variance weighting
The right-hand panel of Figure 1 paints a different picture. Here the mechanism is analogous to Taylor’s law in ecology: [29] the variance can decrease with a smaller mean (originally describing population density for various species). When researchers achieve significance by lowering the standard error, we again observe funnel asymmetry. But this time no bias arises in the reported effect sizes: the black-filled circles and the hollow circles denote the same effect size, only precision changes. The simple unweighted mean of reported estimates is unbiased, and inverse-variance weighting paradoxically creates a downward bias. The bias increases when we use a correction based on the funnel plot: effectively, when we estimate the size of a hypothetical infinitely precise study, the intercept of a regression curve.
In practice, as noted, selection on estimates and standard errors arises simultaneously. We generate this quality in simulations by allowing researchers to replace control variables in a regression context, a mechanism that also gives rise to sizable heterogeneity. Control variables are correlated with the main regressor or of interest (for example, a treatment variable), and their replacement affects both the estimated treatment effect and the corresponding precision. Then p-hacked estimates move not strictly north or west, as in the figure, but northwest. Even spuriously large estimates can now be spuriously precise. The resulting bias direction due to inverse-variance weighting is unclear. Our simulations suggest that an upwards bias is plausible.
Current methods fail with spurious precision
Does any technique yield little bias and good coverage rates in the case of panel B of Figure 1, or at least with a small ratio of selection on standard errors relative to selection on estimates? We examine 7 current estimators: simple unweighted mean, fixed effects (weighted least squares, FE/WLS), [30] precision-effect test and precision-effect estimate with standard errors (PET-PEESE), [9] endogenous kink (EK), [11] weighted average of adequately powered estimates (WAAP), [10] the selection model by Andrews and Kasy, [17] and p-uniform∗ [18]. The first two are basic summary statistics, the next three are correction methods based on the funnel plot, and the last two are selection models. The choice of estimators is subjective, but the three funnel-based techniques are commonly used in observational research. [31–38] The two selection models are also used often [39–46] and represent the latest incarnations of models in the tradition of Hedges [13–16] and their simplifications [47–52].
The importance of reported precision for these estimators is summarized in Table 1. In most of them precision has two roles: weight and identification. Identification can be achieved through meta-regression (where the standard error or a function thereof is included as a regressor), selection model, or a combination of both—such as the EK model.
None of these 7 estimators work well with even a sprinkle of spurious precision. The simple unweighted mean plagued by publication bias can be the best, but still no good. The reader might expect selection models to beat funnel-based models, because of the latter’s heavier reliance on precision. Alas, this is generally not the case, and even selection models are often defeated by the simple mean when selection on standard errors is modest (about 1:5 and more compared to selection on estimates). We propose a straightforward adjustment of funnel-based techniques, the meta-analysis instrumental variable estimator (MAIVE), which corrects most of the bias and restores valid coverage rates. MAIVE replaces, in all meta-analysis contexts, reported variance with the portion of reported variance that can be explained by the inverse sample size used in the primary study. We justify the idea by starting with a version of the Egger regression: [4]
where αi_hat on the left-hand side denotes effects estimated in primary studies and SE their standard errors. This is the PEESE model due to Stanley and Doucouliagos, but for simplicity without additional inverse-variance weights—since the model searches for the effect conditional on maximum precision, it already features an implicit, built-in weight. In panel A of Figure 1, the quadratic regression would fit the data quite well, [9] and estimated α0 would lie close to the mean underlying effect. In panel B, however, the regression fails to recover the underlying coefficients. The regression fails because it assumes a causal effect of the standard error on the estimate: a good description of panel A (Lombard effect), but not panel B (Taylor’s law). In panel B, the standard error sometimes depends on the estimated effect size and is thus correlated with the error term, vi. The resulting estimates of α0 (true effect) and β (intensity of selection) are biased.
Endogeneity problem in meta-regression
The problem is the correlation between SE and vi, which can arise for three reasons: First, selective reporting based on standard errors, which we simulate. Second, measurement error in SE. This issue was mentioned in 2005 by Tom Stanley, [53] who was the first to instrument the standard error in a meta-analysis context. Nevertheless, Stanley didn't discuss the adjustment of weights nor did he pursue the idea further as a bias-correction estimator. We don't consider this source of correlation in simulations. Third, the correlation can be caused by unobserved heterogeneity: some method choices affect both estimates and standard errors, and some standardized meta-analysis effects feature a mechanical correlation between both quantities. [36] (A careless meta-analyst may also mix estimates measured in different units. [54]) Our p-hacking simulation only partly addresses this mechanism by allowing researchers to change control variables, which can affect both estimates and standard errors at the same time—a combination of panel A and panel B of Figure 1. In other words, we model only some of the mechanisms which give rise to spurious precision.
The statistical solution to the problem, often called endogeneity, is to find an instrument for the standard error. A valid instrument is correlated with the standard error, but not with the error term (and thus unrelated to the three sources of endogeneity mentioned above). While finding good instruments is often challenging, here the answer beckons. By definition, reported variance (SE2) is a linear function of the inverse of the sample size used in the primary study. The sample size is plausibly robust to selection, or at least it's more difficult to collect more data than to p-hack the standard error to achieve significance. The sample size isn't estimated, and so it doesn't suffer from measurement error. The sample size is typically not affected by changing methodology, certainly not by changing control variables. Some endogeneity may remain if researchers correctly expecting smaller effects design larger experiments. [41] But, at least in observational research, authors often use as much data as available from the start. Indeed, the sample size, unlike the standard error, is often given to the researcher: the very word data means things given.
Meta-Analysis Instrumental Variable Estimator (MAIVE)
We regress the squared reported standard errors on the inverse sample size and plug the fitted values instead of the variance to the right-hand side of the aforementioned equation. Thence we obtain the baseline MAIVE estimator. For the baseline MAIVE we choose the instrumented version of PEESE without additional inverse-variance weights because it works well in simulations. The version with additional adjusted weights (again, using fitted values instead of reported precision) often performs similarly but is more complex, so we prefer the former, parsimonious solution. In principle, any funnel-based technique (and the funnel plot itself) can be adjusted by the procedure described above: just replace the standard error with the square root of the fitted values. The adjustment helps the fixed-effect, WAAP, and endogenous kink model to typically defeat both the simple unweighted mean and selection models in the presence of spurious precision. MAIVE can be easily applied using our maive package in R.
Table 2 shows the variants of individual estimators we consider in simulations. We always start with the unadjusted, plain-vanilla variant. Where easily possible, we consider the adjustment of weights and identification devices separately. So, for PET-PEESE and EK we have 5 different flavors. Note that the separation is not straightforward for selection models, and we do not pursue it here. This is one of many low-hanging fruits (thesis topics?) that grow from the spurious precision project and await reaping in future research; we discuss more at the end of this column.
In Figure 2 below we report one set of simulation results: the case of the p-hacking scenario with a positive underlying effect size. In this scenario the authors of primary studies run regressions with two variables on the right-hand side and are interested in the slope coefficient on the first variable. Both variables belong to the correctly specified regression model. A meta-analyst collects the slope coefficients estimated for the first variable (e.g., treatment); no one is interested in the second variable (control). The vertical axis in Figure 2 measures the bias of meta-analysis estimators relative to the true value of the slope coefficient, the true treatment effect (α1 = 1). The horizontal axis measures the correlation between the regression variable of interest and a control variable that should be included—but can be replaced by some researchers with another, less relevant control, a practice that affects both reported estimates and their standard errors.
The higher the correlation, the more potential for p-hacking via the replacement of the control variable. If there is no correlation, removing or replacing the control won't systematically affect the main estimated parameter. With a positive correlation and a positive underlying value of the second slope coefficient, replacing the control variable with a less relevant proxy creates an upwards omitted-variable bias. Importantly for our purposes, a higher correlation increases selection on standard errors more than proportionally compared to selection on estimates. With a higher correlation and thus more p-hacking and also more relative selection on standard errors, the bias of standard meta-analysis estimators increases. Note that even a large correlation still corresponds to a relatively small ratio of selection on standard errors (spurious precision, Taylor's law) relative to selection on estimates (Lombard effect). In the paper we compute and tabulate this correspondence for different values of the true effect.
Simple mean beats complex models
Eventually, the bias of the classical, unadjusted techniques gets even larger than the bias of the simple unweighted mean. That is, when there is enough spuriousness, corrections for publication bias do more harm than good. MAIVE corrects most of the spuriousness bias (see panels B-G in Figure 2 and compare panel A, classical estimators, to panel H, MAIVE versions of these estimators), and the MAIVE versions with adjusted or omitted weights work similarly well. MAIVE performs comparably to conventional estimators if spurious precision is negligible, and dominates unadjusted estimators if spuriousness is non-negligible (as we show in the paper, about 1:10 of selection on standard errors to selection on estimates). In the paper we report the results of many more simulation scenarios, both cheating and p-hacking, for bias, MSE, and coverage rates—with comparable results in qualitative terms. Even a modest dose of spurious precision makes inverse-variance weighting (explicit or implicit) unreliable and warrants a MAIVE treatment.
Why, instead of instrumenting, don't we simply replace variance with inverse sample size? [36,55–57] While the replacement would also address spurious precision, the instrumental approach has many advantages, as discussed in Section 3 of the paper. One advantage is flexibility: the instrumental approach can incorporate other aspects of study design, besides sample size, that affect standard errors. Sample size rarely forms a perfect proxy for precision, and MAIVE can be extended by adding instruments to improve the fit. Moreover, the instrumental approach remains statistically valid even if, for some reason, the correlation between the reported variance and the inverse sample size is small.
Current methods adjusted to spuriousness
We don't argue that spurious precision is common. We argue that it can plausibly arise in observational research. Even in experimental settings, randomization can fail, [58] and authors often use regressions to control for pre-treatment covariates or make other adjustments [59] that can yield spurious precision. When it arises, a small dose can render the simple mean more reliable than sophisticated correction techniques. The Meta-Analysis Instrumental Variable Estimator (MAIVE) solves the problem by using inverse sample size as an instrument for reported variance. That is, we regress the reported squared standard errors on the inverse of the number of observations used in the primary study. The fitted values from this regression are then used instead of reported variance in the PEESE meta-regression. Standard weighted means, funnel plots, and funnel-based methods can be adjusted similarly to make them robust to spurious precision. The entire meta-analysis toolkit can be salvaged with this modification.
The instrumental approach has seven benefits over using sample size as a proxy for precision, as noted, and we explain them in the paper. There are at least two costs as well, both compared to the proxy approach and the classical one that relies on reported precision. First, MAIVE is more complex since it involves an additional regression and computation of fitted values and valid confidence intervals. But the instrumental approach is readily available in most statistical programs. We create the maive package for R, which makes estimation easy for meta-analysts unfamiliar with instrumental variables. Second, the additional regression makes MAIVE noisier compared to conventional techniques. When a meta-analyst is sure there can be absolutely no spurious precision in her data, using reported precision without instruments will yield unbiased and more efficient estimates. The lack of spurious precision can be tested approximately by employing the Hausman specification test: [60] if the coefficients estimated in MAIVE are far from those of an unadjusted PEESE, spurious precision is likely an issue.
Using MAIVE in practice
A discussion is in order regarding the application of MAIVE—pronounced, by the way, as the Irish name Maeve. The instrument is the overall sample size, not degrees of freedom, because the latter depends on clustering units. We prefer the MAIVE version of PEESE without weights (after testing with unweighted MAIVE-PET whether the true effect is nonzero). This parsimonious specification intuitively fits both panels of Figure 1. The maive package allows for optional adjusted weights. Researchers may choose a MAIVE version of another estimator, such as endogenous kink. The package also runs the Hausman test, a rough indicator of spuriousness. Because PEESE is heteroskedastic by definition and we prefer not to use inverse-variance weights, the package produces heteroskedasticity-robust standard errors by default. When some studies report multiple estimates, standard errors in MAIVE—and any meta-analysis estimator—should be clustered at the study level, again a default option. With fewer than 30 studies we recommend wild bootstrap. [21] It's a good idea to include study-level dummies (econometric fixed effects) to filter out study-specific idiosyncrasies related to unobserved heterogeneity. The package also reports a robust F-statistic of the first-stage regression. If the F-statistic is below 10, the instrument is weak and MAIVE results should be treated with caution. Researchers may want to use confidence intervals robust to weak instruments.
The reader will object that our simulation is unfair to correction methods. The methods were designed to counter publication bias; we simulate p-hacking. Individual estimates and standard errors get biased, which is why selection models don't work well here—though they don't assume, as funnel methods assume, that selection works only on estimates (the Lombard effect discussed earlier). The distinction between publication bias and p-hacking is clear in theory, but in practice both are often observationally equivalent to the meta-analyst. (But p-hacking likely predominates. [61]) As long as we believe our p-hacking environment is broadly realistic, we need a technique that corrects the resulting bias. MAIVE is the only such technique. One can design p-hacking scenarios in which misspecifications make it almost impossible for meta-analysis methods to uncover the true mean. [58,62] If that is a realistic description of observational research, unconditional meta-analysis means are meaningless. [63] MAIVE can be extended to allow for observed heterogeneity and deliver context-specific means via incorporation into Bayesian model averaging meta-regression approaches addressing model uncertainty. [42–46]
Low-hanging fruit for future research
We leave many questions open regarding spurious precision. How common is it in practice? How does measurement error influence the relative performance of MAIVE? What happens when method heterogeneity explicitly affects both estimates and their precision? Does spurious precision help explain why meta-analyses often exaggerate the true effect compared to multi-lab pre-registered replications? [64–66] How to correctly adjust selection models for spuriousness? The last is perhaps the most important question for future research because many meta-analysts prefer selection models over funnel-based techniques. The adjustment of selection models isn't straightforward since here precision has two intertwined roles: identification and weighting. For identification, we need the reported, nominal precision, which determines statistical significance. But for weights we need the underlying, true precision. The maximum likelihood approach has to be modified to allow a different measure of precision for each role.
Bottom line
Spurious precision, while plausibly destructive, is surmounted by adjusting funnel-based methods.
References
Gurevitch J, Koricheva J, Nakagawa S, Stewart G. Meta-analysis and the science of research synthesis. Nature 2018; 555: 175–182.
Borenstein M, Hedges L, Higgins J, Rothstein H. A basic introduction to fixed-effect and random-effects models for meta-analysis. Research Synthesis Methods 2010; 1(2): 97–111.
Stanley TD, Doucouliagos H. Neither fixed nor random: weighted least squares meta-analysis. Statistics in Medicine 2015; 34(13): 2116-2127.
Egger M, Smith GD, Schneider M, Minder C. Bias in meta-analysis detected by a simple, graphical test. British Medical Journal 1997; 315(7109): 629–634.
Duval S, Tweedie R. Trim and fill: A simple funnel-plot–based method of testing and adjusting for publication bias in meta-analysis. Biometrics 2000; 56(2): 455–463.
Stanley TD. Meta-Regression Methods for Detecting and Estimating Empirical Effects in the Presence of Publication Selection. Oxford Bulletin of Economics and Statistics 2008; 70(1): 103–127.
Stanley TD, Jarrell SB, Doucouliagos H. Could It Be Better to Discard 90% of the Data? A Statistical Paradox. The American Statistician 2010; 64(1): 70–77.
Stanley TD, Doucouliagos H. Meta-regression analysis in economics and business. New York: Routledge. 2012.
Stanley TD, Doucouliagos H. Meta-regression approximations to reduce publication selection bias. Research Synthesis Methods 2014; 5(1): 60–78.
Ioannidis JP, Stanley TD, Doucouliagos H. The Power of Bias in Economics Research. The Economic Journal 2017; 127(605): F236–F265.
Bom PRD, Rachinger H. A kinked meta-regression model for publication bias correction. Research Synthesis Methods 2019; 10(4): 497–514.
Furukawa C. Publication Bias under Aggregation Frictions: Theory, Evidence, and a New Correction Method. MIT 2019; working paper.
Hedges L. Estimation of effect size under nonrandom sampling: The effect of censoring studies yielding statistically insignificant mean differences. Journal of Educational Statistics 1984; 9: 61–85.
Iyengar S, Greenhouse JB. Selection Models and the File Drawer Problem. Statistical Science 1988; 3(1): 109–117.
Hedges LV. Modeling Publication Selection Effects in Meta-Analysis. Statistical Science1992; 72(2): 246–255.
Vevea J, Hedges LV. A general linear model for estimating effect size in the presence of publication bias. Psychometrika 1995; 60(3): 419–435.
Andrews I, Kasy M. Identification of and correction for publication bias. American Economic Review 2019; 109(8): 2766–2794.
Aert vRC, Assen vM. Correcting for publication bias in a meta-analysis with the p-uniform* method. Tilburg University & Utrecht University 2021; working paper.
Abadie A, Athey S, Imbens GW, Wooldridge JM. When Should You Adjust Standard Errors for Clustering? The Quarterly Journal of Economics 2022; 138(1): 1–35.
Cameron AC, Miller DL. A practitioner’s guide to cluster-robust inference. Journal of Human Resources 2015; 50(2): 317–372.
Roodman D, Nielsen MØ, MacKinnon JG, Webb MD. Fast and Wild: Bootstrap Inference in Stata Using Boottest. The Stata Journal 2019; 19(1): 4–60.
White H. A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica 1980; 48(4): 817–838.
Bom PRD, Ligthart JE. What Have We Learned from Three Decades of Research on the Productivity of Public Capital? Journal of Economic Surveys 2014; 28(5): 889–916.
Askarov Z, Doucouliagos A, Doucouliagos H, Stanley TD. The Significance of Data-Sharing Policy. Journal of the European Economic Association 2023; forthcoming.
Lane H, Tranel B. The Lombard Sign and the Role of Hearing in Speech. Journal of Speech and Hearing Research 1971; 14(4): 677–709.
McCloskey DN, Ziliak ST. What quantitative methods should we teach to graduate students? A comment on Swann’s 'Is precise econometrics an illusion'? The Journal of Economic Education 2019; 50(4): 356–361.
Hedges LV. A random effects model for effect sizes. Psychological Bulletin 1983; 93(2): 388–395.
Hedges LV, Olkin I. Statistical methods for meta-analysis. Orlando, FL: Academic Press. 1985.
Taylor LR. Aggregation, variance and the mean. Nature 1961;189(4766): 732–735.
Stanley TD, Doucouliagos H. Neither fixed nor random: weighted least squares meta-regression. Research Synthesis Methods 2017; 8(1): 19–42.
Havranek T, Stanley TD, Doucouliagos H, et al. Reporting Guidelines for Meta-Analysis in Economics. Journal of Economic Surveys 2020; 34(3): 469–475.
Ugur M, Awaworyi Churchill S, Luong H. What do we know about R&D spillovers and productivity? Meta-analysis evidence on heterogeneity and statistical power. Research Policy 2020; 49(1): 103866.
Xue X, Reed WR, Menclova A. Social capital and health: A meta-analysis. Journal of Health Economics 2020; 72(C): 102317.
Neisser C. The Elasticity of Taxable Income: A Meta-Regression Analysis. Economic Journal 2021; 131(640): 3365–3391.
Zigraiova D, Havranek T, Irsova Z, Novak J. How puzzling is the forward premium puzzle? A meta-analysis. European Economic Review 2021; 134(C): 103714.
Nakagawa S, Lagisz M, Jennions MD, et al. Methods for testing publication bias in ecological and evolutionary meta-analyses. Methods in Ecology and Evolution 2022; 13(1): 4–21.
Brown AL, Imai T, Vieider F, Camerer C. Meta-Analysis of Empirical Estimates of Loss-Aversion. Journal of Economic Literature 2023; forthcoming.
Heimberger P. Do Higher Public Debt Levels Reduce Economic Growth? Journal of Economic Surveys 2023; forthcoming.
Carter EC, Schonbrodt FD, Gervais WM, Hilgard J. Correcting for Bias in Psychology: A Comparison of Meta-Analytic Methods. Advances in Methods and Practices in Psychological Science 2019; 2(2): 115-144.
Brodeur A, Cook N, Heyes A. Methods Matter: P-Hacking and Causal Inference in Economics. American Economic Review 2020; 110(11): 3634–3660.
DellaVigna S, Linos E. RCTs to Scale: Comprehensive Evidence From Two Nudge Units. Econometrica 2022; 90(1): 81–116.
Gechert S, Havranek T, Irsova Z, Kolcunova D. Measuring Capital-Labor Substitution: The Importance of Method Choices and Publication Bias. Review of Economic Dynamics 2022; 45(C): 55–82.
Imai T, Rutter TA, Camerer CF. Meta-Analysis of Present-Bias Estimation Using Convex Time Budgets. The Economic Journal 2021; 131(636): 1788–1814.
Gechert S, Heimberger P. Do corporate tax cuts boost economic growth? European Economic Review 2022; 147(C): 104157.
Havranek T, Irsova Z, Laslopova L, Zeynalova O. Publication and Attenuation Biases in Measuring Skill Substitution. The Review of Economics and Statistics 2023; forthcoming.
Matousek J, Havranek T, Irsova Z. Individual discount rates: A meta-analysis of experimental evidence. Experimental Economics 2022; 25(1): 318–358.
Simonsohn U, Nelson LD, Simmons JP. P-curve: A key to the file-Drawer. Journal of Experimental Psychology: General 2014; 143(2): 534–547.
Simonsohn U, Nelson LD, Simmons JP. p-Curve and Effect Size: Correcting for Publication Bias Using Only Significant Results. Perspectives on Psychological Science 2014; 9(6): 666-681. PMID: 26186117.
Assen vM, Aert vRC, Wicherts JM. Meta-analysis using effect size distributions of only statistically significant studies. Psychological Methods 2015; 20(3): 293–309.
Simonsohn U, Simmons JP, Nelson LD. Better p-curves: Making p-curve analysis more robust to errors, fraud, and ambitious p-hacking, a Reply to Ulrich and Miller (2015). Journal of Experimental Psychology: General 2015; 144(6): 1146–1152.
Aert vRC, Assen vM. Bayesian evaluation of effect size after replicating an original study. Plos ONE 2017; 12(4): e0175302.
Aert vRC, Assen vM. Examining reproducibility in psychology: A hybrid method for combining a statistically significant original study and a replication. Behavior Research Methods 2018; 50: 1515–1539.
Stanley TD. Beyond Publication Bias. Journal of Economic Surveys 2005; 19(3):309–345.
Kranz S, Putz P. Methods Matter: p-Hacking and Publication Bias in Causal Analysis in Economics: Comment. American Economic Review 2022; 112(9): 3124–3136.
Sanchez-Meca J, Marın-Martınez F. Weighting by Inverse Variance or by Sample Size in Meta-Analysis: A Simulation Study. Educational and Psychological Measurement 1998; 58(2): 211-220.
Deeks JJ, Macaskill P, Irwig L. The performance of tests of publication bias and other sample size effects in systematic reviews. Journal of Clinical Epidemiology 2005; 58(9): 882–893.
Peters JL, Sutton AJ, Jones DR, Abrams KR, Rushton L. Comparison of Two Methods to Detect Publication Bias in Meta-analysis. JAMA 2006; 295(6): 676-680.
Bruns SB, Ioannidis JP. p-Curve and p-Hacking in Observational Research. PloS ONE 2016; 11(2): e0149144.
Freedman DA. On regression adjustments to experimental data. Advances in Applied Mathematics 2008; 40(2): 180–193.
Hausman JA. Specification Tests in Econometrics. Econometrica 1978;46(6): 1251–1271.
Brodeur A, Carrell S, Figlio D, Lusher L. Unpacking p-hacking and publication bias. American Economic Review 2023; forthcoming.
Bruns SB. Meta-Regression Models and Observational Research. Oxford Bulletin of Economics and Statistics 2017; 79(5): 637–653.
Simonsohn U, Simmons J, Nelson LD. Above averaging in literature reviews. Nature Reviews Psychology 2022; 1: 551–552.
Kvarven A, Stromland E, Johannesson M. Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nature Human Behavior 2020; 4: 423–434.
Lewis M, Mathur MB, VanderWeele TJ, Frank MC. The puzzling relationship between multi-laboratory replications and meta-analyses of the published literature. Royal Society Open Science 2022; 9(2): 211499.
Stanley TD, Doucouliagos H, Ioannidis JPA. Retrospective median power, false positive meta-analysis and large-scale replication. Research Synthesis Methods 2022; 13(1): 88-108.
Putz P, Bruns SB. The (Non-)Signicance of Reporting Errors in Economics: Evidence from Three Top Journals. Journal of Economic Surveys 2021; 35(1): 348-373.
Dear Tom,
Thank you for your detailed comment and kind words! MAIVE is really based on your research, especially your seminal 2005 JoES paper -- which represents the first use of instrumental variables in meta-analysis. We should also cite your 2009 paper with Randy.
As you say, selection on standard errors (SEs) is likely to be much less common than selection on effect size estimates (Es). But even a tiny percentage of SE-selection can create troubles for current meta-analysis estimators, so MAIVE provides a robustness check. All that is needed on top of Es and SEs (which meta-analysts always collect) is the number of observations in primary studies (which meta-analysts collect almost always).
You are also right that in meta-regression…
Fantastic!
I thank Zuzana, Pedro, Tomas and Heiko for this important contribution to our ‘meta-toolbox’!
What an excellent paper, in every way. This use of instrumental variables is very clever and quite insightful. I am especially impressed by the p-hacking case and how omitted-variable bias is used to induce both types of pub’bias as well as some heterogeneity. Nice!
I admire IBHR’s discussion of ‘cheating.’ We all know it happens but tend to avoid speaking the f-word (‘fraud’) fearing that doing so will cause economists to dismiss our important meta-analytic findings. I still believe, perhaps naively, that fraud is relatively rare. Regardless, MAIVE addresses both this and less questionable research practices. Thus, we may embrace MAIVE without questioning our fello…