Meta-analysis: Meaning and Key Concepts

Gene Glass coined the term meta-analysis to describe an empirically-based research method, which synthesizes research findings from numerous empirical studies. In short, a meta-analysis is a synthesis of results of many researchers about the field or topic of interest.

Meta-analysis had its beginning in the social science literature, but its applicability extends to behavioral and physical sciences research and to any discipline where individual study findings are too meager to test a theory. Meta-analysis can address policy issues. It has also been a popular research methodology.

Meta-analysis is related to the review of related literature presented in research reports. What makes it different from an ordinary literature review is that it is more rigorous and exhaustive and requires the original empirical data or summaries, such as means, standard deviations, and correlation co-efficient.

While a literature review simply reports the results of a study as significant or not, meta-analysis requires statistical analysis of original data from the studies being integrated. The real strength of meta-analysis lies in its ability to relate conditions that vary across studies to outcomes. For example, Gene Glass and Mary Smith made a meta-analysis of 375 psychotherapy outcome studies and calculated 833 effects. They found a mean effect size of 68 which indicates that the average treated group was two-thirds of a standard deviation better than its control group. Furthermore, 88% of the effects were positive, showing that most treatment groups exceeded their respective control groups on all kinds of outcomes.

Quantitative Methods and Meta-analysis

Quantitative meta-analysis employs quantitative methodology similar to that used in the primary researches that are being integrated. Statistical significance and estimation of effect size provide summaries of study in quantitative integrated reviews. As pointed by R. Rosenthal, the general relationship between tests of significance and effect size is given by the relation: Test statistics is a product of size of effect and sample size.

Effect is determined by dividing the control and experimental group difference by the standard deviation of the control group (the standard deviation being presumed to have been unaffected by treatment). The result is similar to a Z score. This results in standardized measures of effect for comparability of results across studies. The information from each study is presented as the number of standard deviations by which the experimental group exceeds the control group. Estimation of effects is difficult if standard deviations and means are not available. One course of action is to write the authors and request for these data. The other alternative is to estimate effects from other statistics presented. A method of estimating effects, given the t value and the sample sizes of the control and the experimental groups (assuming that the variance of the control group is unaffected by the treatment) is given by Rosenthal and Rubin:

Effects may also be computed from reported correlation coefficient, but there is a need for transformations to produce comparable correlation statistics.

Other standard quantitative techniques used in meta-analysis include: traditional vote counting, methods for testing the statistical significance of combined results and statistical methods based on vote counts, omnibus combined significance tests, Rosenthal’s fail-safe number, and the possibility of combining raw data, and testing variation among effect sizes, analogues to the ANOVA and regression analysis for effect sizes, and the use of conventional statistical methods like ANOVA and regression analysis with effect sizes or correlations. Estimators of effect size may be adjusted for sources of bias, and correlations may be transformed to standard mean differences.

R. Rosenthal provides dear explanations of how to conduct tests of differences among research results. These include methods for research results represented as effects of magnitude, as well as those represented as p-value or significance level (that is, terms of omnibus procedures for testing differences among the results of three or more studies, as well as procedures for testing specific contrasts among research results), procedures for combining estimates, and standard errors for optimally weighted estimates.

It must be noted that research integration does not have to be solely quantitative (that is, the use of quantitative such as tests of combined significance) or qualitative (that is, the use of purely narrative procedures) because it might be necessary to combine quantitative and qualitative information such as narrative information in quantitative studies, case studies, expert judgment, and narrative research reviews.

H. Cooper delineates five stages in doing a meta-analysis, namely, 1) problem formulation (that is, deciding about what questions or hypotheses to address and what evidence needs to be included in the review), 2) data collection (that is, specification of procedures to be used in finding relevant evidence), 3) data evaluation (that is, deciding about which of the retrieved data should be included in the review), 4) analysis and interpretation (that is, selection of procedures for making inferences about the literature as a whole, and 5) public presentation (that is, deciding what information should be included in the report of the integrated review). On the other hand, R. Light and D. Pillemer give the following strategy in doing a meta-analysis: 1) formulation of the precise question, 2) exploration of available information, 3) selection of studies, 4) determination of the generality of conclusions, and 5) determination of the relationships between study characteristics and study outcomes.

H. Cooper suggests the following basic structure in writing the research report of a meta-analysis: 1) introduction, 2) methods, 3) results, and 4) discussion. These are actually the basic sections of primary research reports.

Validity, Reliability, and other Issues

Threat to validity may arise from nonrepresentative sampling, subjective decisions that can lead to procedural variations that can affect the outcomes of the research review, and the “file drawer” problem in combined significance testing. The file drawer problem has something to do with the effects of selective sampling in doing an integrative research.

Studies that report larger effects or more statistically significant results are more likely to get published. If these studies are sampled in an integrative review, the effect of this selective sampling will seriously distort the conclusions of the integrated review. Mary Smith, for example, reported that published journal results in a meta-analytic study of sex bias in counseling differed from dissertations, with journal results showing bias (average effect of .22) and dissertations showing the opposite (-.24). R. Rosenthal also mentioned about these drawers being filled with studies of no significant difference. He provides a procedure for determining the number of null results that would be necessary to overturn the conclusion, based on a significant finding from a combined-significance test. If only a few unretrieved null results could reduce the combined significance test result to insignificance, then the file drawer threat must be seriously entertained as a rival hypothesis. If the number of null results required is implausibly large, the finding is robust against the file drawer threat.

Another problem confronting meta-analysts is the “apples and oranges” problem. This refers to the inadvertent comparison of studies that are not comparable. Gene Glass suggests inclusion of all research bearing on the topic of interest, carefully categorizing it so that comparisons among various categories will yield important differences in quality should they exist.

Experts differ in their opinion regarding what to include in a meta-analytic study. R. Light and M. Smith suggest stiff criteria for inclusion of research in meta-analysis. Other scholars, such as Gene Glass, insist on including all relevant literature so that statistical analysis can assist in decisions about the use of various classes of studies. V. Wilson and Putnam found a large and consistent difference between randomized and nonrandomized studies of pretest sensitization, which lead them to ignore nonrandomized studies in further meta-analyses. The experimental and logical evidence for pretest effect was lacking in the latter studies. On the other hand, M. Smith and G. Glass found no differences between randomized and nonrandomized psychotherapy outcome studies; hence, they aggregated the two in their latter syntheses.

Criticisms of Meta-analysis

R. Rosenthal gives six classes of criticisms of meta-analysis: those that concern sampling bias, the loss of information inherent in meta-analysis, heterogeneity of method or of study quality, problems of dependence between and within studies, the purported exaggeration of significance in meta-analysis, and the problem of determining the practical importance of effect size.