“For the past few years, there has been a growing crisis in social science, and in biology, medicine, and other statistics-dependent fields, that many claimed research findings are fragile, are unreliable, cannot be replicated, and do not generalize outside of the lab to real-world settings … Arguably the crisis is most pronounced within psychology …”
Thus begins an excellent paper by Andrew Gelman entitled, “The Connection Between Varying Treatment Effects and the Crisis of Unreplicable Research: A Bayesian Perspective.
Gelman makes the point that we (as humans) have unrealistic expectations of what we are “supposed to” get out of the scientific process — namely, we want “facts” and “truth”.
(All “bold” is mine.)
… it is natural for people (including scientific researchers, including ourselves) to feel that the world should be real with clear objective properties … We feel that science should give us facts because that is how we want it to work, and we are unsatisfied when the world does not conform to our obsession with using modern tools for dicing up the world into individualized knowledge-objects (or, as we say, “stylized facts”).
But science cannot give us “truth” in the idealized way in which we want it. All we have are theories that have currently stood up to the tests we’ve thrown at them. But today’s champion can easily become tomorrows loser.
The problem is that the process of showing a theory or hypothesis to be false is harder in the social sciences because of the inherent complexity (literally: complexity theory) of human social interactions. Society is an emergent property of the humans that make it up.
Here are two points Gelman makes that are central to the problem any social scientist faces in practice (quote):
- Psychological and social processes show much more variability than the usual phenomena in the physical sciences. The strength of an iron bar might be calculable to a high degree of accu- racy from a formula derived from first principles, but there will not be anything like the same precision when predicting prices, political opinions, the contact patterns of friends, and so on.
Even to the extent that patterns can be discovered from social data (for example, the Yerkes- Dodson curve in psychology, the Phillips curve in macroeconomics, the seats-votes curve in politics, and various predictive models of consumer behavior), these patterns can and do change over time, and they can look different in different countries and for different groups of people. Social science even at its best is contingent in the sense that physical models are not.
Even in the hard sciences things aren’t so cut and dry. Falsifying a hypothesis is often quite tough, requiring multiple studies — after all, a study whose hypothesis is “your hypothesis is false” is itself subject to falsification! Popper discusses this problem at length (despite the prevailing — false — view that he didn’t!).
More from Gelman:
These concerns relate to the problem of nonreplication of social science research. Instead of focusing on ways to procure the certainty often attributed to the hard sciences, perhaps we as social scientists can address the nonreplication problem by changing what we want to get out of our research, accepting that we can gain knowledge without the certainty we might like, either for individual predictions or for the larger patterns we seek to discover. If effects are different in different places and at different times, then episodes of nonreplication are inevitable, even for very well-founded results.
That’s the kind of philosophical skepticism I can get behind!
It’s a profoundly arrogant thing to assume that we human primates are capable of knowing “The Truth”. Whether or not such a truth exists is besides the point that our methods for finding it are flawed.
Being humble about how secure we should be in our knowledge is the only intellectually honest way to go.
It’s also a good step towards more accurate statistics!
Once we realize that effects are contextually bound, a next step is to study how they vary.
Get Bayesian on that Biatch
Gelman suggests a more Bayesian approach:
Bayesian analysis uses the prior distribution, allowing the direct combination of different sources of information, which is particularly relevant for inferences about parameters that are not well estimated using data from a single study alone.
Bayes plays well with uncertainty: The posterior distribution represents uncertainty about any set of unknowns in a model.
Here are some basic problems that need fixin’ (paraphrased):
- A statistical hypothesis is NOT the same as a scientific hypothesis — and rejecting the first does not (necessarily) negate the second.
- Measurement error always exists.
- All models have “error terms” — “the same measurement taken on two people will give two different results.”
- Varying effects are tacitly accepted because of the existence of these error terms in the models.
- This forces us to move away from an induction approach to science — trying to prove a general law from a set of individual examples/studies — to a more deductive approach.
Correlation vs Causation
In other words, the social sciences in question need better ways to detect REAL causation and not just over-hyped correlations.
Gelman gives an example of how common it is to use causal words to describe mere correlations:
The paper in question features a bunch of comparisons and p values, some of which were statistically significant, and then lots of stories. The problem is that there are so many different things that could be compared, and all one sees is some subset of the comparisons. Many of the reported effects seem much too large to be plausible. And there is a casual use of causal language (for example, the words influenced, effects, and induced) to describe correlations.
The point being:
In criticizing this study, we are not saying that its claims … are necessarily wrong. We are just saying that the evidence is not nearly as strong as the paper makes it out to be.
In summary, there are several reasons that Bayesian ideas are relevant to the current crisis of unreplicable findings in social science. First are the familiar benefits of prior information and hierarchical models that allow partial pooling of different data sources. Second, Bayesian approaches are compatible with large uncertainties, which in practice are inevitable when studying interactions. Interactions, in turn, are important because statistically significant but unreplicable results can be seen as arising from varying treatment effects and situation-dependent phenomena … Finally, hierarchical Bayesian analysis can handle structured data and multiple comparisons, allowing researchers to escape from the paradigm of the single data comparison or single p value being conclusive.
Does this mean that social science is hopeless? Not at all. We can study large differences, we can gather large samples, and we can design studies to isolate real and persistent effects. In such settings, Bayesian inference can help us estimate interactions and make predictions that more fully account for uncertainty. In settings with weaker data and smaller samples that may be required to study rare but important phenomena, Bayesian methods can reduce the now-common pattern of researchers getting jerked around by noise patterns that happen to exceed the statistical significance threshold. We can move forward in social research by accepting uncertainty and embracing variation.
Now go lift something heavy,
ps. The pic at the top is of our member ED deadlifting in one of our T-Shirts that says, “My BMI is Bigger than Yours.” It’s a joke he used to say, so we made it a shirt. Given how idiotic BMI is as a measure of anything important, I thought it relevant.