Clarifying Scientific Concepts Part 6: Hypothesis

Hypothesizing and Confirmation

The term "hypothesis" is frequently bastardized. Confirmation, and its counterpart disconfirmation, are also incredibly misunderstood by the general public. A hypothesis is pretty much just a testable guess, normally derived from a theoretical framework. It is a specific, testable statement about what you expect to happen. It's a prediction about reality that you intend to check with evidence. For example, "If plants are given more light, they will grow faster"; this is a hypothesis, it can be wrong but can definitely be tested. It’s different from an axiom (assumed true), a conjecture (unproven mathematical guess), or a proposition (any statement that is true or false but not necessarily testable), in that it directly connected to the idea of testability and should have properties such as verifiability and falsifiability. This normally implies the phenomena referenced by the hypothesis is measurable, and is therefore directly or indirectly observable through empirical data. In other words, hypotheses should be operationalizable, not merely verbal statements. The hypothesis must be expressed in a level of precision necessary to implement some test, and if a hypothesis is not amendable to this, its not testable in a way that can discern its likeliness.

I'd like to follow this with a few caveats. Scientific practice is often messy, and not defined by one thing such as falsification. Very often, strict falsifiability is not feasible. In practice, this might restrict the applicability of certain testing procedures, such as statistical testing. As Richard McElreath writes in Statistical Rethinking:

Science is not described by the falsification standard, and Popper recognized that. In fact, deductive falsification is impossible in nearly every scientific context.

  • Hypotheses are not models. The relations among hypotheses and different kinds of models are complex. Many models correspond to the same hypothesis, and many hypotheses correspond to a single model. This makes strict falsification impossible.
  • Measurement matters. Even when we think the data falsify a model, another observer will debate our methods and measures. They don’t trust the data. Sometimes they are right. (in addition to issues such as false positives and false negatives, observation error)

So in other words, the scientific method is not reducible to a statistical procedure. Statistical evidence is nevertheless an important feature of the process, and statistical methods can relate hypotheses to data, but they are not sufficient.

We will talk more about this later, but since modern science relies so heavily on procedures from statistics, its impossible to fully conceptually separate hypothesis from statistical inference. There are two concepts that frequently occur in the context of reasoning about hypotheses; confirmation and disconfirmation. Remember that a hypothesis makes a prediction about something; in other words, if the hypothesis were true, we would expect to observe something implied by that hypothesis. These observations are typically encapsulated by a probability distribution, and therefore are described by likelihoods. We have some hypothesis H, and we show that it entails some observation D. If we look for D and don't find it, we must conclude that H is false. However, finding D tells us nothing certain about H, because other hypotheses can also predict D. This is why we invoke the notion of likelihoods. If we observe D, we can't be certain that H explains D; but if we measure relative likelihoods, we can find that H is most probable relative to alternative hypotheses. This type of reasoning is central to understanding how scientists reason under uncertainty.

I'll briefly introduce the idea of Bayesian confirmation. The core idea is that a hypothesis wins credit when evidence was more likely if the hypothesis were true than if it weren't. If seeing E is more expected under H than under “not-H,” then E confirms H. The stronger the shift, the stronger the confirmation. (Formally: confirmation strength is approximately the ratio P(E | H) / P(E | ¬H).) Evidence confirms H when P(E | H) > P(E | ¬H) and disconfirms when P(E | H) < P(E | ¬H).

Evidence doesn’t “prove” a hypothesis; it shifts how credible it is. Many different stories can fit the same facts. What matters is which story makes those facts more expected than rival stories. The same observation can match multiple hypotheses, this is the idea of underdetermination. Also, analysis choices matter. What you count, how you measure, which model you use, and when you stop collecting data can all tilt the result without changing the raw facts. Instead of "proof", you should think of "support"; how much evidence tips the scales relative to alternatives, not in isolation. Vague hypotheses also carry little weight; if almost anything you observe feels like confirmation, the statement doesn't discriminate against anything; specific predictions force real tests. Here are a few questions to ask yourself when evaluating a hypothesis:

  • What are the live alternatives? What else could explain this?
  • What did each hypothesis specifically predict? (Before seeing the data)
  • Would this result have surprised the rival more?
  • What would disconfirm your hypothesis?
  • Did we tune our analysis after seeing results?
  • Does this hold in new data or by a different method?
  • Would those alternatives have expected this result as much as your hypothesis does? If your hypothesis makes the result less surprising than the alternatives, that’s good support. If lots of stories would’ve predicted it, it’s mild at best.

This implies a set of red flags. This includes no explicit alternatives considered, vague hypotheses that predict almost anything, only in-sample success, no preregistration or out-of-sample tests, and heavy reliance on one method/measure with no replication by an independent approach. In science, evidence doesn’t prove; it compares. A result raises or lowers our confidence depending on how expected it was under your hypothesis versus the alternatives. Vague claims make everything look like confirmation; specific, risky predictions make support meaningful. Real progress comes from putting rival ideas on the table, making predictions that could fail, and seeing which idea makes the world’s surprises least surprising.

Evidence that matches the hypothesis is confirming evidence. But confirmation is graded, not all-or-nothing. Some evidence slightly raises confidence, some strongly raises it. A single confirming datapoint means almost nothing unless compared against alternatives. Only seeking confirmation is dangerous. If you only look for confirming evidence, you’ll always find it — because most weak or vague hypotheses can “explain” almost anything after the fact. By searching for anything that confirms the hypothesis, you have not learned anything; you've just collected agreeable anecdotes. Real confirmation involves three disciplines; predictive specificity, active search for disconfirmation, and total evidence (not cherry-picked bits). A single confirming case is cheap, what matters is the entire pattern including: how often the prediction fails vs. succeeds, whether alternative hypotheses explain the same data better, and whether you saw not only what you expected — but didn’t see what would have refuted it. Confirmation raises your confidence in a hypothesis but only when you’ve given disconfirmation a fair chance to happen. The total balance of evidence is what matters.

Comments

Popular posts from this blog

Michael Levin's Platonic Space Argument

Core Concepts in Economics: Fundamentals

Self Reinforcing Beliefs