Clarifying Scientific Concepts Part 3: Evidence

Evidence, Empirical Evidence, and Scientific Underdetermination

It is actually quite difficult to define evidence. What distinguishes a detective who uses evidence from the scientist who uses "empirical evidence", derived from empirical research, when advancing a claim? Clearly, what counts as evidence in these domains is not entirely overlapping. In addition, there is a plethora of synonymous terms that are very often used interchangeably with "evidence", but are conceptually distinct (data, facts, etc.) that muddy the waters. There are also related concepts, such as the burden of proof and admissibility, that frequently arise when discussions involve evidence. In some contexts, these are formally established and institutionalized through rules and procedures; as it is the case in law or debate. The word is also used as a modifier. Consider something like evidence based policy or evidence based medicine; to what extent does the word "evidence" impact how each discipline is carried out? What exactly is that modifier doing to the subsequent words? There is even a branch of epistemology called evidentialism, that is primarily concerned with the relationship between evidence, justification, and knowledge. Lastly, there are even attempts to construct frameworks that grade the quality of evidence, such as the Hierarchy of Evidence. Clearly, understanding how people use this term, in particular scientists, is of significant consequence. My main focus with this section is to characterize how scientists use and reason about evidence. But I also want to bridge the gap between these different senses of the term, so I'll introduce an two authors who have had an impact on how I think about this concept.

In "Evidential Foundations of Probabilistic Reasoning", David Schum introduces the notion of a "Science of Evidence"; he recognizes the inherent plurality of the term and wants to abstract the notion across all disciplinary domains. Schum also recognizes the inherent uncertainty featured in all reasoning tasks based on evidence:

“… in any inference task our evidence is always incomplete, rarely conclusive, and often imprecise or vague; it comes from sources having any gradation of credibility. As a result, conclusions reached from evidence […] can only be probabilistic in nature.”

He also identifies and is concerned with, structural features of evidence; how it all connects together within a network of inference. Schum doesn’t pin “evidence” down with a single neat definition. He argues it’s best understood functionally; by what it does in reasoning. For him, evidence is any item of information (a trace, record, testimony, measurement, etc.) that bears on a hypothesis; its value depends on (i) relevance (how it connects to the hypothesis) and (ii) credibility (how much you can trust the item or its source). The overall inferential force (or probative weight) of an item is a joint product of those two strands.

For Schum, evidence does not exist in a vacuum; it is relational, not free floating. An item isn’t “evidence” all by itself; it becomes evidence only relative to a specific hypothesis/probandum once you supply (and defend) an inference link from the item to that hypothesis. The link from an item to a hypothesis is licensed by background generalizations (“glue”) that are often implicit and need support; in other words, relevance must be argued. He distinguishes directly relevant evidence (bearing on the hypothesis) from ancillary (meta) evidence; material about the strength of that link (e.g., source credibility or whether the generalization really fits this case). “Evidence” is information put to work in support of a hypothesis, with its relevance and credibility argued. An item becomes evidence only when embedded in an argument that (1) states the hypothesis at issue, (2) shows relevance by supplying the generalization that links the item to that hypothesis, (3) supports credibility (often with ancillary evidence), and (4) assesses inferential force given how the item interacts with the rest of the mass of evidence.

Schum is explicit that a report about an event and the event’s actual occurrence are not the same; you must infer from the report to the world, and that inference is always uncertain. This is true of domains involving measurement as well; our measure of the thing is not the thing itself. Hence his insistence that all evidential reasoning is, in the end, probabilistic due to this uncertainty. Relevance is the logical/inferential link from an evidential claim to the hypothesis and Credibility concerns source reliability. With testimony, he decomposes credibility into veracity, objectivity, and observational sensitivity (were they truthful, unbiased, and in a position to observe?), but credibility standards are used beyond the legal realm (consider a scientist questioning the mechanism by which data was collected). Much of what we call “evidence” is actually evidence about other evidence; material that bears on a witness’s credibility or on the soundness of a measurement process. That ancillary layer is often what lets you evaluate the force of the directly relevant items. He often uses the likelihood ratio as a convenient gauge of an item’s inferential force and shows structurally-driven phenomena like inferential drag (links in a chain weaken force), redundancy, and synergy when items combine. But the broader point is structural: you can analyze evidential force even when precise frequencies are unavailable. The key point about structure is that basic configurations of evidence combinations have differing degrees of inferential force. Consider two witnesses claiming to report some event. If we find one of them is wrong, it might weaken our credibility in the other witness if we find out they someone collaborated before reporting the account. In other words, there is structural collapse which leads to a non-linear reduction in inferential force (redundancy). The reverse can be true as well; one piece of information can amplify the inferential force of a collection of evidence (synergy). An item can also be clearly relevant (it bears on the hypothesis) and its source fully credible, yet still move the needle only a little. Fundamentally, the probabilistic assessment of evidence rests on these more primitive notions of relevance and credibility.

A bit more about the notion of inferential force; this is the diagnostic strength of an item given a stated hypothesis versus its rivals. As mentioned before, Schum gauges this with likelihood ratio, but this is not necessary, and scores force in a similar way to how statisticians score posterior inference with Bayes Factors. An item can have a likelihood ratio near one, and hence little inferential force, despite being relevant and credible. Similarly, an item can have very extreme values for likelihood ratios but carry little weight if the credibility and relevance are called into question. Here are just a few ways Schum describes how this can occur:

  • Low diagnosticity: A careful, honest witness saw the suspect “in a dark hoodie”—a description that fits many people. Credibility is high, relevance is clear, but \(P(E\mid \neg H)\) is also high, so likehood ratio is small.
  • Chains = inferential drag: When E supports H only through several intermediate links (\(A \to B \to C \to H\)), each link’s uncertainty compounds, typically reducing net force (“inferential drag”).
  • Dependence & redundancy: Two “independent-looking” reports may trace back to the same primary source. The second then adds little; combining them yields less force than naïvely multiplying independent likelihood ratios. (Conversely, truly independent items can show synergy.)
  • Ancillary constraints on the Likelihood Ratio: Ancillary (meta) evidence about the measurement/testimony changes the likelihoods (e.g., false-positive rates, viewing conditions), which can push force up or down without altering surface relevance.

Let's transition from Schum to another influential thinker on the topic of evidence. While Schum was a legal scholar at the bridge between law and artificial intelligence, Peter Achinstein was a prolific philosopher, writing extensively in philosophy of science. While philosophers don't determine how scientists actually reason in practice, I do find some of them to be quite illustrative and insightful. Achinstein attempts to characterize this in his book "The Concept of Evidence". Similar to Schum, he does not give a single definition of evidence. He distinguishes four concepts of evidence, making one ("potential evidence") the base for these others:

  • Potential Evidence: a true statement e, together with true background b, is potential evidence for hypothesis h only if (i) e doesn’t entail h, and (ii) given e & b, it’s probable that there is an explanatory connection between e and h (Achinstein formalizes this with an “objective epistemic” probability >½).
  • Veridical Evidence: Strong VE requires that (1) e is PE for h; (2) h is true; and (3) there is an explanatory connection between e’s truth and h’s truth. (He also discusses a weaker VE that drops (3), but argues scientists should want the strong form to avoid “misleading” evidence.)
  • ES-evidence (Epistemic Situation): e is true and anyone in a specified epistemic situation is justified in believing that e is (probably) VE for h.
  • Subjective evidence: at time t, agent X believes e is (probably) VE for h, and X’s reason for believing h (is true/probable) is that e is true.

Achinstein rejects mere "positive relevance" alone as adequate, grounding his account in the explanatory connection requirement. I think his account is similar to Schums. For both, "evidence" is relative to a hypothesis. There also has to be some sort of linking glue. Achinsteins explanatory connection plays a role like Schums generalizations that justify the relevance link from an item to hypothesis. I think both agree that this link must be argued, its not something obvious. I think the major difference between the two (arguably due to their professions), is that Achinsteins VE requirement depends heavily on this heavy notion of truth, while Schum considers that an item can be functional within an inferential task despite being mistaken or noisy; directly building in this notion of credibility. Achinstein also does not consider likelihood ratios or combination effects, because he does not have a notion of inferential network like Schum. Achinstein gives normative, truth-leaning definitions keyed to explanatory connection and objective epistemic probability (with “veridical evidence” as the scientific gold standard), whereas Schum gives a process/structure-first account where an item’s status depends on argued relevance, assessed credibility, and the net inferential force it contributes within an evidence network.

Generally, I think Achensteins account is weaker for practitioners, but I do like his explicit recognition of background context; it often contains "ancillary items" recognized by Schum. His non-entailment condition also seems to capture Schums idea of the non-deterministic nature of evidence; one piece of evidence can support multiple, often conflicting hypotheses. Achinstein asks: Is there (probably) a real explanatory connection, and is it (in stronger notions) actually true? Schum asks: Have you argued the link (relevance), can you trust the item (credibility), and how much does it move the odds (force)? The main point is that most evidential links are uncertain and context-sensitive, they are essentially defeasible inferences. These defeasible links can be targeted with critical questioning; these questions address these more basic considerations: relevance, credibility, and explanatory connection. Since I always bring up Douglas Walton in my posts, I'll continue that tradition here. Walton tells us that evidence isn’t a freestanding object; it’s a move in a normative exchange that involve dialectical rules about what is and isn't an acceptable move. Walton makes the procedural part explicit: relevance, weight, and even admissibility are dialogue-governed. This connects his pragmatic approach to the broader topic of evidence; evidence is a premise (or set of premises) that gives presumptive, defeasible support to a claim within a regulated dialogue, and whose strength is assessed by asking the right critical questions. Walton's critical questions are concretizations of the examination of warrants and relevance in an inference task. The dialectical aspect of this is interesting, because these meta-norms of discussion can influence what is considered relevant and the burden of proof. If you can answer the critical questions satisfactorily, the argument keeps its presumptive force, but it's always subject to revision because evidential reasoning is incremental and contextual.

Okay, now how does all of this connect to "Science"? I think to some extent, Schum captures the core logic. Scientists are essentially doing some variation of what Schum, Achenstein, and Walton are describing; but most likely won't use the terminology introduced by these thinkers. Scientists don’t always say “this is evidence” — they talk about data, results, signals, effects, fits to a model, p-values, confidence intervals, posterior distributions, replications. But under the hood they’re doing recognizable evidential work:

  • They start from questions/hypotheses/models. Even in exploratory work, there’s at least a background model (“these genes might co-express,” “this detector should see X events”). That gives evidence something to be for. This maps nicely to Schums "evidence is not free floating" concept and also captures relevance.
  • They produce data via instruments or observations. That’s the raw material; but nobody sensible treats raw data as already “evidence.” This collection step is often a source of critical questioning, mapping to Schum's notion of quality.
  • They process/clean/model it. This is where measurement error, instrument calibration, and statistical assumptions come in.
  • They interpret it relative to rival explanations. “Does this pattern support model A over model B?” “Does this reject the null?” “Does this effect replicate?”
  • They document uncertainty (standard errors, likelihoods, Bayes factors, upper bounds).
  • They bring in meta/ancillary info (instrument logs, sample provenance, blinding procedures, preregistration, replication studies, peer review).
  • They value replication and reproducibility as communal credibility checks: “Can someone else’s instrument get the same item?” That is ancillary evidence writ large.

Scientists constantly argue for relevance, even if they don't explicitly use that word. For example, in an experiment someone might state “If the drug truly lowers blood pressure, the treatment group mean should be lower than control.” That’s the relevance link; they're appealing to a causal generalization based on perhaps an implicit understanding of the dynamics in that scenario. In theoretical domains, this relevance criterion comes explicitly from the theory (or implied by the theory). The theory tells you what data to look for, and legitimizes the data as evidence based on the theoretical framework. No matter what the domain, Schum-style "generalizations" are used to connect an item to a hypothesis. Inferential force is often more nuanced in each domain based on their methods (the type of modeling), but is fundamentally about comparing competing hypotheses. For example, in economics, establishing a causal connection by convincingly settling disputes about directionality of effect (\(X \to Y\) instead of \(Y \to X\)), is considered massive inferential force (and might actually win you the Nobel Prize).

I'd like to make a few caveats, because obviously no single definition can encapsulate all scientific inquiry. In exploratory data analysis, systems biology, or ML-heavy science, people sometimes say “let’s see what the data say” before fixing a sharp hypothesis. Evidence, for Schum, is always evidence-for. Exploratory science often parks in a pre-evidential phase: it’s producing candidates for evidence. It does not start with H, it determines which set of H's should be considered based on features of the raw information. This is called "Data Driven Science". Some actually argue this is more of a subset of engineering rather than "pure science" (see the demarcation issue above). I don't care to make the distinction. Once the plausible set of H's are identified, its pretty much business as usual after that. Another potential issue (perhaps biased by his particular discipline), is that Schum is very item- and argument-focused. Science has a social dimension to confirmation: independent labs, convergence of multiple methodologies (experiment, simulation, field data), long-run track record of an instrument platform etc. That “social corroboration” can be thought of as Schum-style ancillary evidence from independent sources, but in practice scientists sometimes treat it almost as a separate epistemic good (“multiple independent labs found this”... institutional trust). That’s bigger and messier than one neat inference network. Nevertheless, Schum's inference networks in principle could encapsulate this, but in practice it's very likely computationally intractable and the evidence landscape will probably dramatically shift before the analyst can finish computing inferential force.

Comments

Popular posts from this blog

Michael Levin's Platonic Space Argument

Core Concepts in Economics: Fundamentals

Self Reinforcing Beliefs