Clarifying Scientific Concepts
There is a lot of confusion, propagated by various media sources, around fundamental scientific concepts and terminology. Colloquial uses of terms such as "theory" or "hypothesis" tend to distort the scientific usage of these terms. Scientific concepts can become trivialized as well; "your 'theory' is just as good as my 'theory'". I want to clarify some of this terminology because constant misuse simply confuses everyone, making it harder to distinguish between competing sources of information on social media platforms.
Science and Pseudo-Science
Demarcating science from non-science is quite difficult. There are obvious exemplars of pseudo-science and many prototypical examples of real science but there haven't been any necessary and sufficient conditions identified that can be used to categorize any particular example into a clearly defined bucket. Nevertheless, there are common features shared across many disciplines we deem scientific. These attributes form clusters; examples on the peripheries become harder to classify because they share features with canonical examples of pseudoscience. If you are familiar with Wittgenstein's notion of family resemblance then this might make sense to you. You can also think about it in terms of network clustering. In the diagram below, you can think of the edges between nodes as defining some common relation and distance measuring some degree of closeness.
A two dimensional view might look something like this:
In the middle of the large cluster we might consider a discipline like physics and towards the periphery of the cluster we could consider something like psychology, sociology, and economics. The green cluster could represent a pseudo-scientific category containing things like intelligent design. The thing to note with both visualizations is that there aren't a global set of definable features that can distinguish any two disciplines. We cannot construct a list containing all of the essential features that could exclude non-sciences without running into problems. For example, a pillar of modern sciences is the experiment. Theoretical physics, however, typically does not conduct experiments. Does this mean we exclude it from being a science? That would be absurd. Similarly, Geological sciences typically do not engage in any a-priori theorizing. A common practice of modern science is establishing some sort of theory to explain observations. Does this mean geology is not a science? That too, would be absurd.
These examples immediately make it obvious that, when discussing what is science, we have to consider that "science" is a term referring to a broad class of related types of disciplines. Something can be more or less scientific, exemplifying the fact that "science" is somewhat of a graded concept. This implies that there are qualities or properties by which we can evaluate any particular knowledge claim as science or not. I've shifted away from classifying disciplines as a whole to individual claims because we might run into the same problem when attempting to classify an entire discipline as scientific. For example, there are many knowledge claims coming out of the psychological literature that do not exhibit properties we consider scientific. It's also possible that for any given knowledge claim, some of the properties might not be exhibited. Nevertheless, this does not imply that all claims coming out of psychology are pseudo-scientific. This is also true of some of the more canonical examples of science. Therefore, we must consider the rate at which claims originating from any discipline exhibit scientific qualities. This will prevent us from a-prior labeling knowledge claims as pseudo-scientific simply based on the discipline they come from.
Here is a list of some qualities I think that can be used when considering whether a claim is scientific. I do not claim this list to be exhaustive. Also, the order does not matter at this point. This is not to say that all qualities are of equal importance. I think that would be false.
- Makes predictions or retrodictions
- Is testable
- Is replicable
- Systematically records observations
- Capable of verifiability and validation
- Acknowledges the boundaries of its explanatory breadth
- Self corrective and reflective
- Maintains are fair degree of precision and clarity with its terminology
- Is falsifiable
- Has an empirical basis
- Is reproducible
- Is internally coherent and logically consistent
- Strives for impartiality and objectivity
- Has a sufficient degree of generalizability
- Can be subject to scrutiny within a broader community of peer review
- Uses concepts that can be measured or quantified
- Subject to revision in light of new findings
- Is transparent with it's methodology
- Is rigorous
- Seeks disconfirmation along with confirmation
- Is communicable
- Seeks to provide causal explanations
- Highly critical during design and analysis phase
- Critically assess methods, assumptions, and interpretations of results
- Uses mathematical models and simulation methods
- Seeks simple and robust methods
- Leverages probabilistic reasoning and acknowledges the uncertainty of it's conclusions
- Does not rely on authority unless it's a claim that is taken to be true within the community
- Considers alternative hypotheses
- Seeks convergent validity through multiple information sources and different methods
There are probably more I am not considering but I think this is not a bad start. I am partial to mathematical modeling but acknowledge that there are disciplines such as Anthropology that are scientific but might not emphasize mathematical modeling. Also, not all knowledge claims must come from a mathematical model. Nevertheless, scientific disciplines tend to use models because they help us check our assumptions against reality. Another consideration is that a claim might not be testable at this point in time, but technological innovations in the future can make it become testable. The fact that something isn't immediately testable due to technical constraints does not make it unscientific. I would say that it's unscientific if, in principle, it cannot be tested. If there is no conceivable way to test the claim, then it is not testable. Again, these are qualities that claims should strive for if we want to consider them scientific.
Throughout the rest of this blog post I'll touch on some of these concepts. I just wanted to initially get this out of the way because many people are confused about which claims are genuinely scientific.
Theories and Scientific Theories
I am not going to focus on any particular theory. I just want to consider in general, what it means to theorize in a scientific setting, how this activity differs from something like philosophical theorizing, and how both activities are quite different from how the public understands the term.
In the broadest sense of the term, theory is a structured way of understanding, interpreting, or explaining phenomena. It provides a conceptual framework, a network of ideas that helps us make sense of observations, connect patterns, and predict or interpret outcomes. Theorizing is something humans do all the time; often when you are trying to explain something, you are assuming some underlying theory (although its normally implicit and not fully structured). Theorizing in the broadest sense, is any process of pattern finding, meaning making, or framework building. It’s the creative and interpretive act of connecting ideas into a coherent picture — whether the “data” are experiments, emotions, social behaviors, or symbols.
In science, theorizing takes on specific methodological and epistemic constraints. Scientific theories must be testable, falsifiable, and consistent with empirical data. These are often formalized, expressed mathematically, aimed at predictive power. So while all scientific theories are theories, not all theories are scientific. Science narrows the broader act of theorizing into a disciplined method: empirical, systematic, and verifiable. In philosophy, theorizing is often about conceptual analysis rather than empirical testing. Philosophical theories often deal with abstractions more removed from empirical reality; it is not connected to experimental methods but rather focuses on logical entailment. It might deal with concepts like possibility and necessity. You might be eager to claim that science deals with these concepts as well. You'd be correct, certain scientific theories entail the possibility and impossibility of various empirical outcomes. Philosophical possibility is much broader, consisting of what is logically possible; in other words its theories are "metaphysical". So you can think of scientific and philosophical theorizing as specialized, formalized subsets of the larger, more universal human capacity to theorize — just like poetry and mathematics are specialized ways of using language.
There are common components to all theories, regardless of how fleshed out the theoretical details.
- Concepts: the basic building blocks of a theory, they name and define the phenomena being discussed. For example, "gravity" in physics, or "motivation" in psychology. Concepts are abstractions, they simplify reality so we can think systematically about it.
- Construct: A type of concept that has been deliberately defined for a specific theoretical purpose. Constructs often can’t be directly observed but are inferred (e.g. “intelligence,” “social capital,” “self-esteem”).
- Propositions: These are statements that state the relationships between concepts, how one thing effects or relates to another. In formal sciences, these are hypotheses; in philosophy or critical theory, they may be argumentative claims. A well defined scientific theory generates testable hypotheses amenable to falsification.
- Assumptions: These are the underlying ideas or conditions taken for granted for the theory to work. For example, in Economics we often assume humans are rational decision-makers. Making assumptions explicit is key to understanding the scope and limits of a theory.
- Boundaries and Scope Conditions: This is the "where and when" of a theory, what domain or context it applies to. For example, a psychological theory may explain individual behavior, not group dynamics.
- Logical Structure: This is the theories internal organization, how its pieces fit together coherently and systematically. A good theory has internal consistency and avoids contradictions.
- Empirical Linkages: This is how the theory connects to observation or experience. Theory entails certain observations, these are the predictions. In science, this means operational definitions and testability.
- Observation or Problem Identification: It starts with noticing a phenomenon, inconsistency, or puzzle. “Something interesting is happening here — why?”
- Conceptualization: Identify key elements and name them. Define concepts clearly and delimit what you’re focusing on.
- Relationship Mapping: Propose how these elements relate. In science, this becomes hypotheses or models. In philosophy or social theory, this becomes conceptual arguments or dialectical relations.
- Integration and Abstraction: Bring multiple relationships together into a systematic framework. The theory begins to generalize — it becomes more than a list of observations.
- Validation or Evaluation: In science → testing with data, replication, falsification. In interpretive or critical theory → coherence, explanatory depth, ethical and practical adequacy.
- Refinement and Extension: Theories evolve as new evidence or perspectives emerge. This is the “living” nature of theory — it’s continuously reshaped.
I've been reading a lot from Paul Smaldino recently, and think his description of theory is incredibly useful. Paul Smaldino doesn’t offer a single, neat “textbook” definition of theory in the way a philosophy-of-science treatise might, but across his writings we can reconstruct how he treats and uses theories. From his published work (on modeling, methodology, philosophy of science), Smaldino’s view of theory includes the following aspects:
- Decomposition into parts, properties, relationships, and dynamics: In “How to Build a Strong Theoretical Foundation,” Smaldino urges that to develop a theory of some phenomenon, one must decompose the system into relevant parts, specify the properties of those parts, articulate the relationships among them, and define how these can change over time. Thus, theory is not just a verbal or narrative statement, but a structural decomposition plus a specification of dynamics and interactions.
- Theories are tools (not “Truth”): Smaldino is explicit that there is (in his view) no one “true” theory; rather, theories are evaluated by how useful they are for understanding, prediction, generalizability, and refinement. In other words, theory is pragmatic: it is judged by its capacity to guide thinking, to generate falsifiable hypotheses, to clarify assumptions, and to integrate with empirical work.
- Verbal vs. formal theories / role of models: Smaldino repeatedly distinguishes verbal theories (narrative descriptions, “story-like”) from formal theories (mathematical or computational models). He argues that verbal theories are often vague, underdetermined, and thus resist strong testing or falsification. Formal models serve as instantiations of theory—they force explicit specification of assumptions, highlight omitted aspects, and allow rigorous exploration of consequences. In this view, a “good” theory is one that can be (or already is) translated into a formal model (or a family of models) that sharpen and test its claims.
- Iterative and reflexive process: Smaldino sees theory construction as iterative: empirical work should refine the theory, and theory should shape what empirical questions get asked. He warns against treating data merely as support for a verbal theory; rather, data should prompt refinement, specification, or rejection of theoretical assumptions. Also, theory-building is reflexive: one must be conscious of which assumptions are built in (implicitly or explicitly), what is omitted for simplicity, and the “violence” (i.e., distortion) done to reality in modeling.
- Theoretical foundation and training: Smaldino laments that many social scientists lack training in theory construction and formal modeling. In “How to Build a Strong Theoretical Foundation,” he argues for greater methodological and conceptual training so that theory is not just received (from canonical frameworks) but actively constructed. His emphasis is that theory is not peripheral—it is central. Without robust theory, methods (however sophisticated) may produce results without insight. (“Better methods can’t make up for mediocre theory.”)
A theory is a deliberately constructed specification of (i) entities or components of a system, (ii) the properties and possible states of those components, (iii) the relationships and rules by which those components interact, and (iv) the temporal dynamics of how those states and relationships evolve. A strong theory is one that (a) can be formalized in mathematical or computational models, (b) offers testable predictions or counterfactuals, (c) is subject to empirical refinement, and (d) is judged not by an abstract “Truth” but by its utility in explaining, predicting, generalizing, and guiding further inquiry.
In his book “Modeling Social Behavior: Mathematical and Agent-Based Models of Social Dynamics and Cultural Evolution”, he defines theory as:
"... a set of assumptions upon which hypotheses derived from that theory must depend. Strong theories allow us to generate clear and falsifiable hypotheses."
Distinguishing it from a theoretical framework:
“A theoretical framework is a broad collection of related theories that all share a common set of core assumptions.”
Theories guide inquiry, and the modeling process. It frames what phenomena we pay attention to, what questions we ask, and how we model:
“Each [model] decomposes a system in a particular way … What questions does your theory address? What parts do you need to include to answer those questions? … Is your model a satisfying representation of your theory?”
That is, a theory is more than just a verbal narrative: it's the background of assumptions that define how one decomposes the phenomena, and from which hypotheses or models are generated. Formal models are instantiations or precise expressions of the theory, and are used as a way to stress test or refine the theory. There is a one to many relationship between theories and models; one theory can be expressed with many different models. This is what I take to be the scientific notion of theory, how I see it applied and how I was trained to apply the term (within the context of economic theory).
Theoretical Virtues
What counts as a "good" theory? How do we compare two theories explaining the same data? Why is simplicity considered desirable? Theoretical virtues are the criteria by which we compare competing theories. In addition to simplicity, there are other common virtues such as elegance (symmetry), explanatory power (unifying phenomena under one framework), fruitfulness (good at generating testable predictions), and coherence (with itself and other theories). Scientists often invoke these when deciding between theories that fit data equally well.
The weight given to each theoretical virtue varies across fields and context. Empirical adequacy is typically non-negotiable. In practice, scientists do appeal to simplicity, elegance, and explanatory depth — even if they don’t always articulate these as “philosophical criteria.” Generally, theoretical scientists (e.g., theoretical physicists, cosmologists, or mathematicians) care more explicitly about theoretical virtues because their work often advances ahead of decisive empirical data. For example, a String Theorist might emphasize mathematical beautify and unification, even though direct empirical tests might be lacking. Empiricists on the other hand, tend to prioritize measurable success and predictive reliability. The line dividing the two is by no means sharp.
We will look at a paper called "Systematizing the Theoretical Virtues". It provides a fairly comprehensive and structured account of the major theoretical virtues, and how they constitute a "logic of theory choice".
Evidential Virtues
- 1) Evidential accuracy: “A theory fits the empirical evidence well (regardless of causal claims).” Does the theory fit the data? This is the baseline virtue: the observable world looks the way the theory says it should. It’s neutral about causes; it’s just “getting the facts right.” Use it when comparing rivals that speak to the same dataset; watch for overfitting (a theory can “fit” because it has too much wiggle room). Evidential accuracy underwrites the other two evidential virtues: typically you assess causal adequacy and depth after you’ve seen solid fit.
- 2) Causal adequacy: “T’s causal factors plausibly produce the effects (evidence) in need of explanation.” Does the posited mechanism really have the oomph? Beyond fit, we ask whether the causes would in fact yield the observed effects (often many causes in interaction). Robustness analysis across heterogeneous models can support this by showing the same core causal structure yields the phenomenon across variations. Beware “dormant” causes that are merely named, not shown to operate at the required scale.
- 3) Explanatory depth: “Excels in causal history depth or in other depth measures such as the range of counterfactual questions that its law-like generalizations answer.” How far and how flexibly does the explanation reach? Depth comes in two flavors: (i) event-focused “how far back” causal history, and (ii) law-focused counterfactual range (how much would still hold under interventions or changed background conditions). It’s different from unification: depth concerns the same target system under varying conditions, not explaining more kinds of facts. Measure it by the breadth of stable “what-if” answers your laws support.
Coherential Virtues
- 4) Internal consistency: “T’s components are not contradictory.” No contradictions inside the theory. A minimal bar: if it derives P and ¬P, something must give. Subtle inconsistencies can hide in idealizations; don’t set the bar so high that all idealized modeling looks “inconsistent,” but don’t excuse genuine clashes as “just idealization,” either. Think formal coherence first, before aesthetic “niceness.”
- 5) Internal coherence: “Components are coordinated into an intuitively plausible whole… T lacks ad hoc hypotheses—components merely tacked on to solve isolated problems.” Parts hang together as an intuitively plausible whole (no ad hoc patches). Different from pure logic: a theory can be consistent yet obviously jury-rigged. Red flags: fixes that are untestable, explain nothing else, or sit awkwardly with the core principles. Use “negative” diagnosis (ad hocness) to pressure-test coherence.
- 6) Universal coherence: “T sits well with (or is not obviously contrary to) other warranted beliefs.” Fits with the rest of what we’re warranted to believe. This is external fit: harmony with well-established results and background commitments (including conservation principles, etc.). Clash here doesn’t instantly falsify, but it raises costs you must repay with exceptional evidential gains. Distinguish healthy tension (pushes progress) from outright conflict with robust knowledge.
Aesthetic Virtues
- 7) Beauty: “Evokes aesthetic pleasure in properly functioning and sufficiently informed persons.” The theory evokes aesthetic pleasure in appropriately situated observers. Beauty shows up as symmetry, aptness, “surprising inevitability,” etc. On Keas’s account, beauty may have extrinsic epistemic value (it can guide us toward other, more tightly connected virtues like simplicity and unification). Use with humility: beauty can inspire, but by itself it doesn’t guarantee truth.
- 8) Simplicity: “Explains the same facts as rivals, but with less theoretical content.” Same explananda, less theory. Think fewer entities (parsimony) and/or more concise principles (elegance). Practically, count independent parameters, primitive postulates, or distinct assumptions. Simplicity often correlates with better predictive performance in model selection, but it also interacts with coherence (ad hoc add-ons usually bloat a theory).
- 9) Unification: “Explains more kinds of facts than rivals with the same amount of theoretical content.” Same resources, more kinds of facts explained. Unification and simplicity are complementary “styles of informativeness”: simplicity reduces content for the same domain; unification expands domain for the same content. Use it to prefer frameworks that tie disparate phenomena together (Maxwell’s electrodynamics-light, plate tectonics, etc.). Keep distinct the diachronic notion (“consilience” gained over time) from this aesthetic one present at introduction.
Diachronic Virtues
- 10) Durability: “Has survived testing by successful prediction or plausible accommodation of new data.” Survives testing over time (prediction or plausible accommodation). Durability is not mere popularity or longevity: it’s testy time. Prediction is often the gold standard; in historical sciences, repeated plausible accommodation of novel data also counts. A newborn theory can’t yet be “durable”; this virtue is inherently time-laden.
- 11) Fruitfulness: “Over time, generates additional discovery by means such as successful novel prediction, unification, and non ad hoc theoretical elaboration.” Generates further discovery (incl. novel prediction, non-ad hoc elaboration, added unification). If durability is conservation (passing tests), fruitfulness is innovation (creating new testable strands). Novel prediction here is genuinely new—wasn’t “built in” as a target during construction. Fruitfulness and durability interlock in mature research traditions (e.g., gravitational astronomy from Uranus’s anomaly to Neptune).
- 12) Applicability: “Used to guide successful action or to enhance technological control… higher when it enables outcomes otherwise not possible.” Guides successful action or control (science → technology, policy). Distinct from experimental control for testing; this is practical leverage (engineering, medicine, forecasting). It’s confirmatory and arrives only after earlier virtues are in place (you can’t apply what you haven’t yet credibly learned), so it is inherently diachronic.
Evidence and Empirical Evidence
What distinguishes a detective who uses evidence from the scientist who uses "empirical evidence" when engaged in empirical research?
Hypothesizing and Confirmation
The term "hypothesis" is frequently bastardized. Confirmation, and its counterpart disconfirmation, are also incredibly misunderstood by the general public. A hypothesis is pretty much just a testable guess, normally derived from a theoretical framework. It is a specific, testable statement about what you expect to happen. It's a prediction about reality that you intend to check with evidence. For example, "If plants are given more light, they will grow faster"; this is a hypothesis, it can be wrong but can definitely be tested. It’s different from an axiom (assumed true), a conjecture (unproven mathematical guess), or a proposition (any statement that is true or false but not necessarily testable), in that it directly connected to the idea of testability and should have properties such as verifiability and falsifiability. This normally implies the phenomena referenced by the hypothesis is measurable, and is therefore directly or indirectly observable through empirical data. In other words, hypotheses should be operationalizable, not merely verbal statements. The hypothesis must be expressed in a level of precision necessary to implement some test, and if a hypothesis is not amendable to this, its not testable in a way that can discern its likeliness.
I'd like to follow this with a few caveats. Scientific practice is often messy, and not defined by one thing such as falsification. Very often, strict falsifiability is not feasible. In practice, this might restrict the applicability of certain testing procedures, such as statistical testing. As Richard McElreath writes in Statistical Rethinking:
Science is not described by the falsification standard, and Popper recognized that. In fact, deductive falsification is impossible in nearly every scientific context.
(1) Hypotheses are not models. The relations among hypotheses and different kinds of models are complex. Many models correspond to the same hypothesis, and many hypotheses correspond to a single model. This makes strict falsification impossible.
(2) Measurement matters. Even when we think the data falsify a model, another observer will debate our methods and measures. They don’t trust the data. Sometimes they are right. (in addition to issues such as false positives and false negatives, observation error)
So in other words, the scientific method is not reducible to a statistical procedure. Statistical evidence is nevertheless an important feature of the process, and statistical methods can relate hypotheses to data, but they are not sufficient.
We will talk more about this later, but since modern science relies so heavily on procedures from statistics, its impossible to fully conceptually separate hypothesis from statistical inference. There are two concepts that frequently occur in the context of reasoning about hypotheses; confirmation and disconfirmation. Remember that a hypothesis makes a prediction about something; in other words, if the hypothesis were true, we would expect to observe something implied by that hypothesis. These observations are typically encapsulated by a probability distribution, and therefore are described by likelihoods. We have some hypothesis H, and we show that it entails some observation D. If we look for D and don't find it, we must conclude that H is false. However, finding D tells us nothing certain about H, because other hypotheses can also predict D. This is why we invoke the notion of likelihoods. If we observe D, we can't be certain that H explains D; but if we measure relative likelihoods, we can find that H is most probable relative to alternative hypotheses. This type of reasoning is central to understanding how scientists reason under uncertainty.
I'll briefly introduce the idea of Bayesian confirmation. The core idea is that a hypothesis wins credit when evidence was more likely if the hypothesis were true than if it weren't. If seeing E is more expected under H than under “not-H,” then E confirms H. The stronger the shift, the stronger the confirmation. (Formally: strength ≈ how big the ratio is between P(E|H) and P(E|¬H).) Evidence confirms H when P(E | H) > P(E | not-H) and disconfirms when P(E | H) < P(E | not-H).
Evidence doesn’t “prove” a hypothesis; it shifts how credible it is. Many different stories can fit the same facts. What matters is which story makes those facts more expected than rival stories. The same observation can match multiple hypotheses, this is the idea of underdetermination. Also, analysis choices matter. What you count, how you measure, which model you use, and when you stop collecting data can all tilt the result without changing the raw facts. Instead of "proof", you should think of "support"; how much evidence tips the scales relative to alternatives, not in isolation. Vague hypotheses also carry little weight; if almost anything you observe feels like confirmation, the statement doesn't discriminate against anything; specific predictions force real tests. Here are a few questions to ask yourself when evaluating a hypothesis:
- What are the live alternatives? What else could explain this? (List at least one.)
- What did each hypothesis specifically predict? (Before seeing the data.)
- Would this result have surprised the rival more? (If yes, support is stronger.)
- What would disconfirm your hypothesis? (Name a clear outcome.)
- Did we tune our analysis after seeing results? (If yes, be cautious.)
- Does this hold in new data or by a different method? (Consilience.)
- Would those alternatives have expected this result as much as your hypothesis does? If your hypothesis makes the result less surprising than the alternatives, that’s good support. If lots of stories would’ve predicted it, it’s mild at best.
Scientific Measurement
In terms of "what is a science", if a claim is not measurable in principle, I think this significantly reduces the ability to call it "scientific".
Data and Statistics
Simply put, this is also a cornerstone of modern science. We will look at how scientists model the Data Generating Process, how data is collected, and how data is what binds science to reality. Let's first look at The Data-Generating Process and Scientific Inference.
Scientific Research and Big Data
A corollary to the prior section, data driven methods are also becoming quite prolific in many domains. The general public is grossly incompetent when it comes to understanding the nuances of collection, storage, governance, processing, transmission, provenance, and utility of data for inquiry. And yet, this has been a massive pillar in many of the advances in the past few decades. People generally do not have a clue why big data is so valuable, what can be done with it, and to whom. They are unaware that their digital footprint can be used to yield a fairly accurate picture of their beliefs and preferences, which can then be used for predictive analytics. They are also unaware of the value it provides to scientific researchers.
Scientific Representation, Models in Science, and Mathematical Modeling
How do scientists represent the target system they are studying? There are quite a range of scientific models in application across all domains of science.
Computer Simulation
The advent and proliferation of computing, programming languages, and software has undoubtedly had a significant impact on the way science is carried out. Simulation modeling is now quite indispensable within the toolkit of the modern scientist. I would go so far to say that you simply cannot do modern science without the aid of a computer in one form or another. This is true for physical sciences and biological sciences as well as social sciences; even non-traditional scientific disciplines like quantitative finance. In fact, most of my initial experience in this during grad school came through studying stochastic processes in financial engineering courses, in addition to Monte Carlo Methods in Bayesian statistics and state space modeling in economics (as well as DSGE models). Since then, I've been interested in simulating social complexity via agent based models. Most modeling cannot be done unless within the context of computer simulation, which requires knowledge of algorithms, data structures, and computational complexity, for understanding how to implement your model. This is obviously a prolific aspect of science. So in this section, I want to describe the function of a simulation, how it augments the scientific toolkit, and various simulation methods, ones that I am more familiar with given my education and work experience.
When we simulate, we are simulating some process or system. This shows it's generality, because we can essentially represent just about anything as a system or process, which means we can describe the properties, components, relationships, behavior, dynamics and architecture of just about any system computationally; allowing us to reason about the real system under discussion in a controlled setting. A simulation is an imitation of the dynamics of a real-world process or system over time. This computational representation is studied, like non-computation models, for a variety of tasks including: "what if" analysis, scenario analysis, intervention analysis, stress testing, modification, or pretty much anything else. The alternative approach to simulation is direct experimentation, which is infeasible in many situations. Simulations are often cheaper, faster, more likely to be replicated, safer, and ethical. In many cases it's also just practically impossible to model as system mathematically with closed form solutions; systems are often intractable and too complicated to solve. Approximations via simulation tend to be much more suitable for rapid experimentation. Like any model, it is not assumption free; these assumptions are encapsulated in our formulation of the model. Simulation models allow us to modify our assumptions and test the implications.
These models are essential for engineering any system with significance. Consider the car you use, how did the engineers determine it's reliability? They used simulation methods to guide the design process. How do airlines have such high reliability? The use simulations to understand how the plane will operate under a variety of scenarios, this influences their design decisions. How does the airline ensure timely arrival of planes and coordinate thousands of daily trips? They use simulation methods, among other methods like optimization. How did researchers identify a vaccination so quickly during the COVID pandemic? This is multifaceted, and involves simulation at every step. Supercomputers like those at Lawrence Livermore National Laboratory were used for rapid drug discovery. Identifying an effective drug involves discovering a molecular structure. You can imagine the combinatorial explosiveness of the search space; doing this purely by gathering information from experiments is simply not feasible for rapid turn discoveries. Supercomputing allows you to simulate the effectiveness of a proposed structure, narrowing down the search space for researchers, allowing them to identify an effective structure more quickly by searching more dense regions of probability space. In addition, simulations were used for epidemic forecasting. Country level microsimulations quantified how distancing, lockdowns, and closures could keep hospitals from being overwhelmed. Suppose you have normal capacity at a hospital, with limited ability to scale; massive stress on that system might overwhelm it, leading to excess deaths. Therefore, from a policymakers perspective, they might want to know these counterfactual situations, and adjust their policy accordingly. Closures were also determined based on simulations. Airflow models revealed how respiratory particles move indoors, guiding ventilation, filtration, and layout choices, and which facilities are likely locations to have a massive outbreak, which subsequently impacts hospital stress. In each of these cases, simulations gave us usable answers while experiments and trials were still spinning up. This particular problem, shared among many complex problems, often involves systems of systems. Modeling and simulation allows researchers to understand how various systems interact; we can effectively integrate multiple models of systems to understand how they all interact. This is something that is very difficult without the use of computational resources. Supercomputers enabled the possibility of rapid computational experimentation, which lead to effective decision support. Put simply, computer simulation has a direct impact on the policy that effects your life.
There are essentially 3 by 2 types of simulations. Think of it as a grid, where each cell represents a combination of the various elements of a simulation. There are stochastic vs deterministic simulations, static vs dynamic simulations, and discrete time/event vs continuous time/event simulations. So you can have a discrete time dynamic stochastic system, a stochastic continuous time continuous event simulation, a deterministic dynamic time discrete event simulation etc. Each of these dimensions represent different aspects of the system under discussion. Stochastic systems have random components, dynamical systems are time dependent, and continuous systems are those where the system state can be represented numerically as a non-finite number. On the contrary, deterministic systems do not contain randomness, static representations do not depend on time, and discrete representations refer to systems where the states can be represented as a finite number. Each combination implies different sets of methods. It is entirely up to the research to decide how to model the system, but the decision is not arbitrary. Sometimes it is just easier to represent a system statically, this is often the case in economics. Introducing more moving parts makes the system harder to understand, so researchers must find a sweet spot between model complexity and granularity, and how well it answers questions. For example, in economics we have DSGE models that rely on the "representative agent". This is a sort of idealization about how people make decisions in an economy, imposed upon the entire collection of agents; the "representative agent" represents how everyone who is "rational" would make economic decisions. It assumes away any underlying network structure and heterogeneity. It idealizes the economic decision independent from other factors. This form allows us to have nice compact modeling formulations that are solvable or easy to reason about. But obviously, it does not have to be done this way. Agent based models on the contrary, allow the modeler to encode heterogeneity. We can then run simulations "from the ground up", and use these results to reason about a real world economy. This also comes with its own set of costs and sacrifices. These models are harder to validate and make sense of. Therefore, decisions to represent systems depend upon these considerations.
What are the elements of a simulation model? Well, it depends on the type of model and the domain you're studying. This taxonomy will be biased towards discrete event simulations, but I think pretty much every simulation will implicitly refer to these elements. There are two objects of simulation:
- Entities: individual elements of the system that are being simulated and whose behavior is being explicitly tracked. Each entity can be individually identified;
- Resources: also individual elements of the system but they are not modelled individually. They are treated as countable items whose behavior is not tracked.
These decisions are made by the modeler, and depend on the system under discussion. How do we organize the entities and resources?
- Attributes: properties of objects (that is entities and resources). This is often used to control the behavior of the object. In a more comprehensive simulation, an attribute might be the type of features that distinguish entities.
- State: collection of variables necessary to describe the system at any time point. These fully characterize the system. For example, in a queuing system, it might be wait time.
- Queue: collection of entities or resources ordered in some logical fashion. This refers to how the entities are processed within the system
- Event: instant of time where the state of the system changes. An event describes the possible ways the state can change, and locates the time in which that change took place.
- Activity: a time period of specified length which is known when it begins (although its length may be random). This may be specified in terms of a random distribution.
- Delay: duration of time of unspecified length, which is not known until it ends. This is not specified by the modeler ahead of time but is determined by the conditions of the system. Very often this is one of the desired output of a simulation.
- Clock: variable representing simulated time.
- Processes: a type of event that has start-end rules with, including decision logic, policies, and control rules.
1) Frame the decision and the system
2) Build the conceptual model:
- Entities and states: What things move or change (patients, packets, orders, molecules)? What states can they occupy (waiting, in service, recovered, failed)?
- Processes and rules: How do states change—by scheduled events (arrivals, service completions), by interactions (agent meetings), or by continuous flows (stock-and-flow)?
- Time treatment: Decide if you advance time by events (jump to next event; classic discrete-event), by fixed steps (∆t; good for differential equations or when events are dense), or hybrid (event-driven with sub-stepping for continuous parts).
- Resources and constraints: Servers, machines, beds, CPU cores, budgets. Specify capacities, calendars, and priorities.
- Randomness: Where uncertainty lives (interarrival times, service durations, agent behaviors, failure times) and how you’ll model it (distributions, correlations).
- Policies and controls: Schedules, routing rules, admission limits, pricing, triage—these become the levers for scenarios.
3) Input modeling: turn messy data into usable distributions
4) Choose a paradigm
- Discrete-event simulation (DES): Best for queuing, logistics, manufacturing, networks. You maintain an event calendar, a future event list, and process handlers that update state and schedule downstream events. You observe sharp changes at discrete times (arrivals, completions).
- Agent-based simulation (ABS): Best when micro-level behavior and interaction drive macro outcomes (epidemics, social systems, markets). Each agent carries rules; the system emerges from interactions. Often run with small time steps or event hooks.
- System dynamics (SD): Best for feedback-heavy, aggregate systems (stocks, flows, delays). You write coupled differential or difference equations and integrate in time.
- Monte Carlo (MC): Best for pure uncertainty propagation: sample inputs, evaluate a deterministic model, aggregate outputs. Often baked into other paradigms.
5) Implement a Minimal Version
6) Verification: prove you built the model you meant to build
7) Validation: prove the model is a good stand-in for reality
8) Experiment design: plan runs that answer the question
9) Randomness, variance, and confidence
10) Sensitivity, uncertainty, and robustness
11) Prepare results for presentation
12) Reproducibility and governance
Mechanisms in Science
The act of identifying mechanistic cause and effect relations.
Scientific Explanation
What does it mean when someone says "Science has explained something"?
Scientific Reduction
What is the role of reduction in explanation? When larger systems are explained in terms of something more fundamental, what exactly are we accomplishing?
Scientific Objectivity
Whether or not the practice of science can be truly objective is not the purpose of this section. Rather, I'd like to discuss various methods it uses to maintain alignment with the standard, and how built in mechanisms self correct when deviations from the ideal occur.
Scientific Discovery
What constitutes a scientific discovery? With the constant barrage of "new discoveries" flooding the media, how do we make sense of what is going on?
Scientific Underdetermination, Fallibilism, and Uncertainty
Many people are interested in science because of a "debunking" attitude rather than one of genuine curiosity. Because of this, they expect some sort of infallibility, so they can use conclusions from science to beat their opponent in an argument. They then become frustrated when they find out something they held to be true should have been tentative at best. Science is always in the process of revising itself and really is not concerned with the platonic conception of truth.
Scientism
Can someone dogmatically adhere to science at the expense of other methods of inquiry? We will look at Six Signs of Scientism to answer this question. Susan Haack’s central objective in Six Signs of Scientism is to demarcate scientism from legitimate science; not in the naïve sense of drawing a boundary around science proper (a move she explicitly critiques as itself scientistic), but rather to expose a cluster of intellectual temptations in contemporary culture that inflate the authority, epistemic reach, or rhetorical prestige of science beyond its proper bounds. Early on, she defines scientism as “a kind of over-enthusiastic and uncritically deferential attitude toward science, an inability to see or an unwillingness to acknowledge its fallibility, its limitations, and its potential dangers” (Haack, p. 76). Her task is not to attack science, she explicitly defends its value, but to identify when admiration becomes uncritical worship. She warns that scientism is not a single thesis but a family of symptoms — subtle, culturally normalized behaviors and linguistic patterns. Hence: six “signs.” Each sign, she notes, is not definitive alone, but diagnostic when seen together.
Sign 1: Honorific use of "Science"
Sign 2: Using Scientific Trappings Decoratively
Sign 3: Obsession with Demarcation
Sign 4: The Quest for "The Scientific Method"
- There is no one “scientific method” used by all and only scientists (p. 89).
- This does not make scientific discovery miraculous; it makes it continuous with ordinary empirical inquiry, but amplified, refined, and disciplined by the distinctive helps science has developed (pp. 88–89).
Sign 5: Looking to Science for Answers Beyond its Scope
- Policy masquerading as science. Science can tell us the likely consequences of damming a river, changing tax codes, or modifying school governance; it cannot by itself adjudicate whether the ends are desirable, or what trade-offs are morally justifiable (p. 90). When researchers’ ethical/political convictions tilt their evidential judgment, or when normative conclusions are presented “as if they were scientific results,” we have scientism (p. 90).
- Empirical surveys as ethical verdicts. Haack analyzes a Lancet article advocating the “complete lives” principle for allocating scarce medical resources — giving priority to adolescents/young adults — and notes the authors cite surveys of what “most people think” as support (pp. 90–91). She underscores the category mistake: “most people think x is morally best” ≠ “x is morally best” (p. 91). Substituting measured preference for justification is a hallmark of scientism.
Sign 6: Denigrating the Non-Scientific
- Within inquiry: It is scientistic to assume empirical legal studies are inherently superior to interpretive legal scholarship (p. 92). Different questions demand different cognitive virtues and methods.
- Beyond inquiry: It is scientistic to assume that art, literature, music, craftsmanship, and tradition have lesser value simply because they are not avenues of empirical discovery (pp. 92–93).
Summarizing Scientism
Conclusion: The Richard Feynman Lectures
I've always found Feynman to be an excellent science communicator. So to wrap this up, lets have a look at his famous lecture on the scientific method:
Richard Feynman on Scientific Method (1964) | After noise reduction
Now, I'm going to discuss how we would look for a new law. In general, we look for a new law by the following process. First, we guess it.
Then we-- well, don't laugh. That's really true. Then we compute the consequences of the guess to see what-- if this is right, if this law that we guessed is right, we see what it would imply, and then we compare those computation results to nature. Or we say, compare to experiment or experience. Compare it directly with observation to see if it works.
If it disagrees with experiment, it's wrong. And that simple statement is the key to science. It doesn't make a difference how beautiful your guess is. It doesn't make a difference how smart you are, who made the guess, or what his name is, if it disagrees with experiment, it's wrong. That's all there is to it.
It's therefore not unscientific to take a guess, although many people who are not in science think it is. For instance, I had a conversation about flying saucers some years ago with laymen.
Because I'm scientific. I know all about flying saucers. So I said, I don't think there are flying saucers. So the other-- my antagonist said, is it impossible that there are flying saucers? Can you prove that it's impossible? I said, no, I can't prove it's impossible. It's just very unlikely.
That, they say, you are very unscientific. If you can't prove an impossible, then why-- how can you say it's likely, that it's unlikely? Well, that's the way-- that it is scientific. It is scientific only to say what's more likely and less likely, and not to be proving all the time possible and impossible.
To define what I mean, I finally said to them, listen, I mean that from my knowledge of the world that I see around me, I think that it is much more likely that the reports of flying saucers are the result of the known irrational characteristics of terrestrial intelligence, rather than the unknown rational effort of extraterrestrial intelligence.
It's just more likely, that's all. And it's a good guess. And we always try to guess the most likely explanation, keeping in the back of the mind the fact that if it doesn't work, then we must discuss the other possibilities.
There was, for instance, for a while a phenomenon we called superconductivity. It still is a phenomenon, which is that metals conducts electricity without resistance at low temperatures. And it was not at first obvious that this was a consequence of the known laws with these particles. But it turns out that it has been thought through carefully enough, and it's seen, in fact, to be a consequence of known laws.
There are other phenomena, such as extrasensory perception, which cannot be explained by this known knowledge of physics here. And it is interesting, however, that that phenomenon has not been well established, and--
--that we cannot guarantee that it's there. So if it could be demonstrated, of course, that would prove that the physics is incomplete. And therefore, it's extremely interesting to physicists whether it's right or wrong. And many, many experiments exist which show it doesn't work.
The same goes for astrological influences. If that were true, that the stars could affect the day that it was good to go to the dentist, then-- it's in America we have that kind of astrology-- then it would be wrong. The physics theory would be wrong, because there's no mechanism understandable in principle from these things that would make it go. And that's the reason that there's some skepticism among scientists with regard to those ideas.
Now, you see, of course, that with this method, we can disprove any definite theory. We have a definite theory, a real guess from which you can really compute consequences which could be compared to experiment, and in principle, we can get rid of any theory. You can always prove any definite theory wrong. Notice, however, we never prove it right.
Suppose that you invent a good guess, calculate the consequences, and discover every consequence that you calculate agrees with the experiment. Your theory is then right? No, it is simply not proved wrong. Because in the future, there could be a wider range of experiments, you compute a wider range of consequences, and you may discover, then, that the thing is wrong.
That's why laws like Newton's laws for the motion of planets lasts such a long time. He guessed the law of gravitation, calculated all kinds of consequences for the solar system and so on, compared them to experiment, and it took several hundred years before the slight error of the motion of Mercury was developed.
During all that time, the theory had been failed to be proved wrong, and could be taken to be temporarily right. But it can never be proved right, because tomorrow's experiment may succeed in proving what you thought was right wrong. So we never are right. We can only be sure we're wrong. However, it's rather remarkable that we can last so long. I mean, have some idea which will last so long.
I must also point out to you that you cannot prove a vague theory wrong. If the guess that you make is poorly expressed and rather vague, and the method that you used for figuring out the consequences is rather a little vague-- you're not sure. You say, I think everything is because it's all due to [INAUDIBLE], and [INAUDIBLE] do this and that, more or less. So I can sort of explain how this works. Then you see that that theory is good, because it can't be proved wrong.
If the process of computing the consequences is indefinite, then with a little skill, any experimental result can be made to look like-- or an expected consequence. You're probably familiar with that in other fields. For example, A hates his mother. The reason is, of course, because she didn't caress him or love him enough when he was a child. Actually, if you investigate, you find out that as a matter of fact, she did love him very much, and everything was all right. Well, then, it's because she was overindulgent when he was [INAUDIBLE]. So by having a vague theory--
--it's possible to get either result.
Now, wait. Now, the cure for this one is the following. It would be possible to say, if it were possible to state ahead of time how much love is not enough, and how much love is overindulgent exactly, and then there would be a perfectly legitimate theory against which you can make tests. It is usually said when this is pointed out how much love is and so on, oh, you're dealing with psychological matters, and things can't be defined so precisely. Yes, but then you can't claim to know anything about it.
Now, I want to concentrate for now on-- because I'm a theoretical physicist, and more delighted with this end of the problem-- as to what goes-- how do you make the guesses? Now, it's strictly, as I said before, not of any importance where the guess comes from. It's only important that it should agree with experiment, and that it should be as definite as possible.
But, you say, that is very simple. We set up a machine-- a great computing machine-- which has a random wheel in it that makes a succession of guesses. And each time it guesses a hypotheses about how nature should work, computes immediately the consequences, and makes a comparison to a list of experimental results it has at the other end. In other words, guessing is a dumb man's job.
Actually, it's quite the opposite, and I will try to explain why.
The first problem is how to start. You see how I start? I'll start with all the known principles. But the principles that are all known are inconsistent with each other, so something has to be removed. So we get a lot of letters from people. We're always getting letters from people who are insisting that we ought to make holes in our guesses as follows. You see, you make a hole to make room for a new guess.
Somebody says, do you know, people always say space is continuous. But how do you know when you get to a small enough dimension that there really are enough points in between? It isn't just a lot of dots separated by a little distance.
Or they say, you know those quantum mechanical amplitudes you told me about? They're so complicated and absurd. What makes you think those are right? Maybe they aren't right. I get a lot of letters with such content.
But I must say that such remarks are perfectly obvious and are perfectly clear to anybody who is working on this problem, and it doesn't do any good to point this out. The problem is not what might be wrong, but what might be substituted precisely in place of it. If you say anything precise, for example, in the case of a continuous space. Suppose the precise composition is that space really consists of a series of dots only, and the space between them doesn't mean anything, and the dots are in a cubic array, then we can prove that immediately is wrong. That doesn't work.
You see, the problem is not to make-- to change, or to say something might be wrong, but to replace it by something. And that is not so easy. As soon as any real definite idea is substituted, it becomes almost immediately apparent that it doesn't work.
Secondly, there's an infinite number of possibilities of these simple types. It's something like this. You're sitting, working very hard. You work for a long time trying to open a safe. And some Joe comes along who hasn't-- doesn't know anything about what you're doing or anything, except that you're trying to open a safe.
He says, you know, why don't you try the combination 10, 20, 30? Because you're busy. You tried a lot of things. Maybe you already tried 10, 20, 30. Maybe you know that the middle number is already 32 and not 20. Maybe you know that as a matter of fact, this is a five-digit combination. There we go.
So these letters don't do any good, and so please don't send me any letters trying to tell me how the thing is going to work. I read them to make sure--
--that I haven't already thought of that. But it takes too long to answer them, because they're usually in the class, try 10, 20, 30.
Comments
Post a Comment