Why Large Language Models are Not and Will Not Become Artificial General Intelligence

 I was originally intending on writing a multi-post analysis on the topic of LLMs and AGI specifically to show the fundamental problems of conflating the two or thinking it's likely the former will lead to the latter. However, it dawned on me that it would probably be more useful to simply list out and describe the resources that have been leading me to this conclusion rather than regurgitate the arguments in a watered down form. Therefore, this post will contain many references to what I see as expositions that point out some of the fundamental constraints of LLMs and AGI more broadly.

Artificial General Intelligence is the idea that an Intelligent Agent (usually assumed to be manifested in the form of a computer system based on the notion of a Turing Machine) can achieve human-like performance with any intellectual task. An agent is any sort of abstract actor who has the capability to act in a given environment; the concepts of agency and intentionality are deep philosophical topics that have been discussed for thousands of years so I am not going to dive too deeply into it. From a technical sense of the term, agency refers to the intrinsic capacity to set goals, form criteria associated with those goals for evaluation, the ability to construct a mental model of an environment, and adapt that model in accordance with feedback from their environment. This is by no means comprehensive; different disciplines characterize agents differently conditional on the problem domain, such as in Economics where Agents are idealized buyers and sellers who always trying to optimize some objective function, or Agent Based Models where "Agents" can be thought of as autonomous objects interacting with others based on an exhaustive pre-set list of update rules. There is also the notion of a Software Agent, multi-agent systems, and many more. Within the context of AGI, an agent can be considered to have attained "general intelligence" if it is autonomous and can surpass humans on human tasks. It should be obvious that this definition alone is almost entirely operational, it says nothing about whether these systems will emerge consciousness or a "mind" (the Strong AI Hypothesis). From an engineering perspective, philosophical questions like these tend to be out of scope for actually developing the system, although they do arise when these systems seem to surpass expectations. Part of the reason for writing this post is to convince the reader that LLMs are not Strong AI in any sense of the term and that it is kind of silly to think statistical models of language are somehow fundamental to consciousness. I also don't really care to comment on the X-Risk (Existential Risk from Artificial General Intelligence) "debate" because it will become clear, based on the references I will provide, that AGI doomers fundamentally misunderstand and mistakenly anthropomorphize these systems in a way that leads to hysteria. The true risks come from injecting probabilistic sub-systems into increasingly interconnected and complex higher level systems that can lead to Cascading Failures; and since this environment would be characterized by a form of Knightian Uncertainty, we are now dealing with a type of Epistemic Risk that is fundamentally incalculable. In other words, X-Risk is really just another type of all-too-human problem. Anthropomorphizing autonomous systems really just distracts us from the real issue. It also should also be obvious that AGI presupposes a specific, very operational, definition of Intelligence that isn't necessarily aligned with how it's conceptualized in Cognitive Science, Psychology, or the field of Animal Intelligence. AGI researchers seem to assume that intelligence can emerge outside of an evolutionary and embodied history, something I find to be highly unlikely.

Before we dive into some of the critiques, it is important to understand what Pre-Trained Generative Transformers (The most ubiquitous LLM architecture at the moment) are actually doing under the hood. Stephen Wolfram wrote an article What Is ChatGPT Doing … and Why Does It Work? explaining the fundamentals of next word prediction, concept embeddings, Training Validation and Test Data, and the basics of neural networks. You should at least skim that article before diving any further into this one. Also, familiarity with different Theories of Mind will be useful because philosophy of mind seems to reoccur in certain criticisms (See list of Concepts in Philosophy of Mind).

Okay so here we go; below you can find the resources and some commentary.

  1. Dancing with pixies: strong artificial intelligence and panpsychism.
    • Abstract: The argument presented in this paper is not a direct attack or defence of the Chinese Room Argument (CRA), but relates to the premise at its heart, that syntax is not sufficient for semantics, via the closely associated propositions that semantics is not intrinsic to syntax and that syntax is not intrinsic to physics. However, in contrast to the CRA’s critique of the link between syntax and semantics, this paper will explore the associated link between syntax and physics. The main argument presented here is not significantly original – it is a simple reflection upon that originally given by Hilary Putnam (Putnam 1988) and criticised by David Chalmers and others: instead of seeking to justify Putnam’s claim that, “every open system implements every Finite State Automaton (FSA)”, and hence that psychological states of the brain cannot be functional states of a computer, I will seek to establish the weaker result that, over a finite time window every open system implements the trace of a particular FSA Q, as it executes program (p) on input (x). That this result leads to panpsychism is clear as, equating Q (p, x) to a specific Strong AI program that is claimed to instantiate phenomenal states as it executes, and following Putnam’s procedure, identical computational (and ex hypothesi phenomenal) states (ubiquitous little ‘pixies’) can be found in every open physical system.
    • My Commentary: Essentially the author is arguing against the Computational Theory of the Mind, and more broadly the Functionalist view of mind, claiming that if we consider the mind to be equivalent to brain function, and brain function to be equivalent to a Deterministic Finite Automaton implemented in the Turing computational model (fundamental to all digital systems), then we would have to accept Panpsychism, implying that there is "mind" in mundane objects such as rocks. His argument is a Reductio ad absurdum, showing that conceptualizing the mind as a finite state machine leads to unacceptable conclusions.
  2. “What's wrong with LLMs and what we should be building instead” - Tom Dietterich
    • Summary: In this discussion Thomas G. Dietterich discusses the fundamental problems with LLMs and potential solutions. He essentially recognizes the need for Cognitive Architecture and Knowledge bases.
    • My Commentary: I think a cognitive approach different than Connectionism ought to be the framework for developing Artificial intelligence. As mentioned above, LLMs are essentially massive statistical models that identify patterns within massive data sets, and use these patterns to predict the next likely token in a sequence. I do not think cognition and consciousness can be reduced to a statistical model. If we take a look at Human Intelligence, memory seems to be something fundamentally missing from language models. Human memory is distinct from database storage and ephemeral memory; it is largely constructed and associated with experiences in an embodied setting. If we take seriously the work from Elizabeth Loftus, C.R. Gallistel, or any other memory researcher (Like the Atkinson-Shiffrin model) it becomes difficult to see how LLM token context can be considered genuine memory. This is just one example of cognitive functionality that neural networks do not emulate. Further, there are specialized sub-modules within our brain that seem to be coupled with other sub-systems. Artificial Deep neural networks don't take into account Functional Integration and does not distinguish between different types of nodes/connections within the system. ANNs abstract away relevant ontological differences between different types of nodes (and their specializations) within the whole system.
  3. Gut microbial communities modulating brain development and function
    • Abstract: Mammalian brain development is initiated in utero and internal and external environmental signals can affect this process all the way until adulthood. Recent observations suggest that one such external cue is the indigenous microbiota which has been shown to affect developmental programming of the brain. This may have consequences for brain maturation and function that impact on cognitive functions later in life. This review discusses these recent findings from a developmental perspective.
    • My Commentary: This is very new research but I've been highly motivated by recent discoveries by the NIH Human Microbiome Project. This stems from my position that cognition evolves from an environment and is embodied. Essentially, any given perturbation of human microbial composition can effect cognitive development. This would strongly imply that cognitive processes are coupled to environmental dynamics and other biological factors. Conceptualizing AGI or consciousness as distinct from an evolutionary history, sitting within some Platonic realm, ignores the situatedness of cognitive development. From the Human Microbiome Project, we know that certain microbes aid cellular functionality; meaning that our evolution as a species has coevolved with organisms external to us. I think that understanding human cognition requires a deeper understanding of the microbes that support the flourishing of life. What I am arguing is that our bodies are in a Symbiotic relationship with the microbial life; cognitive function and degradation must be highly dependent on maintaining this relationship. Here are some other papers that motivate my conclusions:
  4. Grow Smart and Die Young: Why Did Cephalopods Evolve Intelligence?
    • Abstract: The most influential views on the evolution of intelligence suggest that intelligence coevolves with slow life history in response to socioecological challenges; however, these conclusions are primarily focused on large-brained vertebrates. Cephalopod mollusks strongly challenge the most accepted hypotheses on the evolution of intelligence: cephalopods evolved complex brains and high behavioral flexibility together with fast life histories and in simple social environments. Surprisingly, the evolution of intelligence in cephalopods has been largely overlooked, thus leaving this evolutionary conundrum unsolved. Discussing differences and similarities between cephalopods and large-brained vertebrates, may shed light on fundamental aspects of the evolution of intelligence.
    • My Commentary: Like I mentioned above, AGI researchers tend to think of Intelligence in a very operational sense, mainly because of the need to formalize and abstract an algorithm that can run on a digital computer. While there are fields of research dedicated to Biologically Inspired computing who have recognize alternative conceptions of intelligence such as Swarm Intelligence, as I've alluded to in number 3, Platonifying cognition as distinct from evolutionary history seems to me far fetched. There is also a tendency to focus on humans as the paradigmatic example of intelligence that can obfuscate the fact that there are many other types of intelligences among mammals. The Cephalopods are very unique example in that their brain is literally distributed across its body. There is evidence suggesting that the evolution of cephalopod intelligence was a response to predation. I think human intelligence probably emerged in a similar way and that you simply cannot just decouple "the algorithm" from it's path dependency.
  5. Free agents: how evolution gave us free will : Kevin Mitchell, Trinity College Dublin and The Evolution of Agency
    • Abstract: An evolutionary case for the existence of free will. Scientists are learning more and more about how brain activity controls behavior and how neural circuits weigh alternatives and initiate actions. As we probe ever deeper into the mechanics of decision making, many conclude that agency-or free will-is an illusion. In Free Agents, leading neuroscientist Kevin Mitchell presents a wealth of evidence to the contrary, arguing that we are not mere machines responding to physical forces but agents acting with purpose. Traversing billions of years of evolution, Mitchell tells the remarkable story of how living beings capable of choice emerged from lifeless matter. He explains how the emergence of nervous systems provided a means to learn about the world, granting sentient animals the capacity to model, predict, and simulate. Mitchell reveals how these faculties reached their peak in humans with our abilities to imagine and to introspect, to reason in the moment, and to shape our possible futures through the exercise of our individual agency. Mitchell's argument has important implications-for how we understand decision making, for how our individual agency can be enhanced or infringed, for how we think about collective agency in the face of global crises, and for how we consider the limitations and future of artificial intelligence. An astonishing journey of discovery, Free Agents offers a new framework for understanding how, across a billion years of Earth history, life evolved the power to choose and why this matters.
    • My Commentary: One of the first concepts you encounter with AGI is the idea of "Autonomous Agency". My basic contention is that we refer to these systems as "agents" but strictly speaking they do not have any agency because that is something that emerges from an evolutionary context. This book by Kevin Mitchell is not explicitly attempting to undermine AGI research but I think it is relevant to the idea that agency is something computational.
  6. The Evolution of Agency: Behavioral Organization from Lizards to Humans
    • Abstract: A leading developmental psychologist proposes an evolutionary pathway to human psychological agency. Nature cannot build organisms biologically prepared for every contingency they might possibly encounter. Instead, Nature builds some organisms to function as feedback control systems that pursue goals, make informed behavioral decisions about how best to pursue those goals in the current situation, and then monitor behavioral execution for effectiveness. Nature builds psychological agents. In a bold new theoretical proposal, Michael Tomasello advances a typology of the main forms of psychological agency that emerged on the evolutionary pathway to human beings. Tomasello outlines four main types of psychological agency and describes them in evolutionary order of emergence. First was the goal-directed agency of ancient vertebrates, then came the intentional agency of ancient mammals, followed by the rational agency of ancient great apes, ending finally in the socially normative agency of ancient humans. Each new form of psychological organization represented increased complexity in the planning, decision-making, and executive control of behavior. Each also led to new types of experience of the environment and, in some cases, of the organism's own psychological functioning, leading ultimately to humans' experience of an objective and normative world that governs all of their thoughts and actions. Together, these proposals constitute a new theoretical framework that both broadens and deepens current approaches in evolutionary psychology.
    • My Comments: This book is similar to Kevin Mitchells in that it postulates agency developing through evolutionary pressures. Tomasello outlines how the need for social cooperation and social coordination lead to evolutionary forces selecting for feedback systems that can interact intelligently with their environment to achieve meta-goals higher than survival. Mitchell also describes the origin of self-preserving feedback systems leading to agency. Again, this is not a direct subversion of AGI research but I think it is relevant, given the examples I listed above concerning the microbiome and cephalopods. This is also relevant because it raises the fundamental question: Can intelligence and agency emerge in a vacuum? Much of learning is social; it seems a bit odd to me that AGI can simply come about absent interaction with other intelligent agents.
  7. Neural Networks are not Reasoning Systems
  8. Artificial Intelligence is stupid and causal reasoning won't fix it
    • Abstract: Artificial Neural Networks have reached Grandmaster and even super-human performance across a variety of games: from those involving perfect-information (such as Go) to those involving imperfect-information (such as Starcraft). Such technological developments from AI-labs have ushered concomitant applications across the world of business - where an AI brand tag is fast becoming ubiquitous. A corollary of such widespread commercial deployment is that when AI gets things wrong - an autonomous vehicle crashes; a chatbot exhibits racist behaviour; automated credit scoring processes discriminate on gender etc. - there are often significant financial, legal and brand consequences and the incident becomes major news. As Judea Pearl sees it, the underlying reason for such mistakes is that, 'all the impressive achievements of deep learning amount to just curve fitting'. The key, Judea Pearl suggests, is to replace reasoning by association with causal-reasoning - the ability to infer causes from observed phenomena. It is a point that was echoed by Gary Marcus and Ernest Davis in a recent piece for the New York Times: 'we need to stop building computer systems that merely get better and better at detecting statistical patterns in data sets - often using an approach known as Deep Learning - and start building computer systems that from the moment of their assembly innately grasp three basic concepts: time, space and causality'. In this paper, foregrounding what in 1949 Gilbert Ryle termed a category mistake, I will offer an alternative explanation for AI errors: it is not so much that AI machinery cannot grasp causality, but that AI machinery - qua computation - cannot understand anything at all.
    • My Comments: This is pretty much a similar argument to Bishops argument in Number 1. As we will see in the next critique, I agree with Bishop that AI systems don't actually understanding anything.
  9. The Chinese Room Argument
    • Summary: Searle's thought experiment begins with this hypothetical premise: suppose that artificial intelligence research has succeeded in constructing a computer that behaves as if it understands Chinese. It takes Chinese characters as input and, by following the instructions of a computer program, produces other Chinese characters, which it presents as output. Suppose, says Searle, that this computer performs its task so convincingly that it comfortably passes the Turing test: it convinces a human Chinese speaker that the program is itself a live Chinese speaker. To all of the questions that the person asks, it makes appropriate responses, such that any Chinese speaker would be convinced that they are talking to another Chinese-speaking human being. The question Searle wants to answer is this: does the machine literally "understand" Chinese? Or is it merely simulating the ability to understand Chinese?[6][d] Searle calls the first position "strong AI" and the latter "weak AI". He begins with three axioms (A1) "Programs are formal (syntactic)." A program uses syntax to manipulate symbols and pays no attention to the semantics of the symbols. It knows where to put the symbols and how to move them around, but it does not know what they stand for or what they mean. For the program, the symbols are just physical objects like any others. (A2) "Minds have mental contents (semantics)." Unlike the symbols used by a program, our thoughts have meaning: they represent things and we know what it is they represent. (A3) "Syntax by itself is neither constitutive of nor sufficient for semantics." This is what the Chinese room thought experiment is intended to prove: the Chinese room has syntax (because there is a man in there moving symbols around). The Chinese room has no semantics (because, according to Searle, there is no one or nothing in the room that understands what the symbols mean). Therefore, having syntax is not enough to generate semantics. Searle posits that these lead directly to this conclusion: (C1) Programs are neither constitutive of nor sufficient for minds. This should follow without controversy from the first three: Programs don't have semantics. Programs have only syntax, and syntax is insufficient for semantics. Every mind has semantics. Therefore no programs are minds.
    • My Comments: Searle is distinguishing between Semantics and Syntax. Turing machines can only function on arbitrarily specified syntactic input-output transformation rules. At the very core of a Turing machine is something called machine language; it is the most fundamental thing a computer can "understand" after being compiled from higher level languages. The machine reads the instructions and implements a series of simple transformations on the data; it is a finite state machine as we mentioned before. In the context of understanding Chinese; suppose I see a Chinese symbol and reference it in some registry containing a mapping between the corresponding English symbol. Would this imply that I actually understand Chinese? And that is the point. At the core of every Deep Neural Network is a finite state automaton operating on arbitrary input output rules. It knows nothing about the semantic meaning of the symbols. Simply put; I do not find any of the responses compelling and do not see how this problem can be overcome. This is not to say that AGI is impossible though; perhaps another model of computation (such as analog computing) might be promising. The argument shows that Functionalism is false; this need not mean AGI is impossible but for the time being, we are completely reliant on this model of computation so I don't see a major transition in the near future.
  10. Symbol Grounding Problem by Stevan Harnad
    • Abstract: There has been much discussion recently about the scope and limits of purely symbolic models of the mind and about the proper role of connectionism in cognitive modeling. This paper describes the symbol grounding problem : How can the semantic interpretation of a formal symbol system be made intrinsic to the system, rather than just parasitic on the meanings in our heads? How can the meanings of the meaningless symbol tokens, manipulated solely on the basis of their shapes, be grounded in anything but other meaningless symbols? The problem is analogous to trying to learn Chinese from a Chinese/Chinese dictionary alone. A candidate solution is sketched: Symbolic representations must be grounded bottom-up in nonsymbolic representations of two kinds: iconic representations, which are analogs of the proximal sensory projections of distal objects and events, and categorical representations, which are learned and innate feature-detectors that pick out the invariant features of object and event categories from their sensory projections. Elementary symbols are the names of these object and event categories, assigned on the basis of their categorical representations. Higher-order symbolic representations, grounded in these elementary symbols, consist of symbol strings describing category membership relations. Connectionism is one natural candidate for the mechanism that learns the invariant features underlying categorical representations, thereby connecting names to the proximal projections of the distal objects they stand for. In this way connectionism can be seen as a complementary component in a hybrid nonsymbolic/symbolic model of the mind, rather than a rival to purely symbolic modeling. Such a hybrid model would not have an autonomous symbolic module, however; the symbolic functions would emerge as an intrinsically dedicated symbol system as a consequence of the bottom-up grounding of categories ' names in their sensory representations. Symbol manipulation would be governed not just by the arbitrary shapes of the symbol tokens, but by the nonarbitrary shapes of the icons and category invariants in which they are grounded
    • My Comments: I see this as analogous to the Chinese Room Argument. It is very complicated when you think about it; How is it that our words (or any arbitrary symbol) become grounded in meaning? The notion of "Reference" has deep philosophical roots going back to Frege and Peirce and as recently as Saul Kripke. Is it possible for any symbol to have meaning or is it necessary for a mind to assign meaning within a communicative context? Is meaning inherently non-grounded? How is it our minds can pick referents associated with a symbol in the real world? The idea is profound because LLMs and really any other statistical model of language such as Latent Drichlet Analysis for topic modeling actually don't understand the tokens they are processing. They are doing statistics on a corpus of text to find regularities for prediction. Fundamentally, the machine cannot understand what these symbols "mean" in a semantic sense. So is there an algorithm we could construct that constructs semantic content? I see this problem as fundamentally unsolved, and unsolvable.
  11. Why Machines will Never Rule the World: Artificial Intelligence without Fear
    • Abstract: The book’s core argument is that an artificial intelligence that could equal or exceed human intelligence—sometimes called artificial general intelligence (AGI)—is for mathematical reasons impossible. It offers two specific reasons for this claim: Human intelligence is a capability of a complex dynamic system—the human brain and central nervous system. Systems of this sort cannot be modelled mathematically in a way that allows them to operate inside a computer. In supporting their claim, the authors, Jobst Landgrebe and Barry Smith, marshal evidence from mathematics, physics, computer science, philosophy, linguistics, and biology, setting up their book around three central questions: What are the essential marks of human intelligence? What is it that researchers try to do when they attempt to achieve "artificial intelligence" (AI)? And why, after more than 50 years, are our most common interactions with AI, for example with our bank’s computers, still so unsatisfactory?
    • My Comments: Barry Smith argues that the mathematics necessary to model human complexity is fundamentally impossible. This is an entire book so I can't really summarize it quickly, but some of their arguments are related to the limitations of stochastic complex systems modeling, language modeling, and social interaction. Barry Smith also has excellent discussions on the notion of Dispositions and Capabilities which are fundamental concepts to understand about autonomous systems.
  12. How Organisms Come to Know the World: Fundamental Limits on Artificial General Intelligence
    • Abstract: Artificial intelligence has made tremendous advances since its inception about seventy years ago. Self-driving cars, programs beating experts at complex games, and smart robots capable of assisting people that need care are just some among the successful examples of machine intelligence. This kind of progress might entice us to envision a society populated by autonomous robots capable of performing the same tasks humans do in the near future. This prospect seems limited only by the power and complexity of current computational devices, which is improving fast. However, there are several significant obstacles on this path. General intelligence involves situational reasoning, taking perspectives, choosing goals, and an ability to deal with ambiguous information. We observe that all of these characteristics are connected to the ability of identifying and exploiting new affordances—opportunities (or impediments) on the path of an agent to achieve its goals. A general example of an affordance is the use of an object in the hands of an agent. We show that it is impossible to predefine a list of such uses. Therefore, they cannot be treated algorithmically. This means that “AI agents” and organisms differ in their ability to leverage new affordances. Only organisms can do this. This implies that true AGI is not achievable in the current algorithmic frame of AI research. It also has important consequences for the theory of evolution. We argue that organismic agency is strictly required for truly open-ended evolution through radical emergence. We discuss the diverse ramifications of this argument, not only in AI research and evolution, but also for the philosophy of science.
  13. The Frame Problem
    • Summary: In the confined world of a robot, surroundings are not static. Many varying forces or actions can cause changes or modifications to it. The problem of forcing a robot to adapt to these changes is the basis of the frame problem in artificial intelligence. Information in the knowledge base and the robot's conclusions combine to form the input for what the robot's subsequent action should be. A good selection from its facts can be made by discarding or ignoring irrelevant facts and ridding of results that could have negative side effects. (Dennett) (Fischler 304-5). A robot must introduce facts that are relevant to a particular moment. That is, a robot will examine its current situation, and then look up the facts that will be beneficial to choosing its subsequent action. The robot should also search for any changeable facts. It then examines these facts to determine if any of them have been changed during a previous examination. There are two basic types of change:
      • Relevant Change: inspect the changes made by an action
      • Irrelevant Change: do not inspect facts that are not related to the task at hand
      • Facts may be examined utilizing two levels:
        • Semantic Level: This level interprets what kind of information is being examined. Solutions should become obvious by the assumptions of how an object should behave. There are believers in a purely semantic approach who believe that correct information can be reached via meaning. However, this hypothesis has yet to be proven.
        • Syntactic Level: This level simply decides in which format the information should be inspected. That is, it forms solutions based on the structure and patterns of facts.
      • When inspecting the facts, various problems can occur:
        • Sometimes an implication can be missed.
        • Considering all facts and all their subsequent side effects is time-consuming.
        • Some facts are unnecessarily examined when they are unneeded.
    • My Comments: Other scholars have addressed this problem; for example John Vervaeke has explored a solution he calls "Relevance Realization". This is a non-trivial topic; it was something that became apparent to me when studying argumentation. How is it we determine whether a counter-argument is relevant to a given topic? There really is no working definition of "relevance", and furthermore, we really don't understand the processes by which sentient agents explore a combinatorial search space and retain the salient features (or what is relevant). I think the solution will inevitable require us to understand defeasible reasoning and heuristics at a deeper level but no one has identified cognitive mechanisms that reduce the search space to something manageable. Consider the current state of LLMs; we need to train neural networks on as much information as possible for them to fine tune their statistical generalization but humans have this ability to make intelligent choices without searching an exponentially large state space. This is something I do not think can be addressed with current methods in AI research.
  14. Core knowledge: Elizabeth S. Spelke
    • Abstract: Complex cognitive skills such as reading and calculation and complex cognitive achievements such as formal science and mathematics may depend on a set of building block systems that emerge early in human ontogeny and phylogeny. These core knowledge systems show characteristic limits of domain and task specificity: Each serves to represent a particular class of entities for a particular set of purposes. By combining representations from these systems, however human cognition may achieve extraordinary flexibility. Studies of cognition in human infants and in nonhuman primates therefore may contribute to understanding unique features of human knowledge. 2020 APA, all rights reserved)
    • My Comments: I think this is a deeply fundamental problem in AI research; Human cognition seems to have innate functionalities that expedite the learning process. The mind is not a blank slate. We are potentially born with cognitive mechanisms that have evolved over a long period of time that allow us to perform specialized functions. This also suggests continuity between us and closely related biological cousins, since many of these functions are shared among related mammals. This suggests, at the very least, that cognitive functions are the product of an evolutionary history. This is similar to Noam Chomsky's idea that we are born with an innate capacity for language. The innateness hypothesis suggests that we are endowed with an innate blueprint rather than learning language inductively. I think this extends to Spelke's ideas as well; there are certain features that we have by default so the statistical models of the mind will fundamentally not have these.
    • 14b. Iris Berent: Language universals at birth
      • Abstract: The evolution of human languages is driven both by primitive biases present in the human sensorimotor systems and by cultural transmission among speakers. However, whether the design of the language faculty is further shaped by linguistic biological biases remains controversial. To address this question, we used near-infrared spectroscopy to examine whether the brain activity of neonates is sensitive to a putatively universal phonological constraint. Across languages, syllables like blif are preferred to both lbif and bdif. Newborn infants (2–5 d old) listening to these three types of syllables displayed distinct hemodynamic responses in temporal-perisylvian areas of their left hemisphere. Moreover, the oxyhemoglobin concentration changes elicited by a syllable type mirrored both the degree of its preference across languages and behavioral linguistic preferences documented experimentally in adulthood. These findings suggest that humans possess early, experience-independent, linguistic biases concerning syllable structure that shape language perception and acquisition.
      • My Comments: This is another example of what I am addressing. Assuming that neural networks can inductively come to understand language and semantics through brute force statistical learning seems contrary to what language researchers know about innate universal structures at birth. I don't think this should be controversial.
  15. Deep Learning: A Critical Appraisal by Gary Marcus
    • Summary: The author lists ten concerns for deep learning
      • Deep learning thus far is data hungry
      • Deep learning thus far is shallow and has limited capacity for transfer
      • Deep learning thus far has no natural way to deal with hierarchical structure
      • Deep learning thus far has struggled with open-ended inference
      • Deep learning thus far is not sufficiently transparent
      • Deep learning thus far has not been well integrated with prior knowledge
      • Deep learning thus far cannot inherently distinguish causation from correlation
      • Deep learning presumes a largely stable world, in ways that may be problematic
      • Deep learning thus far works well as an approximation, but its answers often cannot be fully trusted
      • Deep learning thus far is difficult to engineer with
    • My Comments: A lot of these are repeats but the author sees them as critical impediments to AGI through LLMs. Some of these I see as solvable, such as the ease of engineering (which recently has been awesome with all of the open source tools) and approximation issues. Some of the deeper critiques, such as hierarchical structure, to me seem fundamentally problematic for deep learning because neural networks are essentially statistical models, there will always be room for error when it comes to basic concepts humans seem to innately have (such as being able to recognize the concept "on top of" without error). There are many other semantic relations such as taxonomic relations, meronomies, contrastive relations, congruence relations, chaining relations, ranking, grading, degree relations, synonymy, homonymy, hyponymy, or literally any other relation identified in lexical semantics that require the ability to reason and use conceptual schemas.

Okay so I think that should be enough for now. I will probably add to this post at some point; especially if I think these issues have been sufficiently addressed. My main thesis I suppose is that I kind of see this LLM hype is nothing but a hype. There are many amazing applications for GPT4, some of which I am using, but the notion that we are anywhere close to AGI seems a bit absurd. However, I could be wrong. For all I know, there is some general learning algorithm that's yet to be released to the public. Maybe there is a single argument that can defeat all of these issues listed above. 

Comments

Popular posts from this blog

The Nature of Agnosticism Part 1

The Nature of Agnosticism Part 2

Basic Considerations for Argument and Evidence Evaluation