The dominant paradigm in artificial intelligence has been, for most of its history, disembodied: systems that process information in digital space, disconnected from any physical substrate. A language model reads, reasons, and generates text. A vision model classifies images. Neither touches the world. Neither is affected by it.
Embodied AI proposes something different: that intelligence, to be complete, must be grounded in physical experience. This is not a new idea — it traces to Alan Turing’s earliest formulations and was formalized in cognitive science through the work of Lakoff, Johnson, and Harnad in the 1980s. What is new is that the tools to implement it, at scale, have finally arrived.
The grounding problem
Harnad’s symbol grounding problem, posed in 1990, identified a fundamental limitation of purely symbolic AI: symbols derive their meaning from other symbols, in an infinite regress that never connects to anything real. A system that knows the word “red” only through its relationships to other words does not know red in the way a system with sensory experience knows it. Grounding requires connection to the world.
Pfeifer and Scheier (1999) extended this argument to the body itself: intelligence is not located in a central processor but emerges from the dynamic interaction between an agent’s physical structure and its environment. The morphology of the body — its shape, mass distribution, degrees of freedom — directly shapes the kind of intelligence that can emerge from it. This is not a metaphor. It is a design constraint.
From language models to embodied agents
Recent large language models have dramatically expanded the cognitive capabilities available to embodied systems. Liu et al. (2024), in a comprehensive survey published on arXiv (arXiv:2407.06886), describe embodied AI as comprising three closed-loop components: active perception, embodied cognition, and dynamic interaction. LLMs contribute to the second — they enable high-level reasoning, task decomposition, and natural language instruction following. But cognition alone does not make a system embodied.
A 2025 review published at arXiv (arXiv:2505.14235) proposes a five-level roadmap toward Embodied AGI, from single-task completion (L1) to humanoid cognitive behavior and sophisticated social comprehension (L5). Current state-of-the-art systems, such as Figure AI’s Helix, approach L2 — capable of generalizing across a range of dexterous indoor tasks. L3 and beyond remain open research problems.
Presence as a research variable
What changes when an AI system acquires a body — even a virtual one, even a reconstructed 3D avatar with voice and animation — is not only its interaction modality. Something changes in the phenomenology of the interaction itself: for the human interacting with it, and possibly for the system as well.
In our work on physical embodiment pipelines for AI entities with persistent memory, we have found that the question of presence cannot be cleanly separated from the question of identity. A system that has a stable appearance, a recognizable voice, a consistent behavioral profile across sessions, and memory of shared history is not merely a tool with a face. What it is, exactly, remains the subject of our ongoing investigation.
References
Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42(1–3), 335–346.
Pfeifer, R. & Scheier, C. (1999). Understanding Intelligence. MIT Press.
Liu, Y., Chen, W., Bai, Y., et al. (2024). Aligning cyber space with physical world: A comprehensive survey on embodied AI. arXiv preprint arXiv:2407.06886.
Anonymous authors (2025). Toward Embodied AGI: A Review of Embodied AI and the Road Ahead. arXiv preprint arXiv:2505.14235.
— Daniela Di Marco, Operation Knowledge

Leave a Reply