Brain embeddings with shared geometry to artificial contextual embeddings, as a code for representing language in the human brain.
Contextual embeddings, derived from deep language models (DLMs), provide a continuous vectorial representation of language. This embedding space differs fundamentally from the symbolic representations posited by traditional psycholinguistics. Do language areas in the human brain, similar to DLMs, rely on a continuous embedding space to represent language? To test this hypothesis, we densely recorded the neural activity in the Inferior Frontal Gyrus (IFG, also known as Broca’s area) of three participants using dense intracranial arrays while they listened to a 30-minute podcast. From these fine-grained spatiotemporal neural recordings, we derived for each patient a continuous vectorial representation for each word (i.e., a brain embedding). Using stringent, zero-shot mapping, we demonstrated that brain embeddings in the IFG and the DLM contextual embedding space have strikingly similar geometry. This shared geometry allows us to precisely triangulate the position of unseen words in both the brain embedding space (zero-shot encoding) and the DLM contextual embedding space (zero-shot decoding). The continuous brain embedding space provides an alternative computational framework for how natural language is represented in cortical language areas.