Neural networks have been an inspiring source of physics problems for generations. The current revolution in artificial intelligence (LeCun et al., 2015; Minaee et al., 2024) has roots in classical work that appeared a bit over sixty years ago in Reviews of Modern Physics (Block, 1962; Block et al., 1962; Rosenblatt, 1961). The explicit effort to build network models grounded in statistical mechanics began in the 1970s (Cooper, 1973; Little, 1974; Little and Shaw, 1975, 1978) and received important stimuli in the early 1980s (Hopfield, 1982, 1984), making connections to then new ideas about spin glasses (Amit, 1989; Me ́zard et al., 1987). In the context of these models one could use statistical mechanics to address not just the dynamics of a network, but the way in which it learns from experience (Levin et al., 1990; Watkin et al., 1993). There is a path from this early work to current efforts of the theoretical physics community to understand the recent successes of machine learning (Carleo et al., 2019; Mehta et al., 2019; Roberts, 2021; Roberts and Yaida, 2022). In §II we provide a brief guide to this rich history, emphasizing points which seem especially relevant for recent developments connecting theory and experiment.
神经网络长期以来一直是物理学问题的一个重要来源。当前的人工智能革命(LeCun等人,2015;Minaee等人,2024)可以追溯到六十多年前发表在《评论现代物理学》上的经典工作(Block,1962;Block等人,1962;Rosenblatt,1961)。在统计力学基础上建立网络模型的明确努力始于1970年代(Cooper,1973;Little,1974;Little和Shaw,1975,1978),并在1980年代初期得到了重要的推动(Hopfield,1982,1984),与当时关于自旋玻璃的新思想建立了联系(Amit,1989;Me ́zard等人,1987)。在这些模型的背景下,可以使用统计力学来解决网络的动态问题,以及它如何从经验中学习的问题(Levin等人,1990;Watkin等人,1993)。从这项早期工作可以看到通向当前理论物理界努力理解机器学习近期成功的路径(Carleo等人,2019;Mehta等人,2019;Roberts,2021;Roberts和Yaida,2022)。在第二节中,我们简要介绍了这一丰富的历史,并强调了与理论和实验之间最近发展特别相关的观点。
In the long history of physicists’ engagement with neural networks, it must be admitted that the search for tractable models often loosened the connection of theory to experiments on real brains. This problem became more urgent as methods became available to monitor, simultaneously, the electrical activity of tens, hundreds, and even thousands of neurons while animals engage in reasonably natural behaviors (§III). If we imagine a statistical mechanics for neural networks, these tools give us access to something like a Monte Carlo simulation of the microscopic degrees of freedom. This explosion of data calls out for new methods of analysis, and creates new opportunities for theory/experiment interaction.
在物理学家与神经网络长期互动的历史中,必须承认,为了寻找可处理的模型,理论与真实大脑实验之间的联系往往变得松散。随着方法的发展,可以同时监测数十、数百甚至数千个神经元的电活动,同时动物参与相当自然的行为,这一问题变得更加紧迫(第三节)。如果我们想象一个神经网络的统计力学,这些工具使我们能够访问类似于微观自由度的蒙特卡罗模拟。这种数据的爆炸式增长呼唤新的分析方法,并为理论/实验互动创造了新的机会。
Roughly twenty years ago it was suggested that maximum entropy methods could provide a very direct bridge from the new data on large numbers of neurons to explicit statistical physics models for these networks (§IV). In the simplest version of this approach, measurements on the mean activity and pairwise correlations among neurons result in an Ising spin glass model for patterns of activity in the network. Importantly, all the couplings in the Ising model are determined by the measured correlations, and one can proceed to make parameter free predictions for higher order properties of the network. The surprise was that these predictions, at least in some cases, are extraordinarily successful (§§IV.C and V).
大约二十年前,有人建议最大熵方法可以为大量神经元的新数据与这些网络的明确统计物理模型之间提供一个非常直接的桥梁(第四节)。在这种方法的最简单版本中,对神经元的平均活动和成对相关性的测量导致了网络中活动模式的 Ising 自旋玻璃模型。重要的是,Ising 模型中的所有耦合都是由测量的相关性决定的,并且可以继续对网络的更高阶属性进行无参数预测。令人惊讶的是,这些预测,至少在某些情况下,非常成功(第四节C和第五节)。
The phenomenological success of the maximum entropy approach raised several questions. Should we expect this success to generalize or was there something special about the first examples? Does success tell us something about the underlying network? If the models are so accurate, perhaps we should take them seriously as statistical physics problems: where are real networks in the phase diagram of possible networks (§VI)? Can these models be given different interpretations, e.g. in terms of a smaller number of “latent variables” that are encoded by the network?
最大熵方法的现象学成功引发了几个问题。我们是否应该期望这种成功具有普遍性,还是最初的例子中有一些特殊之处?成功是否告诉我们一些关于基础网络的信息?如果模型如此准确,也许我们应该认真对待它们作为统计物理问题:真实网络在可能网络的相图中的位置在哪里(第六节)?这些模型能否有不同的解释,例如以网络编码的较少数量的“潜变量”为例?
The relatively simple statistical physics models constructed via maximum entropy are in some cases are more successful than complex models motivated by biological details. Why should simple models work? In condensed matter physics we often describe macroscopic, emergent phenomena using models that are much simpler than the underlying microscopic mechanisms. This works not because we are lucky but because the renormalization group (RG) tells us that in many cases there is only a small number of relevant operators, so that models simplify as we restrict our attention to longer length scales. Inspired by these ideas, there have been efforts to explicitly coarse–grain the patterns of activity in very large networks (§VII). The very first such efforts revealed surprisingly precise scaling behaviors, in some cases with exponents that are reproducible in the second decimal place. These initial results now have been confirmed in other systems.
通过最大熵构建的相对简单的统计物理模型在某些情况下比由生物细节驱动的复杂模型更成功。为什么简单模型会有效?在凝聚态物理中,我们经常使用比基础微观机制简单得多的模型来描述宏观的、涌现的现象。这并不是因为我们幸运,而是因为重整化群(RG)告诉我们,在许多情况下,只有少数相关算符,因此当我们将注意力限制在更长的长度尺度时,模型会简化。受这些思想的启发,人们已经努力明确地对非常大网络中的活动模式进行粗粒化(第七节)。这些最初的努力揭示了令人惊讶的精确缩放行为,在某些情况下,指数可以在小数点后第二位重复出现。这些初步结果现在已经在其他系统中得到证实。
As with maximum entropy methods, the success of coarse–graining in uncovering interesting collective behaviors of real neurons raises several questions. The observation of scaling suggests that the dynamics of these networks is controlled by some nontrivial fixed point of the RG. But are these phenomenological analyses sufficient to identify fixed point behaviors in cases that we understand? Could the observed scaling behaviors emerge in some other way? Are these behaviors universal?
与最大熵方法一样,粗粒化在揭示真实神经元有趣的集体行为方面的成功引发了几个问题。缩放的观察表明,这些网络的动态受 RG 的某个非平凡不动点控制。但是,这些现象学分析是否足以在我们理解的情况下识别不动点行为?观察到的缩放行为是否可能以其他方式出现?这些行为是普遍存在的吗?
When physicists first wrote down statistical mechanics models for neural networks, it was not clear if these models should be taken as metaphors or if they should be taken seriously as theories of real brains.1 If forced to choose, most people would have voted for metaphors, since real brains surely are too complicated to be captured in the physicists’ drive for simplification. While it emphatically is too soon to claim that we have a theory of the brain, progress that we review here makes clear that we can have the precise quantitative connections between theory and experiment that we have in the rest of physics. As experiments on the physics of living systems improve, we should ask more of our theories.
当物理学家第一次写下神经网络的统计力学模型时,还不清楚这些模型是否应该被视为隐喻,还是应该被认真对待作为真实大脑的理论。如果被迫选择,大多数人会投票给隐喻,因为真实的大脑肯定太复杂,无法捕捉到物理学家对简化的追求。虽然现在断言我们有了大脑的理论还为时过早,但我们在这里回顾的进展清楚地表明,我们可以在理论和实验之间建立精确的定量联系,就像我们在物理学的其他领域一样。随着对生命系统物理学实验的改进,我们应该对我们的理论提出更多要求。
Finally, in case thinking about the brain is not sufficient motivation, networks of neurons provide a prototype of living systems with many degrees of freedom (Appendix A). Even a single protein molecule typically is composed of more than one hundred amino acids, and the structures and functions of these molecules emerge from interactions among these many more microscopic elements. At the next scale up, membrane patches and protein droplets self–organize in ways that most likely reflect phase separation. The identities and internal states of cells are determined by the expression levels of large numbers of genes that form an interacting regulatory network. In developing embryos and tissues more generally the movements of individual cells organize into macroscopic flows. In populations of bacteria, swarms of insects, schools of fish, and flocks of birds we see collective movements and decision making. In all these examples—and, of course, in networks of neuronswhat we recognize as the functional behavior of living systems is a macroscopic behavior that emerges from interactions of many components on a smaller scale.
最后,以防思考大脑还不足以激发动力,神经元网络提供了一个具有许多自由度的生命系统的原型(附录A)。即使是单个蛋白质分子通常也由一百多个氨基酸组成,这些分子的结构和功能是由这些更多微观元素之间的相互作用产生的。在下一个尺度上,膜片和蛋白质液滴以很可能反映相分离的方式自组织。细胞的身份和内部状态由大量基因的表达水平决定,这些基因形成了一个相互作用的调控网络。在发育中的胚胎和组织中,个体细胞的运动组织成宏观流动。在细菌群体、昆虫群、鱼群和鸟群中,我们看到了集体运动和决策。在所有这些例子中——当然也包括神经元网络——我们所认识到的作为生命系统功能行为的是一种宏观行为,它是由较小尺度上许多组件的相互作用产生的。
In the inanimate world, statistical mechanics provides a powerful and predictive framework within which to understand emergent phenomena. It is an old dream of the physics community that we could have a statistical mechanics of emergent phenomena in the living world as well. We encourage the reader to think of what we review here as progress toward realizing this dream.
在无生命的世界中,统计力学提供了一个强大且具有预测性的框架,用于理解涌现现象。物理学界有一个古老的梦想,即我们也可以拥有一个关于生命世界中涌现现象的统计力学。我们鼓励读者将我们在这里回顾的内容视为实现这一梦想的进展。