Introduction
1.1 Inspiration from Neuroscience
Neurons
McCulloch-Pitts Neuron Model
$$ n_{i}(t+1) = \Theta\left[\sum_{j}w_{ij}n_{j}(t)-\mu_{i}\right],\quad \Theta(x) = \begin{cases} 1, & x \geq 0 \\ 0, & \text{otherwise} \end{cases} $$
$w_{ij}$: the strength of the synapse connecting neuron $j$ to neuron $i$.
graded response: respond to their input in a continuous way
asynchronous updating(异步更新)
$$ n_{i}:= g\left( \sum_{j}w_{ij}n_{j}-\mu_{i} \right) $$
$n_{i}$: the state or activation of unit $i$.
$g(x)$: (nonlinear) activation function, gain function, transfer function, or squashing function
neuron $\rightarrow$ unit(processing element, neurodes), synapse(突触) $\rightarrow$ weight(权重).
Parallel Processing
数量的巨大以允许错误与噪声.
1.2 History
associative content-addressable memory(联想内容寻址记忆), CAM
The Hopfield Model
2.1 The Associative Memory Problem
Store a set of $p$ patterns $\xi_{i}^{\mu}$? in such a way that when presented with a new pattern $\zeta_{i}$, the network responds by producing whichever one o f the stored patterns most closely resembles $\zeta_{i}$.
存储 $p$ 个图案 $\xi_{i}^{\mu}$ 为一组,这样当出现一个新图案 $\zeta_{i}$ 时,网络就会做出反应,生成存储图案中与 $\zeta_{i}$ 最相似的图案。其中 $\mu=1,2,\cdots,p$ 是图案的索引,$i=1,2,\cdots N$ 是神经元的索引.
Hamming distance:
$$ \sum_{i}[\xi_{i}^{\mu}(1-\zeta_{i}) + (1-\xi_{i}^{\mu})\zeta_{i}] $$
如何构建 $w_{ij}$, 使得输入测试图案 $n_{i}=\zeta_{i}$ 时, 输出记忆图案 $n_{i}=\xi_{i}^{\mu_{0}}$, 其中 $\mu_{0}$ 使得 Hamming distance($\zeta_{i},\xi_{i}^{\mu}$) 最小.
记忆图案是相空间中 $\mu$ 个吸引子(attractor).
2.2 The Model
$n_{i}=0,1$ 记录单元的状态, 引入 $S_{i} = 2n_{i}-1 = \pm 1$ 表示是否激活.
令激活函数为符号函数:
$$ S_{i} := \text{sgn}\left(\sum_{j}w_{ij}S_{j}-\theta_{i}\right),\quad \text{sgn}(x) = \begin{cases} 1, & x \geq 0 \\ -1, & x < 0 \end{cases} $$
$\theta_{i}$: threshold(阈值), 其中 $\theta_{i} = 2\mu_{i}-\sum_{j}w_{ij}$. 为简化可令 $\theta_{i}=0$, 那么
$$ S_{i} := \text{sgn}\left(\sum_{j}w_{ij}S_{j}\right) $$
异步更新(asynchronous)/同步更新(synchronous).
- 随机选取一个节点 $i$, 令其更新
- 令各节点以某概率判定是否更新.
两种方法是等价的.
异步更新的终点是没有任何 $S_{i}$ 可变化. 这要求 $w_{ij}$: 1. 图案可被记忆为稳定点; 2. 抗微扰.
One Pattern
只记忆一个图案 $\{\xi_{i}\}$, 则收敛条件为
$$ \text{sgn}\left(\sum_{j}^{N}w_{ij}\xi_{j}\right) = \xi_{i},\quad \forall i $$
取
$$ w_{ij} = \frac{1}{N}\xi_{i}\xi_{j} $$
因为 $\xi_{j}^{2} = 1$, 可以轻松验证该记忆方法成立.
因为是符号函数, 因此即使初始态 $h_{i}=\sum_{j}w_{ij}S_{j}$ 偏差于 $\xi_{i}$, 只要符号不变, 仍然会收敛到 $\xi_{i}$. 这就是 attractor(吸引子).
在这种激活函数下, 吸引子成对存在.
Many Patterns
Hebb’s rule: 对于存储总共 $p$ 个图案, 使用 $\mu$ 标记其中一种, 延拓定义
$$ w_{ij} = \frac{1}{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu} $$
对第 $\nu$ 个图案的稳定性条件: $\text{sgn}(h_{i}^{\nu}) = \xi_{i}^{\nu}$
令输入
$$ h_{i}^{\nu} \equiv \sum_{j}w_{ij}\xi_{j}^{\nu} = \frac{1}{N}\sum_{j}\sum_{\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}\xi_{j}^{\nu} = \xi_{i}^{\nu} + \frac{1}{N}\sum_{j}\sum_{\mu \neq \nu}\xi_{i}^{\mu}\xi_{j}^{\mu}\xi_{j}^{\nu} $$
第二项(被称为 crosstalk)足够小(<1)甚至为 $0$ 时, 稳定性条件满足.
Storage Capacity
令第二项乘上 $-\xi_{i}^{\nu}$, 从而引入
$$ C_{i}^{\nu} \equiv -\xi_{i}^{\nu}\frac{1}{N}\sum_{j}\sum_{\mu \neq \nu}\xi_{i}^{\mu}\xi_{j}^{\mu}\xi_{j}^{\nu} $$
那么
$$ h_{i}^{\nu} = \xi_{i}^{\nu} - \frac{C_{i}^{\nu}}{\xi_{i}^{\nu}} = \xi_{i}^{\nu}(1 - C_{i}^{\nu}) $$
因为 $(\xi_{i}^{\nu})^{2}=1$. 那么当 $C_{i}^{\nu}>1$ 时, $h_{i}^{\nu}$ 将与 $\xi_{i}^{\nu}$ 异号.
令存储图案中 $\xi_{j}^{\mu} = \pm 1$ 等概率独立分布. 那么引入 $X_{j\mu}=(\xi_{i}^{\nu}\xi_{i}^{\mu})\xi_{j}^{\nu}\xi_{j}^{\mu}$, 且有 $E[X_{j\mu}]=0$, $\text{Var}(X_{j\mu})=1$.
因此
$$ \begin{aligned} E[C_{i}^{\nu}] &=0 \\ \text{Var}(C_{i}^{\nu}) &= \frac{1}{N^{2}}\sum_{j}\sum_{\mu \neq \nu}\text{Var}[X_{j\mu}] = \frac{1}{N^{2}}\sum_{j}\sum_{\mu \neq \nu}1 = \frac{1}{N^{2}}N(p-1) \end{aligned} $$
那么近似 $C_{i}^{\nu}\approx \mathcal{N}\left(0,\frac{p}{N}\right)$.
规定 $P_{\text{error}} = \text{Prob}(C_{i}^{\nu}>1)$ 为出错概率, 那么 $P_{\text{error}} <\epsilon$ 时的 $p = p_{\text{max}}$ 被称作 storage capacity(存储容量).
$$ P_{\text{error}} = \frac{1}{\sqrt{2\pi}\sigma}\int_{1}^{\infty}e^{-x^{2}}/2\sigma^{2}\mathrm{d}x = \frac{1}{2}[1-\text{erf}(1/\sqrt{2\sigma^{2}})] = \frac{1}{2}[1-\text{erf}(\sqrt{N/2p})] $$
error function: $$ \text{erf}(x) = \frac{2}{\sqrt{\pi}}\int_{0}^{x}e^{-u^{2}}\mathrm{d}u $$
利用极限
$$ \lim_{x\rightarrow\infty}1-\text{erf}(x) \rightarrow e^{-x^{2}}/\sqrt{\pi}x $$
近似
$$ \log{P_{\text{error}}} \approx -\log{2} - N/2p - \frac{1}{2}\log{\pi} - \frac{1}{2}\log{(N/2p)} $$
错误率条件 $P_{\text{error}}<0.01/N$ 转化为 $N/2p>log{N}$, 得到 $p_{\text{max}} = N/2\log{N}$.
The Energy Function
定义能量函数为
$$ H = -\frac{1}{2}\sum_{i}\sum_{j}w_{ij}S_{i}S_{j} $$
Lyapunov function.
对称阵($w_{ij}=w_{ji}$) 时能量函数存在. 此时不妨写作
$$ H = C - \sum_{(ij)}w_{ij}S_{i}S_{j} $$
尝试更新一次节点 $i$:
$$ S_{i}^{\prime} = \text{sgn}\left(\sum_{j}w_{ij}S_{j}\right) $$
$S_{i}^{\prime} = S_{i}$ 能量不变; $S_{i}^{\prime} = -S_{i}$, 能量变换为
$$ H^{\prime} - H = -\sum_{j\neq i}w_{ij}S_{i}^{\prime}S_{j} + \sum_{j\neq i}w_{ij}S_{i}S_{j} = 2S_{i}\sum_{j\neq i}w_{ij}S_{j} = 2S_{i}\sum_{j}w_{ij}S_{j} - 2w_{ii} $$
其中 $w_{ii} = p/N$, 所以 $\Delta H<0$.
通过设定 $w_{ii}=0$, 使得 $S_{i}=\pm 1$ 不会同时为稳定态.
Starting from an Energy Function
单图案.
$$ H = -\frac{1}{2N}\left(\sum_{i}S_{i}\xi_{i}\right)^{2} $$
多图案.
$$ \begin{aligned} H &= - \frac{1}{2N}\sum_{\mu=1}^{p}\left(\sum_{i}S_{i}\xi_{i}^{\mu}\right)^{2} \\ &= -\frac{1}{2N}\sum_{\mu=1}^{p}\left(\sum_{i}S_{i}\xi_{i}^{\mu}\right)\left(\sum_{i}S_{j}\xi_{j}^{\mu}\right) = -\frac{1}{2}\sum_{ij}\left(\frac{1}{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu}\right)S_{i}S_{j} \end{aligned} $$
采用 Hebb’s rule 时即为能量函数的定义. 这就是通过最小能量观点寻找 $w_{ij}$ 的思路.
Spurious States(杂散态)
Hebb’s rule 适用的是 retrieval states(记忆态).
Spurious States 被定义为奇数个 retrieval states 的线性组合. 如
$$ \xi_{i}^{\text{mix}} = \text{sgn}\left(\pm \xi_{i}^{\mu_{1}} \pm \xi_{i}^{\mu_{2}} \pm \xi_{i}^{\mu_{3}}\right) $$
以 $(+,+,+)$ 为例, 输入
$$ h_{i}^{\text{mix}} = \frac{1}{N}\sum_{j\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}\xi_{j}^{\text{mix}} = \frac{1}{2}\xi_{i}^{\mu_{1}} + \frac{1}{2}\xi_{i}^{\mu_{2}} + \frac{1}{2}\xi_{i}^{\mu_{3}} + \text{cross-terms} $$
大 $p$ 时出现 spin glass states. 这些同样是能量函数的局部极小值点, 需要后续通过各类方法判别并排除.
2.3 Statistical Mechanics of Magnetic Systems
磁场
$$ h_{i} = \sum_{j}w_{ij}S_{j} + h^{\text{ext}},\quad w_{ij} = w_{ji} $$
能量
$$ H = -\frac{1}{2}\sum_{ij}w_{ij}S_{i}S_{j} - h^{\text{ext}}\sum_{i}S_{i} $$
外部磁场 $\leftrightarrow$ 阈值.
Finite Temperature Dynamics
温度 热涨落 使得自旋翻转.
Glauber dynamics:
$$ S_{i} := \begin{cases} +1 & \text{with probability } g(h_{i}) \\ -1 & \text{with probability } 1 - g(h_{i}) \end{cases},\quad g(h) = f_{\beta}(h)\equiv \frac{1}{1+e^{-2\beta h}},\quad \beta = \frac{1}{k_{B}T} $$
注意到性质 $1-f_{\beta}(h) = f_{\beta}(-h)$.
则写作对称形式
$$ \text{Prob}(S_{i}=\pm 1) = f_{\beta}(\pm h_{i}) = \frac{1}{1+e^{\mp 2\beta h_{i}}} $$
$T\rightarrow 0$($\beta\rightarrow\infty$) 时, $f_{\beta}(h)$ 退化为阶跃函数 $\Theta(h)$, 则 $S_{i}:=\text{sgn}(h_{i})$;
$T\rightarrow \infty$($\beta\rightarrow 0$) 时, $f_{\beta}(h)\rightarrow 1/2$, 则 $S_{i}$ 以 $1/2$ 概率取 $\pm 1$, 即完全随机.
A Single Spin in Equilibrium
$$ \langle S\rangle = \text{Prob}(+1)\cdot (+1) + \text{Prob}(-1)\cdot (-1) = \frac{1}{1+e^{-2\beta h}} - \frac{1}{1+e^{2\beta h}} = \tanh(\beta h) $$
hyperbolic tangent function: $$ \tanh(x) = \frac{e^{x}-e^{-x}}{e^{x}+e^{-x}} $$
![]()
Mean Field Theory
磁场平均值和自旋平均值:
$$ \begin{aligned} \langle h_{i}\rangle &= \sum_{j}w_{ij}\langle S_{j}\rangle + h^{\text{ext}}\\ \langle S_{i}\rangle &= \tanh\left(\beta\langle h_{i}\rangle\right) = \tanh\left(\beta\sum_{j}w_{ij}\langle S_{j}\rangle + \beta h^{\text{ext}}\right) \end{aligned} $$
The Ferromagnet(铁磁体)
铁磁态下 $w_{ij}>0$, $\langle S_{i}\rangle = \langle S\rangle$.
$$ w_{ij} = \frac{J}{N} $$
$$ \langle S\rangle = \tanh(\beta J\langle S\rangle) $$
![]()
高温一个顺磁解, 低温多两个铁磁解(自发磁化 spontaneous magnetization).
2.4 Stochastic Networks
令神经元的激活/静息概率为
$$ \text{Prob}(S_{i} = \pm 1) = f_{\beta}(\pm h_{i}) = \frac{1}{1+e^{\mp 2\beta h_{i}}} $$
$f_{\beta}(h)$: logistic function(逻辑函数).
引入等效温度 pseudo-temperature $T$, 即 $\beta\equiv \frac{1}{T}$.
低温极限下 sigmoid 函数退化为阶跃函数, 概率函数退化为 McCulloch-Pitts rule.
对称阵 $w_{ij}$ 使得系综平均可以达到且与时间无关.
Mean Field Theory
$p\ll N$
平均场方程:
$$ \langle S_{i}\rangle = \tanh{\left(\frac{\beta}{N}\sum_{j\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}\langle S_{j}\rangle\right)}. $$
假定 $\langle S_{i}\rangle = m\xi_{i}^{\nu}$:
$$ m\xi_{i}^{\nu} = \tanh{\left(\frac{\beta}{N}\sum_{j\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}m\xi_{j}^{\nu}\right)}\Rightarrow m\xi_{i}^{\nu} = \tanh{(\beta m\xi_{i}^{\nu})}\Rightarrow m = \tanh{(\beta m)} $$
记忆态在 $T<T_{c} = 1$ 时稳定.
$$ m = \frac{\langle S_{i}\rangle}{\xi_{i}^{\nu}} = \text{Prob}(\text{bit }i\text{ is correct}) - \text{Prob}(\text{bit }i\text{ is incorrect}) $$
$$ \langle N_{\text{correct}}\rangle = \frac{1}{2}N(1+m) $$
这不是完美装置. 反转态和混合态都会存在.
2.5 Capacity of the Stochastic Network
考察 $p\sim N$.
定义 load parameter(负载参数): $\alpha = \frac{p}{N}$.
保留 crosstalk 项, 即
$$ m_{\nu} = \frac{1}{N}\sum_{i}\xi_{i}^{\nu}\langle S_{i}\rangle $$
$$ r = \frac{1}{\alpha}\sum_{\nu\neq 1}m_{\nu}^{2} $$
是系统构型的均方重叠.
Capacity Calculation
平均场方程:
$$ \begin{aligned} m_{\nu} &= \frac{1}{N}\sum_{i}\xi_{i}^{\nu}\tanh{\left(\beta\sum_{\mu}\xi_{i}^{\mu}m_{\mu}\right)} = \frac{1}{N}\sum_{i}\xi_{i}^{\nu}\xi_{i}^{1}\tanh{\left[ \beta\left( m_{1} + \xi_{i}^{\nu}\xi_{i}^{1}m_{\nu} + \sum_{\mu\neq 1,\nu}\xi_{i}^{\mu}\xi_{i}^{1}m_{\mu} \right) \right]}\\ &= \frac{1}{N}\sum_{i}\xi_{i}^{\nu}\xi_{i}^{1}\tanh{\left[ \beta\left( m_{1} + \sum_{\mu\neq 1,\nu}\xi_{i}^{\mu}\xi_{i}^{1}m_{\mu} \right) \right]} + \frac{\beta}{N}\sum_{i}\left\{ 1-\tanh^{2}\left[ \beta\left( m_{1} + \sum_{\mu\neq 1,\nu}\xi_{i}^{\mu}\xi_{i}^{1}m_{\mu} \right) \right] \right\}m_{\nu} \end{aligned} $$
因为第二项极小: $$ \frac{\mathrm{d}}{\mathrm{d}x}\tanh(x) = 1-\tanh^{2}(x) $$
中心极限定理: $N^{-1}\sum_{i}$ 视为 Gaussian 噪声 $\sum_{\mu\neq 1,\nu}\xi_{i}^{\mu}\xi_{i}^{1}m_{\mu}$, 方差为 $\alpha r$.