Introduction

1.1 Inspiration from Neuroscience

Neurons

McCulloch-Pitts Neuron Model

$$ n_{i}(t+1) = \Theta\left[\sum_{j}w_{ij}n_{j}(t)-\mu_{i}\right],\quad \Theta(x) = \begin{cases} 1, & x \geq 0 \\ 0, & \text{otherwise} \end{cases} $$

$w_{ij}$: the strength of the synapse connecting neuron $j$ to neuron $i$.

graded response: respond to their input in a continuous way

asynchronous updating(异步更新)

$$ n_{i}:= g\left( \sum_{j}w_{ij}n_{j}-\mu_{i} \right) $$

$n_{i}$: the state or activation of unit $i$.

$g(x)$: (nonlinear) activation function, gain function, transfer function, or squashing function

neuron $\rightarrow$ unit(processing element, neurodes), synapse(突触) $\rightarrow$ weight(权重).

Parallel Processing

数量的巨大以允许错误与噪声.

1.2 History

associative content-addressable memory(联想内容寻址记忆), CAM

The Hopfield Model

2.1 The Associative Memory Problem

Store a set of $p$ patterns $\xi_{i}^{\mu}$? in such a way that when presented with a new pattern $\zeta_{i}$, the network responds by producing whichever one o f the stored patterns most closely resembles $\zeta_{i}$.

存储 $p$ 个图案 $\xi_{i}^{\mu}$ 为一组,这样当出现一个新图案 $\zeta_{i}$ 时,网络就会做出反应,生成存储图案中与 $\zeta_{i}$ 最相似的图案。其中 $\mu=1,2,\cdots,p$ 是图案的索引,$i=1,2,\cdots N$ 是神经元的索引.

Hamming distance:

$$ \sum_{i}[\xi_{i}^{\mu}(1-\zeta_{i}) + (1-\xi_{i}^{\mu})\zeta_{i}] $$

如何构建 $w_{ij}$, 使得输入测试图案 $n_{i}=\zeta_{i}$ 时, 输出记忆图案 $n_{i}=\xi_{i}^{\mu_{0}}$, 其中 $\mu_{0}$ 使得 Hamming distance($\zeta_{i},\xi_{i}^{\mu}$) 最小.

记忆图案是相空间中 $\mu$ 个吸引子(attractor).

2.2 The Model

$n_{i}=0,1$ 记录单元的状态, 引入 $S_{i} = 2n_{i}-1 = \pm 1$ 表示是否激活.

令激活函数为符号函数:

$$ S_{i} := \text{sgn}\left(\sum_{j}w_{ij}S_{j}-\theta_{i}\right),\quad \text{sgn}(x) = \begin{cases} 1, & x \geq 0 \\ -1, & x < 0 \end{cases} $$

$\theta_{i}$: threshold(阈值), 其中 $\theta_{i} = 2\mu_{i}-\sum_{j}w_{ij}$. 为简化可令 $\theta_{i}=0$, 那么

$$ S_{i} := \text{sgn}\left(\sum_{j}w_{ij}S_{j}\right) $$

异步更新(asynchronous)/同步更新(synchronous).

  1. 随机选取一个节点 $i$, 令其更新
  2. 令各节点以某概率判定是否更新.

两种方法是等价的.

异步更新的终点是没有任何 $S_{i}$ 可变化. 这要求 $w_{ij}$: 1. 图案可被记忆为稳定点; 2. 抗微扰.

One Pattern

只记忆一个图案 $\{\xi_{i}\}$, 则收敛条件为

$$ \text{sgn}\left(\sum_{j}^{N}w_{ij}\xi_{j}\right) = \xi_{i},\quad \forall i $$

$$ w_{ij} = \frac{1}{N}\xi_{i}\xi_{j} $$

因为 $\xi_{j}^{2} = 1$, 可以轻松验证该记忆方法成立.

因为是符号函数, 因此即使初始态 $h_{i}=\sum_{j}w_{ij}S_{j}$ 偏差于 $\xi_{i}$, 只要符号不变, 仍然会收敛到 $\xi_{i}$. 这就是 attractor(吸引子).

在这种激活函数下, 吸引子成对存在.

Many Patterns

Hebb’s rule: 对于存储总共 $p$ 个图案, 使用 $\mu$ 标记其中一种, 延拓定义

$$ w_{ij} = \frac{1}{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu} $$

对第 $\nu$ 个图案的稳定性条件: $\text{sgn}(h_{i}^{\nu}) = \xi_{i}^{\nu}$

令输入

$$ h_{i}^{\nu} \equiv \sum_{j}w_{ij}\xi_{j}^{\nu} = \frac{1}{N}\sum_{j}\sum_{\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}\xi_{j}^{\nu} = \xi_{i}^{\nu} + \frac{1}{N}\sum_{j}\sum_{\mu \neq \nu}\xi_{i}^{\mu}\xi_{j}^{\mu}\xi_{j}^{\nu} $$

第二项(被称为 crosstalk)足够小(<1)甚至为 $0$ 时, 稳定性条件满足.

Storage Capacity

令第二项乘上 $-\xi_{i}^{\nu}$, 从而引入

$$ C_{i}^{\nu} \equiv -\xi_{i}^{\nu}\frac{1}{N}\sum_{j}\sum_{\mu \neq \nu}\xi_{i}^{\mu}\xi_{j}^{\mu}\xi_{j}^{\nu} $$

那么

$$ h_{i}^{\nu} = \xi_{i}^{\nu} - \frac{C_{i}^{\nu}}{\xi_{i}^{\nu}} = \xi_{i}^{\nu}(1 - C_{i}^{\nu}) $$

因为 $(\xi_{i}^{\nu})^{2}=1$. 那么当 $C_{i}^{\nu}>1$ 时, $h_{i}^{\nu}$ 将与 $\xi_{i}^{\nu}$ 异号.

令存储图案中 $\xi_{j}^{\mu} = \pm 1$ 等概率独立分布. 那么引入 $X_{j\mu}=(\xi_{i}^{\nu}\xi_{i}^{\mu})\xi_{j}^{\nu}\xi_{j}^{\mu}$, 且有 $E[X_{j\mu}]=0$, $\text{Var}(X_{j\mu})=1$.

因此

$$ \begin{aligned} E[C_{i}^{\nu}] &=0 \\ \text{Var}(C_{i}^{\nu}) &= \frac{1}{N^{2}}\sum_{j}\sum_{\mu \neq \nu}\text{Var}[X_{j\mu}] = \frac{1}{N^{2}}\sum_{j}\sum_{\mu \neq \nu}1 = \frac{1}{N^{2}}N(p-1) \end{aligned} $$

那么近似 $C_{i}^{\nu}\approx \mathcal{N}\left(0,\frac{p}{N}\right)$.

规定 $P_{\text{error}} = \text{Prob}(C_{i}^{\nu}>1)$ 为出错概率, 那么 $P_{\text{error}} <\epsilon$ 时的 $p = p_{\text{max}}$ 被称作 storage capacity(存储容量).

$$ P_{\text{error}} = \frac{1}{\sqrt{2\pi}\sigma}\int_{1}^{\infty}e^{-x^{2}}/2\sigma^{2}\mathrm{d}x = \frac{1}{2}[1-\text{erf}(1/\sqrt{2\sigma^{2}})] = \frac{1}{2}[1-\text{erf}(\sqrt{N/2p})] $$

error function: $$ \text{erf}(x) = \frac{2}{\sqrt{\pi}}\int_{0}^{x}e^{-u^{2}}\mathrm{d}u $$

利用极限

$$ \lim_{x\rightarrow\infty}1-\text{erf}(x) \rightarrow e^{-x^{2}}/\sqrt{\pi}x $$

近似

$$ \log{P_{\text{error}}} \approx -\log{2} - N/2p - \frac{1}{2}\log{\pi} - \frac{1}{2}\log{(N/2p)} $$

错误率条件 $P_{\text{error}}<0.01/N$ 转化为 $N/2p>log{N}$, 得到 $p_{\text{max}} = N/2\log{N}$.

The Energy Function

定义能量函数为

$$ H = -\frac{1}{2}\sum_{i}\sum_{j}w_{ij}S_{i}S_{j} $$

Lyapunov function.

对称阵($w_{ij}=w_{ji}$) 时能量函数存在. 此时不妨写作

$$ H = C - \sum_{(ij)}w_{ij}S_{i}S_{j} $$

尝试更新一次节点 $i$:

$$ S_{i}^{\prime} = \text{sgn}\left(\sum_{j}w_{ij}S_{j}\right) $$

$S_{i}^{\prime} = S_{i}$ 能量不变; $S_{i}^{\prime} = -S_{i}$, 能量变换为

$$ H^{\prime} - H = -\sum_{j\neq i}w_{ij}S_{i}^{\prime}S_{j} + \sum_{j\neq i}w_{ij}S_{i}S_{j} = 2S_{i}\sum_{j\neq i}w_{ij}S_{j} = 2S_{i}\sum_{j}w_{ij}S_{j} - 2w_{ii} $$

其中 $w_{ii} = p/N$, 所以 $\Delta H<0$.

通过设定 $w_{ii}=0$, 使得 $S_{i}=\pm 1$ 不会同时为稳定态.

Starting from an Energy Function

单图案.

$$ H = -\frac{1}{2N}\left(\sum_{i}S_{i}\xi_{i}\right)^{2} $$

多图案.

$$ \begin{aligned} H &= - \frac{1}{2N}\sum_{\mu=1}^{p}\left(\sum_{i}S_{i}\xi_{i}^{\mu}\right)^{2} \\ &= -\frac{1}{2N}\sum_{\mu=1}^{p}\left(\sum_{i}S_{i}\xi_{i}^{\mu}\right)\left(\sum_{i}S_{j}\xi_{j}^{\mu}\right) = -\frac{1}{2}\sum_{ij}\left(\frac{1}{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu}\right)S_{i}S_{j} \end{aligned} $$

采用 Hebb’s rule 时即为能量函数的定义. 这就是通过最小能量观点寻找 $w_{ij}$ 的思路.

Spurious States(杂散态)

Hebb’s rule 适用的是 retrieval states(记忆态).

Spurious States 被定义为奇数个 retrieval states 的线性组合. 如

$$ \xi_{i}^{\text{mix}} = \text{sgn}\left(\pm \xi_{i}^{\mu_{1}} \pm \xi_{i}^{\mu_{2}} \pm \xi_{i}^{\mu_{3}}\right) $$

以 $(+,+,+)$ 为例, 输入

$$ h_{i}^{\text{mix}} = \frac{1}{N}\sum_{j\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}\xi_{j}^{\text{mix}} = \frac{1}{2}\xi_{i}^{\mu_{1}} + \frac{1}{2}\xi_{i}^{\mu_{2}} + \frac{1}{2}\xi_{i}^{\mu_{3}} + \text{cross-terms} $$

大 $p$ 时出现 spin glass states. 这些同样是能量函数的局部极小值点, 需要后续通过各类方法判别并排除.

2.3 Statistical Mechanics of Magnetic Systems

磁场

$$ h_{i} = \sum_{j}w_{ij}S_{j} + h^{\text{ext}},\quad w_{ij} = w_{ji} $$

能量

$$ H = -\frac{1}{2}\sum_{ij}w_{ij}S_{i}S_{j} - h^{\text{ext}}\sum_{i}S_{i} $$

外部磁场 $\leftrightarrow$ 阈值.

Finite Temperature Dynamics

温度 热涨落 使得自旋翻转.

Glauber dynamics:

$$ S_{i} := \begin{cases} +1 & \text{with probability } g(h_{i}) \\ -1 & \text{with probability } 1 - g(h_{i}) \end{cases},\quad g(h) = f_{\beta}(h)\equiv \frac{1}{1+e^{-2\beta h}},\quad \beta = \frac{1}{k_{B}T} $$

注意到性质 $1-f_{\beta}(h) = f_{\beta}(-h)$.

则写作对称形式

$$ \text{Prob}(S_{i}=\pm 1) = f_{\beta}(\pm h_{i}) = \frac{1}{1+e^{\mp 2\beta h_{i}}} $$

$T\rightarrow 0$($\beta\rightarrow\infty$) 时, $f_{\beta}(h)$ 退化为阶跃函数 $\Theta(h)$, 则 $S_{i}:=\text{sgn}(h_{i})$;

$T\rightarrow \infty$($\beta\rightarrow 0$) 时, $f_{\beta}(h)\rightarrow 1/2$, 则 $S_{i}$ 以 $1/2$ 概率取 $\pm 1$, 即完全随机.

A Single Spin in Equilibrium

$$ \langle S\rangle = \text{Prob}(+1)\cdot (+1) + \text{Prob}(-1)\cdot (-1) = \frac{1}{1+e^{-2\beta h}} - \frac{1}{1+e^{2\beta h}} = \tanh(\beta h) $$

hyperbolic tangent function: $$ \tanh(x) = \frac{e^{x}-e^{-x}}{e^{x}+e^{-x}} $$

Mean Field Theory

磁场平均值和自旋平均值:

$$ \begin{aligned} \langle h_{i}\rangle &= \sum_{j}w_{ij}\langle S_{j}\rangle + h^{\text{ext}}\\ \langle S_{i}\rangle &= \tanh\left(\beta\langle h_{i}\rangle\right) = \tanh\left(\beta\sum_{j}w_{ij}\langle S_{j}\rangle + \beta h^{\text{ext}}\right) \end{aligned} $$

The Ferromagnet(铁磁体)

铁磁态下 $w_{ij}>0$, $\langle S_{i}\rangle = \langle S\rangle$.

$$ w_{ij} = \frac{J}{N} $$

$$ \langle S\rangle = \tanh(\beta J\langle S\rangle) $$

高温一个顺磁解, 低温多两个铁磁解(自发磁化 spontaneous magnetization).

2.4 Stochastic Networks

令神经元的激活/静息概率为

$$ \text{Prob}(S_{i} = \pm 1) = f_{\beta}(\pm h_{i}) = \frac{1}{1+e^{\mp 2\beta h_{i}}} $$

$f_{\beta}(h)$: logistic function(逻辑函数).

引入等效温度 pseudo-temperature $T$, 即 $\beta\equiv \frac{1}{T}$.

低温极限下 sigmoid 函数退化为阶跃函数, 概率函数退化为 McCulloch-Pitts rule.

对称阵 $w_{ij}$ 使得系综平均可以达到且与时间无关.

Mean Field Theory

$p\ll N$

平均场方程:

$$ \langle S_{i}\rangle = \tanh{\left(\frac{\beta}{N}\sum_{j\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}\langle S_{j}\rangle\right)}. $$

假定 $\langle S_{i}\rangle = m\xi_{i}^{\nu}$:

$$ m\xi_{i}^{\nu} = \tanh{\left(\frac{\beta}{N}\sum_{j\mu}\xi_{i}^{\mu}\xi_{j}^{\mu}m\xi_{j}^{\nu}\right)}\Rightarrow m\xi_{i}^{\nu} = \tanh{(\beta m\xi_{i}^{\nu})}\Rightarrow m = \tanh{(\beta m)} $$

记忆态在 $T<T_{c} = 1$ 时稳定.

$$ m = \frac{\langle S_{i}\rangle}{\xi_{i}^{\nu}} = \text{Prob}(\text{bit }i\text{ is correct}) - \text{Prob}(\text{bit }i\text{ is incorrect}) $$

$$ \langle N_{\text{correct}}\rangle = \frac{1}{2}N(1+m) $$

这不是完美装置. 反转态和混合态都会存在.

2.5 Capacity of the Stochastic Network

考察 $p\sim N$.

定义 load parameter(负载参数): $\alpha = \frac{p}{N}$.

保留 crosstalk 项, 即

$$ m_{\nu} = \frac{1}{N}\sum_{i}\xi_{i}^{\nu}\langle S_{i}\rangle $$

$$ r = \frac{1}{\alpha}\sum_{\nu\neq 1}m_{\nu}^{2} $$

是系统构型的均方重叠.

Capacity Calculation

平均场方程:

$$ \begin{aligned} m_{\nu} &= \frac{1}{N}\sum_{i}\xi_{i}^{\nu}\tanh{\left(\beta\sum_{\mu}\xi_{i}^{\mu}m_{\mu}\right)} = \frac{1}{N}\sum_{i}\xi_{i}^{\nu}\xi_{i}^{1}\tanh{\left[ \beta\left( m_{1} + \xi_{i}^{\nu}\xi_{i}^{1}m_{\nu} + \sum_{\mu\neq 1,\nu}\xi_{i}^{\mu}\xi_{i}^{1}m_{\mu} \right) \right]}\\ &= \frac{1}{N}\sum_{i}\xi_{i}^{\nu}\xi_{i}^{1}\tanh{\left[ \beta\left( m_{1} + \sum_{\mu\neq 1,\nu}\xi_{i}^{\mu}\xi_{i}^{1}m_{\mu} \right) \right]} + \frac{\beta}{N}\sum_{i}\left\{ 1-\tanh^{2}\left[ \beta\left( m_{1} + \sum_{\mu\neq 1,\nu}\xi_{i}^{\mu}\xi_{i}^{1}m_{\mu} \right) \right] \right\}m_{\nu} \end{aligned} $$

因为第二项极小: $$ \frac{\mathrm{d}}{\mathrm{d}x}\tanh(x) = 1-\tanh^{2}(x) $$

中心极限定理: $N^{-1}\sum_{i}$ 视为 Gaussian 噪声 $\sum_{\mu\neq 1,\nu}\xi_{i}^{\mu}\xi_{i}^{1}m_{\mu}$, 方差为 $\alpha r$.