The Hopfield Model
partition function:
$$ Z = \text{Tr}_{S}\exp{(-\beta H\{S\})} $$
energy function:
$$ H_{0} = -\frac{1}{2N}\sum_{\mu=1}^{p}\left( \sum_{i}S_{i}\xi_{i}^{\mu} \right)^{2} + \frac{p}{2} $$
Hopfield 网络的 energy function:
$$ H = -\frac{1}{2}\sum_{i,j}J_{ij}S_{i}S_{j} $$
Hebb’s rule:
$$ J_{ij} = \frac{1}{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu} $$
将 Hebb’s rule 代入 energy function:
$$ H = -\frac{1}{2}\sum_{i,j}\left(\frac{1}{N}\sum_{\mu=1}^{p}\xi_{i}^{\mu}\xi_{j}^{\mu}\right)S_{i}S_{j} = -\frac{1}{2N}\sum_{\mu=1}^{p}\left(\sum_{i}S_{i}\xi_{i}^{\mu}\right)^{2} $$
假设 $\xi_{i}^{\mu}$ 是独立同分布的随机变量, 取值为 $\pm 1$ 且概率相等. 那么定义随机变量
$$ X^{\mu} \equiv \sum_{i=1}^{N}S_{i}\xi_{i}^{\mu} $$
则
$$ (X^{\mu})^{2} = \sum_{ij}S_{i}S_{j}\xi_{i}^{\mu}\xi_{j}^{\mu} $$
期望
$$ \langle (X^{\mu})^{2}\rangle = \sum_{ij}S_{i}S_{j}\langle\xi_{i}^{\mu}\xi_{j}^{\mu}\rangle $$
- 当 $i\neq j$ 时, $\xi_{i}$ 和 $\xi_{j}$ 互相独立, 则 $\langle \xi_{i}^{\mu}\xi_{j}^{\mu}\rangle = \langle\xi_{i}^{\mu}\rangle\langle\xi_{j}^{\mu}\rangle = 0$;
- 当 $i = j$ 时, $\langle (\xi_{i}^{\mu})^{2}\rangle = 1$
因此合并得到 $\langle\xi_{i}^{\mu}\xi_{j}^{\mu}\rangle = \delta_{ij}$. 于是
$$ \langle (X^{\mu})^{2}\rangle = \sum_{ij}S_{i}S_{j}\delta_{ij} = N $$
所以
$$ \langle H\rangle = -\frac{1}{2N}\sum_{\mu=1}^{p}\langle (X^{\mu})^{2}\rangle = -\frac{1}{2N}\cdot p\cdot N = -\frac{p}{2} $$
在后面加上相抵的常数项 $\frac{p}{2}$ 使得 $\langle H_{0}\rangle = 0$.
定义 $\alpha = \frac{p}{N}$.
Mean Field Theory for $\alpha=0$
设外场 $h^{\mu}\xi_{i}^{\mu}$, 则
$$ H = H_{0} - \sum_{\mu}h^{\mu}\sum_{i}\xi_{i}^{\mu}S_{i} $$
partition function:
$$ Z = e^{-\beta p/2}\text{Tr}_{S}\exp{\left[ \frac{\beta}{2N}\sum_{\mu}{{\color{red}{\left(\sum_{i}S_{i}\xi_{i}^{\mu}\right)^{2}}}} + \beta\sum_{\mu}h^{\mu}\sum_{i}\xi_{i}^{\mu}S_{i} \right]} $$
由于存在二次项, 所以无法拆分为 $N$ 个 exp 乘积.
技巧:
$$ \int_{-\infty}^{\infty}e^{-ax^{2}\pm bx}\mathrm{d}x = \sqrt{\frac{\pi}{a}}e^{\frac{b^{2}}{4a}} $$
反向使用该公式, 将 $e^{x^{2}}$ 化为积分式:
$$ \frac{\beta}{2N}\left(\sum_{i}S_{i}\xi_{i}^{\mu}\right)^{2} = \frac{b^{2}}{4a} \Rightarrow \begin{cases} b &= \beta \sum_{i}S_{i}\xi_{i}^{\mu}\\ a &= \frac{\beta N}{2} \end{cases} $$
引入辅助积分变量 $m^{\mu}$, 即有
$$ \exp{\left[ \frac{\beta}{2N}\left(\sum_{i}S_{i}\xi_{i}^{\mu}\right)^{2} \right]} = \left(\frac{\beta N}{2\pi}\right)^{\frac{1}{2}}\int\mathrm{d}m^{\mu}\exp{\left[ -\frac{1}{2}\beta N(m^{\mu})^{2} + \beta m^{\mu}\sum_{i}\xi_{i}^{\mu}S_{i} \right]} $$
原指数中的一次项即可合并入各 $\mu$ 的指数积分中:
$$ Z = e^{-\beta p/2}\left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \times \text{Tr}_{S}\prod_{\mu}\int\mathrm{d}m^{\mu}\exp{\left[ -\frac{1}{2}\beta N(m^{\mu})^{2} + \beta (m^{\mu} + h^{\mu})\sum_{i}\xi_{i}^{\mu}S_{i} \right]} $$
将 $m^{\mu}, h^{\mu}, \xi_{i}^{\mu}$ 合并写作含 $p$ 个分量的矢量形式 $\vec{m},\vec{h},\vec{\xi}_{i}$.
注意: 在这里 $\xi_{i}^{\mu}$ 是指模式 $\mu$ 下, 在神经元/节点 $i$ 处的取值. 而 $\vec{x}_{i}$ 是指在神经元/节点 $i$ 处的所有模式取值构成的向量. 因为认为这些模式彼此独立, 因此在 $\mu$ 尺度上可以认为 $\xi_{i}^{\mu}$ 是 $\{\pm 1\}$ 随机变量, 即 $\vec{\xi}$ 一共有 $2^{p}$ 种可能取值.
由于恒等式
$$ \begin{aligned} \prod_{\mu}\int\mathrm{d}m^{\mu} &\equiv \int\mathrm{d}m^{1}\cdots\mathrm{d}m^{p} \equiv \int\mathrm{d}\vec{m}\\ \sum_{\mu}(m^{\mu})^{2} &\equiv \vec{m}^{2}\\ \sum_{\mu}(m^{\mu} + h^{\mu})\xi_{i}^{\mu} &\equiv (\vec{m} + \vec{h})\cdot\vec{\xi}_{i}\\ \text{Tr}_{S}(*) &\equiv \sum_{S_{1}=\pm 1}\sum_{S_{2}=\pm 1}\cdots\sum_{S_{N}=\pm 1}(*) \end{aligned} $$
具体推导过程:
$$ \begin{aligned} Z &= e^{-\frac{\beta p}{2}} \left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \text{Tr}_{S}\prod_{\mu}\int\mathrm{d}m^{\mu}\exp{\left[ -\frac{1}{2}\beta N(m^{\mu})^{2} + \beta (m^{\mu} + h^{\mu})\sum_{i}\xi_{i}^{\mu}S_{i} \right]}\\ &= e^{-\frac{\beta p}{2}} \left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \text{Tr}_{S}\int\prod_{\mu}\mathrm{d}m^{\mu}\exp{\left\{\sum_{\mu}\left[ -\frac{1}{2}\beta N(m^{\mu})^{2} + \beta (m^{\mu} + h^{\mu})\sum_{i}\xi_{i}^{\mu}S_{i} \right]\right\}}\\ &= e^{-\frac{\beta p}{2}} \left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \text{Tr}_{S}\int\mathrm{d}\vec{m}\exp{\left[ -\frac{1}{2}\beta N \sum_{\mu}(m^{\mu})^{2} + \beta \sum_{i\mu}(m^{\mu}+h^{\mu})\xi_{i}^{\mu}S_{i} \right]}\\ &= e^{-\frac{\beta p}{2}} \left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \text{Tr}_{S}\int\mathrm{d}\vec{m}\exp{\left[ -\frac{1}{2}\beta N \vec{m}^{2} + \beta \sum_{i}(\vec{m} + \vec{h})\cdot\vec{\xi}_{i}S_{i} \right]}\\ &= e^{-\frac{\beta p}{2}} \left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \int\mathrm{d}\vec{m}e^{-\frac{1}{2}\beta N \vec{m}^{2}}\text{Tr}_{S}e^{\beta \sum_{i}(\vec{m} + \vec{h})\cdot\vec{\xi}_{i}S_{i}}\\ &= e^{-\frac{\beta p}{2}} \left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \int\mathrm{d}\vec{m}e^{-\frac{1}{2}\beta N \vec{m}^{2}}\prod_{i}\text{Tr}_{S_{i}}e^{\beta (\vec{m} + \vec{h})\cdot\vec{\xi}_{i}S_{i}}\\ &= e^{-\frac{\beta p}{2}} \left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \int\mathrm{d}\vec{m}e^{-\frac{1}{2}\beta N \vec{m}^{2}}\prod_{i}\{2\cosh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}_{i}]}\}\\ &= e^{-\frac{\beta p}{2}} \left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \int\mathrm{d}\vec{m}e^{-\frac{1}{2}\beta N \vec{m}^{2}} \prod_{i}e^{\ln\{2\cosh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}_{i}]}\}}\\ &= e^{-\frac{\beta p}{2}} \left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \int\mathrm{d}\vec{m}e^{-\frac{1}{2}\beta N \vec{m}^{2}} e^{\sum_{i}\ln\{2\cosh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}_{i}]}\}}\\ &= e^{-\beta N {\color{red}{\frac{p}{2N}}}} \left(\frac{\beta N}{2\pi}\right)^{\frac{p}{2}} \int\mathrm{d}\vec{m} e^{-\beta N {\color{red}{\frac{\vec{m}^{2}}{2}}}} e^{-\beta N {\color{red}{\left(-\frac{1}{\beta N}\sum_{i}\ln{(2\cosh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}_{i}]})}\right)}}}\\ &= \left(\frac{\beta N}{2\pi}\right)^{p/2}\int\mathrm{d}\vec{m}e^{-\beta N{\color{red}{f(\beta,\vec{m})}}} \end{aligned} $$
其中 $$ f(\beta,\vec{m}) = \frac{1}{2}\alpha + \frac{1}{2}\vec{m}^{2} - \frac{1}{\beta N}\sum_{i}\ln{(2\cosh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}_{i}]})} $$
$$ Z = \left(\frac{\beta N}{2\pi}\right)^{p/2}\int\mathrm{d}\vec{m}e^{-\beta Nf(\beta,\vec{m})} $$
在热力学极限($N\to\infty$)下可对该积分进行估算. 提取其一维形式积分
$$ \begin{aligned} I &= \sqrt{N}\int\mathrm{d}x e^{-Ng(x)}\\ &= \sqrt{N}\int\mathrm{d}x \exp\left\{-N\left[ g(x_{0}) + \frac{1}{2}g^{\prime\prime}(x_{0})(x - x_{0})^{2} + \cdots \right]\right\}, \quad \begin{cases}g^{\prime}(x_{0}) &= 0\\ g^{\prime\prime}(x_{0}) &> 0\end{cases}\\ &\approx \sqrt{N}e^{-Ng(x_{0})}\int\mathrm{d}x \exp\left\{-\frac{Ng^{\prime\prime}(x_{0})}{2}(x - x_{0})^{2}\right\}\\ &= \sqrt{N}e^{-Ng(x_{0})}\sqrt{\frac{2\pi}{Ng^{\prime\prime}(x_{0})}}\\ &= e^{-Ng(x_{0})}\sqrt{\frac{2\pi}{g^{\prime\prime}(x_{0})}} \end{aligned} $$
可发现
$$ \begin{aligned} -\frac{1}{N}\ln{I} &= g(x_{0}) + \frac{1}{2N}\left[\ln{g^{\prime\prime}(x_{0})}-\ln{2\pi}\right]\\ &\overset{N\to\infty}{\longrightarrow} g(x_{0}) \end{aligned} $$
即知道了 $g(x)$ 最小值, 即可估算积分值 $I$. 回到原 $p$ 维积分, 同样的处理技巧, 即 $g(\cdot) = \beta f(\beta, \vec{m})$:
$$ -\frac{1}{N}\ln{Z} = \beta\underset{\vec{m}}{\min}f(\beta,\vec{m}) $$
而自由能
$$ F = -\frac{1}{\beta}\ln{Z} = N\underset{\vec{m}}{\min}f(\beta,\vec{m})\Rightarrow \frac{F}{N} = \underset{\vec{m}}{\min}f(\beta,\vec{m}) $$
接下来寻找 $f(\beta,\vec{m})$ 的最小值.
$$ f(\beta,\vec{m}) = \frac{1}{2}\alpha + \frac{1}{2}\vec{m}^{2} - \frac{1}{\beta}{\color{red}{\frac{1}{N}\sum_{i}}}\ln{(2\cosh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}_{i}]})} $$
$$ \begin{aligned} \frac{\partial f}{\partial m^{\mu}} &= m^{\mu} - \frac{1}{N}\sum_{i}\xi_{i}^{\mu}\tanh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}_{i}]} = 0\\ \Rightarrow m^{\mu} &= {\color{red}{\frac{1}{N}\sum_{i}}}\xi_{i}^{\mu}\tanh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}_{i}]} = {\color{red}{\langle}} \xi^{\mu}\tanh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}]}{\color{red}{\rangle}} \end{aligned} $$
提示:
$$ (\vec{m} + \vec{h})\cdot\vec{\xi} = \sum_{\nu}(m^{\nu} + h^{\nu})\xi^{\nu} $$
所以在取平均时, 求和符合内隐含固定某个 $\mu$ 的假设, 但是 $\tanh$ 函数里计算时的的确确是对所有 $\mu$ 都进行了计算.
也许更精准的写法是
$$ m^{\mu} = E_{\{\xi^{\nu}\}_{\nu=1}^{p}}\left\{\xi^{\mu}\tanh{\left[\beta\sum_{\nu}(m^{\nu}+h^{\nu})\xi^{\nu}\right]}\right\} $$
同理, 求和写作期望形式:
$$ f \overset{\alpha\rightarrow 0}{=} \frac{1}{2}\vec{m}^{2} -\frac{1}{\beta}\bigg\langle \ln\{2\cosh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}]}\}\bigg\rangle $$
注意 $m^{\mu}$ 是作为辅助变量引入的, 它实际上可具有实际的物理含义.
$$ \begin{aligned} \frac{\partial F}{\partial h^{\mu}} &= \frac{\partial }{\partial h^{\mu}}\left[-\frac{1}{\beta}\ln{Z}\right] = -\frac{1}{\beta}\frac{1}{Z}\frac{\partial Z}{\partial h^{\mu}} = -\frac{1}{\beta Z}\frac{\partial}{\partial h^{\mu}} [\text{Tr}_{S}e^{-\beta H}]\\ &= -\frac{1}{\beta Z} \text{Tr}_{S}\left(-\beta \frac{\partial H}{\partial h^{\mu}}\right) e^{-\beta H}\\ &= \frac{e^{-\beta H}}{Z} \left(-\sum_{i}\xi_{i}^{\mu}S_{i}\right)\\ &= -\sum_{i} \xi_{i}^{\mu}\langle S_{i}\rangle \end{aligned} $$
而另一方面,
$$ f \overset{\alpha\rightarrow 0}{=} \frac{1}{2}\vec{m}^{2} -\frac{1}{\beta}\bigg\langle \ln\{2\cosh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}]}\}\bigg\rangle $$
$$ \begin{aligned} \frac{\partial f}{\partial h^{\mu}} = -\frac{1}{\beta} \bigg\langle \beta \xi^{\mu}\tanh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}]} \bigg\rangle = -\bigg\langle \xi^{\mu}\tanh{[\beta(\vec{m} + \vec{h})\cdot\vec{\xi}]} \bigg\rangle = -m^{\mu} \end{aligned} $$
而我们又有 $F/N = \underset{\vec{m}}{\min}f(\beta,\vec{m})$, 因此
$$ \begin{aligned} \frac{\partial F}{\partial h^{\mu}} &= -\sum_{i}\xi_{i}^{\mu}\langle S_{i}\rangle\\ =N\frac{\partial f}{\partial h^{\mu}} &= -Nm^{\mu} \end{aligned} $$
即
$$ m^{\mu} = \frac{1}{N}\sum_{i}\xi_{i}^{\mu}\langle S_{i}\rangle $$
所以使得自由能最小的 $m^{\mu}$ 就是网络对于模式 $\mu$ 的平均 overlap.
在 $h^{\mu} = 0$ 时, 自洽方程(self-consistency equation) 形式为
$$ m^{\mu} = \bigg\langle \xi^{\mu}\tanh{\left(\beta \vec{m} \cdot \vec{\xi}\right)} \bigg\rangle $$
- memory state: 只和某个特定的 $\mu$ 有 correlation 的 $\vec{m}$ 解. 如 $\vec{m} = (m,0,0,\cdots)$, 则自洽方程进一步化为
$$ m^{\mu} = \bigg\langle \xi^{\mu}\tanh{(\beta m \xi^{1})}\bigg\rangle = \langle \xi^{\mu}\xi^{1}\rangle \tanh{(\beta m)} = \delta_{\mu 1}\tanh{(\beta m)} $$
当 $\mu = 1$ 时, 有
$$ m = \tanh{(\beta m)} $$
-
spurious state.
-
- 最简单的是 mixture state, 即
$$ \vec{m} = (\underbrace{m,m,\cdots,m}_{n},\underbrace{0,0,\cdots,0}_{p-n}) $$
一共有 $C_{p}^{n}$ 种方式. 那么自洽方程化为
$$ \begin{aligned} m^{\mu} &= \bigg\langle \xi^{\mu}\tanh{\left(\beta m\sum_{\nu=1}^{n}\xi^{\nu}\right)}\bigg\rangle\\ m &= \frac{1}{n}\bigg\langle z\tanh{(\beta mz)}\bigg\rangle, \quad z = \sum_{\mu=1}^{n}\xi^{\mu} \end{aligned} $$
除了要求 self-consistency, 还需要该解使得自由能 $F$ 或者 $f$ 最小. 这要求
$$ A_{\mu\nu} = \frac{\partial^{2}f}{\partial m^{\mu}\partial m^{\nu}} $$
的特征值均为正.
在驻点附近,
$$ f(\vec{m}^{*}+\delta\vec{m}) = f(\vec{m}^{*}) + \frac{1}{2}\sum_{\mu\nu}\delta m^{\mu} A_{\mu\nu}\delta m^{\nu} + \mathcal{O}(\delta m^{3}),\quad A_{\mu\nu} = \frac{\partial^{2}f}{\partial m^{\mu}\partial m^{\nu}}\bigg|_{\vec{m}=\vec{m}^{*}} $$
- 若 $A_{\mu\nu}$ 特征值均为正, 即为二次型正定, 驻点 $\vec{m}^{*}$ 为极小值点;
- 若 $A_{\mu\nu}$ 存在负特征值, 则存在方向使得 $f\downarrow$, 驻点 $\vec{m}^{*}$ 不稳定;
结论: 只有 $n$ 为奇数时才有可能正定, 且温度 $T = \frac{1}{\beta}$ 需要低于临界温度(critical temperature) $T_{c}$.
-
- asymmetric mixture state:
$$ m = \left(\frac{1}{2},\frac{1}{2},\frac{1}{4},\frac{1}{4},\frac{1}{4},0,0,0\cdots\right) $$
Mean Field Theory for $\alpha\neq 0$
计算 $\langle\log{Z}\rangle$. replica method:
$$ \ln{Z} = \lim_{n\to 0}\frac{Z^{n}-1}{n} $$
这样就可以通过计算 $\langle Z^{n}\rangle$ 来得到 $\langle\ln{Z}\rangle$. 先取 $n$ 为整数, 最后再取 $n\to 0$.