Mean field theory in physics and machine learning
- Ising model and it’s mean field approximation
- Deriving the mean field
- Connection to KL divergence
- Additional information
Ising model and it’s mean field approximation
An Ising model is a mathematical model that can explain the physical behavior of magnets. It was instrumental in understanding phase transition.
An Ising model with N spins. The i-th spin is Si. Each spin can only take one of the two states: +1 or −1. B is magnetic field. J is interaction strength between two neighboring spins. Total energy is:
E=∑iBSi+J∑ijSiSjThe j in second term sums over nearest neighbor. The first term is the interaction between the external field B and individual spins, which is easy to deal with. The second term is hard – It is the interaction between the spins, a product of two unknowns. In other words, the second term denotes coupling between the spins.
To simplify the problem, we can imagine each spin Si feels an average spin ˉS coming from each of its neighbors, much like the external field B. By doing that, the second term of Eq. (1) depends on a single spin instead of two. It is now just like the first term involving a single summation.
The total energy Eq.(1) becomes
EMF=∑iBSi+J∑iSiˉS=(B+zJˉS)∑iSiWe can define mean field
ΔB=zJˉSand z is the number of nearest neighbor (e.g. 4 for a square lattice).
Note that although the mean field energy Eq.(2) is simpler than Eq.(1), it is still a function of all spins, i.e. EMF(s1,s2,..,sN). Only that now the energy is a linear function of the spins.
Deriving the mean field
Gibbs-Bogoliubov-Feynman inequality
The mean field approximation is a way to simplify the partition function and make it possible for analytical treatment. In this section, we will show mean field approximation results in a lower bound to the partition function. (Partition function is defined as the sum of the probabilities of all possible states. It is an important expression in statistical mechanics. From it we can calculate many physical quantities such as average spin.)
Partition function of Ising model is
Z=∑s1,s2...sNexp[−βE(s1,s2,...,sN)]The sum is over all spin configurations, .i.e. s1=±1,s2=±1 etc.
This partition function is hard to evaluate. We can try to simplify it to make it computationally tractable. One way to do it is to find a way to replace the energy E whith its mean value.
Let E=EMF+ΔE. ΔE is the energy difference from the mean field energy EMF. The partition function can be rewritten as
Z=∑exp[−β(EMF+ΔE)]=∑ZMFexp[−β(EMF+ΔE)]ZMF=ZMF⟨exp(−βΔE)⟩MFZMF=∑exp(−βEMF) is the mean-field partition function, ⟨⟩MF denotes expectation value weighted over mean field probabilities. Now the expectation value is still hard to evaluate. We can move the expectation operator to the exponent of the exponential function by
⟨exp(−βΔE)⟩=⟨exp(−β⟨ΔE⟩)exp(−β(ΔE−⟨ΔE⟩)⟩≥exp(−β⟨ΔE⟩)Where we used
- exp(x)>=1+x (You can see this as Taylor’s expansion or Jensen’s inequality)
- exp(⟨ΔE⟩) is just a number and does not suject to the average.
- The second exp() became 1 after first order expansion exp(x)>=1+x
The result is the Gibbs-Bogoliubov-Feynman inequality:
Z≥ZMFexp(−β⟨ΔE⟩MF)We can use ZMF instead of Z. The trade-off is now we have the lower bound instead of the actual partition function, introducing a biased error. But this makes the function computable, so we would take it…
Explicit expression of the lower bound
We can further calculate ⟨ΔE⟩ for the Ising model.
ΔE=E−EMFRecall
E=∑iBSi+J∑ijSiSjThe mean field energy is
EMF=(B+zJˉS)∑iSiSo
⟨EMF⟩=N(B+ΔB)ˉSThe factor 1/2 in interaction term accounts for double counting.
⟨E⟩=∑iB¯Si+J∑ij¯Si¯Sj=N(BˉS+J2zˉS2)Note that we can write ¯SiSj=¯Si¯Sj becasue the mean field energy that’s average over is a linear summation of the spins.
⟨ΔE⟩=⟨E⟩−⟨EMF⟩=N(J2zˉS2−ΔBˉS)Note that the mean spin ˉS is a function of the mean field ΔB because the field is created by the spin self-consistently.
Explicitly, The lower bound of partition function is:
Z≥ZMFexp(−β⟨ΔE⟩MF)=ZMFexp(−βN(J2zˉS2−ΔBˉS))Note that ZMF and ˉS depend on ΔB.
Variational mean field
What is the physical interpretation of the mean field Eq.(3)? In this section, we show that it is the field that minimize the free energy. That sense sense because otherwise it won’t be the mean observed value!
Holtmonz free energy is given by
H=−kBTlnZWe can find the mean field ΔB by minimziing the free energy H. Instead of working with H directly, we minimize the upper bound (lower bound of partition function Z):
HUB=−kBTln[ZMFexp(−βN(J2zˉS2−ΔBˉS))] =−kBTlnZMF+NJ2zˉS2−NBˉSMinimizing the free energy upper bound w.r.t. the mean field ΔB (This is where the term variation comes from):
∂HUB/∂ΔB=0Using
∂ZMF/∂ΔB=−NβˉSZMFIt follows:
∂HUB/∂ΔB=0 ∂ˉS/∂ΔB(ΔB−JzˉS)=0Therefore the mean field is
ΔB=JzˉSthe same result as using the mean field argument in the first section.
Connection to KL divergence
We wanted to approximate the true probability distribution
p(s)=exp(−βE(s))/Zwith the mean field distribution
q(s)=exp(−βEMF(s))/ZMFwhere s=(s1,s2,…,sN) is a vector that defines configuration of all spins.
A way to measure how good q(s) approximate p(s) is through Kullback-Leibler divergence:
KL(q(s)||p(s))=∫ds[q(s)logq(s)p(s)]
Putting in p(s) and q(s) gives
KL(q(s)||p(s))=∫dsexp(−βEMF)ZMF(β(E−EMF)+logZZMF)
Z and ZMF does not depend on the integrating variables and can be taken out of the integral.
KL(q(s)||p(s))=β⟨E−EMF⟩MF+logZZMF
Since KL divergence is always ≥0, this gives
logZZMF≥−β⟨E−EMF⟩MFZ≥ZMFexp(−β⟨ΔE⟩MF)which we recovers the Gibbs-Bogoliubov-Feynman inequality Eq.(10). So one interpretation of mean field approximation is to find a distribution to closely approximate the true distribution Eq. (26). For this reason, variational approaches to Bayesian inference problems are sometimes called mean-field theory.
Additional information
Mean Field Theory Solution of the Ising Model is an excellent write up of mean field solution to Ising model. Variational Inference: A Review for Statisticians is an excellent review of applications of variational methods.