This talk by Andrew Gordon Wilson (2019). Other talk by Andrew Gordon Wilson (2022).

The following notes are taken from his slides:

Why Bayesian Deep learning?

Model construction and understand generalization
Decision making

Better point estimates. Marginalization integrates away the posterior vs optimization.

\begin{align*}
 P(y|x_*, y, X) &= \int P(y|x_*, w)p(w|y, X)\mathrm{d}w\\
                &\approx \frac{1}{N_{samp}} \sum_i P(y|x_*, w_i)\\
 \end{align*}

Intepretability, incorporate expert knowledge
Successful in the second wave of DL
NN are less mysterious under the lens of probability theory
Bayesian neural network, by averaging over the posterior, take into account the fact that wide basin of attractions generalize better.

Why not?

Computationally intractable. BUT all we care about is averages over the posteriors; we don’t need to keep all the samples to do this.
Involves a lot of moving parts

Some practical stuff

More principled Fast Geometric Ensembling

Stochastic weight averaging allows to compute an approximation in weight space from which we can sample;

Random low-dimensional subspace

ω = P z + \overset{ω}{^}

Run SGD with high LR
Collect snapshots $ω_{i}$
Use SWA solution as weights $\overset{ω}{^} = \frac{1}{M} \sum_{i} ω_{i}$
Find the first $k$ PCA components of $\overset{ω}{^} - ω_{i}$

Are these approaches practical?

HMC works better than everything else out of the box.

Bayesian Deep Learning

Why Bayesian Deep learning?

Some practical stuff

Are these approaches practical?

Backlinks