Let us consider the binary target $y = [y_{1}, \dots, y_{N}]$ and the dependant variables $X = [x_{1}, \dots, x_{N}]$ such that

\begin{equation*}
P\left(\boldsymbol{y} | X, \boldsymbol{\beta} \right) = \prod_{i=1}^N P\left(y_i | \boldsymbol{x}_i, \boldsymbol{\beta} \right)
\end{equation*}

We further assume that

\begin{align*}
y_i | \boldsymbol{x_i}, \boldsymbol{\beta} &\sim \operatorname{Bernoulli}\left(p_i\right)\\
p_i & = \sigma(\boldsymbol{x_i}^T\,\boldsymbol{\beta})\\
\sigma(x) &= \left(1 + \exp(-x)\right)^{-1}
\end{align*}

It can be shown that

\begin{equation*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) = \frac{\exp\left(y_i\: \boldsymbol{x}_i^T \boldsymbol{\beta}\right)}{1 + \exp\left(\boldsymbol{x}_i^T \boldsymbol{\beta}\right)}
\end{equation*}

while assuming a normal prior on each coefficient $β_{j}$

\begin{equation*}
\beta_j \sim \operatorname{N}\left(\mu, \sigma^2\right)
\end{equation*}

So the posterior distribution:

P (β ∣ y, X) \propto P (Y ∣ X, β) P (β)

does not have a closed-form solution.

But behold! Polson et al. remind us of this neat result:

\begin{align*}
\frac{\left(e^u\right)^a}{\left(1 + e^u\right)^b} &= \frac{1}{2^b}\, e^{\kappa u}\,\int_0^\infty e^{-\frac{u^2}{2} \omega}\; p(\omega)\, \mathrm{d}\omega\\
\kappa &= a - \frac{b}{2}\\
p(\omega) &= \mathrm{PG}\left(\omega|b, 0\right)
\end{align*}

So if we set $b = 1$ , $u = x_{i}^{T} β$ and $a = y_{i}$ the following holds:

\begin{align*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) &= \frac{1}{2} \int_0^\infty \exp\left( y_i \boldsymbol{x_i}^T\,\boldsymbol{\beta} - \frac{\left(\boldsymbol{x_i}^T\,\boldsymbol{\beta}\right)^2}{2} \omega\right)\;P(\omega)\; \mathrm{d}\omega\\
&= \frac{1}{2} \int_0^\infty \exp\left( -\frac{\omega}{2} \left( \frac{y_i}{\omega} - \boldsymbol{x_i}^T\,\boldsymbol{\beta}\right)^2\right)\;P(\omega)\; \mathrm{d}\omega\\
\end{align*}

So if we condition on $ω_{n}$ we find that $P (y_{i} ∣ x_{i}, β, ω_{n})$ , and hence $P (β ∣ Y, X, ω_{n})$ to be gaaussian (if $ω_{n}$ is draws from $PG (1, 0)$ )

TODO Simple example with a bimodal distribution

And $Y_{i}$ indicates which mode the sample belongs to. We can use the polyagamma library.

Appendix

Likelihood

Let us derive the likelihood $p (y ∣ x, β)$ for the Bernoulli and Negative Binomial observation distribution. Assuming that the observations are i.i.d. from the same distribution we can write:

\begin{equation*}
P\left(\boldsymbol{y} | x, \boldsymbol{\beta}\right) = \prod_{i=1}^N P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right)
\end{equation*}

Bernoulli observation distribution

In this case:

\begin{align*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) &= p_i^{\,1-y_i}\,\left(1 - p_i\right)^{y_i}\\
p_i &= \frac{\exp(f_i)}{1 + \exp(f_i)}\\
f_i &= - x_i^T\,\boldsymbol{\beta}
\end{align*}

\begin{align*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) &= p_i^{\,1-y_i}\,\left(1 - p_i\right)^{y_i}\\
&= \left[ \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{\,1-y_i}\,\left[1 - \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{y_i}\\
&= \left[ \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{\,1-y_i}\,\left[\frac{1}{1 + \exp(f_i)}\right]^{y_i}\\
&= \frac{\left( \exp(f_i) \right)^{\,1-y_i}}{1 + \exp(f_i)}\\
&= \frac{\left( \exp(-f_i) \right)^{\,y_i}}{1 + \exp(-f_i)}\\
\end{align*}

So that:

\begin{equation*}
\mathcal{L}(\boldsymbol{\beta}) = \prod_{i=1}^N P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) = \prod_{i=1}^N \frac{\left(\exp\left(\boldsymbol{x}_i^T \boldsymbol{\beta}\right)\right)^{y_i}}{1 + \exp\left(\boldsymbol{x}_i^T \boldsymbol{\beta}\right)}
\end{equation*}

import aesara.tensor as at
 
srng = at.random.RandomStream(0)
X = at.matrix("X")
beta = at.vector("beta")
 
p = at.sigmoid(at.dot(X.T, beta))
Y_rv = srng.bernoulli(p, size=X.shape[0])
y_vv = Y_rv.clone()
 
loglikelihood = y.logprob(y_vv)

Is equivalent to:

X = at.matrix("X")
beta = at.vector("beta")
f = at.dot(X.T, beta)
# loglikelihood = at.prod(at.pow(at.exp(f), y) / (1 + at.dot(exp(f)))...)

Negative binomial observation distribution

If we note $r$ the negative binomial’s dispersion parameter, and $y_{i}$ the number of observations in experiment $i$ , then:

\begin{align*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) &= p_i^{\,r}\,\left(1 - p_i\right)^{y_i}\\
p_i &= \frac{\exp(f_i)}{1 + \exp(f_i)}\\
f_i &= - x_i^T\,\boldsymbol{\beta}
\end{align*}

The previous calculation trivially applies to this as well:

\begin{align*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) &\propto p_i^{\,r}\,\left(1 - p_i\right)^{y_i}\\
&= \left[ \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{\,r}\,\left[1 - \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{y_i}\\
&= \left[ \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{\,r}\,\left[\frac{1}{1 + \exp(f_i)}\right]^{y_i}\\
&= \frac{\left( \exp(f_i) \right)^{\,r}}{\left(1 + \exp(f_i)\right)^{y_i+r}}\\
\end{align*}

So that

\begin{equation*}
\mathcal{L}_i\left(\boldsymbol{\beta}\right) = \prod_{i=1}^N P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) = \prod_{i=1}^N \frac{\left(\exp\left( - \boldsymbol{x}_i^T \boldsymbol{\beta}\right)\right)^{r}}{\left(1 + \exp\left(-\boldsymbol{x}_i^T \boldsymbol{\beta}\right)\right)^{y_i+r}}
\end{equation*}

References

Dunson, etc. Bayesian inference for logistic models using Polya-Gamma latent variables. (2013)
Slides: http://www.gatsby.ucl.ac.uk/tea/tea_archive/attached_files/polya-gamma-slides.pdf
https://tiao.io/post/polya-gamma-bayesian-logistic-regression/#i
https://gregorygundersen.com/blog/2019/09/20/polya-gamma/
Implementation in AeMCMC

Polya-Gamma augmentation