Let us consider the binary target and the dependant variables such that

\begin{equation*}
P\left(\boldsymbol{y} | X, \boldsymbol{\beta} \right) = \prod_{i=1}^N P\left(y_i | \boldsymbol{x}_i, \boldsymbol{\beta} \right)
\end{equation*}

We further assume that

\begin{align*}
y_i | \boldsymbol{x_i}, \boldsymbol{\beta} &\sim \operatorname{Bernoulli}\left(p_i\right)\\
p_i & = \sigma(\boldsymbol{x_i}^T\,\boldsymbol{\beta})\\
\sigma(x) &= \left(1 + \exp(-x)\right)^{-1}
\end{align*}

It can be shown that

\begin{equation*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) = \frac{\exp\left(y_i\: \boldsymbol{x}_i^T \boldsymbol{\beta}\right)}{1 + \exp\left(\boldsymbol{x}_i^T \boldsymbol{\beta}\right)}
\end{equation*}

while assuming a normal prior on each coefficient

\begin{equation*}
\beta_j \sim \operatorname{N}\left(\mu, \sigma^2\right)
\end{equation*}

So the posterior distribution:

does not have a closed-form solution.

But behold! Polson et al. remind us of this neat result:

\begin{align*}
\frac{\left(e^u\right)^a}{\left(1 + e^u\right)^b} &= \frac{1}{2^b}\, e^{\kappa u}\,\int_0^\infty e^{-\frac{u^2}{2} \omega}\; p(\omega)\, \mathrm{d}\omega\\
\kappa &= a - \frac{b}{2}\\
p(\omega) &= \mathrm{PG}\left(\omega|b, 0\right)
\end{align*}

So if we set , and the following holds:

\begin{align*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) &= \frac{1}{2} \int_0^\infty \exp\left( y_i \boldsymbol{x_i}^T\,\boldsymbol{\beta} - \frac{\left(\boldsymbol{x_i}^T\,\boldsymbol{\beta}\right)^2}{2} \omega\right)\;P(\omega)\; \mathrm{d}\omega\\
&= \frac{1}{2} \int_0^\infty \exp\left( -\frac{\omega}{2} \left( \frac{y_i}{\omega} - \boldsymbol{x_i}^T\,\boldsymbol{\beta}\right)^2\right)\;P(\omega)\; \mathrm{d}\omega\\
\end{align*}

So if we condition on we find that , and hence to be gaaussian (if is draws from )

TODO Simple example with a bimodal distribution

And indicates which mode the sample belongs to. We can use the polyagamma library.

Appendix

Likelihood

Let us derive the likelihood for the Bernoulli and Negative Binomial observation distribution. Assuming that the observations are i.i.d. from the same distribution we can write:

\begin{equation*}
P\left(\boldsymbol{y} | x, \boldsymbol{\beta}\right) = \prod_{i=1}^N P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right)
\end{equation*}

Bernoulli observation distribution

In this case:

\begin{align*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) &= p_i^{\,1-y_i}\,\left(1 - p_i\right)^{y_i}\\
p_i &= \frac{\exp(f_i)}{1 + \exp(f_i)}\\
f_i &= - x_i^T\,\boldsymbol{\beta}
\end{align*}
\begin{align*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) &= p_i^{\,1-y_i}\,\left(1 - p_i\right)^{y_i}\\
&= \left[ \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{\,1-y_i}\,\left[1 - \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{y_i}\\
&= \left[ \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{\,1-y_i}\,\left[\frac{1}{1 + \exp(f_i)}\right]^{y_i}\\
&= \frac{\left( \exp(f_i) \right)^{\,1-y_i}}{1 + \exp(f_i)}\\
&= \frac{\left( \exp(-f_i) \right)^{\,y_i}}{1 + \exp(-f_i)}\\
\end{align*}

So that:

\begin{equation*}
\mathcal{L}(\boldsymbol{\beta}) = \prod_{i=1}^N P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) = \prod_{i=1}^N \frac{\left(\exp\left(\boldsymbol{x}_i^T \boldsymbol{\beta}\right)\right)^{y_i}}{1 + \exp\left(\boldsymbol{x}_i^T \boldsymbol{\beta}\right)}
\end{equation*}
import aesara.tensor as at
 
srng = at.random.RandomStream(0)
X = at.matrix("X")
beta = at.vector("beta")
 
p = at.sigmoid(at.dot(X.T, beta))
Y_rv = srng.bernoulli(p, size=X.shape[0])
y_vv = Y_rv.clone()
 
loglikelihood = y.logprob(y_vv)

Is equivalent to:

X = at.matrix("X")
beta = at.vector("beta")
f = at.dot(X.T, beta)
# loglikelihood = at.prod(at.pow(at.exp(f), y) / (1 + at.dot(exp(f)))...)

Negative binomial observation distribution

If we note the negative binomial’s dispersion parameter, and the number of observations in experiment , then:

\begin{align*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) &= p_i^{\,r}\,\left(1 - p_i\right)^{y_i}\\
p_i &= \frac{\exp(f_i)}{1 + \exp(f_i)}\\
f_i &= - x_i^T\,\boldsymbol{\beta}
\end{align*}

The previous calculation trivially applies to this as well:

\begin{align*}
P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) &\propto p_i^{\,r}\,\left(1 - p_i\right)^{y_i}\\
&= \left[ \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{\,r}\,\left[1 - \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{y_i}\\
&= \left[ \frac{\exp(f_i)}{1 + \exp(f_i)}\right]^{\,r}\,\left[\frac{1}{1 + \exp(f_i)}\right]^{y_i}\\
&= \frac{\left( \exp(f_i) \right)^{\,r}}{\left(1 + \exp(f_i)\right)^{y_i+r}}\\
\end{align*}

So that

\begin{equation*}
\mathcal{L}_i\left(\boldsymbol{\beta}\right) = \prod_{i=1}^N P\left(y_i | \boldsymbol{x_i}, \boldsymbol{\beta}\right) = \prod_{i=1}^N \frac{\left(\exp\left( - \boldsymbol{x}_i^T \boldsymbol{\beta}\right)\right)^{r}}{\left(1 + \exp\left(-\boldsymbol{x}_i^T \boldsymbol{\beta}\right)\right)^{y_i+r}}
\end{equation*}

References