# Marginalize over discrete parameters

Marginalization is an operation that happens at the level of the probability density. First, we could add a `marginalize` function that takes a density and a set of values to marginalize out, and rewrites the density to return its marginalized counterpart:

```import aeppl

logprob, (y_vv, i_vv) = aeppl.joint_logprob(Y_rv, i_rv)
marginalized_logprob = aeppl.marginalize(logprob, i_vv)
```

Another solution is to add `marginalize` keyword to `joint_logprob`:

```import aeppl

logprob, (y_vv,) = aeppl.joint_logprob(Y_rv, i_rv, marginalize=(i_rv,))
```

What this keyword hides is a function that acts at the measure level in AePPL's intermediate representation. The availability of this intermediate representation in `joint_logprob` makes it easier to perform marginalization at this level. The internals of `joint_logprob` would then look like in pseudo-code:

```def joint_logprob(*rvs, *, to_marginalize):
rvs_to_values = aeppl.internals.create_value_variables(rvs)
measures = aeppl.internals.to_ir(rvs_to_values)
marginalized_measures = aeppl.internals.marginalize(measures, to_marginalize)
logdensity = aeppl.internals.disintegrate(marginalized_measures)
```

This makes me think that AePPL's intermediate representation should be a first-class citizen.

## Different examples of marginalization

### Switchpoint model

Consider the following example from the Stan documentation:

```import aesara
import aesara.tensor as at

srng = at.random.RandomStream(0)

r_e = at.scalar('r_e')
r_l = at.scalar('r_l')
T = at.iscalar('T')

e_rv = srng.exponential(r_e)
l_rv = srng.exponential(r_l)
s_rv = srng.integers(1, T)

t = at.arange(1, T)
rate = at.where(at.ge(s_rv, t), e_rv, l_rv)
D_rv = srng.poisson(rate)

# Draw from the prior predictive distribution
fn = aesara.function([r_e, r_l, T], D_rv)
print(fn(1., 3., 10))
```

Here we can marginalize over `integers` to ease sampling, and recover the posterior distribution using posterior predictive sampling.