# Predict the result of French elections

Sorry but this is a very rough draft written for myself.

## Introduction

Dimensions:

• Elections $$e = 1, \dots, E$$
• Parties $$p = 1, \dots, P$$
• Pollsters $$h=1, \dots, H$$
• Time $$t = t_0-N, \dots, t_0$$

Latent variables:

• Intercept with polls $$\iota$$
• Latent party $$p$$ popularity at day $$t$$ with polls: $$\mu_{p, t}$$
• House effect for model with polls $$\alpha_k$$ where $$k=1, \dots, n_{pollsters}$$
• Intercept with results and fundamentals $$\tilde{\iota}$$
• House effect for model with results and fundamentals $$\tilde{\alpha}_k$$
• Poll biais (what is this?)
• Latent party $$m$$ popularity at day $$t$$ with fundamentals: $$\tilde{\mu}_{m, t}$$

We integrate two different models:

• A model that aggregates polls and tries to infer the "true" intentions
• A model that uses fundamental data to predict the results on election days
• Both models are integrated as we relate the results to the "true" intention at time $$T$$ of the election, which is connected to the intentions at previous time steps.

We use Gaussian processes to model the time evolution of the different parameters. However:

• We use 1D gaussian processes, one for each party where we could use a multidimensional GP with a dense covariance matrix instead (and thus model the 'transfers');
• We use the Squared exponential kernel but the Ornstein-Uhlenbeck kernel should be more adaptated as a stochastic process. We could also try a non-stationary kernel as the Wiener kernel (I don't see why the distribution should be stationary here)
• The value of the parameter is the sum of three parameters modeled by GPs with different timescales. Can we do better than this?

## Intercepts

\begin{align*} \sigma_{\iota} &\sim \operatorname{HalfNormal}(0.5)\\ \iota_{e,p} &\sim \operatorname{ZeroSumNormal}(0, \sigma_{\iota}) \end{align*} \begin{align*} \sigma_{\tilde{\iota}} &\sim \operatorname{HalfNormal}(0.5)\\ \tilde{\iota}_{p} &\sim \operatorname{ZeroSumNormal}(0, \sigma_{\tilde{\iota}}) \end{align*}

## House effect

The systemic poll biais shared by every pollster for each political party:

$$\zeta_{p} \sim \operatorname{ZeroSumNormal}(0, 0.15)$$

The house effet per party

$$\epsilon_{h,p} \sim \operatorname{ZeroSumNormal}(0, 0.15)$$

And the house effect per (election, party)

\begin{align*} \sigma_{\tilde{\epsilon}, h, p} &\sim \operatorname{HalfNormal}(0.15)\\ \tilde{\epsilon}_{h, p, e} &= \sigma_{\tilde{\epsilon}, h, p} \;\operatorname{ZeroSumNormal}(0, 1) \end{align*}

## Fundamental data

Idea that elections are simple to predict using fundamental data. Here we model the unemployment effect $$\nu_u$$:

$$\nu_u \sim \operatorname{ZeroSumNormal}(0, 0.15)$$

## Time evolution

We model the time evolution of parties' latent popularity with 3 gaussian processes with different length scales to catch the different time scales of the process.

## Combine the factors

### Poll aggregator

$$\lambda_{h, t, e, p} = \tilde{\iota}_{p} + \iota_{e,p} + \mu_{t,p} + \tilde{\mu}_{t,e,p} + \zeta_{u} \; U_{t} + \zeta_{p} + \epsilon_{h,p} + \tilde{\epsilon}_{h,p,e}$$

We then note the vector $$\mathbf{p}_{h, t, e} = \left(p_{h,t,e,Green}, \dots, p_{h,t,e,Left}\right)$$ and write

$$\mathbf{p}_{h, t, e} = \operatorname{Softmax}(\lambda_{h, t, e})$$

The latent popularity is given by removing the house effects & poll biases:

$$\mathbf{p}^{latent}_{h,t,e} = \operatorname{Softmax}\left(\tilde{\iota}_{p} + \iota_{e,p} + \mu_{tp} + \tilde{\mu}_{t,e,p} + \nu_{u} \; U_{t}\right)$$

### Fundamentals model

$$\tilde{p}_{h, t, e, p} = \operatorname{Softmax}\left(\tilde{\iota}_{p} + \iota_{e,p} + \mu_{t_0,p} + \tilde{\mu}_{t_0,e,p} + \nu_{u} \; U_{t_0}\right)$$

## Connect to poll results and election results

The concentration parametrer:

$$\alpha \sim \operatorname{InverseGamma}(1000, 100)$$

We note $$n_{h, p, t, e}$$ the result of a poll at time $$t$$ for party $$p$$, and $$N_{t}$$ the number of respondents:

$$n_{h, p,t, e} \sim \operatorname{DirichletMultinomial}(\alpha\,p_{hpte}, N_{t})$$

We note $$r_{pe}$$ the result for party $$p$$ at election $$e$$, $$R_e$$ the number of voters and we write

$$r_{pe} \sim \operatorname{DirichletMultinomial}\left(\alpha\;\tilde{p}_{pe}, R_{e}\right)$$

Q: What if there are several polls in one day?