No one really know how to define what a session in a mobile app is, and it is indeed something that is hard to define. Here is a real inter-event time distribution for the iOS and Android version of the same mobile application:

We call the random variable that represents the inter-event time. We found and . Worse, . Is it possible to find a way to delineate sessions that gives consistant results across platforms?

The key is to realize that the above distributions are the results of different processes:

  • Actions within a session. We note the inter-action time
  • Leaving the session and coming back to the application at a later time. We note the inter-session time

It is known that the inter-event time between human actions such as responding to emails follows a Pareto distribution so we will assume that the inter-session time follows a Pareto distribution:

\begin{equation}
  T_{session} \sim \operatorname{Pareto}(\alpha)
\end{equation}

We will assume a distribution with shorter tails yet flexible for inter-actions times:

\begin{equation}
 T_{action} \sim \operatorname{Gamma}(k, \theta)
\end{equation}

So the inter-event time in the data is given by a mixture:

\begin{equation}
  P(T|k,\theta,\alpha) = \omega_{a}\, P(T_{action}|k,\theta) + \omega_{s}\, P(T_{session}|\alpha)
\end{equation}

The implementation in PyMC3 is straightforward:

import pymc3 as pm
import numpy as np
 
data = [insert, your, inter, event, time, data, here]
 
with pm.Model() as mixture_model:
 
    # Define priors
    BoundedNormal = pm.Bound(pm.Normal, lower=0.0)
    k = BoundedNormal('k', mu=20, sd=np.sqrt(data.var()), testval=20)
    theta = pm.HalfCauchy('theta', 5)
    alpha = pm.HalfNormal('alpha', 5)
 
    # Likelihood
    gamma = pm.Gamma.dist(mu=k, sd=theta)
    pareto = pm.Pareto.dist(alpha=alpha, m=0.1)
    w = pm.Dirichlet('weight', a=np.array([1, 1]))
    interval_obs = pm.Mixture('interval_obs', w=w, comp_dists=[gamma,pareto], observed=data)
 
    # Inference
    trace = pm.sample()
 

To delineate sessions we need to find a time scale such that if no actions has been performed since more than we consider the session to be over. We look for the maximum value of such that:

\begin{equation}
  P(T = \Delta t| inter-actions) > \epsilon\:P(T = \Delta t| inter-sessions)
\end{equation}

Where is a constant to dermine (we can probably use decision theory for that). Choosing we found the following posterior distribution for :

In other words, we will consider that a session is still active as long as . We can now take the users’ timeline of events, extract sessions using the threshold, and look at the distribution of session duration of all users. We find that for sessions > 10s the distributions for iOS and Android users match almost perfectly:

Of course there are details to iron out. The definition could be improved by discarding “noise” events (that lead to these very short inter-action times) but the main idea (using a mixture) is here.