# Shape semantics for random variables in Aesara

## Random variables in Aesara

Random variables are represented in Aesara with the `RandomVariable`

operator, which corresponds to the following mathematical function:

\[ \operatorname{RandomVariable}: \Omega \times \Theta \to E \]

Were \(\Omega\) is the set of available RNG seeds, \(\Theta\) the parameter space. \(E\) is the state space, which roughly corresponds to the support of the corresponding distribution. Given a random seed and parameter values, the operator returns a **realization** of the random variable it represents, i.e. an element of \(E\).

The recommended way to use random variables in Aesara is via the `RandomStream`

interface, which automatically seeds the random functions. Here, `srng.normal`

defines a single, normally-distributed, random variable. Executing the graph returns a random value drawn from the normal distribution:

import aesara.tensor as at srng = at.random.RandomStream(0) x_rv = srng.normal(0, 1) print(x_rv.eval())

1.4436909546981256

Here `srng.normal(0,1)`

returns a scalar, but the output of `RandomVariable`

operators can have more complex shape. The shape is essentially determined by three things:

- The
*support shape*of the random variable; *Broadcasting rules*between the parameters of the random variable's distribution;- The
`size`

parameter passed to the random function.

In this document we will explain the semantics of shapes in Aesara, and what these shapes represent.

## The support shape

The dimensionality of the parameter space and the sample space differ depending on the distribution. For instance, the normal distribution is parametrized by \(\mu \in \mathbb{R}\) and \(\sigma \in \mathbb{R}^+\) and the realizations of the corresponding random variables are scalars \(\in \mathbb{R}\).

import aesara.tensor as at srng = at.random.RandomStream(0) mu = 0 sigma = 1 x_rv = srng.normal(mu, sigma) sample = x_rv.eval() print(f"sample value: {sample}") print(f"support shape: {sample.shape}")

sample value: 1.4436909546981256 support shape: ()

We say that the support `normal`

is 0-dimensional and that the support shape is `()`

. The Dirichlet distribution is slightly more complicated: it is parametrized by a vector \(\boldsymbol{\alpha} \in \mathbb{R}^k\) and its realizations are vectors in the k-unit simplex \(\operatorname{\Delta}^k\):

import aesara.tensor as at srng = at.random.RandomStream(0) alpha = [1., 3., 4.] x_rv = srng.dirichlet(alpha) sample = x_rv.eval() print(f"sample value: {sample}") print(f"support shape: {sample.shape}")

sample value: [0.39086221 0.17265609 0.43648169] support shape: (3,)

The support shape of `dirichlet`

is `(k,)`

, with `k`

the length of its parameter \(\alpha\). The multinomial is another interesting example because its parameters have different dimensionalities; it is parametrized by a probability vector \(\boldsymbol{p} \in \Delta^k\), a number of trials \(n \in \mathbb{N}\) and returns a vector in \(\mathbb{N}^k\):

import aesara.tensor as at srng = at.random.RandomStream(0) n = 10 p = [.1, .3, .6] x_rv = srng.multinomial(n, p) sample = x_rv.eval() print(f"sample value: {sample}") print(f"support shape: {sample.shape}")

sample value: [3 2 5] support shape: (3,)

The support shape of `multinomial`

is `(k,)`

with `k`

the length of the probability vector \(p\). Here we have only considered random variables with a 0- or 1-dimensional sample space, but it can obviously be more complicated. For instance, the random variable with a Wishart density is a function that maps \(\Sigma\) to \(\mathbb{R}^{n \times m}\), and the corresponding support shape is thus `(n,m)`

. All you have to remember is that *the support shape is the shape of a single element of the state space \(E\)*.

The support shape and how it relates to the shape of the parameters is explicited in the documentation, where we attach a `signature`

string to each random variable (these are gufunc-like signatures). For instance, for the previous examples we have:

- Normal:
`(), () -> ()`

- Dirichlet:
`(n) -> (n)`

- Multinomial:
`(), (n) -> (n)`

So if you have doubts regarding the support shape of a distribution, check out the *signature*.

## The batch shape

We just saw how to define a single random variable, and how the choice of distribution determined the *support shape*. In many realistic settings, however, we would like to define several independently distributed random variables at once. Aesara provides two mechanisms to do so: broadcasting the parameters, and the `size`

argument of the `RandomVariable`

operator. The shape induced by this mechanism is called the *batch shape*.

### Batching by broadcasting

Say we want a sample from three independent, normally distributed, random variables with a mean of \(0\), \(3\) and \(5\) respectively. One (cumbersome) way to achieve this is:

import aesara.tensor as at srng = at.random.RandomStream() rv_0 = srng.normal(0, 1) rv_3 = srng.normal(3, 1) rv_5 = srng.normal(5, 1) rv = at.stack([rv_0, rv_3, rv_5]) sample = rv.eval() print(f"sample value: {sample}") print(f"sample shape: {sample.shape}")

sample value: [1.65040785 1.76749492 5.86773357] sample shape: (3,)

To simplify this common operation, we can pass arrays as parameters to Aesara's `RandomVariable`

, and the `Op`

will use NumPy broadcasting rules to return an array of samples from independent random variables:

import aesara.tensor as at import numpy as np srng = at.random.RandomStream(0) mean = np.array([0, 3, 5]) rv = srng.normal(mean, 1) sample = rv.eval() print(f"sample values: {sample}") print(f"sample shape: {sample.shape}")

sample values: [1.44369095 2.10405402 5.73595567] sample shape: (3,)

In this case the *batch shape* is also `(3,)`

; it is the shape of the tensor that contains samples from random variables that are independently distributed and whose distribution belong to the same family. In other words, `srng.normal(mean, 1)`

implicitly represents 3 independent random variables.

We can also use arrays for the standard deviation in this case. Standard broadcasting rules apply to determine the batch shape. For instance, the following fails with a shape mismatch error because the `mean`

and `sigma`

arrays cannot be broadcasted:

import aesara.tensor as at import numpy as np srng = at.random.RandomStream(0) mean = np.array([0, 3, 5]) sigma = np.array([1, 2]) rv = srng.normal(mean, sigma) try: rv.eval() except ValueError as err: print(err)

shape mismatch: objects cannot be broadcast to a single shape Apply node that caused the error: normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FDB97DFD200>), TensorConstant{[]}, TensorConstant{11}, TensorConstant{[0 3 5]}, TensorConstant{[1 2]}) Toposort index: 0 Inputs types: [RandomGeneratorType, TensorType(int64, (0,)), TensorType(int64, ()), TensorType(int64, (3,)), TensorType(int64, (2,))] Inputs shapes: ['No shapes', (0,), (), (3,), (2,)] Inputs strides: ['No strides', (8,), (), (8,), (8,)] Inputs values: [Generator(PCG64) at 0x7FDB97DFD200, array([], dtype=int64), array(11), array([0, 3, 5]), array([1, 2])] Outputs clients: [['output'], ['output']] HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'. HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

In the following they are broadcastable, and `np.broadcast(mean, sigma)`

gives us the batch shape:

import numpy as np mean = np.array([0, 3, 5]) sigma = np.array([[1, 2, 7], [4, 2, 8]]) print(np.broadcast(mean, sigma).shape)

(2, 3)

Indeed:

import aesara.tensor as at import numpy as np srng = at.random.RandomStream(0) mean = np.array([0, 3, 5]) sigma = np.array([[1, 2, 7], [4, 2, 8]]) rv = srng.normal(mean, sigma) sample = rv.eval() print(f"sample values: {sample}") print(f"batch shape: {sample.shape}")

sample values: [[ 1.44369095 1.20810805 10.15168969] [ 0.02350816 4.70676358 6.28758426]] batch shape: (2, 3)

The normal distribution is fairly simple since its support is 0-dimensional. Let take thus consider the more complex Dirichlet example:

import aesara.tensor as at import numpy as np srng = at.random.RandomStream(0) alpha = np.array([[1., 2., 4.], [3., 5., 7.]]) rv = srng.dirichlet(alpha) sample = rv.eval() print(f"sample values: {sample}") print(f"sample shape: {sample.shape}")

sample values: [[0.42615878 0.09794332 0.4758979 ] [0.15408529 0.34781447 0.49810024]] sample shape: (2, 3)

Which is equivalent to:

import aesara.tensor as at import numpy as np srng = at.random.RandomStream(0) rv1 = srng.dirichlet([1., 2., 4.]) rv2 = srng.dirichlet([3., 5., 7.]) rv = at.stack([rv1, rv2]) sample = rv.eval() print(f"sample values: {sample}") print(f"sample shape: {sample.shape}")

sample values: [[0.42615878 0.09794332 0.4758979 ] [0.27582652 0.02985376 0.69431972]] sample shape: (2, 3)

I said *more complex*, but we actually have a very simple formula to determine the samples shape from the support and batch shapes. If `support_shape`

and `batch_shape`

represent the shape tuples, then:

sample

_{shape}= batch_{shape}+ support_{shape}

### Using the `size`

argument to create identically distributed random variables

We also frequently need to define iiid random variables. We could use the previous broadcasting rules to define 3 normally-distributed random variables with mean 0 and variance 1:

import aesara.tensor as at import numpy as np srng = at.random.RandomStream(0) mean = np.zeros(3) rv = srng.normal(mean, 1) sample = rv.eval() print(f"sample values: {sample}") print(f"sample shape: {sample.shape}")

sample values: [ 1.44369095 -0.89594598 0.73595567] sample shape: (3,)

But there is a shortcut: the `size`

argument of the `RandomVariable`

operator. In the following code, `size`

allows us to define the same 3 random variables as above in a more concise way:

import aesara.tensor as at import numpy as np srng = at.random.RandomStream(0) rv = srng.normal(0, 1, size=3) sample = rv.eval() print(f"sample values: {sample}") print(f"sample shape: {sample.shape}")

sample values: [ 1.44369095 -0.89594598 0.73595567] sample shape: (3,)

We can of course do the same thing with the Dirichlet distribution:

import aesara.tensor as at import numpy as np srng = at.random.RandomStream(0) rv = srng.dirichlet([1, 3, 5], size=3) sample = rv.eval() print(f"sample values: {sample}") print(f"sample shape: {sample.shape}")

sample values: [[0.34934376 0.15431609 0.49634016] [0.16080299 0.37886972 0.4603273 ] [0.21030357 0.42525361 0.36444282]] sample shape: (3, 3)

In this simple case (no broadcasting), `size`

corresponds to the *batch shape*. `Batch`

thus refers indistinctly to identifically distributed or differently distributed random variables.

### Broadcasting and the `size`

argument

The story with the `size`

argument gets more complicated when the `RandomVariable`

operator also needs to broadcast the distribution parameters. But luckily, despite the apparent complexity there is only one simple rule to remember: *the size argument to RandomVariable represents the batch shape*.

Let us illustrate this rule on an example where parameters are broadcasted and the `size`

argument is used:

import aesara.tensor as at import numpy as np srng = at.random.RandomStream(0) mean = np.array([0, 3, 5]) sigma = np.array([1, 2, 3]) rv = srng.normal(mean, sigma, size=(2, 2, 3)) sample = rv.eval() print(f"sample values: {sample}") print(f"batch shape: {sample.shape}")

sample values: [[[1.44369095e+00 1.20810805e+00 7.20786701e+00] [5.87704041e-03 4.70676358e+00 5.48284410e+00]] [[8.19314690e-01 4.61131137e+00 5.65270195e+00] [9.70078743e-01 1.52177388e+00 6.78043377e+00]]] batch shape: (2, 2, 3)

The batch shape here is given by `np.broadcast_shapes(np.broadcast(mean, sigma).shape, size)`

. What we did here was create `(2, 2)`

arrays of identically distributed random variables, one for each of the `3`

parametrizations of the probability distribution.

To confirm that `size`

only affects the batch shape, let us give an example with the Dirichlet `RandomVariable`

:

import aesara.tensor as at import numpy as np srng = at.random.RandomStream(0) alpha = np.array([[1., 2., 4.], [3., 5., 7.]]) rv = srng.dirichlet(alpha, size=(5,2)) sample = rv.eval() print(f"sample values: {sample}") print(f"sample shape: {sample.shape}")

sample values: [[[0.42615878 0.09794332 0.4758979 ] [0.15408529 0.34781447 0.49810024]] [[0.0019247 0.11699742 0.88107788] [0.18458045 0.34448051 0.47093904]] [[0.03488488 0.05031013 0.914805 ] [0.30760146 0.30766399 0.38473455]] [[0.11179322 0.44319822 0.44500856] [0.28814443 0.34356531 0.36829026]] [[0.21973253 0.19563397 0.5846335 ] [0.20072679 0.42110777 0.37816544]]] sample shape: (5, 2, 3)

The batch shape is indeed given by `np.broadcast_shapes(alpha.shape, size)`

(and the support shape is `(3,)`

).

## Summary

The shape of random tensors in Aesara consists in the composition of two semantically different pieces:

- The
**support shape**corresponds to the shape of one element in the state space \(E\); - The
**batch shape**corresponds to the shape of the array of independent random variables \(X_i: \Omega \to E\) where \(i \in \left\{ 1 \dots N\right\}\) that are implicitly created by the`RandomVariable`

; These can be identically or differently distributed. - There are two ways to create independent random variables: via broadcasting of the parameter arrays, or using the
`size`

argument to`RandomVariable`

\s. When specified, the`size`

argument must be the batch shape. - The shape of the array returned by the
`RandomVariable`

operator is`sample_shape = batch_shape + support_shape`