Shape semantics for random variables in Aesara

Random variables in Aesara

Random variables are represented in Aesara with the RandomVariable operator, which corresponds to the following mathematical function:

\[ \operatorname{RandomVariable}: \Omega \times \Theta \to E \]

Were \(\Omega\) is the set of available RNG seeds, \(\Theta\) the parameter space. \(E\) is the state space, which roughly corresponds to the support of the corresponding distribution. Given a random seed and parameter values, the operator returns a realization of the random variable it represents, i.e. an element of \(E\).

The recommended way to use random variables in Aesara is via the RandomStream interface, which automatically seeds the random functions. Here, srng.normal defines a single, normally-distributed, random variable. Executing the graph returns a random value drawn from the normal distribution:

import aesara.tensor as at

srng = at.random.RandomStream(0)
x_rv = srng.normal(0, 1)
print(x_rv.eval())

1.4436909546981256

Here srng.normal(0,1) returns a scalar, but the output of RandomVariable operators can have more complex shape. The shape is essentially determined by three things:

The support shape of the random variable;
Broadcasting rules between the parameters of the random variable's distribution;
The size parameter passed to the random function.

In this document we will explain the semantics of shapes in Aesara, and what these shapes represent.

The support shape

The dimensionality of the parameter space and the sample space differ depending on the distribution. For instance, the normal distribution is parametrized by \(\mu \in \mathbb{R}\) and \(\sigma \in \mathbb{R}^+\) and the realizations of the corresponding random variables are scalars \(\in \mathbb{R}\).

import aesara.tensor as at

srng = at.random.RandomStream(0)

mu = 0
sigma = 1
x_rv = srng.normal(mu, sigma)
sample = x_rv.eval()

print(f"sample value: {sample}")
print(f"support shape: {sample.shape}")

sample value: 1.4436909546981256
support shape: ()

We say that the support normal is 0-dimensional and that the support shape is (). The Dirichlet distribution is slightly more complicated: it is parametrized by a vector \(\boldsymbol{\alpha} \in \mathbb{R}^k\) and its realizations are vectors in the k-unit simplex \(\operatorname{\Delta}^k\):

import aesara.tensor as at

srng = at.random.RandomStream(0)

alpha = [1., 3., 4.]
x_rv = srng.dirichlet(alpha)
sample = x_rv.eval()

print(f"sample value: {sample}")
print(f"support shape: {sample.shape}")

sample value: [0.39086221 0.17265609 0.43648169]
support shape: (3,)

The support shape of dirichlet is (k,), with k the length of its parameter \(\alpha\). The multinomial is another interesting example because its parameters have different dimensionalities; it is parametrized by a probability vector \(\boldsymbol{p} \in \Delta^k\), a number of trials \(n \in \mathbb{N}\) and returns a vector in \(\mathbb{N}^k\):

import aesara.tensor as at

srng = at.random.RandomStream(0)

n = 10
p = [.1, .3, .6]
x_rv = srng.multinomial(n, p)
sample = x_rv.eval()

print(f"sample value: {sample}")
print(f"support shape: {sample.shape}")

sample value: [3 2 5]
support shape: (3,)

The support shape of multinomial is (k,) with k the length of the probability vector \(p\). Here we have only considered random variables with a 0- or 1-dimensional sample space, but it can obviously be more complicated. For instance, the random variable with a Wishart density is a function that maps \(\Sigma\) to \(\mathbb{R}^{n \times m}\), and the corresponding support shape is thus (n,m). All you have to remember is that the support shape is the shape of a single element of the state space \(E\).

The support shape and how it relates to the shape of the parameters is explicited in the documentation, where we attach a signature string to each random variable (these are gufunc-like signatures). For instance, for the previous examples we have:

Normal: (), () -> ()
Dirichlet: (n) -> (n)
Multinomial: (), (n) -> (n)

So if you have doubts regarding the support shape of a distribution, check out the signature.

The batch shape

We just saw how to define a single random variable, and how the choice of distribution determined the support shape. In many realistic settings, however, we would like to define several independently distributed random variables at once. Aesara provides two mechanisms to do so: broadcasting the parameters, and the size argument of the RandomVariable operator. The shape induced by this mechanism is called the batch shape.

Batching by broadcasting

Say we want a sample from three independent, normally distributed, random variables with a mean of \(0\), \(3\) and \(5\) respectively. One (cumbersome) way to achieve this is:

import aesara.tensor as at

srng = at.random.RandomStream()
rv_0 = srng.normal(0, 1)
rv_3 = srng.normal(3, 1)
rv_5 = srng.normal(5, 1)
rv = at.stack([rv_0, rv_3, rv_5])

sample = rv.eval()
print(f"sample value: {sample}")
print(f"sample shape: {sample.shape}")

sample value: [1.65040785 1.76749492 5.86773357]
sample shape: (3,)

To simplify this common operation, we can pass arrays as parameters to Aesara's RandomVariable, and the Op will use NumPy broadcasting rules to return an array of samples from independent random variables:

import aesara.tensor as at
import numpy as np

srng = at.random.RandomStream(0)

mean = np.array([0, 3, 5])
rv = srng.normal(mean, 1)

sample = rv.eval()
print(f"sample values: {sample}")
print(f"sample shape: {sample.shape}")

sample values: [1.44369095 2.10405402 5.73595567]
sample shape: (3,)

In this case the batch shape is also (3,); it is the shape of the tensor that contains samples from random variables that are independently distributed and whose distribution belong to the same family. In other words, srng.normal(mean, 1) implicitly represents 3 independent random variables.

We can also use arrays for the standard deviation in this case. Standard broadcasting rules apply to determine the batch shape. For instance, the following fails with a shape mismatch error because the mean and sigma arrays cannot be broadcasted:

import aesara.tensor as at
import numpy as np

srng = at.random.RandomStream(0)

mean = np.array([0, 3, 5])
sigma = np.array([1, 2])
rv = srng.normal(mean, sigma)

try:
    rv.eval()
except ValueError as err:
    print(err)

shape mismatch: objects cannot be broadcast to a single shape
Apply node that caused the error: normal_rv{0, (0, 0), floatX, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7FDB97DFD200>), TensorConstant{[]}, TensorConstant{11}, TensorConstant{[0 3 5]}, TensorConstant{[1 2]})
Toposort index: 0
Inputs types: [RandomGeneratorType, TensorType(int64, (0,)), TensorType(int64, ()), TensorType(int64, (3,)), TensorType(int64, (2,))]
Inputs shapes: ['No shapes', (0,), (), (3,), (2,)]
Inputs strides: ['No strides', (8,), (), (8,), (8,)]
Inputs values: [Generator(PCG64) at 0x7FDB97DFD200, array([], dtype=int64), array(11), array([0, 3, 5]), array([1, 2])]
Outputs clients: [['output'], ['output']]

HINT: Re-running with most Aesara optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the Aesara flag 'optimizer=fast_compile'. If that does not work, Aesara optimizations can be disabled with 'optimizer=None'.
HINT: Use the Aesara flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

In the following they are broadcastable, and np.broadcast(mean, sigma) gives us the batch shape:

import numpy as np

mean = np.array([0, 3, 5])
sigma = np.array([[1, 2, 7], [4, 2, 8]])
print(np.broadcast(mean, sigma).shape)

(2, 3)

Indeed:

import aesara.tensor as at
import numpy as np

srng = at.random.RandomStream(0)

mean = np.array([0, 3, 5])
sigma = np.array([[1, 2, 7], [4, 2, 8]])
rv = srng.normal(mean, sigma)

sample = rv.eval()
print(f"sample values: {sample}")
print(f"batch shape: {sample.shape}")

sample values: [[ 1.44369095  1.20810805 10.15168969]
 [ 0.02350816  4.70676358  6.28758426]]
batch shape: (2, 3)

The normal distribution is fairly simple since its support is 0-dimensional. Let take thus consider the more complex Dirichlet example:

import aesara.tensor as at
import numpy as np

srng = at.random.RandomStream(0)

alpha = np.array([[1., 2., 4.], [3., 5., 7.]])
rv = srng.dirichlet(alpha)
sample = rv.eval()

print(f"sample values: {sample}")
print(f"sample shape: {sample.shape}")

sample values: [[0.42615878 0.09794332 0.4758979 ]
 [0.15408529 0.34781447 0.49810024]]
sample shape: (2, 3)

Which is equivalent to:

import aesara.tensor as at
import numpy as np

srng = at.random.RandomStream(0)

rv1 = srng.dirichlet([1., 2., 4.])
rv2 = srng.dirichlet([3., 5., 7.])
rv = at.stack([rv1, rv2])
sample = rv.eval()

print(f"sample values: {sample}")
print(f"sample shape: {sample.shape}")

sample values: [[0.42615878 0.09794332 0.4758979 ]
 [0.27582652 0.02985376 0.69431972]]
sample shape: (2, 3)

I said more complex, but we actually have a very simple formula to determine the samples shape from the support and batch shapes. If support_shape and batch_shape represent the shape tuples, then:

sample_shape = batch_shape + support_shape

Using the `size` argument to create identically distributed random variables

We also frequently need to define iiid random variables. We could use the previous broadcasting rules to define 3 normally-distributed random variables with mean 0 and variance 1:

import aesara.tensor as at
import numpy as np

srng = at.random.RandomStream(0)

mean = np.zeros(3)
rv = srng.normal(mean, 1)

sample = rv.eval()
print(f"sample values: {sample}")
print(f"sample shape: {sample.shape}")

sample values: [ 1.44369095 -0.89594598  0.73595567]
sample shape: (3,)

But there is a shortcut: the size argument of the RandomVariable operator. In the following code, size allows us to define the same 3 random variables as above in a more concise way:

import aesara.tensor as at
import numpy as np

srng = at.random.RandomStream(0)

rv = srng.normal(0, 1, size=3)

sample = rv.eval()
print(f"sample values: {sample}")
print(f"sample shape: {sample.shape}")

sample values: [ 1.44369095 -0.89594598  0.73595567]
sample shape: (3,)

We can of course do the same thing with the Dirichlet distribution:

import aesara.tensor as at
import numpy as np

srng = at.random.RandomStream(0)

rv = srng.dirichlet([1, 3, 5], size=3)

sample = rv.eval()
print(f"sample values: {sample}")
print(f"sample shape: {sample.shape}")

sample values: [[0.34934376 0.15431609 0.49634016]
 [0.16080299 0.37886972 0.4603273 ]
 [0.21030357 0.42525361 0.36444282]]
sample shape: (3, 3)

In this simple case (no broadcasting), size corresponds to the batch shape. Batch thus refers indistinctly to identifically distributed or differently distributed random variables.

Broadcasting and the `size` argument

The story with the size argument gets more complicated when the RandomVariable operator also needs to broadcast the distribution parameters. But luckily, despite the apparent complexity there is only one simple rule to remember: the size argument to RandomVariable represents the batch shape.

Let us illustrate this rule on an example where parameters are broadcasted and the size argument is used:

import aesara.tensor as at
import numpy as np

srng = at.random.RandomStream(0)

mean = np.array([0, 3, 5])
sigma = np.array([1, 2, 3])
rv = srng.normal(mean, sigma, size=(2, 2, 3))

sample = rv.eval()
print(f"sample values: {sample}")
print(f"batch shape: {sample.shape}")

sample values: [[[1.44369095e+00 1.20810805e+00 7.20786701e+00]
  [5.87704041e-03 4.70676358e+00 5.48284410e+00]]

 [[8.19314690e-01 4.61131137e+00 5.65270195e+00]
  [9.70078743e-01 1.52177388e+00 6.78043377e+00]]]
batch shape: (2, 2, 3)

The batch shape here is given by np.broadcast_shapes(np.broadcast(mean, sigma).shape, size). What we did here was create (2, 2) arrays of identically distributed random variables, one for each of the 3 parametrizations of the probability distribution.

To confirm that size only affects the batch shape, let us give an example with the Dirichlet RandomVariable:

import aesara.tensor as at
import numpy as np

srng = at.random.RandomStream(0)

alpha = np.array([[1., 2., 4.], [3., 5., 7.]])
rv = srng.dirichlet(alpha, size=(5,2))
sample = rv.eval()

print(f"sample values: {sample}")
print(f"sample shape: {sample.shape}")

sample values: [[[0.42615878 0.09794332 0.4758979 ]
  [0.15408529 0.34781447 0.49810024]]

 [[0.0019247  0.11699742 0.88107788]
  [0.18458045 0.34448051 0.47093904]]

 [[0.03488488 0.05031013 0.914805  ]
  [0.30760146 0.30766399 0.38473455]]

 [[0.11179322 0.44319822 0.44500856]
  [0.28814443 0.34356531 0.36829026]]

 [[0.21973253 0.19563397 0.5846335 ]
  [0.20072679 0.42110777 0.37816544]]]
sample shape: (5, 2, 3)

The batch shape is indeed given by np.broadcast_shapes(alpha.shape, size) (and the support shape is (3,)).

Summary

The shape of random tensors in Aesara consists in the composition of two semantically different pieces:

The support shape corresponds to the shape of one element in the state space \(E\);
The batch shape corresponds to the shape of the array of independent random variables \(X_i: \Omega \to E\) where \(i \in \left\{ 1 \dots N\right\}\) that are implicitly created by the RandomVariable; These can be identically or differently distributed.
There are two ways to create independent random variables: via broadcasting of the parameter arrays, or using the size argument to RandomVariable\s. When specified, the size argument must be the batch shape.
The shape of the array returned by the RandomVariable operator is sample_shape = batch_shape + support_shape