API design

A little friction in your design is good.

In python there are two ways to enforce this. First with keyword-only arguments:

def function(a, **, new, this):
    pass

function(0, new=1, this=2)

Then with namedtuples or dataclasses:

class Parameters(NamedTuple):
    a: int
    b: str

@dataclass
class Parameters:
    a: int
    b: str

Avoid **kwargs at all cost

They are hard to document. Use dataclasses or namedtuples instead (similar to the Go way of doing things). This also has the advantage of putting the emphasis on your data structure, the interface with the outside world. A way to control the data (can add validation when initializing a dataclass for instance)

Difference between a framework and a library

  • Designed to do specific things
  • Composes with other libraries (composability)
  • Modularity
  • Framework already contain the base flow & architecture of the application you're building (Flask vs http library)

Build the conceptually right thing first, the abstraction later

Naming is 90% of the work

Take it seriously.

Always be thinking "How much documentation am I going to have to write for this?"

This is linked to naming. Ideally function name and name of the arguments should suffice.

Writig the function is only 50% of the work

This is junior's most frequent error. Once the code has been written they consider their job done. But you need to keep re-writing the code until it makes sense to someone else.

Programming is writing

Hunmanities students make better programmers. Stop thinking of programming as a scientific discipline (I mean it slightly is for machine learning). You are writing.

But mostly reading

Code is read more often than written.

OOP is usually the path to broken abstractions

When you create objects, you are populating a world. Bad APIs create a fantasy world, a collections of objects that have no bearing on reality.

Is Machine Learning really about creating objects? What's an algorithm? Can you eat it?

"There is an object X that does Y" vs "When I take X and apply this transformation I get Z" Stateless. If I run it twice with the same input it gives me the same result.

Sometimes combinations of transformations are useful and we give them a name: a new object or a new function.

What are you preventing your users from doing?

We are often thinking in terms of what our API allows users to do, less in terms about what it prevents them from doing. In some cases creating a sandbox is good (framework), but sometimes it is frustrating (library).

You need to have trap doors, way to escape your leaky abstractions

The Julia language is great with that. You can always act on the abstraction layer below.

A model is a combination of arrays and operations on arrays.

Write that first.

API = abstraction. All abstractions suck.

Naming

There is a joke in CS that naming is the hardest thing in programming along with cache invalidation. Except it is not a joke. Naming is hard, and good names go a loooong way.

Why do poiticians use vague terms? Allow people to project whatever people want on them. Don't code like a politician. Be pedantic.

Favorite APIs

  • numpy, I rarely have any surprises with functions I discover for the first time;
  • schedule
  • jax

Choose COMPOSITION OVER CONFIGURATION

Interchangeable elements should have the same signature

Please do not reinvent the wheel!

In doubt do like numpy. Example of theano is partially hard to use because it uses different naming and type casting conventions.

You are not going to do better than numpy; if you do no one will adopt your conventions.

import jax.numpy as jnp and import theano.tensor as tt

JAX feels like the natural extension of numpy to which it adds:

  • JIT-compilation jax.jit(fn)
  • SIMD mapping jax.vmap(fn) or jax.pmap(fn)
  • forward and backward autodiff: jax.grad(fn)
def function(x):
    y = jnp.exp(x)
    return y

Modular internals may not make a difference to end users, but it does to maintainer. Do not only think about the user!

Links to this note