Named tensor notation
Personal summary of Named Tensor Notation.
Named tensor notation gives names to the axes in vectors, matrices and tensors. These names are analogous to units in physics: if you keep track of the units while manipulating physical quantities, you’ll catch basic mistakes, and the computation will make more sense overall.
For example, instead of defining the parity check matrix of a code as
- access an individual parity check as
, - access the coefficients for a particular coordinate as
, - access one particular coefficient as
.
Since the axes have names, their order doesn’t matter:
, and have the same effect.
Element-wise operations
You can perform any operation element-wise. For example, given two matrices
In general, the result will inherit all the axes present in either operand. For example, given a “column vector”
To avoid confusion, we can denote element-wise multiplication explicitly by
Element-wise operations are commutative/associative whenever the underlying operations are.
Diagonal matrices
This sometimes allows you to use vectors instead of diagonal matrices. For example, if you want to scale the rows of a matrix
Reductions
Reductions “summarize” the information across one of the axes. For example, for a matrix
is a “row vector” containing the sum of each column, is a “column vector” containing the maxima of each row, is a “row vector” containing the ( -)norm of each column.
Contractions
Many linear algebra operations are contractions: an element-wise multiplication followed by a sum. We denote these using the shorthand
Contractions generalize:
- dot product: for
, we have ; - matrix-vector product: for
and , we have ; - matrix-matrix product: for
and , we have .
Contractions are great because:
- they make it clear what you’re summing over,
- they are commutative,
- they are (mostly) associative: you can turn
into as long as doesn’t have and doesn’t have ;1 - you don’t need to waste time thinking about whether you should take the transpose.
Limitations
Since contractions “type-check” that the axes being summed over are identical, they cannot represent operations like the trace of a matrix (which depends on its two axes having the same length). Because of this, they are a bit weaker than Einstein summation (e.g. np.einsum
), which sums over arbitrary pairs of axes.
Differentiation
Named tensor notation makes it easy to compute some derivatives which if done in matrix form would have required writing out the matrix products explicitly, using the method of differentials.
Let
where
Matrix products
For example, consider a depth-two linear network with a single output
where
which means
More generally, if we have matrices
Note that due to the existence of
Renaming
Given a tensor
This is equivalent to multiplying by the corresponding identity matrix:
Transpose
Renaming is mostly used to create a temporary independent copy of an axis that is present twice in a computation. In that use, renaming is analogous to taking the transpose, and we’ll2 use the shorthand
For example, say you have a matrix
However, the
of each of the original vectors (i.e. we only got the diagonal of the Gram matrix). If we want the result to keep the
Similarly, we can express the second-moment matrix as
In the usual matrix notation, where the Gram matrix would be
#to-write more general remarks about “square matrices”
- and how they’re inconvenient to work with in this notation
- but on the other hand maybe rightfully so?
- would be nice if you could allow duplicated axes for symmetric matrices? but you’d need notation for that anyway, to stop the values from just getting multiplied coordinatewise / the axes from getting merged
Powers
#to-write
- for a matrix
can define its power in an unambiguous way - but eigenvalues are still annoying to deal with? writing
is still very distasteful…- but maybe that goes to show how unnatural the whole concept of eigenvalues is? :P
- and it serves as a good reminder that you can’t just write
(since that would mean removing from every entry, not just the diagonal)
Fixing associativity
Suppose you have a product of the form
where
(where the second equality holds as long as
-
The former problem seems unsolvable, since the result should be summing over
’s , but after associating is now sitting on the outside with as the only reduction (and also it might be that doesn’t have ). The latter problem can be solved by renaming in before associating. ↩ -
This shorthand is not part of the original specification. ↩