Adapted from lecture 12 of Ryan O’Donnell’s Fall 2017 graduate complexity theory class at CMU, lecture 12c of his Spring 2020 “CS theory toolkit” class (also at CMU), and sections 1.2-1.3 of the lecture notes for Emanuele Viola’s Fall 2017 “Special topics in complexity theory” class at Northeastern.

A distribution D over {0,1}N is k-wise uniform if looks uniform over every set of k variables.1 Similarly, a random hash function h:{0,1}n{0,1} is k-wise uniform if the hashes h(x1),,h(xk) of k distinct inputs are distributed uniformly. In fact (and this took me way too much time to realize), this is the same notion: a k-wise uniform hash function over {0,1}n corresponds to a k-wise uniform distribution over {0,1}2n: just interpret each of the 2n outputs

h(0,,0),,h(1,,1)

as one coordinate of the distribution. So we will use them interchangeably.

This is equivalent to fooling tests of degree k. Indeed,

  • if p has degree k, then we can decompose ExD[p(x)] into the expectation of its monomials, and those must be fooled since they each depend on only k variables;
  • conversely, if PrxD[xS=v]2k for some set S of k variables and some assignment v{0,1}S, then the polynomial p(x):=1[xS=v] has degree k and isn’t fooled.

In this note, we’ll try to build k-wise uniform distributions and hash functions that can be efficiently computed from a short random seed.

1-wise uniformity

The case k=1 is trivial: we just need to make sure that individually, each coordinate of the distribution is 0 or 1 with probability 1/2 each. So we can let D be uniform over the two strings (0,,0) and (1,,1). In terms of hash functions, we can let h be the all-zeroes or all-ones function with probability 1/2 each.

This requires only 1 bit of randomness: the seed length is 1.

Pairwise uniformity

When k=2, in addition to making sure h(x)=1 with probability 1/2 for each x, we also need to make sure that the outputs h(x),h(y) of two distinct points xy are equal as often as they’re unequal. In particularly, h(x) and h(y) shouldn’t always have the same value.

In the abstract, how would you try to design a hash function h that avoids pairwise collisions? One attempt might be to do some kind of “parity check”, say XOR all the bits together. But for whatever parity check you come up with, there will be some strings xy that nevertheless map to the same value.

On the other hand, taking a random parity check seems more promising: that is, we’ll draw a random “parity check” string r{0,1}n, then hash x to

h(x):=rx=r1x1rnxn.

If xy, then their hashes will differ with probability 1/2: indeed, the “difference” is h(x)h(y)=r(xy), and if you take a random parity check of a nonzero string, it will be 1 with probability 1/2.

This doesn’t give us pairwise uniformity, though: the issue is that h(0n) is always 0. Luckily, this is the only issue, because for every x that’s not the all-zeros string, we do have h(x)=1 with probability 1/2. In fact, this means we already have a pairwise-uniform distribution over N:=2n1 coordinates given by {h(x)}x{0,1}n{0n}.

To get pairwise uniformity, it’s enough to avoid the all-zeros string by first mapping each x to (x1,,xn,1){0,1}n+1 then taking a random parity check for r{0,1}n+1. This is equivalent to drawing a random string r{0,1}n and a random bit b{0,1}, and mapping x to

h(x):=(rx)b=r1x1rnxnb.

This requires seed length n+1 for hash functions, and log(N+1) for distributions. And it’s very easy to compute. :)

Dual distance

It’s fairly natural to try and achieve k-wise uniformity by making the distribution D uniform over a linear code CF2N. Indeed, a linear code C is uniform over a set S of coordinates if and only if it is “complete over S”, i.e. C|S={0,1}S. In particular, this means that D will be k-wise uniform as long as any k columns in the generator matrix of C are linearly independent.

For example, the distribution with N:=2n1 coordinates we described in the previous section is the Hadamard code: when n=3, it is generated by the matrix G of all nonzero vectors x{0,1}3

G:=(000111101100111010101),

with C:={rGr{0,1}3}. It’s pairwise uniform because any two distinct nonzero vectors x,y{0,1}3 are linearly independent.

The condition “any k columns in G are linearly independent” can be rephrased as “no nonzero vector of Hamming weight k is perpendicular to all rows of G”: such a vector would give the coefficients of a linear combination of k columns that sums to zero. Since the vectors perpendicular to all rows of G form the dual code C,2 this is also equivalent to the dual code having distance >k (we call this distance the dual distance).

This means that to find a small k-wise uniform distribution D, it’s enough to find a large linear code C with distance >k: the parity checks of C will be the generators of C, so their number is the seed length needed. In the example above, C is the Hamming code, which has distance >2 and has the 3 parity checks indicated by G.

General case: Reed-Solomon codes

The Hamming bound tells us that a code of distance >k can only cover a NΩ(k) fraction of F2N. So in the best case, C would need Ω(klogN) parity checks, and thus the seed length would be at least Ω(klogN) in the best case.

Embarrasingly, I don’t know how to match the Hamming bound in general over F2, but I do know how to do this over larger fields such as F2n. And fortunately, if we have a k-wise uniform distribution over (F2n)m, we still get a k-wise uniform distribution over {0,1}nm by interpreting the elements of F2n as strings in {0,1}n and concatenating them together.

Specifically, Reed-Solomon codes are great at achieving dual distance. Pick a field F of size m, and fix m distinct elements f1,,fmF in it (the “evaluation points”). Let F[X]<k be the set of all polynomials p(x)=a0+a1x++ak1xk1 of degree <k. Consider the code C where each codeword is the values of some polynomial p at the evaluation points f1,,fm:

C:={(p(f1),,p(fm)) | pF[X]<k}.

Polynomials of degree <k are uniquely determined by their value at k distinct points, so C is complete on every set of k points, and therefore the uniform distribution on C is k-wise uniform.

Each coefficient ai takes log|F| bits of randomness, so by setting m:=N/logN and F:=F2logN, we get a k-wise uniform distribution over {0,1}N with seed length O(klogN) as hoped, and it’s very easy to compute.

Note: If you really care, you can get seed length k/2logN+O(1) from BCH codes, and you can generalize that to k/2logqN+O(1) if both the seeds and the outputs are over Fq instead of {0,1}.

Lower bounds

We seemed to get the best possible seed length we could have gotten from linear codes, but maybe there was some a more efficient way to do things, with a seed length s much smaller than klogN?

Let’s start with a very basic observation. Consider the matrix M{0,1}2s×N whose rows are all possible outcomes of the distribution. If the distribution is pairwise uniform, then at the very least, no two columns shouldn’t be completely identical: otherwise those two coordinates would be perfectly correlated. There are only 22s possible columns, so we get N22ssloglogN.

But we have a stronger guarantee than this: for any two columns v,w{0,1}2s, we should have that vi=wi as often as viwi. If we switch the domain from {0,1} to {±1}, this means their inner product vw is 0: v and w are orthogonal. A space of dimension 2s cannot have more than 2s pairwise orthogonal vectors, so we have N2sslogN.

More generally, if the distribution is k-wise uniform, then for any k columns v(1),,v(k), the k-tuple (vi(1),,vi(k)) should take every value in {±1}k equally often. If we split them into two groups (assume k is even), then the products vi(1)vi(k/2) and vi(k/2+1)vi(k) should be equal as often as they’re unequal. This means that the element-wise products v(1)v(k/2) and v(k/2)+1v(k) are orthogonal. There are (Nk/2) such products of k/2 columns, so by the same argument3 we get (Nk/2)2ssk/2log(N/k).

  1. i.e. for every S([N]k), the distribution of xS for xD is uniform over {0,1}S 

  2. The dual code C the set of all vectors perpendicular to C: all possible “parity checks” of C. This is also the code whose generating matrix is C’s parity check matrix and vice versa. 

  3. There is a slight catch, which is that for the argument to go through, we also need that v(1)v(k/2) is orthogonal to v(k/2+1)v(k) when there is partial overlap between v(1),,v(k/2) and v(k/2+1),,v(k) (i.e. some of those are the same columns in M). But this is not an issue: if say v(j)=v(j) for jk/2<j, then then in the inner product (v(1)v(k/2))(v(k/2)+1v(k))=ivi(1)vi(k), the corresponding factors vi(j) and vi(j) will cancel out, so you can just get rid of both j and j and apply the same argument to products of fewer than k/2 columns, which must also be orthogonal.