Szemerédi’s regularity lemma tells us that we can split a graph into O(1) equal parts V=V1VM such that the subgraphs between (almost) each pair of parts (Vi,Vj) are “homogenous”: the edge densities between any two large subsets AVi and BVj are all roughly the same.

In other words, for each such pair (Vi,Vj), it’s a bit as if each edge eVi×Vj were included at random with some fixed probability d(Vi,Vj), the edge density between Vi and Vj. If the edges were actually drawn at random in this way (and ignoring the fact that this is only true for almost all pairs), then we could say lots of cool things about the graph!

For example, we could easily get an estimate on the number of triangles: for every triple (Vi,Vj,Vk) of parts (not necessarily distinct), the fraction of triples (u,v,w)Vi×Vj×Vk that are triangles would be very close to the product of the densities d(Vi,Vj)d(Vi,Vk)d(Vj,Vk). And in particular, whenever this product of densities is 0, you’d get Ω(n3) triangles, because the parts all have size Ω(n). So either the graph has no triangles at all, or it has lots of them; there’s no in-between!

The graph counting lemma and the reduced graph realize those two dreams.

Graph counting lemma

We’ll first show that regularity allows us to find many triangles, then generalize to counting the occurrences of any constant-size graph.

Finding triangles

Above, we argued that if the edges between three parts X,Y,Z were drawn at random, then there should be roughly d(X,Y)d(X,Z)d(Y,Z)|X||Y||Z| triangles. Let’s show that we get roughly that many triangles as long as (X,Y), (X,Z) and (Y,Z) are ϵ-regular: as long as the densities d(X,Y),d(X,Z) are 2ϵ, then we have

(12ϵ)(d(X,Y)ϵ)(d(X,Z)ϵ)(d(Y,Z)ϵ)|X||Y||Z|

triangles.

We’ll first start by finding a bunch of wedges starting from X: vertices xX that have edges (x,y) and (x,z) for many yY and zZ. Then we’ll complete those wedges to triangles by finding edges (y,z) for many of those wedges.

Formally, at most an ϵ fraction of vertices xX can be linked to less than a d(X,Y)ϵ fraction of Y (and similarly for Z): otherwise, those poorly-connected vertices x would form a large subset XX for which the density d(X,Y) is abnormally low. By a union bound, this means that a 12ϵ fraction of vertices xX are connected to both a d(X,Y)ϵ fraction of Y and a d(X,Z)ϵ fraction of Z.

Let’s name those fractions Yx and Zx. Since d(X,Y),d(X,Z)2ϵ these fractions Yx,Zx are big enough for regularity between Y and Z to apply, so the density between Yx,Zx is at least d(Y,Z)ϵ, and each edge between Yx and Zx gives us a triangle. So in total, we have

(12ϵ)|X|\# good x(d(X,Y)ϵ)|Y||Yx|(d(X,Z)ϵ)|Z||Zx|(d(Y,Z)ϵ)d(Yx,Zx)

triangles.

General case

Informally, for any fixed graph H, the number of copies of H within G that “lie within the regular parts” is roughly the number we’d expect given the densities between the parts.

Lemma

Let H be a graph over [k] and let ϵ>0. Let V1,,Vk be a partition such that (Vi,Vj) is ϵ-regular whenever (i,j) is an edge in H. Then, the number of labeled copies1 of H is

((i,j)E(H)d(Vi,Vj)±ϵe(H))i=1k|Vk|.

Note: the factor e(H) in the error comes from the fact that the ϵ-regularity is applied e(H) times.

Proof

TODO (it was a beautiful telescoping sum)

Reduced graph

TODO: rewrite

For any ϵ, Szemerédi’s regularity lemma can split any graph into Oϵ(1) equal parts V=V1,,VK such that for all but an ϵ fraction of the pairs (i,j)[K]2, (Vi,Vj) is ϵ-regular: the density between large enough subsets of them is roughly what we would expect it to be.

The idea of the “reduced graph” is to make a graph over the parts [K]: we will connect (i,j) iff (Vi,Vj) is ϵ-regular and the density d(Vi,Vj) is above some threshold α.

Applications

This technique allows us (among other things) to reprove some known results very easily.

Theorem (Erdős-Stone-Simonovits)

Any H-free graph over n vertices has at most (11χ(H)1+o(1))n22 edges.2

Proof (sketch)

Let t:=|H|. Let H be the complete χ(H)-bipartite graph with parts of size t. Clearly, H contains H. We will show that if G has too many edges, it must contain a copy of H.

Suppose G has (11χ(H)1+2α)n22 edges. Apply Szemerédi’s regularity lemma to G with small enough ϵ (to be set later). Consider the reduced graph R with threshold α. As long as ϵ is small enough as a function of α, this reduced graph will have at least (11χ(H)1+Ω(1))K22 edges. So by Turán’s theorem, R must contain a copy of Kχ(H). Then, apply the graph counting lemma to the corresponding parts of G to find Ωϵ(nχ(H)t) copies of H in G.3 We only needed one.

  1. This means copies are ordered, and I think that technically, mapping two nodes of H to the same node in G is allowed (but this is usually not a problem since such mappings are comparatively rare). 

  2. Here, χ(H) is the coloring number of H, i.e. the least k such that it can be embedded into a k-partite graph. 

  3. For this to work, we needed ϵ to be enough to survivec the e(H) factor blow up coming from the e(H) applications of regularity in the proof of the counting lemma.