Reed-Solomon erasure code recovery in n*log^2(n) time with FFTs

With Fast Fourier transforms it’s possible to convert a set of evaluations of a polynomial over a prime field at a set of specific points, P(1), P(r), P(r^2)P(r^{{2^k}-1}) (where r^{2^k} = 1) into the polynomial’s coefficients in n * log(n) time. This is used extensively and is key to the efficiency of STARKs and most other general-purpose ZKP schemes. Polynomial interpolation (and its inverse, multi-point evaluation) is also used extensively in erasure coding, which is useful for data recovery and in blockchains for data availability checking.

Unfortunately, the FFT interpolation algorithm works only if you have all evaluations at P(r^i) for 0 \le i \le 2^k - 1. However, it turns out that you can make a somewhat more complex algorithm to also achieve polylog complexity for interpolating a polynomial (ie. the operation needed to recover the original data from an erasure code) in those cases where some of these evaluations are missing. Here’s how you do it.

Erasure code recovery in O(n*log^2(n)) time

Let d[0] ... d[2^k - 1] be your data, substituting all unknown points with 0. Let Z_{r, S} be the minimal polynomial that evaluates to 0 at all points r^k for k \in S. Let E(x) (think E = error) be the polynomial that evaluates to your data with erasures (ie. E(r^i) = d[i]), and let P(x) be the polynomial that evaluates to original data. Let I be the set of indices representing the missing points.

First of all, note that D * Z_{r,I} = E * Z_{r,I}. This is because D and E agree on all points outside of I, and Z_{r,I} forces the evaluation on points inside I to zero. Hence, by computing d[i] * Z_{r,I}(r^i) = (E * Z_{r,I})(r^i), and interpolating E * Z_{r,I}, we get D * Z_{r,I}.

Now, how do we extract D? Just evaluating pointwise and dividing won’t work, as we already know that at least for some points (in fact, the points we’re trying to recover!) (D * Z_{r,I})(x) = Z_{r,I}(x) = 0, and we won’t know what to put in place of 0 / 0. Instead, we do a trick: we generate a random k, and compute Q1(x) = (D * Z_{r,I})(k * x) (this can be done by multiplying the ith coefficient by k^{-i}). We also compute Q2(x) = Z_I(k * x) in the same way. Now, we compute Q3 = Q1 / Q2 (note: Q3(x) = D(k * x)), and then from there we multiply the ith coefficient by k^i to get back D(x), and evaluate D(x) to get the missing points.

Altogether, outside of the calculation of Z_{r,I} this requires six FFTs: one to calculate the evaluations of Z_{r,I}, one to interpolate E * Z_{r,I}, two to evaluate Q1 and Q2, one to interpolate Q3, and one to evaluate D. The bulk of the complexity, unfortunately, is in a seemingly comparatively easy task: calculating the Z_{r,I} polynomial.

Calculating Z in O(n*log^2(n)) time

The one remaining hard part is: how do we generate Z_{r,S} in n*polylog(n) time? Here, we use a recursive algorithm modeled on the FFT itself. For a sufficiently small S, we can compute (x - r^{s_0}) * (x - r^{s_1}) ... explicitly. For anything larger, we do the following. Split up S into two sets:

S_{even} = {\frac{x}{2}\ for\ x \in S\ if\ S\ mod\ 2 = 0}
S_{odd} = {\frac{x-1}{2}\ for\ x \in S\ if\ S\ mod\ 2 = 1}

Now, recursively compute L = Z_{r^2, S_{even}} and R = Z_{r^2, S_{odd}}. Note that L evaluates to zero at all points (r^2)^{\frac{s}{2}} = r^s for any even s \in S, and R evaluates to zero at all points (r^2)^{\frac{s-1}{2}} = r^{s-1} for any odd s \in S. We compute R'(x) = R(x * r) using the method we already described above. We then use FFT multiplication (use two FFTs to evaluate both at 1, r, r^2 ... r^{-1}, multiply the evaluations at each point, and interpolate back the result) to compute L * R', which evaluates to zero at all the desired points.

In one special case (where S is the entire set of possible indices), FFT multiplication fails and returns zero; we watch for this special case and in that case return the known correct polynomial, P(x) = x^{2^k} - 1.


Here’s the code that implements this (tests here):

Questions:

  • Can this be easily translated into binary fields?
  • The above algorithm calculates Z_{r,S} in time O(n * log^2(n)). Is there a O(n * log(n)) time way to do it?
  • If not, are there ways to achieve constant factor reductions by cutting down the number of FFTs per recursive step from 3 to 2 or even 1?
  • Does this actually work correctly in 100% of cases?
  • It’s very possible that all of this was already invented by some guy in the 1980s, and more efficiently. Was it?
3 Likes

Concerning efficient erasure coding, have you checked out fountain codes
– Luby codes and then Raptor codes? Here is Nick Johnson’s short tutorial on it. They have O(n) coding complexity, so you can encode + decode 1MB in 0.2s on a garden variety machine running an implementation like this one rather than the 21s encoding you reported from C++. If you bring in RNG from { the hash of the data being erasure coded, your favorite replacement to RANDAO }, the erasure coding can be Merklized. There are patents on this owned by Qualcomm but at least some of the oldest are expiring.

For STARKs, will try the “Calculating Z” technique and report back.

For STARKs, you want the evaluation of Z to be very concise so that you can do it inside the verifier; you can often find mathematical structures that do this for you. For example, in the multiplicative subgroups that I used for the MIMC example, you’re taking the subgroup 1, g, g^2g^{2^k-2}, where Z = (x^{2^k} - 1) / (x - g^{2^k - 1}). For erasure coding it’s more difficult because you don’t have this option, since it’s assumed the adversary could be omitting a portion of the data that will make computing Z for it not so simple.

I understand this this as E being the polynomial that interpolates to the known evaluations. I think P(x) is later called D. Isn’t (due to the low-degreeness) actually E=D?

I would write it down in a slightly different way: Say F is the interpolation of the data d[i] where d[i] is availabile, and 0 where i\in I (i.e. d[i] is not available). Then F can be easily computed from the available data using FFT, by substituting 0 in all the positions where the data is available.

Now F = D \cdot Z_{r,I}. Afterwards, use the trick to described to recover D from this equation.

Aah I think the idea that I meant to say is that E evaluates to the data but putting 0 in place of all missing points, and D is the actual data.

I was thinking about this problem today and I came across this interesting algorithm to compute Z_{r,S}. It is unfortunately less efficient (O(n^2)) except in some special cases, but maybe someone can think of a trick to make it efficient so want to document it here.

We can easily compute a polynomial multiple of Z_{r,S}, by taking the vector that is zero in the positions of S and has a random non-zero value in the other positions, and take its Fourier transform. Let’s do this for two different random assignments and call the resulting polynomials P(X) and Q(X). Then we know that with overwhelming probability that Z_{r,S}(X) = \mathrm{gcd}(P(X), Q(X)).

So we can easily get the desired polynomial by computing a gcd. The Euclidean algorithm can do this and will reduce the degree of the polynomials by 1 at each step, and each step involves O(n) field operations, so in the general case this algorithm uses O(n^2) steps. However, the Euclidean terminates once the degree of the gcd is reached, so should the number of zeroes be close to the total size of the domain, it can be much faster.

There are O(n\log^2{n})-time variants of the Euclidean algorithm, if this is what you need.

For example, see Fast Euclidean Algorithm, by von zur Gathen, Joachim and Gerhard, Jurgen, in Modern Computer Algebra, 2013.

Later edit: In the past, I’ve used libntl’s fast EEA implementation. See here.

That’s very cool, I found some info on the algorithm here: https://planetmath.org/fasteuclideanalgorithm
So this gives the algorithm the same asymptotic complexity as the one suggested by @vbuterin (module a \log(\log n) factor that I’m not sure applies for finite fields; if it does it would also apply to @vbuterin’s algorithm as it would be for all polynomial multiplications via FFT.
So, interesting question is which one is concretely more efficient.

Yes, that article is incorrect about the extra \log\log n factor for polynomial multiplication.

The time to multiply two polynomials over a finite field is M(n) = n \log n field operations (via FFT).

(On a related note, the extra \log \log n factor does come up only for n-bit integer multiplication. However, a recent breakthrough showed an O(n\log{n}) algorithm for integer multiplication. But it is not concretely fast. In practice, one still uses Schonhage-Strassen, which takes O(n\log{n}\log{\log{n}}).)

An update on this – I coded both ideas here in python. In short, while both have the same asymptotics, Vitalik’s approach is ca. 30x faster when constants are factored in.

The quest for an O(n \log n) or better algorithm is still open.