Accumulators, scalability of UTXO blockchains, and data availability

Related Stack Exchange question: https://crypto.stackexchange.com/questions/53213/accumulators-with-batch-additions (maybe one of you posted it)

You make a good point @aniemerg. (Apologies for the late reply; I’ve been mulling over this in search of a convincing solution :slight_smile: )

I have a few ideas to address the problem. The first three ideas can be combined for varying sophistication and effectiveness.

  1. Identifiable aborts: Basically the batcher can take whoever seems online right now and attempt to organise a corresponding batched transaction. So long as aborts are identifiable (this is trivial for a third party batcher; not so clear in the MPC case) then simply retry without those who aborted.
  2. Dropout tolerance: Consider the following slightly more general template scheme, which allows for a small number of dropouts. Instead of containing transactions that must all confirm together, the template contains “candidate” transactions for which the corresponding candidate spender can drop out. The template is shared with auxiliary information so that the spenders can compute the witnesses for the transaction recipients if given the subset of transactions that eventually confirmed. The batcher makes a best effort attempt to gather signatures (asynchronously) until the dropout rate is reasonable, then builds the new accumulator value, and also includes \log{{a}\choose{b}} bits of information to the final transaction, where a is the total number of candidate transactions and b is the number of confirmed transactions. Note that this scheme is workable if b is close to a. For example, if a=250 and b=240 (dropout rate of 4%), then that’s only 58 additional bits.
  3. Sub-batches: If several transactions are controlled by the same spender then those transactions can form an atomic “sub-batch”. (That is, either all corresponding transactions are dropped off, or none.) This allows to replace a in the above idea from the number of candidate transactions to the number of candidate spenders.
  4. Delayed witnesses: The following idea might work well for micro-transactions. The batcher is a third-party that does batching-as-a-service with a scheduled batch, say, every 5 minutes. Spenders that want a transaction in the next batch send it to the batcher and get a signed receipt for inclusion in the next batch. Now the batcher is making the promise that 1) the transaction will get included in the next batch, and that 2) he will deliver the corresponding recipient witnesses once the batched transaction has gone through. (If the spender can’t guarantee delivery of the recipient witnesses to the recipient, the transaction would be simultaneously spent and “uncashable”, hence why I suggest limiting this to micro-transactions.) These two promises can be backed with a collaterised contract that would heavily penalise the batcher in case of bad behaviour, or at least guarantee witness data availability with onchain challenges. Over time the batcher may be able to use his reputation as a trustworthy service provider as additional collateral.
1 Like

Another data structure that can be used for accumulators are Bloom filters. Although Bloom filters are probabilistic data structures, parameters of a filter can be selected to force false positive rate to be exponentially low.

Arguably RSA accumulators are also probabilistic, since they use probabilistic primes. Good things about bloom filters is they have fixed time additions and removals.

Here is a Wiki page on bloom filters

If you use a counting Bloom filter, then it will support removal, and the hash of the Bloom filter can be included in the header. The Bloom filter itself will then become the witness. Bloom filters can be compressed for storage or network communication ([for example as explained here (http://www.eecs.harvard.edu/~michaelm/NEWWORK/postscripts/cbf2.pdf). So another option is to include a compressed Bloom filter in the header, and then the element itself becomes the witness.

An approximate formula for false positives is

image

k is the number of hashes in the filter, m is the number of elements, and n is the size of the filter in bins

It is a tradeoff of computation vs witness size. The more hashes you are willing to do during the witness verification, the smaller is the filter size.

It is interesting to understand how a bloom filter would perform vs Merkle tree vs RSA accumulator …

Bloom filters are widely used in routing tables of network routers, arguably if RSA accumulators would be faster Cisco guys would use them.

I think what we can do is agree on a particular realistic real-life benchmark such as number of elements, insertions, deletions etc. and then benchmark different approaches.

1 Like

I looked at bloom filters before. The problem is that for the false positive rate to be low enough to be cryptographically secure, the size of the bloom filter basically needs to be almost the same as the size of a plain concatenated list of the hashes of the elements.

You can prove this info-theoretically. Informally: suppose that there is a list of 2^64 elements of which you are including N. From a bloom filter with 160 bit security, it’s possible to recover which N elements you’ve included with probability ~1 - 2^32. Hence, that bloom filter basically contains the info, and the size of the info is roughly 64 * N bits; and so the bloom filter must contain at least 64 * N bits.

1 Like

Actual accumulators get around this by requiring witness data.

Interesting …

I found a paper that discusses using bloom filters as crypto accelerators

If you try determining which elements are in the set by brute forcing over 2^64 candidate elements you will recover the true elements of the set, but also recover lots of false positives, and there will be no way for you to distinguish good guys from bad guys

False. The bloom filter has 160-bit security, meaning the false positive rate is 1 in 2^160. Hence, the chance that any of the 2^64 elements that are not part of the N will show up as a positive is 1 in 2^96 (oh wow, it’s even less probable than I thought, I guess I double-counted somewhere).

Now I understand that your argument is totally correct - sorry …

I think the mistake I made is relying on the formula for the false positives from Wikipedia, it is only true when m is much larger than kn, essentially for filters which are underloaded …

It looks like the paper referred to above is doing a bit more then just using a bloom filter, they are citing Nyberg accumulator, I need to understand how it works …

Here some rephrasing of the batched impossibility result for deletions, additions, and updates to make it a bit simpler to understand. I am using an argument similar to what Vitalik has used above.

Let A be the value of the accumulator, Let the size of accumulator in bits be L_{acc}. Let the size of a typical witness in bits be L_{wit}. Let N be the number of elements in the set, and let W[i] be a set of all witnesses, where 0 \le i < N.

Let us suppose that the accumulator is cryptographically secure to prove both participation and non-participation.

Then, it is clear that for large N one has

L_{wit} \sim O({log_2 N})

This comes from the fact that for a fixed accumulator value A, W[i] encode all N elements of the set, and the dimension of the encoding space needs to be at least N.

Now consider the case where m distinct elements are added to the accumulator. Then the new set W' can be described as the set W plus the delta layer w. Using the same argument as above, the size of w in bits L_{delta} should be at least

L_{delta} \sim O(m * log_2 N)

This comes again from the fact that with everything else fixed, the delta layer w can be used to encode the entire batch set of m distinct elements. Computing the delta layer is equivalent to computing the updated set W' from W.

Since the delta layer is linear in m, the amount of computation needed to derive the delta layer must also be at least linear in m. This is essentially Camacho proof.

For large N it does not really matter whether one is talking about additions or deletions. It is also true for unique updates (if each update in a set of m updates is touching a different element).

regarding delegated solutions for data availability: http://arxiv.org/abs/1712.04417v2
question: why force the N-party trusted setup for the RSA modulus?
is there some theoretical or philosophical argument against the trustless setup I proposed earlier and it got deleted?

is there some theoretical or philosophical argument against the trustless setup I proposed earlier and it got deleted?

Sorry, which trustless setup was this?

I’m not aware of a trustless setup for RSA that doesn’t generate numbers hundreds of times larger than what a trusted setup can do.

let n be the size of the modulus, we have A and B with respective private inputs iA = (pa, qa); iB = (pb, qb); trying to compute
f(i_a, i_b) = (p_a + p_b) \times (q_a + q_b) = N
let M' be the product of the first pimes such that M' = 2^\frac{n}{2-1} (just an efficiency improvement)
we first help coordinating the selection of a reasonable p_b without leaking p_a

  • A choose random p'_a \in \mathbb{Z}^*_{M'} , p_a \in \mathbb{Z}_M'
  • B choose random p'_b \in \mathbb{Z}^*_{M'} , \alpha_b \in \mathbb{Z}_M'
  • A and B perform a two-way ANDOS buy-sell with A selling (p'_a, p_a) and B selling (p'_b, \alpha_b) and f being f((p'_a, p_a), (p'_b, \alpha_b)) = p'_a \times p'_b - p_a - \alpha_b \mod M'
  • B gets \beta and computes p_b = \beta + \alpha_b \mod M'

With knowledge of p_b = p'_a \times p'_b - p_a - \alpha_b \mod M' one can only obtain that (p_a + p_b) is in \mathbb{Z}^*_{M'}

Trivially perform B&F division test, then repeat for q and test the output N.

ANDOS protocols scale to N-parties so no worries about PKCS compatible moduli

I didn’t get an email notification for your post. It probably wasn’t successfully posted (as opposed to getting deleted).

I’m a little bit confused by the trustless setup you are proposing. Would you mind editing your post to add details and cleanup the exposition? (I think there are several typos that makes things hard for me to follow.) Below are specific questions.

Is this actually a trustless setup, or are you proposing an N-party trusted setup?

What do you mean by “first primes”? Also, do you really mean M'=2^\frac{n}{2-1}=2^n?

Would you mind pointing to a specific construction/paper? I’m new to All-or-Nothing Disclosure of Secrets protocols.

Is this the same f as defined above, i.e. f(i_a, i_b) = (p_a + p_b) \times (q_a + q_b)?

What is \beta?

What is the B&F division test?

sorry I didn’t mean to accuse anyone most probably a mistake then;

no, it was a mistake:
let M be the smallest product of first primes greater than 2^n and as an efficiency iprovement we can assume M' \approx 2^{n/2-1}

In general: https://link.springer.com/content/pdf/10.1007%2F3-540-47721-7_17.pdf
A specific construction: https://link.springer.com/content/pdf/10.1007%2F3-540-49649-1_28.pdf

no this one is for the ANDOS selling protocol.

see paper above, it’s the “merchandise” you obtained without letting the other party know what you choose.

Above paper has the test in section 3.
Note: The protocol above involves a thirdparty overall, but not for the primary test.

For all these accumulators, every UTXO needs to update it’s witness or proof when a new block is added. This means that every coin holder needs to keep track of the blockchain always, which is not practical at all.

Is there any good solution to this issue? One way I see is that miners could provide witness service for normal users, but it’s not an ideal solution.

1 Like

Is there an implementation available for the RSA accumulators described here?

An interesting possibility for RSA accumulators, or even bilinear accumulators, is to make the proof for STARKs much smaller by reducing the size and complexity of the proof for each item from \theta(\log n) to \theta(1). It doesn’t affect the number of proofs required, only the size of the proofs.

In this case we don’t necessarily care about the trusted third party problem of if the accumulator is updated and referenced by multiple people, all that is necessary is to prove that specific members exist within a set in a way which satisfies the computationally binding difficulty of modifying the set to meet a malicious goal, forging one proof is as difficult as forging all proofs or finding a collision in the underlying hashing algorithm.

e.g. with a Merkle tree, the entropy that selects which items from the computation trace that need to be verified are taken from the tree root, if any one item is changed the set of randomly chosen items will change. The proof of each item must be consistent with every other item, that is - their paths share common hashes and the modification of any one item requires the re-computation of the whole tree, which then changes the items which are selected (this is like, intractable computational difficulty).

Trusted setup doesn’t matter one person is providing a set of their own data to be verified by another person, if there were a Merkle tree which required the creator to be trusted to setup the parameters it wouldn’t affect the computational binding property of pseudo-random item selection.

So, the accumulator and witnesses, for a set of items S, is:

N = P * Q
A = 1
W = []
for _ in S:
    W = [pow(w, _, N) for w in W] + [A]
    A = pow(A, _, N)

Note that for every subsequent item added all previous witnesses need updating so that for any given (Witness,Value) pair you can do assert pow(Witness, Value, N) == A. This makes complexity of creating the accumulator about n \log n, where only 2n items are stored, the computational overhead is greater than that of a Merkle tree, but the memory overhead is the same (or possibly greater, given the larger size of the witness compared to the size of each hash)?

Then to select the items:

X = [] # Deterministically chosen samples of computation trace polynomial
B = A
for _ in range(0, n_samples):
    B = HASH(B)
    i = B % N
    X.append( (S[i], W[i]) )

return (A, N, X)   # Accumulator, modulo and random samples

Is there a mechanism where, if you’re creating an accumulator from a fixed sized batch of items, to reduce the computational cost of updating all previous witnesses at each step, without increasing the memory required (e.g. you can use a cheaper multiply operation, then a single POW at the end, but the aggregate cost is about the same)

Ah… Hold on…

What this doesn’t do is guarantee that any item is the randomly selected item within the set, it has no ordering, without guaranteed ordering there is no security as you can just provide any items you want as proof, as long as the set contains them.

To add ordering you’d require another level of structure, and this… I’m not sure how to achieve.

For accumulators to be more usable they would need strong ordering, with verifiability of ordering using only the witnesses. This would give them similar qualities to a Merkle tree, but is that possible with less data than Merkle tree proofs e.g. (\log n) * m * 32 where m is the number of paths, versus (\log_2N) * m where N is the modulus, assuming only \log_2N bits are required for each witness.

1 Like

This post RSA Accumulators for Plasma Cash history reduction on RSA Accumulators describes that we can build both proofs of membership and proof of non-membership. But the stored elements have to be primes or powers of a prime to avoid interaction.
I think that with this we could store an arbitrary number of ordered values (also non-primes) by adding p_1^{x_1} , ... p_n^{x_n}, with p_1, … p_n being the first n primes and x_1, … x_n being the values to store.
By proving membership of p_i^{x_i} and non-membership of p_i^{x_i+1} we show that x_i is on the i-th place in the set.

2 Likes

I’m a bit late on finding this thread, but have you considered using the new RSA batching techniques to handle this problem? @gakonst has a good simplified writeup on it.

The idea is to replace the need for making batched update data available with a zero-knowledge proof that shows that a sender of a transaction knows the discrete log of their batched exponent, without needing to send the exponent of course.

@vbuterin suggested an n-party setup where each party provides a prime to create a modulus N = p_1 * p_2 * ... * p_n. I thought it would be better to have each party provide a large semiprime s_i = p_{i1} * p_{i2} as its own modulus, and use each modulus as its own accumulator, running in parallel. Here’s my reasoning:

Pros:

  • No multiparty computation ceremony is needed to multiply the numbers without anyone learning them.

  • Proofs can be verified in parallel.

  • Furthermore, updating an accumulator/verifying a witness requires mod N exponentiation, which requires us to multiply N-bit numbers O(\log a) times (using the repeated squaring method). Using the Karatsuba algorithm, this takes O(\log a)(2048n)^{1.58} time for a product of n 2048 bit primes, but only O(\log a) \cdot n (4096)^{1.58} for n products of pairs of 2048-bit primes, so the latter method is faster by a factor of (n^{0.58})/2^{1.58}. The speedup is (n^{1.58})/2^{1.58} if the multiplications are parallel, and potentially more if the decrease in size of the multiplicands allows us to take advantage of hardware speedups for multiplication.

  • Since the semiprimes are just standard RSA-2048 keys, they don’t have to be created specifically for the purpose of this accumulator. They can come from a variety of sources. For example:

    • we can use famous semiprimes like those from the RSA factoring challenge.

    • We can use expired public keys from defunct organizations as suggested in Bunz’s talk

    • We can generate several large random “nothing-up-my-sleeve” numbers and hope they have large prime factors. As Theorem 1 of the paper on page 262 of this document linked by @JustinDrake proves, a 6144-bit (6KiB) random number has a 0.5 * (\ln^2(3/2) + O(1/(6144\ln2))) \approx 0.08 probability of having at least 2 >2048 bit prime factors (not sure, but it may be possible to improve this guarantee by squeezing). As Justin says, this paper sees the benefit of choosing multiple small numbers over one big number, but as far as I can tell, it misses the fact that it is not necessary to multiply these to get an accumulator. Indeed, efficiency was the main problem with this scheme, so getting the (n^{0.58})/2^{1.58} speedup may be important for this approach.

    • Trusted individuals in the cryptocurrency community can supply public keys they generate.

  • (Minor point) The single prime way is insecure against n-1 parties being dishonest, since n-1 parties can factor their primes out of the modulus to find the last prime, the semiprime setup is secure unless all n parties are dishonest.

Cons:

  • Obviously, the size of the accumulator and witnesses is doubled.

Are there any problems with this scheme I’ve missed?