Latest Casper Basics. Tear it apart

Use of hypertarget and hyperref for technical terms may also be useful, where the hypertarget is the term where it is defined, and the hyperlink is the term somewhere else linking to the hypertarget.

1 Like

I think that’s good feedback. Keeping readability foremost in mind when writing and editing is essential, and using more meaningful variable names should be part of that. However, I think that using certain Greek letters that conventionally have a certain meaning, e.g μ for the mean and σ for the standard deviation, and are used for that purpose, is acceptable, and in fact, because of their widespread usage, would actually be preferable. Also, I think it is fairly conventional in maths to use single character variables, so I don’t think it’s a big deal.

I’m coming back to this now and picking-up where I left-off in Section 4.

Initial Section 4 comments -

  • I’d suggest starting 4.1 and 4.2 with a brief definition of each attack type. You could also add them by expanding the intro to Section 4.

  • There’s a hanging phrase after the Figure 5 description. I can’t tell if it’s a segment of more complete text that’s supposed to be there, or a remnant of text that was intended to be removed.

@jamesay1 very thorough edits. Thank you. I’ve incorporated most of them.

I’ve incorporated all of your edits into: https://arxiv.org/abs/1710.09437v2

Some of the things I have kept, but may still be changed in v3.

1 Like

@vbuterin: Two questions.

(1) For dynamic validator sets, do we want to change the definition of justification such that a checkpoint c is justified if and only if the votes forming the supermajority link to c are: (1) included in c's blockchain, and (2) before the child of c—i.e., before block number h(c) * 100 ? This would adjust the adjusted definition of justification to be analogous to the adjusted definition for finalization.

If so, it seems we should add a new bullet for justification for dynamic validator sets.


(2) In the paper, we have:

Due to network delays, it’s possible that clients will disagree whether a given piece of slashing evidence was submitted into a given chain ``on time’’ or as having accepted it too late.

I want to expand this a little bit. When we say “too late”, we mean whether the evidence transaction was submitted before/after a particular vote? I suppose I’d like more clarity on what exactly the rules are for an evidence transaction. Specifically, if I see a slashing evidence transaction, but only on chain A, and the violators are still building on chain B, do still I count it, or is slashing evidence considered “local” to its chain? I personally lean towards it being global.

Also, once I see a slashing evidence transaction, what precise actions am I, a validator, to take?

1 Like

@virgil updated link (v2) not working for me, post @jamesray1 edits. One unrelated comment/question below. Take with a grain or two of salt, hope it makes sense :slight_smile: good work to the team, great to see this progress.

Section 1.1

Because proof of stake security is based on the size of the penalty, which can be set to greatly exceed the gains from the mining reward, proof of stake provides strictly stronger security incentives than proof of work.

Curious here: Understand the PoS penalties can vastly outweigh mining rewards, but this does not inherently undermine the potential incentive for a 51% attack. Later, in the conclusion, user-activated soft forks are discussed briefly in reference to this possibility.

The problems that Casper does not wholly solve, particularly related to 51% attacks, can still be corrected using user activated soft forks. Future developments will undoubtedly improve Casper’s security and reduce the need for user-activated soft forks.

Recommend at least another sentence or two dedicated to this topic for clarification on this hole, if you deem it important? It’d be good to know the appropriate approach validators would take in this version of Casper to mitigate 51% attack risk.

Perhaps helpful: In Peercoin’s paper, this paragraph helped me understand how PoS alleviated some of the dependence on 51% honest nodes in a PoW system. Does this hold true in Casper? If so, could be worth calling out by name. Just my two cents. Hope it’s helpful.

This design alleviates some of the concerns of Bitcoin’s 51% assumption, where the
system is only considered secure when good nodes control at least 51% of network
mining power. First the cost of controlling significant stake might be higher than the cost
of acquiring significant mining power, thus raising the cost of attack for such powerful
entities. Also attacker’s coin age is consumed during the attack, which may render it
more difficult for the attacker to continue preventing transactions from entering main
chain.

Here’s v2 until it’s available on arxiv (it takes a few days before arxiv.org updates).

v2.pdf (1.1 MB)

Regarding 51% attacks—PoS doesn’t stop 51% attacks, it merely limits their damage. In generic PoW, 51% attacks can cause a finality version. In the current Casper system, a PoW 51% attack can only cause a liveness failure.

3 Likes

Thanks @virgil for incorporating most of my edits, I appreciate that.

Your proposed definition for justification seems reasonable. If a block has a supermajority link then those votes need to be in the same block, although I can’t recall why they should be in the child of that block.

I looked at v2 and the following changes haven’t been made (I found the paper on Github so I intend to make PRs for these changes, probably tomorrow):

  • it still has an PoS-based rather than a PoS-based.

  • Wayback links haven’t been included, which are helpful to have if any links break, and then readers will see that there is an alternative permanent link. While they can always be added later if a link breaks, now that they are saved to Wayback, if a link breaks then it relies on someone pointing that out to editors. While it’s not a big deal either way, it can be helpful to be proactive. However, I also understand that it is not conventional to use archive links, especially if the original link is still active. Perhaps it would suffice to include a note=“This has been archived on the Wayback Machine.”

  • In figure 5: “After withdrawal, attack fabricates a conflicting chain.” After withdrawal, the attacker(s) (the supermajority validators) fabricates a conflicting chain

  • This sentence in para 4., 4.2, is not complete and doesn’t make sense: “If the validators are split into two subsets, with subset V_A voting on chain A and subset V_B voting on chain B.” Maybe it should run on to the next sentence, like thus: “…on chain B, then on chain A, V_B’s deposit’s will leak, and vice versa…”. However, this will lead to a long sentence, so I suggest:
    “If the validators are split into two subsets, with subset V_A voting on chain A and subset V_B voting on chain B, then on chain A, V_B’s deposit’s will leak, and vice versa, leading to each subset having a supermajority on its respective chain. Thus, this will allow two conflicting checkpoints…”

1 Like

I think the paper needs to add a definition of “broadcast” and explain for how long the votes are stored.

  1. How does the broadcast exactly work. is it some kind of a mesh network with fixed links? If it is a mesh network, how much power do malicious nodes have to delay or censor messages? If my node is connected to nodes that are all malicious, can my node end up having a totally different world view?

  2. When a node receives votes, are these votes stored indefinitely by the node or discarded?
    If votes are stored, my understanding is that vote sets stored by different nodes will not be
    identical due to network delays? Since broadcast messages may be lost, one can
    imagine situation that a particular node will miss a particular checkpoint because
    a message was lost or witheld? Is the broadcast algorithm eventually consistent?

  3. If a new node (or a new validator) joins a network and starts downloading the blockchain, it will also need to download votes - correct? What mechanism is used to sync a new node with the current state of the network (votes, checkpoints etc. ) ?

  4. If a validator crashes and comes back online, the validator needs to sync up votes and checkpoints that were created when a validator was offline - how does this happen?

  5. It seems that the number of validators need to be somehow limited, since if all nodes become validators, than having 25,000 validators broadcast their messages will be very expensive network
    -wise. What mechanism will be used to limit the number of validators - is it going to be a minimum deposit or something else?

  6. What bounties will validators receive for validating? It seems that if validating will provide a low-risk ROI (say 7% annual return on deposit) , then pretty much any existing node would want to become a validator - why not?) Then we are coming back to all nodes potentially becoming validators.

1 Like

The “partially synchronous” model implies that a broadcasted message will reach all honest nodes within some time D, except there may be periods where it doesn’t and during which there are no guarantees, and during these periods it is ok if liveness fails but not ok if safety fails. Within these constraints, we assume that all broadcasts from all honest nodes reach all other honest nodes. Making sure that happens is the task of a separate layer; in general, gossip networking and consensus protocol are separate abstractions and should not mix.

It seems that the number of validators need to be somehow limited, since if all nodes become validators, than having 25,000 validators broadcast their messages will be very expensive network
-wise. What mechanism will be used to limit the number of validators - is it going to be a minimum deposit or something else?

Yes. There are a couple of options; one is 1000 ETH, the other is to set the minimum deposit size to be equal to the current number of active validators (we expect a natural equilibrium of ~500-2000 validators to arise from this). Note that because a message only gets sent once every ~12 minutes, having this many validators is okay.

2 Likes

Vitalik - thank you,

Other questions

  1. The paper says at the end that the algorithm is a strict improvement on PoW … I guess once you introduced the
    “FOLLOW THE CHAIN …” rule, this statement is not formally true anymore, because there could be situations where PoW says one thing and “FOLLOW THE CHAIN …” says another thing …

  2. Another thing which may be a bit problematic is large fluctuations in deposits. You may end up having Wall Street dumping money into validators since validators will provide a low-risk ROI. As a result, there may be supernodes with vast voting power. Have you thought about limiting deposit size? You could introduce a rule, where a deposit which is more than 10 times the average deposit in the network does not count for the voting purposes.

Have you thought about limiting deposit size? You could introduce a rule, where a deposit which is more than 10 times the average deposit in the network does not count for the voting purposes.

That’s not possible. The wall street guy will just split his funds between N different accounts.

1 Like

Thats true :slight_smile: There is another type of a solution that one can use . You can have a random set of validators which changes all the time (in this scenario all nodes in the network serve as validators).

For each epoch, you announce a random seed R and every node can find out if it is is a validator
by doing a determininstic signature ECDSAD on the seed R and checking if the hash of the signature is less than the
inclusion constant multiplied by node deposit

HASH(ECDSAD_SIGN®) < M * DEPOSIT

Where M is a positive constant tuned to statistically have 100 nodes.
In this way the node knows it is a vaidator, but others cant find out until
the node announces, since others do not have the private key of the node.

Essentially at any given moment in time, 100 random nodes are validators, the validator set is constantly changing with time. IMHO advantages of this system are since validators change all the time, the attacker has very little time to compromise validators once they have been selected. In this system splitting supernodes into smaller nodes makes sense too, since even though as you suggested, wall street guys can split a large node into smaller nodes, only one of these nodes will be selected into the random validator set at each given moment in time

1 Like

The paper defines a supermajority link as a link which 2/3 of validators voted on. Then it proves safety under the assumption that less than 1 / 3 of validators violate a slashing condition. I was wondering if these numbers are arbitrary, so I did a little handwavy calculation:

Let’s define the supermajority as M and the fraction of honest validators h. For a given M we want to find the smallest h that is necessary to justify two supermajority links with, e.g., the same target number. In the worst case, the honest validators distribute their vote 50/50 on both links. 1 - h are dishonest and vote on both links. So the total vote on each link is h/2 + (1 - h) = 1 - h/2. We’re looking for the security bound, so we set M = 1 - h/2 and solve for h = 2 (1 - M).

Some example cases to consider:

  • If M = 2/3 then h = 2/3 (that’s the case that is used in the paper)
  • If M = 1/2 then h = 1 (so all validators have to be honest)
  • If M = 1 then h = 0 (almost no one has to be honest if we require all to agree)
  • If M = 60\% (some random number), then h = 80%

So if I’m not missing something (which is entirely possible) the chosen M and h really seem to be arbitrary. I’d change that or at least mention it in the paper as

  • having another security parameter to tune is nice
  • the relationship between h and M is not immediately obvious (it is not M + h = 1 as one might think if one sees the numbers 2/3 and 1/3), so readers might misunderstand the proofs

The purpose of the 2/3 choice is to maximize the fault tolerance against both liveness attacks (1-M go offline) and safety faults (2M-1 equivocate). max(1-M, 2M-1) is minimized at M = 2/3, with fault tolerance 1/3 against both kinds of attacks.

2 Likes

Sorry for my potential ignorance but I think this sentence could be misleading:

A checkpoint c is called finalized if it is justified and there is a supermajority link c → c' where c' is a direct child of c.

I think this is misleading because if c’ is another checkpoint necessarily it will be at least 100 blocks away, no?

I am looking at the source code of (Casper) as it is now and it seems to be prone to a “selfish validator” issue described below

It seems that In the current code for vote() all validators get the same reward for voting, no matter the vote, as long as the vote was made during the epoch and did not violate slashing conditions.

In this case, a selfish validator can skip doing all work, and just piggiyback on other validators by waiting close to the end of the epoch and voting as most of the validators voted.

I think this is very much similar to filecoin and other storage systems - you need Proof-of-Storage, otherwise people will just pretend they store things.

So it seems one needs something like Proof-of-Validation-Work to show that a validator actually performed useful work and not just selfishly piggybacked on the work of others.

Validators are paid for their work, so imho there must be a proof of this work actually done …

One could theoretically pay to validators that voted earlier than others, but it seemes that selfish validators could just do front-running of honest validators …

Any ideas on how Proof-of-Validation-Work could be implemented … ?

Here is one way to fix this vulnerability - make it two stages:

  1. Make validators first submit encypted votes.

  2. After voting is over, make them reveal the decryptions …

I think there are multiple crypto ways in which it could be done, for instance submitting hashes and then revealing hash pre-images …

This would make things way more secure, although a bit more expensive in terms of how much you pay per vote, since one would not be able to use bit arrays anymore …

Sorry for my potential ignorance but I think this sentence could be misleading:

A checkpoint c is called finalized if it is justified and there is a supermajority link c → c’ where c’ is a direct child of c.
I think this is misleading because if c’ is another checkpoint necessarily it will be at least 100 blocks away, no? away, no?

That’s correct. I don’t see it as misleading. It just means that a checkpoint won’t be finalized until at least 100 blocks away, when a supermajority link is formed from a child checkpoint, and the checkpoint is justified.

Yes, but if they wait too long then there’s a risk that their vote will not get included in time. Also, the risk from voting early is tiny; as long as you vote after enough blocks that the PoW has basically settled on a given checkpoint, you’re pretty much guaranteed to get in. Realistically I don’t expect a noticeable difference in returns between the different strategies, and would actually expect early voting to be safer.