Sharding phase 1 spec (RETIRED)

I need to figure out how to implement this in Rust, where a struct accepts an address as an argument. Cross-posting this at https://gitter.im/Drops-of-Diamond/Lobby?at=5ab485e45f188ccc15e8c909, Rust-specific discussion can be had there.

AIUI this is saying that address is an int128 type, but in current implementations it is a binary data of length 160 bits.

I’ll just define this like so for now:

	struct CollatorPool {
		collator_pool_len: int128,
			// size of the collator pool
		collator_pool: [Address; collator_pool_len], 
			// array of active collator addresses
		empty_slots_stack_depth: int128,
		empty_slots_stack: [int128; empty_slots_stack_depth],	
			// stack of empty collator slot indices
		empty_slots_stack_top: int128,		// top index of the stack
	}

What is the depth of empty_slots_stack? Is it 1024 words = 1024 *32 bytes like the current stack depth?

Or should empty_slots_stack: int128[int128] actually be written in Rust like empty_slots_stack: HashMap<H128, H128>?

Why make this an arbitrary byte array? Is that for abstraction reasons for custom signature schemes like Lamport signatures, ECDSA, etc.? This is actually meant to be a fixed byte array but the syntax is wrong, see my comment below.

The int128 is the collator’s id in the array, not the address itself.

1 Like

OK, thanks for clarifying. In that case I’ll need to figure out how to write collator_pool: address[int128] in Rust. I’ll ask around.

I think collator_pool: address[int128] is meant to be used like CollatorPool.collator_pool[int128], returning the address of the collator, i.e. you find the address (as a custom type) of a collator by accessing it’s ID in the collator_pool. But clarification would be good as it’s not very clear. Alternatively it may mean or be intended to be used like CollatorPool.collator_pool[address}, which will return the collator’s IDas an int128 if the collator exists, or produce an error otherwise. In any case it’s not very clear, and others on Stack Overflow couldn’t tell either. Alternatively this could be a hash map, which seems to be the most plausible. I’ve used collator_pool: HashMap<address, H128>. (I’ll release the full code soon, but H128 is a type from the crate ethereum_types.)

1 Like

OK thanks, so address is the value type and H128 is the key type so swapping them around we have collator_pool: HashMap<H128, Address>. @JustinDrake it would be helpful if you could clarify in the spec that this is a map in Vyper, which corresponds to an associative array.

It would be good to get a spec for the sharding p2p network. I guess that’s an R&D project.

I think I still need an answer to this, but I guess we have to wait until phase 2 when the EVM is defined?

Why does the empty slots stack need to have any limits at all?

1 Like

Well, I was under the impression that it would have a limit like the current stack depth in the EVM of 1024 words. Thus if you have any stack I was thinking it would utilize the EVM stack.

While I’m not an expert on stacks, doing some quick reading gives the following. Note the second paragraph on the depth from Wikipedia here, I have given the previous one for more context.

With stack machines, in contrast, results can be stored in one of two ways. Firstly, results can be stored using a temporary variable in memory. Storing and subsequent retrievals cost additional instructions and additional data cache cycles. Doing this is only a win if the subexpression computation costs more in time than fetching from memory, which in most stack CPUs, almost always is the case. It is never worthwhile for simple variables and pointer fetches, because those already have the same cost of one data cache cycle per access. It is only marginally worthwhile for expressions like X+1. These simpler expressions make up the majority of redundant, optimizable expressions in programs written in non-concatenative languages. An optimizing compiler can only win on redundancies that the programmer could have avoided in the source code.

The second way leaves a computed value on the data stack, duplicating it as needed. This uses operations to copy stack entries. The stack must be depth shallow enough for the CPU’s available copy instructions. Hand-written stack code often uses this approach, and achieves speeds like general-purpose register machines.[4][8][9] Unfortunately, algorithms for optimal “stack scheduling” are not in wide use by programming languages.

But we’re talking about a stack variable here, so I guess the stack based virtual machine is not as relevant. Since memory and virtual memory can be unlimited, a stack variable can also be. But I’m not sure.

Having a limited stack can prevent consuming all memory and crashing a system.

https://www.linuxquestions.org/questions/programming-9/why-the-stack-size-limit-878108/

But the empty slots stack is a data structure in permanent storage, which can be much much bigger.

2 Likes

OK, fair enough, I see your point.

I’m implementing this over at https://github.com/Drops-of-Diamond/Diamond-drops. Feedback is appreciated. I’ll probably push commits at the end of each day, but I’ve updated the repo before submitting this comment.

I think register_collator()should take the address of the collator as an argument, and also msg.value and msg.sender (unless these are all public and within the namespace). I’ve imported Address from ethereum_types, but I don’t think there’s any support for msg.value and msg.sender. That makes things difficult. I would either have to build a Solidity or Vyper-like interpreter that supports Rust, or build it in one of these and then convert it to Rust. I will have to look into the latter, for the former it’d be better to use wasm. Continuing, to update the collator_registry you need to pass the address of the collator into it, and so since this is used in register_collator(), you will also need to pass the address of the collator into it.

This is a bit vague for a spec (even though it is a draft), it would be better to be more specific. When is collator_pool_len going to change—I assume it would be incremented when there are no empty slots in the stack? Should we pop() an empty collator slot from empty_slots_stack_top and then .insert(empty_slots_stack_top, Address)? where Address is the Address of the collator that has just been registered? I have implemented my interpretation.

So the _valueType of a pair in the above map will be the collator slot index, but what is the _keyType? Perhaps it would be simpler to have this as a vector rather than a hashmap.

2 Likes

I think register_collator()should take the address of the collator as an argument

You should be able to see msg.value and msg.sender, there’s no need to replicate that in register_collator’s arguments.

I think the method should return an uint, the index in the collator_pool the collator is, instead of a bool.

This is a bit vague for a spec (even though it is a draft), it would be better to be more specific. When is collator_pool_len going to change—I assume it would be incremented when there are no empty slots in the stack? Should we pop() an empty collator slot from empty_slots_stack_top and then .insert(empty_slots_stack_top, Address)? where Address is the Address of the collator that has just been registered? I have implemented my interpretation.

In my opinion collator_pool_len is the length of the collator pool, so when collators leave the pool, this is decremented, when collators enter the pool, this is incremented. The empty_slots_stack should be transparent to the collator’s pool length.

1 Like