Sharding phase 1 spec (RETIRED)

I’ll just define this like so for now:

	struct CollatorPool {
		collator_pool_len: int128,
			// size of the collator pool
		collator_pool: [Address; collator_pool_len], 
			// array of active collator addresses
		empty_slots_stack_depth: int128,
		empty_slots_stack: [int128; empty_slots_stack_depth],	
			// stack of empty collator slot indices
		empty_slots_stack_top: int128,		// top index of the stack
	}

What is the depth of empty_slots_stack? Is it 1024 words = 1024 *32 bytes like the current stack depth?

Or should empty_slots_stack: int128[int128] actually be written in Rust like empty_slots_stack: HashMap<H128, H128>?

Why make this an arbitrary byte array? Is that for abstraction reasons for custom signature schemes like Lamport signatures, ECDSA, etc.? This is actually meant to be a fixed byte array but the syntax is wrong, see my comment below.

The int128 is the collator’s id in the array, not the address itself.

1 Like

OK, thanks for clarifying. In that case I’ll need to figure out how to write collator_pool: address[int128] in Rust. I’ll ask around.

I think collator_pool: address[int128] is meant to be used like CollatorPool.collator_pool[int128], returning the address of the collator, i.e. you find the address (as a custom type) of a collator by accessing it’s ID in the collator_pool. But clarification would be good as it’s not very clear. Alternatively it may mean or be intended to be used like CollatorPool.collator_pool[address}, which will return the collator’s IDas an int128 if the collator exists, or produce an error otherwise. In any case it’s not very clear, and others on Stack Overflow couldn’t tell either. Alternatively this could be a hash map, which seems to be the most plausible. I’ve used collator_pool: HashMap<address, H128>. (I’ll release the full code soon, but H128 is a type from the crate ethereum_types.)

1 Like

OK thanks, so address is the value type and H128 is the key type so swapping them around we have collator_pool: HashMap<H128, Address>. @JustinDrake it would be helpful if you could clarify in the spec that this is a map in Vyper, which corresponds to an associative array.

It would be good to get a spec for the sharding p2p network. I guess that’s an R&D project.

I think I still need an answer to this, but I guess we have to wait until phase 2 when the EVM is defined?

Why does the empty slots stack need to have any limits at all?

1 Like

Well, I was under the impression that it would have a limit like the current stack depth in the EVM of 1024 words. Thus if you have any stack I was thinking it would utilize the EVM stack.

While I’m not an expert on stacks, doing some quick reading gives the following. Note the second paragraph on the depth from Wikipedia here, I have given the previous one for more context.

With stack machines, in contrast, results can be stored in one of two ways. Firstly, results can be stored using a temporary variable in memory. Storing and subsequent retrievals cost additional instructions and additional data cache cycles. Doing this is only a win if the subexpression computation costs more in time than fetching from memory, which in most stack CPUs, almost always is the case. It is never worthwhile for simple variables and pointer fetches, because those already have the same cost of one data cache cycle per access. It is only marginally worthwhile for expressions like X+1. These simpler expressions make up the majority of redundant, optimizable expressions in programs written in non-concatenative languages. An optimizing compiler can only win on redundancies that the programmer could have avoided in the source code.

The second way leaves a computed value on the data stack, duplicating it as needed. This uses operations to copy stack entries. The stack must be depth shallow enough for the CPU’s available copy instructions. Hand-written stack code often uses this approach, and achieves speeds like general-purpose register machines.[4][8][9] Unfortunately, algorithms for optimal “stack scheduling” are not in wide use by programming languages.

But we’re talking about a stack variable here, so I guess the stack based virtual machine is not as relevant. Since memory and virtual memory can be unlimited, a stack variable can also be. But I’m not sure.

Having a limited stack can prevent consuming all memory and crashing a system.

https://www.linuxquestions.org/questions/programming-9/why-the-stack-size-limit-878108/

But the empty slots stack is a data structure in permanent storage, which can be much much bigger.

2 Likes

OK, fair enough, I see your point.

I’m implementing this over at https://github.com/Drops-of-Diamond/Diamond-drops. Feedback is appreciated. I’ll probably push commits at the end of each day, but I’ve updated the repo before submitting this comment.

I think register_collator()should take the address of the collator as an argument, and also msg.value and msg.sender (unless these are all public and within the namespace). I’ve imported Address from ethereum_types, but I don’t think there’s any support for msg.value and msg.sender. That makes things difficult. I would either have to build a Solidity or Vyper-like interpreter that supports Rust, or build it in one of these and then convert it to Rust. I will have to look into the latter, for the former it’d be better to use wasm. Continuing, to update the collator_registry you need to pass the address of the collator into it, and so since this is used in register_collator(), you will also need to pass the address of the collator into it.

This is a bit vague for a spec (even though it is a draft), it would be better to be more specific. When is collator_pool_len going to change—I assume it would be incremented when there are no empty slots in the stack? Should we pop() an empty collator slot from empty_slots_stack_top and then .insert(empty_slots_stack_top, Address)? where Address is the Address of the collator that has just been registered? I have implemented my interpretation.

So the _valueType of a pair in the above map will be the collator slot index, but what is the _keyType? Perhaps it would be simpler to have this as a vector rather than a hashmap.

2 Likes

I think register_collator()should take the address of the collator as an argument

You should be able to see msg.value and msg.sender, there’s no need to replicate that in register_collator’s arguments.

I think the method should return an uint, the index in the collator_pool the collator is, instead of a bool.

This is a bit vague for a spec (even though it is a draft), it would be better to be more specific. When is collator_pool_len going to change—I assume it would be incremented when there are no empty slots in the stack? Should we pop() an empty collator slot from empty_slots_stack_top and then .insert(empty_slots_stack_top, Address)? where Address is the Address of the collator that has just been registered? I have implemented my interpretation.

In my opinion collator_pool_len is the length of the collator pool, so when collators leave the pool, this is decremented, when collators enter the pool, this is incremented. The empty_slots_stack should be transparent to the collator’s pool length.

1 Like

Yeah, but you need to get these parameters from the EVM, and I can either use Solidity, Vyper, or Hera with a rustc binding. There’s also https://github.com/brson/mir2wasm/ which is experimental, and plans to compile Rust to WebAssembly via rustc + rust MIR. For the latter you need to make it compile to EWasm only, not all of Wasm, and compile EWasm to EVM bytecode, the latter of which can already be done, AIUI.

That makes sense, except for that I still don’t see why you would have the empty_slots_stack. However, another possible way to interpret it is that if a collator leaves after being registered, the index stays, but the address is updated to empty. Then the empty_slots_stack has the index added/pushed (that is associated with this now empty address). Then when a collator enters after registering, you pop off the index from the empty_slots_stack, then pass it to the collator registry and update the valueType associated with this index to the collator’s address.

My implementation doesn’t bother to have separate variables for collator_pool_len and empty_slots_stack_top, you can just use pop() and push() methods. I don’t see the advantage in having separate variables.

I think it should be made clear that the actual SMC code will be written once likely in Vyper, and published to the main chain, and then everyone can call that contract; there’s no reason to make a custom reimplementation. It’s like Casper FFG in this way. So the spec doesn’t need to be precise on the implementation level, only on the interface level.

4 Likes