OK, I think I cracked how to do it nonce-based.
Basically, every shard remembers a “last nonce sent” and “last nonce received” for every other shard. To send a receipt from shard A to shard B, if shard[A].last_nonce_sent_to[B] = n
you need to provide a receipt from shard B proving that shard[B].last_nonce_received_from[A] >= n-2
. So the sent and received branches have to be almost perfectly in sync, and if someone skips receiving then the next person to send has to cover for them.
I think this design is simpler and lets us get rid of bitfields.