The main concern I have with this kind of approach is concrete efficiency. First of all, if implemented naively, allowing arbitrary cross-shard synchronous transactions, even with delayed execution, leads to the entire system slowing down to a rate not faster than a single chain system. To see why, consider a case where transactions with the following shard pairs are submitted and included at the same block height: (1, 99), (2, 99), (3, 99) … (98, 99).
How does one evaluate the state transition function of shard 83? Well, one can start off by evaluating the transactions up until some height. Then, one gets to the transaction (83, 99). However, the execution of this transaction also depends on the pre-state of shard 99, which depends on transactions (1, 99) … (82, 99). Hence, nodes processing execution on that shard also needs to execute those 82 transactions. The result is that everyone on every shard needs to process on average half of shard 99. If we suppose that similar sequences (1, x) … (99, x) exist for all other x, then everyone needs to process most of the content of every shard. Scalability is lost.
One solution is to split up shards into pairs every block height, and keep shuffling these pairs. However, this means that a synchronous transaction would on average need to wait 50 blocks to get included (and extending the scheme to larger sets would mean that a cross-3-shard transaction would need to wait ~5000 blocks, etc).
The one solution I can think of is to require cross-shard synchronous transactions to specify an access list, and require the access lists of every cross-shard tx in the same shard to be disjoint. Intra-shard txs would be executed after the cross-shard txs. This would allow an execution process where executing any block on any shard requires knowing the state roots of the previous block on all shards, and it would involve fetching the appropriate Merkle proofs for cross-shard data, executing all of the cross-shard transactions, then executing the intra-shard transactions. This is workable, though it does run into the issue that it is vulnerable to reorgs of any shard, and it does contradict the preference I have heard some developers expressing for not separating state execution and block consensus.