Agree that setting H(0, 0) = 0 is an optimization!
Another thing I am thinking about is, is there an ultra-simple hack that can allow us to avoid having to do a hash call for H(x, 0) or H(0, x) as well? Unfortunately it seems like you can’t do it at least with the same domain (32 byte values) as the hash function, because if the function is invertible, then defining H^{-1}_{0L}(x) such that H_{0L}^{-1}(x) = y implies H(0, y) = x (and similarly for H_{0R}^{-1} for the 0 in the right position), then given any a = H(b, c), a = H(0, H_{0L}^{-1}(a)) = H(H_{0R}^{-1}(a), 0).
If there isn’t I don’t think that’s a big deal; hash functions are quite cheap and fast these days, so doing 256 hashes instead of 32 isn’t too big a deal, especially since it’s only needed for writes and not reads (and for multi-branch updates the work is parallelizable!), but something really clean and simple would be nice.