Thanks for the comments.
To the first point in 1., not considering bid shading is one of the reasons I called this method an approximation. To the other point, yes in the particular case of EIP-1559 a discontinuity design like the one you described may be more accurate. But actually I realized that case is a bad example for what I was trying to convey: I wanted to suggest that comparing surplus/welfare can be useful in general, whenever we make a system change that does not involve changing the type of mechanism entirely like we did (e.g., changing the learning rate of the update rule or making the rule multidimensional). In that case even if the surplus estimate is biased for the legacy transactions, if you are willing to assume the same bias pre- and post-change, then the delta surplus will be approximately correct. There are other nuances and could be interesting to do a proper estimation exercise to assess the goodness of this approximation, one idea is to use price discontinuity similar to https://www.nber.org/system/files/working_papers/w22627/w22627.pdf
To point 2., there are some transactions for which it is fine to assume private value (e.g., USDC transfers). For the ones related to trading it is more of a case of interdependent values for which analysis is more challenging. This actually shows up in many places in blockchain-related auctions and we need to advance the research on this.