Only traces require an archive node. Everything else (block header, transactions, receipts) is available with a full node (block history, transaction history, receipt history).
The single most important goal, from the start, was that things worked on small machines.
If you require an archive node, then this doesn’t work on a small machine as archive nodes don’t work on small machines.
I think your model is generally good, but I’m now quite concerned about the inability to disable the trace requirement. You would catch probably 99% of appearances without traces (just by looking at headers, transactions and receipts), and the disk requirement of a full node is about an order of magnitude lower than the disk requirement of an archive node. While I often complain about the size of Ethereum full nodes and the fact that most users cannot run them, requiring an archive node essentially forces you into having a dedicated server.
The use case I’m most interested in is the ability for a user to get their transaction history (appearance history would be even better) without needing to outsource to a third party like Etherscan or run their own multi-terabyte servers with indexes. I feel like your proposed solution here is really close to providing that, but only if traces can be either disabled or flagged in the index so they can be ignored by the vast majority of people who don’t have Ethereum archive nodes.
Note: I am not a fan of the solution of “just use a third party hosted archive node” as that is a point of centralization that I think we should try to avoid, and I’m also not a fan of “just by a 4TB drive to store Ethereum state history on” as that puts the solution out of reach of essentially all consumers (even consumers with high end computers).
The above is all doubly true if you want the index to be built and maintained within existing clients. If it only works with an archive node, I think that is a non-starter as the assumption is that almost no-one runs an archive node, so the index wouldn’t be useful to all of those people.
Do you have any data on what percentage of appearances would be missed if you dropped the transaction tracing? I would be curious to see that data with “app” contracts filtered out (e.g., ignore Uniswap internal stuff). It would be even cooler if we could somehow figure out how to filter out bots (e.g., MEV bots). My guess is that once you filter out apps and bots, the number of addresses that appear only in traces and not in transaction body or events, is vanishingly small and not worth the archive node requirement.