Sure. Before I answer the more specific points below, I’ll note that “data availability” is a bit like “decentralisation” in that it can be somewhat of a nebulous term. (It is not an objective thing like validity, it takes different qualitatively forms, and can vary along quantitative continuums.) I will focus specifically on real-time data availability, which is the public access to freshly produced mining data (blocks or collations) via network gossip. Real-time data availability emerges from miner incentives, and is present in both the sharded and non-sharded contexts.
Whenever miner data is created (be it a block or a collation) that data is published to a public gossip feed. In the non-sharded context there’s a single real-time gossip feed, whereas in the sharded context there is one real-time gossip feed per shard. In short, the witnesses get gossiped. A validator cannot build on top of a collation header published on the main shard for which it does not have the corresponding collation body.
Thanks to real-time data availability, anyone who cares about specific witnesses can maintain those witnesses. There are no restrictions as to where storage happens, and there are many setups (such as storage markets) that make sense. To illustrate consider the witnesses for Kitty #417398:
- The owner of #417398 is incentivised to maintain witnesses to maintain effective ownership of the cat.
- The developers and maintainers of cryptokitties.co are incentivised to maintain witnesses e.g. to feed their for-profit marketplace, avoid angry users with unspendable kitties.
- Miners are incentivised maintain witnesses to unlock mining fees by “fillling in” transactions with missing/stale witnesses (c.f. this post).
- Other participants (volunteers who want to support Ethereum pro bono, academics, archive.org, etc.) may also maintain witnesses.
The point is not for everyone to have access to all witnesses at any time. That’s overkill. It is sufficient to have those who care about specific witnesses have access to those when they need them, and real-time data availability unlocks the possibility for that.
The same thing works in the stateless client paradigm. Real-time data availability allows user Y to monitor for malicious activity and react.
That’s not how sharding works (see rough spec here). Basically every validator works on all shards. The validators form a pool and individual validators are given the right to create blocks on random shards. Every “period” the validators are “shuffled”. Stateless clients allow for the period to be kept very short, and for the randomisation process to have a “lookahead” of just a few minutes. So to compromise a given shard an attacker would need the ability to compromise enough validators from the whole validator pool, and compromise individual validators at a few minutes notice.