For injection defects like SQL injection (SQLi), the proper remediation technique is neither input validation nor sanitization but rather using the appropriate parameterized queries. But for HTML injection (XSS), it is necessary to sanitize—or more specifically, to escape—the user-controlled data. However, escaping is context-dependent. That is to say, if the same data is used in multiple places, a different escaping may be necessary. There are five HTML contexts, and there’s simply no way to universally escape the data. (At that second link, notice especially the text in bold.)
We could imagine a pathological case in which a single input value is used in one, and only one, HTML context multiple times. In that case, we could conceivably escape at the source of the data. However, if the data is ever used in a different context, it would have to be re-encoded. More importantly, however, even in this imagined case, the regression risk would simply be unacceptable.
Let’s say you have a user whose nickname is “Joe’s B&B.” It’s determined that an account management page is subject to XSS because the nickname is output without encoding. In addition to the security defect, this page also probably appears broken to this user, as the second B is treated as an HTML escape sequence and doesn’t appear in the rendered text. However, this is just a minor annoyance.
It’s very common for every page that displays user information to use the same model class. Escaping the data when populating the model would fix the XSS on the account management page. But what regressions would be introduced? A reasonable authorization measure for an account update might verify that the nickname of the logged-in user matches the nickname of the account being updated. Only now that we’ve escaped the nickname, the comparison will always fail, as the nickname in the database is the unescaped version. This is a fairly simple example, but it illustrates the point quite well.