Software Integrity

 

Mitigate XSS: Why input validation is bogus

Ask any security guy/gal about how to best mitigate cross-site scripting (XSS) and what is the answer? It’s some variation on validating input. Look at my own writings about this topic and what will you find? Variations on the input validation theme. Input validation is a great solution for new applications, but it’s a horrible choice for existing applications.

Why this change of heart? Well, this is something that’s been coming for a quite a while. I’ve become more and more disillusioned with input validation. Let’s start with some basics.

The first few reasons are well written about. Black-lists, whether syntactic or semantic, suffer the problems of black-lists: they can only look for known bad data. Not only that, but they often prevent good data from being input.

White-lists are then given as the answer. But, I’ve been had a nagging suspicion that syntactic white-lists are useless except for a small number of highly structured types. What’s worried me is that when writing several language compilers I’ve always had cases where I had to write syntax rules that were more lenient than the language allowed and then sort out the problem in semantic analysis. This implementation restriction was due to the limitations of my parsers and my parsers were a lot more sophisticated then the reg-exp parsers in many validation frameworks. So, syntactic white-lists have to let “bad data” through; not a fool proof solution.

Semantic white-lists? Well, these are great for enumerations, but is all input enumerations?

Don’t even get me started about GET versus POST.

Where does this leave us? Well, it leaves us with writing guidance that looks like a patch-work quilt. You use semantic white-lists here, but not there. You use syntactic white-lists for some kinds of data, but they will let bad data through.

So, here’s the nail in the output validation coffin. Let’s say you had a sizeable application with lots of fields of different types. What are you going to do? Take a swing that you can figure how the right type of input validation? What if you get it wrong? Well, if you get input validation wrong you just broke some percentage of your existing installed base. How large a percentage – you don’t know.

Why not just go with an output encoding solution? Yes, it seems like you’re snipping off the leaves of the tree. Yes, you need to build some way to tag output so that your test team can test that you’ve snipped off all of the leaves.

When I think about this problem as managing risk, Am I worried about the N% of the XSS bugs I’ve not fixed or the N% of the input fields I’ve now broken? I dunno about you, but I’d rather not break something that worked and have to “roll back” that fix – thus [re-]opening up an XSS vulnerability.