An application obtains data from various trusted and untrusted sources during its workflow. It is important to perform input validation of data obtained from all sources to ensure that only properly formed data gains entry into the application workflow. If not properly validated, malformed input can lead to attacks such as SQL injection and cross-site scripting (XSS). Always conduct input validation on the server—even if client-side validation is also present.
Syntactic and semantic validation
It’s important to validate data both syntactically and semantically. Syntactic validation ensures that the input data has the correct elements such as structure, data type, and length. On the other hand, semantic validation ensures the correctness of data per business logic (e.g., checking for negative price).
Blacklisting and whitelisting
The two primary approaches for performing input validation are blacklisting and whitelisting. Blacklisting involves detecting dangerous characters and patterns in the input (e.g., an apostrophe character or the <script> tag) and filtering them out. Since blacklisting does not account for all attack vectors, it is relatively easy to bypass these checks and controls. Also, the exclude list needs to be updated every time a new attack vector is discovered. This isn’t very manageable. As such, they should not be used for validating data.
Whitelisting involves defining a set of approved characters and patterns. It also involves implementing controls for validating that the input contains characters and patterns only from the defined set. If the input contains any other characters, that input is discarded. This is a stronger approach for input validation and is generally performed using regular expressions.