Introduction to variable validation
In this first chapter you will learn about one of the cornerstones of web security: variable validation.
Validating a variable means checking that its value is acceptable and safe to use in your application.
In other words, it answers two questions:
- Is the value of the variable acceptable?
- Is using this variable secure for your application?
Even if it's a fairly easy concept, variable validation is one of the most important web security concepts and it's important to understand it correctly.
Before looking at some practical validation techniques, let's see why variable validation is needed in the first place and which variables must be validated.
Which variables must be validated?
Not every variable needs to be validated, because not all variables are created equal.
For example, let's say you have an $age variable with the age of a person (in years). When you declare your variable like this:
you are sure that its value is just what you expect: a positive integer number representing the age of a person. No problems here.
But what about the next one?
Here, the value of $age is set by a remote user who is submitting an HTML form.
You still expect that value to be a positive integer number, but in this case you cannot be sure it really is.
In fact, the remote user could set the value of the "age" request parameter to anything: a negative number, a text string or an empty value.
Therefore, before you can use the $age variable in your application, you need to make sure that its value is valid.
This is what variable validation is about.
Of course, what "valid" means depends on the context.
In this case, $age represents a person's age, so a valid value is a positive integer number inside a reasonable range, let's say between 1 and 120.
When you cannot trust your variables: the "untrusted sources"
Let's look at the exact reason why you need to validate the $age variable.
The root of the problem is that $age contains a value from the request string, which is a data source you cannot trust.
Indeed, there is nothing you can do to prevent remote users from sending invalid values, including malicious values specifically crafted to attack your website.
(Note that front-end validation can easily be evaded and must never be relied upon).
Generally speaking, an untrusted source is a data source that could potentially provide invalid or even harmful data.
Untrusted sources include:
- The request string (GET and POST data)
- Cookies (data inside $_COOKIE)
- Uploaded files (data inside $_FILES)
- Local files accessible to other users
- Remote files downloaded via HTTP or FTP
- Unverified included files
- Database data shared with other apps or users
- ... and more
Whenever you set a variable with a value from one of these sources, that variable must be validated.
You can translate the above list of untrusted sources into a practical set of variables that you must always validate:
- Everything that comes from the request string, including HTML forms, AJAX and front-end connections
- Cookies, even secure ones
- Any file uploaded by remote users (you have to validate both the file name and the file content)
- Local files (for example, opened with file() or fopen()), unless they are part of your application
- Files and resources from remote servers like HTTP, FTP, emails, etc.
- Included PHP scripts unless they are part of your application
- Database data, unless it's been created by you or already validated
Now, let's see a simple example of how the validation process works.
The validation process
Now, how does this validation process work in practice?
It's quite simple. Basically, you need to apply a series of validation checks to the variable you need to validate.
For example, to validate the $age variable, you need to check that:
- The "age" request array element exists inside $_POST (it means the remote user has filled the input form)
- $age is a numeric string
- Its value is a positive integer between 1 and 120
If that seems easy to you... it's because it is! Here's how you can do it:
This is just a simple example.
There are different checks and techniques you can implement as part of a full validation process. Usually, you will need to use more than one check for a validation process to be complete.
The validation you just did on $age, for example, required three checks:
- a check on the existence of the request element,
- a check on the type of the request element (integer number),
- and a check on the limits within the value is supposed to stay (between 1 and 120).
Often, you will need to apply even stricter checks.
But let's take one step at a time. In the next lesson you are going to learn how to check the type of a variable.
- Some data sources are "untrusted", like the request string.
- If you set a variable from one of these sources, you must validate it before you can use it in your application.
- Front-end validation is never enough. Only back-end validation is really secure.
- A complete variable validation process requires a series of validation checks.