In this first chapter, you are going to learn about one of the cornerstones of web security: variable validation.
When you validate a variable, you are checking that its value is acceptable and safe to use in your application.
In other words, you are answering two questions:
- Is the value of the variable acceptable?
- Can I use this variable in my application without risks?
Variable validation is one of the most important web security concepts and it's important to understand it properly.
Before looking at some practical validation techniques, let's see why variable validation is needed in the first place and which variables must be validated.
Which variables must be validated?
Not every variable needs to be validated, because not all variables are created equal.
For example, let's say that you have a $age variable that represents the age of a person (in years):
In this case, you are sure that its value is just what you expect: a positive integer number representing the age of a person.
You are setting the value explicitly, so there are no doubts about its validity.
But what about the next one?
Here, the value of $age is set by a remote user who is submitting an HTML form.
You still expect its value to be a positive integer number, but in this case you cannot be sure it really is.
In fact, the remote user can set the value of the "age" request parameter to anything: a negative number, a text string or an empty value.
Therefore, before you can use the $age variable in your application, you must make sure that its value is valid.
This is what variable validation is about.
Of course, what "valid" means depends on the context.
In this case where $age represents a person's age, valid values are positive integer numbers inside a reasonable range (for example between 1 and 120).
The untrusted data sources
The exact reason why you need to validate the $age variable, is that it contains a value from the request string.
The request string is a data source you cannot trust.
There is nothing you can do to prevent remote users from sending invalid values, including malicious values specifically crafted to attack your website.
(Note that front-end validation can easily be evaded, and it cannot be relied upon).
This leads us to the concept of untrusted data source.
An untrusted data source is a data source that can provide invalid or even harmful data.
Untrusted data sources include:
- The request string (GET and POST data, that you find in the $_GET, $_POST and $_REQUEST arrays)
- Cookies (data inside $_COOKIE)
- Uploaded files (data inside $_FILES)
- Local files accessible to other users
- Remote files downloaded via HTTP or FTP
- Unverified included files
- Database data shared with other apps or users
Every time you set a variable from one of these data sources, you must validate the variable.
Lesson Key PointYOU MUST VALIDATE ALL VARIABLES SET FROM UNTRUSTED DATA SOURCES.
Put into practice, the above list of untrusted data sources includes the following data:
- Cookies, even HTTPS ones.
- Any file uploaded by remote users (you have to validate both the file name and the file content).
- Local files (including files opened with file() or fopen()), unless they are part of your application.
- Files and resources from remote servers like HTTP, FTP, emails, etc.
- Included PHP scripts, unless they are part of your application.
- Database data, unless it's been created by you or already validated.
Now, let's see a simple example of how the validation process works.
The validation process.
Now, how does the validation process work in practice?
Basically, you need to apply one or more validation checks to the variable you need to validate.
For example, to validate the $age variable, you need to check that:
- The "age" request array element exists inside $_POST (this means that the remote user has filled the input form properly).
- $age is a numeric string.
- Its value is a positive integer between 1 and 120.
So, here is the PHP code implementation of these checks (don't worry, we will go through the details later):
This is just a simple example.
There are different checks and techniques you can implement as part of a full validation process. Usually, you will need more than one check for the validation process to be complete.
For instance, the validation you just did on $age required three checks:
- a check on the existence of the request element,
- a check on the type of the request element (integer number),
- and a check on the limits within which the value must be (between 1 and 120).
Often, you will need to apply even stricter checks.
But let's take one step at a time.
- Some data sources are "untrusted", like the request string.
- If you set a variable value from one of these sources, you must validate it before you can use it.
- Front-end validation is not reliable. Only back-end validation is secure.
- A complete variable validation process requires a series of validation checks.