File name validation



A secure file upload process begins with file name validation.

Let's see an example right away.

Let's say that you have this upload form:

And this PHP script that handles the file uploaded through the form:


The uploaded file is saved inside the local directory /home/uploads/. The name of the saved file is the same as the original file.

However, the above code does not validate the file name, which is an untrusted variable that the remote user can edit in any way.

In fact, a malicious user could include invalid or harmful characters in the name, even without actually renaming the file (it's enough to use the browser development tools to change the name).

The PHP file handler automatically performs a basic sanitization process on the file name. For example, it removes directory traversal attempts like "../".

However, many dangerous strings (like "..") are not filtered out.

Moreover, you probably want to allow only a limited set of characters for file names.

The name validation process consists of the following steps:

  1. Name length check.
  2. Characters and names filtering.
  3. Extension check (which will be covered in the next lesson).

Let's see how to implement them.




Name length check

The file name must meet the following criteria:

  • The name must be long enough (as a minimum, at least 1 character is required).
  • The name length must not break the file system limits.
  • The name length must not break the database limits, if it's going to be saved on a database.
  • The name length must be less than an arbitrary maximum value set by you.


A maximum value of 32 or 64 characters is usually a good choice.

Modern file systems accept very long file names, and database text columns usually accept at least 256 characters (far more than a reasonable file name length). But make sure to check that all those limits are not exceeded, just in case.

As a precaution, you should trim() the file name to remove "blank" characters from the beginning and the ending of the name. If you want to be even stricter, you can reject the uploaded file if the trimmed file name is different from the original name.

This is how to check the file name length:




Characters and names filtering

The next step is to filter out invalid characters.

This is one of the few cases where you can apply a whitelist-based filter: you can define a list of accepted characters and filter out everything else.

If an invalid character is found, you can either:

  • reject the uploaded file, or
  • simply strip the invalid character from the name.


If you decide to remove the invalid characters but to keep the file, remember to check the name length again because it will be shorter than the original name.

To apply this filter, a solution is to use a regular expression.

For example, here is how to accept only letters (lowercase and uppercase) and the characters ".", "_" and "-":


You can also implement a simple find/replace solution. This is a better choice if you want to remove the invalid characters:


You can apply stricter text filters as well. For instance, you can require the file name to begin with a letter.

The dot character (.) requires special attention and we will cover it in the next lesson.




In some cases, you may want to prevent specific names from being used.

For example, if you are saving the file inside the web server root (but you should avoid doing that, as explained later), then the name .htaccess should not be used.

You can prevent specific names from being used with a blacklist-based filter:


Note:

You must also check that a file with the same name does not exist in the upload directory, otherwise the current file will be overwritten.

This case is explained in the "Forced file name" lesson.


Lesson takeaways

  • The file name must be validated to avoid harmful characters and names.
  • The name length must be between a minimum and a maximum value.
  • You can filter out invalid characters using a whitelist-based filter.
  • You can filter out invalid names using a blacklist-based filter.



Complete and Continue  
Discussion

0 comments