File name validation



A secure file upload process begins with the file name validation.

Let's see an example right away.

Suppose you have this upload form:

And this PHP script that handles the file uploaded through the form:


The uploaded file is saved inside the local directory /home/uploads/. The name of the saved file is the same as the original file.

However, the above code does not validate the file name, which is an untrusted variable that the remote user can edit in any way.

In fact, a malicious user could include invalid or harmful characters in the name, even without actually renaming the file (it's enough to use the browser development tools to change the name).

The PHP file handler automatically performs a basic sanitization process on the file name. For example, it removes directory traversal attempts like "../".

However, many dangerous strings (like "..") are not filtered out.

Moreover, you probably want to allow only a limited set of characters for file names.

The name validation process consists of the following steps:

  1. Name length check
  2. Characters and names filtering
  3. Extension check (which will be covered in the next lesson)

Let's see how to implement them.




Name length check

You must check the uploaded file name length to make sure that:

  • The name is long enough (as a minimum, at least 1 character is required)
  • The name length does not break the file system limits
  • The name length does not break the database limits (if it's going to be saved on a database)
  • The name length is no longer than an arbitrary maximum value set by you.


Usually, setting an arbitrary maximum value is all you need to do. In fact, modern file systems accept very long file names, and database text columns usually accept at least 256 characters (far more than a reasonable file name length). But make sure to check that all those limits are not exceeded.

A maximum value of 32 or 64 characters is usually a good choice.

As a precaution, you should trim() the file name to remove "blank" characters from the beginning and the ending of the name.

This is how to check the file name length:


Pro Tip

YOU CAN ALSO REJECT THE UPLOADED FILE IF THE TRIMMED FILE NAME DOES NOT MATCH THE ORIGINAL FILE NAME.




Characters and names filtering

The next step is to filter out invalid characters.

This is one of the few cases where you can apply a whitelist-based filter.

In fact, you can define a list of accepted characters and filter out everything else.

If an invalid character is found, you can either:

  • reject the uploaded file, or
  • simply strip the invalid character from the name.


If you decide to remove the invalid characters but keep the file, remember to check that the name length is still valid (if all name characters are invalid, the name will be empty).

To apply this filter, you can either use a regular expression, like this:

Ora a simpler find/replace solution, easier if you want to remove the invalid characters:


You can also apply stricter character filters. For example, you could limit the number of non-alphanumeric characters in the name, or prevent names beginning or ending with such characters.

The dot character (.) requires special attention and we will cover it in the next lesson.




In some cases, you may want to prevent specific names from being used.

For example, if you are saving the file inside the web server root (which, as explained later, you should avoid whenever possible), the name .htaccess should not be used.

You can prevent specific names from being used with a blacklist-based filter:

Note:

You must also check that a file with the same name does not exist in the upload directory, otherwise the current file will be overwritten.

This case is explained in the "Forced file name" lesson.


Lesson takeaways

  • The file name must be validated to avoid harmful characters and names.
  • The name length must be between a minimum and a maximum values.
  • You can filter out invalid characters using a whitelist-based filter.
  • You can filter out invalid names using a blacklist-based filter.



Complete and Continue  
Discussion

8 comments