Security Issues in Perl Scripts
By Jordan Dimov (jdimov@cigital.com)

Introduction

A programming language, by design, does not normally constitute a security risk; it is with the programmer that the risk is introduced. Almost every language has certain flaws that may facilitate to some extent the creation of insecure software, but the overall security of a piece of software still depends largely on the knowledge, understanding, and security consciousness of the authors. Perl has its share of security "gotchas", and most Perl programmers are aware of none of them.

In this article, we will look at some of the most widely misused and overlooked features of Perl. We'll see how their incorrect use can pose threats to the security of the system on which they are running as well as to their users. We will show how such weaknesses can be exploited and how to fix or avoid them.

Basic user input vulnerabilities

One big source of security problems in Perl scripts is improperly validated (or unvalidated) user input. Any time your program might take input from an untrusted user, even indirectly, you should be cautious. For example, if you are writing CGI scripts in Perl, expect that malicious users will send you bogus input.

If trusted and used without validation, improper user input to such applications can cause many things to go wrong. The most common and obvious mistake is executing other programs with user provided arguments, without proper validation.

The system() and exec() functions

Perl is famous for its use as a ``glue'' language -- it does an excellent job of calling other programs to do the work for it, carefully coordinating their activities by collecting the output of one program, reformatting it in a particular manner and passing it as input to some other program so everything runs smoothly. As the Perl slogan tells us, there is more than one way to do this.

One way to execute an external program or a system command is by calling the exec() function. When Perl encounters an exec() statement, it looks at the arguments that exec() was invoked with, then starts a new process executing the specified command. Perl never returns control to the original process that called exec().

Another similar function is system(). system() acts very much like exec(). The only major difference is that Perl first forks off a child from the parent process. The child is the argument supplied to system(). The parent process waits until the child is done running, and then proceeds with the rest of the program. We will discuss the system() call in greater detail below, but most of the discussion applies to exec() just as well.

The argument given to system() is a list --- the first element on the list is the name of the program to be executed and the rest of the elements are passed on as arguments to this program. However, system() behaves differently if there is only one parameter. When that is the case, Perl scans the parameter to see if it contains any shell metacharacters. If it does, then it needs those characters to be interpreted by a shell. Therefore, Perl will spawn a command shell (often the Bourne shell) to do the work. Otherwise, Perl will break up the string into words, and call the more efficient C library call execvp(), which does not understand special shell characters.

Now suppose we have a CGI form that asks for a username, and shows some file containing statistics for that user. We might use a system() call to invoke 'cat' for that purpose like this:

     system ("cat /usr/stats/$username"); 

and the $username came from the form:

     $username = param ("username");

The user fills in the form, with username = jdimov for example, then submits it. Perl doesn't find any meta-characters in the string ``cat /usr/stats/jdimov'' so it calls execvp(), which runs ``cat'' and then returns to our script. This script might look harmless, but it can actually be exploited by a malicious attacker. The problem is that by using special characters in the 'username' field on the form, an attacker can execute any command through the shell. For example, let's say the attacker were to send the string "jdimov; cat /etc/passwd". Perl recognizes the semicolon as a meta-character and passes this to the shell:

     cat /usr/stats/jdimov; cat /etc/passwd

The attacker gets both the dummy stats file and the password file. If the attacker is feeling destructive, he could just send "; rm rf /*".

We mentioned earlier that system() takes a list of parameters and executes the first element as a command, passing it the rest of the elements as arguments. So we change our script a little so that only the program we want gets executed:

     system ("cat", "/usr/stats/$username");

Since we specify each argument to the program separately, a shell will never get invoked. Therefore, sending ";rm -rf /*" will not work, because the attack string will be interpreted as a filename only.

This approach is much better than the one argument version, since it avoids use of a shell, but there are still potential pitfalls. In particular, we need to worry about whether the value of $username could ever be used to exploit weaknesses of the program that is being executed (in this case "cat"). For example, an attacker could still exploit our rewritten version of the code to show the system password file by setting $username to the string "../../etc/passwd".

Many other things can go wrong, depending on the program. For example, some applications interpret special character sequences as requests for executing a shell command. One common problem is that some versions of the Unix "mail" utility will execute a shell command when they see the ~! escape sequence in particular contexts. Thus, user input containing "~!rm -rf *" on a blank line in a message body may cause trouble under certain circumstances.

As far as security is concerned, everything stated above with regard to the system() function applies to exec() too.

The open() function

The open() function in Perl is used to open files. In its most common form, it is used in the following way:

     open (FILEHANDLE, "filename");

Used like this, "filename" is open in read-only mode. If "filename" is prefixed with the ">" sign, it is open for output, overwriting the file if it already exists. If it is prefixed with ">>" it is open for appending. The prefix "<" opens the file for input, but this is also the default mode if no prefix is used. Some problems of using unvalidated user input as part of the filename should already be obvious. For example the backward directory traversing trick works just as well here.

There are other worries. Let's modify our script to use open() instead of "cat". We would have something like:

     open (STATFILE, "/usr/stats/$username");

and then some code to read from the file and show it. The Perl documentation tells us that:

If the filename begins with "|", the filename is interpreted as a command to which output is to be piped, and if the filename ends with a "|", the filename is interpreted as a command which pipes output to us.

The user can then run any command under the /usr/stats directory, just by postfixing a '|'. Backwards directory traversal can allow the user to execute any program on the system.

One way to work around this problem is to always explicitly specify that you want the file open for input by prefixing it with the '<' sign:

     open (STATFILE, "</usr/stats/$username");

Sometimes we do want to invoke an external program. For example, let's say that we want to change our script so it reads the old plain-text file /usr/stats/username, but passes it through an HTML filter before showing it to the user. Let's say we have a handy utility sitting around just for this purpose. One approach is to do something like this:

     open (HTML, "/usr/bin/txt2html /usr/stats/$username|");
     print while <HTML>;

Unfortunately, this still goes through the shell. However, we can use an alternate form of the open() call that will avoid spawning a shell:

     open (HTML, "-|") 
       or exec ("/usr/bin/txt2html", "/usr/stats/$username");
     print while <HTML>;

When we open a pipe to "-", either for reading ("-|") or for writing ("|-"), Perl forks the current process and returns the PID of the child process to the parent and 0 to the child. The "or" statement is used to decide whether we are in the parent or child process. If we're in the parent (the return value of open() is nonzero) we continue with the print() statement. Otherwise, we're the child, so we execute the txt2html program, using the safe version of exec() with more than one argument to avoid passing anything through the shell. What happens is that the child process prints the output that txt2html produces to STDOUT and then dies quietly (remember exec() never returns), while in the mean time the parent process reads the results from STDIN. The very same technique can be used for piping output to an external program:

     open (PROGRAM, "|-") 
       or exec ("/usr/bin/progname", "$userinput");
     print PROGRAM, "This is piped to /usr/bin/progname";

These forms of open() should always be preferred to a direct piped open() when pipes are needed, since they don't go through the shell.

Now suppose that we converted the statistics files into nicely formatted HTML pages, and for convenience decided to store them in the same directory as the Perl script that shows them. Then our open() statement might look like this:

     open (STATFILE, "<$username.html");

When the user passes username=jdimov from the form, the script shows jdimov.html. There is still the possibility of attack here. Unlike C and C++, Perl does not use a null byte to terminate strings. Thus the string "jdimov\0blah" is interpreted as just "jdimov" in most C library calls, but remains "jdimov\0blah" in Perl. The problem arises when Perl passes a string containing a null to something that has been written in C. The UNIX kernel and most UNIX shells are pure C. Perl itself is written primarily in C. What happens when the user calls our script as such: "statscript.pl?username=jdimov\%00"? Our script passes the string "jdimov\%00.html" to the corresponding system call in order to open it, but since those system calls are coded in C and expect null-terminated strings, they the .html part. The results? The script will just show the file ``jdimov'' if it exists. It probably doesn't, and even if it does, no big deal. But what if we call the script with:

     statscript.pl?username=statscript.pl%00

If the script is in the same directory as our html files, then we can use this input to trick the poor script into showing us all its guts. This may not be too much of a security threat in this case, but it certainly can be for other programs, since it allows an attacker to analyze the source for other exploitable weaknesses.

Backticks

In Perl, yet another way to read the output of an external program is to enclose the command in backticks. So if we wanted to store the contents of our stats file in the scalar $stats, we could do something like:

     $stats = `cat /usr/stats/$username`;

This does go through the shell. Any script that involves user input inside of a pair of backticks is at risk to all the security problems that we discussed earlier.

There are a couple of different ways to try to make the shell not interpret possible meta-characters, but the safest thing to do is to not use backticks. Instead, open a pipe to STDIN, then fork and execute the external program like we did at the end of the previous section with open().

The eval() and the /e regex modifier

The eval() function can execute a block of Perl code at runtime, returning the value of the last evaluated statement. This kind of functionality is often used for things such as configuration files, which can be written as perl code. Unless you absolutely trust the source of code to be passed to eval(), do not do things like eval $userinput. This also applies to the /e modifier in regular expressions that makes Perl interpret the expression before processing it.

Filtering User Input

One common approach to solving most of the problems we've been discussing in this section is to filter out unwanted meta-characters and other problematic data. For example, we could filter out all periods to avoid backwards directory traversal. Similarly, we can fail whenever we see invalid characters.

This strategy is called "black-listing". The philosophy is that if something is not explicitly forbidden, then it must be okay. A better strategy is "white-listing", which states that if something is not explicitly allowed, then it must be forbidden.

The most significant problem with a black-list is that it's very hard to keep it complete and updated. You may forget to filter out a certain character, or your program may have to switch to a different shell with different set of meta-characters.

Instead of filtering out unwanted meta-characters and other dangerous input, filter in only the input that is legitimate. The following snippet for example will cease to execute a security critical operation if the user input contains anything except letters, numbers, a dot, or an @ sign (characters that may be found in a user's email address):

     unless ($useraddress =~ /^([-\@\w.]+)$/) {
       print "Security error.\n";
       exit (1);
     }

The basic idea is not to try to compile a list of special values to guard against but rather to come up with a list of values that are safe to accept. The choice of acceptable input values will, of course, vary from one application to another. Acceptable values should be chosen in such a way as to minimize their damage-causing potential.

Avoiding the shell

Of course, you should also strive to avoid shells as much as possible. However, this technique is more broadly applicable. If you call an editor, which has special sequences, you can make sure those sequences are not permissible.

Often, you can avoid using external programs to perform a function by using an existing perl module. The Comprehensive Perl Archive Network (CPAN -- www.cpan.org) is a huge resource of tested functional modules for almost anything that a standard UNIX toolset can do. While it may take a little more work to include a module and call it instead of calling an external program, the modular approach is in general far more secure and often a lot more flexible. Just to illustrate the point, using Net::SMTP instead of exec()'ing sendmail --T'' can save you the trouble of going through the shell and can prevent your users from exploiting known vulnerabilities in the 'sendmail' agent.

Other sources of security problems

Insecure Environmental Variables

User input is indeed the chief source of security problems with Perl programs, but there are other factors that should be considered when writing secure Perl code. One commonly exploited weakness of scripts running under the shell or by a web server are insecure environmental variables, most commonly the PATH variable. When you access an external application or utility from within your code by only specifying a relative path to it, you put at odds the security of your whole program and the system that it's running on. Say you have a system() call like this:

  system ("txt2html", "/usr/stats/jdimov");

For this call to work, you assume that the txt2html file is in a directory that is contained somewhere in the PATH variable. But should it happen so that an attacker alters your path to point to some other malicious program with the same name, your system's security is no more guaranteed.

In order to prevent things like this from happening, every program that needs to be even remotely security conscious should start with something like:

     #!/usr/bin/perl -wT
     require 5.001;
     use strict;
     $ENV{PATH} = join ':' => split (" ", << '__EOPATH__');
       /usr/bin
       /bin
       /maybe/something/else
     __EOPATH__

If the program relies on other environmental variables, they should also be explicitly redefined before being used.

Another dangerous variable (this one is more Perl-specific) is the @INC array variable which is a lot like PATH except it specifies where Perl should look for modules to be included in the program. The problem with @INC is pretty much the same as that of PATH  someone might point your Perl to a module that has the same name and does about the same thing as the module you expect, but it also does something subversive in the background. Therefore, @INC should not be trusted any more than PATH and should be completely redefined before including any external modules.

setuid scripts

Normally a Perl program runs with the privileges of the user who executed it. By making a script setuid, its effective user ID can be set to one that has access to resources to which the actual user does not (viz., to the owner ID of the file containing the program). The passwd program for example uses setuid to acquire writing permission to the system password file, thus allowing users to change their own passwords. Since programs that are executed via a CGI interface run with the privileges of the user who runs the web server (usually this is user 'nobody', who has very limited privileges), CGI programmers are often tempted to use the setuid technique to let their scripts perform tricks that they otherwise couldn't. This can be useful, but it can also be very dangerous. For one thing, if an attacker finds a way to exploit a weakness in the script, they won't only gain access to the system, but they will also have it with the privileges of the effective UID of that script (often the 'root' UID).

To avoid this, Perl programs should set the effective UID and GID to the real UID and GID of the process before any file manipulations:

\begin{verbatim}

     $> = $< # set effective user ID to real UID.
     $) = $( # set effective group ID to real GID.

and CGI scripts should always run with the lowest possible privilege.

Beware that just being careful in what you do inside your setuid script doesn't always solve the problem. Some operating systems have bugs in the kernel that make setuid scripts inherently insecure. For this, and other reasons, Perl automatically switches to a special security mode (taint mode) when it runs setuid or setgid scripts. We will discuss taint mode in our next article.

rand()

Generating random numbers on deterministic machines is a nontrivial problem. In security critical applications, random numbers are used intensely for many important tasks ranging from password generation to cryptography. For such purposes, it is vital that the generated numbers are as close to truly random as possible, making it difficult (but never impossible) for an attacker to predict future numbers generated by the algorithm. The Perl rand() function simply calls the corresponding rand(3) function from the standard C library. This routine is not very secure. The C rand() function generates a sequence of pseudorandom numbers based on some initial value called the seed. Given the same seed, two different instances of a program utilizing rand() will produce the same random values. In many implementations of C, and in all version of Perl before 5.004, if a seed is not explicitly specified, it is computed from the current value of the system timer, which is anything but random. Having some information about values produced by rand() at a given point and a sufficient amount of time, any self-respecting cracker can accurately predict the sequence of numbers that rand() will generate next, thus obtaining key knowledge necessary to compromise a system.

One (partial) solution to the rand() problem is to use one of the built-in random number generators on Linux systems -- /dev/random and /dev/urandom. Those are better sources of randomness then the standard library rand() function, but like anything else, they have their own imperfections. The difference between the two devices is that /dev/random stops supplying random numbers when its entropy pool runs out of randomness while /dev/urandom uses cryptography to generate new numbers when the entropy pool runs out. Another solution is to use a secure implementation of one of the more complicated cryptographic random number generators such as Yarrow.

Race Conditions

Race conditions (together with buffer overflows) are a favorite of seasoned crackers. Consider the following code:

     unless (-e "/tmp/a_temporary_file") {
       open (FH, ">/tmp/a_temporary_file");
     }

At first glance this is a very legitimate piece of code that doesn't seem capable of causing any harm. We check to see whether the temporary file exists, and if it doesn't we tell Perl to create it and open it for writing. The problem here is that we assume that our e check is correct at the time we open the file. Of course, Perl wouldn't lie to us about a file existence, but unlikely as it might seem, it is entirely possible that the status of our file has changed between the time we check for it and the time we open it for writing. Suppose that the temporary file does not exist. Suppose also that a knowledgeable attacker, familiar with the workings of our program, executed the following command right at the time after we did our existence check:

    ln -s /tmp/a_temporary_file /etc/an_important_config_file

Now everything we do to the temporary file actually gets done to that important config file of ours. Since we believe that the temp file does not exist (that's what our --e check told us), we go ahead and open it for writing. As a result, our config file gets erased. Not very pleasant. And if the attacker knew what they're doing, this might even be fatal.

Situations like this, where an attacker can race in and change something to cause us trouble between two actions of our program are known as race conditions. In this particular case we have a TOCTOU (Time-Of-Check-Time-Of-Use) race condition. There are several other similar types of race conditions. Such imperfections in a program are very easy to overlook even by experienced programmers, and are being actively exploited. There is no easy omni-powerful solution to this problem. Often the best approach is to use atomic operations when the possibility of race conditions exists. This means using only one system call to do a check for a file and to create that file at the same time, without giving the processor the opportunity to switch to another process in between. This is not always possible though. Another thing we could do in our example would be to use sysopen() and specify a write-only mode, without setting the truncate flag:

     unless (-e "/tmp/a_temporary_file") {
       #open (FH, ">/tmp/a_temporary_file"); 
       sysopen (FH, "/tmp/a_temporary_file", O_WRONLY);  
     }

This way even if our filename does get forged, we won't kill the filewhen we open it for writing.

Note: the module Fcntl must be included in order for that sysopen() call to work, because this is where the constants O_RDONLY, O_WRONLY, O_CREAT, etc. are defined.

Buffer Overflows and Perl

In general, Perl scripts are not susceptible to buffer overflows because Perl dynamically extends its data structures when needed. Perl keeps track of the size and allocated length of every string. Before each time a string is being written into, Perl ensures that enough space is available, and allocates more space for that string if necessary.

There are however a few known buffer overflow conditions in some older implementations of Perl. Notably, version 5.003 can be exploited with buffer overflows. All versions of suidperl (a program designed to work around race conditions in setuid scripts for some kernels) built from distributions of Perl earlier than 5.004 are BO exploitable (CERT Advisory CA--97.17).

Conclusion

In our follow-on article, we will spend some time getting acquainted with the security features that Perl has to offer, particularly Perl's "taint mode", and we'll try to identify some problems that can slip through this tightened security if we are not careful. In studying those aspects of Perl and looking at some characteristic examples, our goal will be to develop an intuition that will help us recognize security problems in Perl scripts at first glance and avoid making similar mistakes in our programs.


Resources

Rain Forest Puppy, Perl CGI problems, Phrack Magazine, Vol. 9, Issue 55, File 07.

The World Wide Web Security FAQ. Chapter 7 -- Safe Scripting in Perl. http://www.w3c.org/Security/Faq/wwwsf5.html

The Perl Security man page.

CGI Programming with Perl, 2nd Edition. O'Reilly and Associates. July 2000.

the ITS4 Software Security Scanner. http://www.cigital.com/its4/

The SANS institute's list of top-ten most-critical internet security threats. http://www.sans.org/topten.htm

Matt Bishop, Michael Dilger. Checking for Race Conditions in File Accesses. Computing Systems 9(2), Spring 1996, pp. 131-152.