Outside of a dog, a book is a man's best friend. Inside a dog it's too dark to read. | |
Groucho Marx |
Let's get a bit more serious and examine the assembly program from In the language of evil with readelf(1), part of the binutils package. This tool is huge. Here we use only one option, -l.
Command: src/readelf/evil_magic.sh
#!/bin/sh
strip ${TMP}/evil_magic/nasm
ls -l ${TMP}/evil_magic/nasm
readelf -l ${TMP}/evil_magic/nasm |
Output: out/redhat-linux-i386/readelf/evil_magic
-rwxr-xr-x 1 alba anonymou 476 Jun 30 00:06 tmp/redhat-linux-i386/evil_magic/nasm
Elf file type is EXEC (Executable file)
Entry point 0x8048080
There are 1 program headers, starting at offset 52
Program Header:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
LOAD 0x000000 0x08048000 0x08048000 0x00097 0x00097 R E 0x1000
Section to Segment mapping:
Segment Sections...
00 .text |
Nice to see the entry point we retrieved through od(1) again. Program layout is a simplified variation of Sort of an answer. The value of FileSiz includes ELF header and program header. The size of this overhead is:
So effective code size is:overhead = Entry point - VirtAddr = 0x8048080 - 0x8048000 = 0x80 = 128 bytes
This matches with the disassembly listing. However, the ratio of file size to effective code deserves the title "Bloat", with capital B.code size = FileSiz - overhead = 0x97 - 0x80 = 0x17 = 23 bytes
Only 5 percent of the file actually do something useful!code size / file size = 23 / 476 = 0.048
Anyway, we see that even for trivial examples the code is surrounded by lots of other stuff. Let's zoom in on our target.
Command: src/readelf/segments.sh
#!/bin/sh
ls -l /bin/bash
readelf -l /bin/bash |
Output: out/redhat-linux-i386/readelf/segments
-rwxr-xr-x 1 root root 519964 Jul 9 2001 /bin/bash
Elf file type is EXEC (Executable file)
Entry point 0x8059380
There are 6 program headers, starting at offset 52
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000034 0x08048034 0x08048034 0x000c0 0x000c0 R E 0x4
INTERP 0x0000f4 0x080480f4 0x080480f4 0x00013 0x00013 R 0x1
[Requesting program interpreter: /lib/ld-linux.so.2]
LOAD 0x000000 0x08048000 0x08048000 0x79273 0x79273 R E 0x1000
LOAD 0x079280 0x080c2280 0x080c2280 0x057e0 0x09bd0 RW 0x1000
DYNAMIC 0x07e980 0x080c7980 0x080c7980 0x000e0 0x000e0 RW 0x4
NOTE 0x000108 0x08048108 0x08048108 0x00020 0x00020 R 0x4
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .hash .dynsym .dynstr .gnu.version .gnu.version_r .rel.got .rel.bss .rel.plt .init .plt .text .fini .rodata
03 .data .eh_frame .ctors .dtors .got .dynamic .bss
04 .dynamic
05 .note.ABI-tag |
Looks intimidating. But then the ELF specification says that only segments of type "LOAD" are considered for execution. Since the flags of the first one are R E, meaning "read & execute", we know that it must be the code segment. The other one has RW, meaning "read & write", so it must be the data segment.
MemSiz is larger than FileSiz in the data segment. Just like with mmap(2) excessive bytes are defined to be initialized with 0. The linker takes advantages of that by grouping all variables that should be initialized to zero at the end. Note that the last section of segment 3 (counting starts with 0) is called .bss, the traditional name for this kind of area.
The mapping for segment 2 looks even more complex. But I would guess that .rodata means "read-only data" and .text contains productive code, as opposed to the administrative stuff in the other sections. LSB [1] has a good overview of section names. [2]
The distance between the two LOAD segments is interesting:
VirtAddr[2] - VirtAddr[1] - FileSiz[1] = 0x80c2280 - 0x8048000 - 0x79273 = 0x100d = 4109 bytes
Only 13 bytes (0xd) would be needed to align the first LOAD segment up to the alignment of 0x1000. For some reason at least one complete page lies between code segment and data segment. This would be easy target for a tiny virus. So lets check out whether this is a unique phenomenon.
The following Perl script reads a list of file names from stdin, one file name per line. readelf -l is called for every file, and the output is parsed for lines starting with the word LOAD. The end of the segment is calculated and stored for the next line. If the distance from the start of a LOAD segment to the end of the previous one is less than 0x1000, a message is written.
Source: src/scanner/dist.pl
#!/usr/bin/perl -w
use strict;
my $min = 0xFFFFFFFF;
my $max = 0;
my $detected = 0;
LOOP: while(my $filename = <>)
{
chomp $filename; $filename =~ s/^\s*//;
next LOOP if ( ! -e $filename );
open(ELF, "readelf -l $filename 2>&1 |") || die "$1 ($filename)";
my $nrLoad = 0;
my $end = 0;
LINE: while(my $line = <ELF>)
{
chomp $line;
next LINE if (!($line =~ m/^\s*LOAD\s*/));
$nrLoad++;
my @number = split / +/, $line;
my $virtaddr = hex($number[3]);
my $filesiz = hex($number[5]);
if ($end != 0)
{
my $dist = $virtaddr - $end;
if ($dist < 0x1000)
{
printf "%-46s virtaddr=0x%08x dist=0x%08x\n",
$filename, $virtaddr, $dist;
$detected++;
}
$max = $dist if ($dist > $max);
$min = $dist if ($dist < $min);
}
$end = $virtaddr + $filesiz;
}
close ELF;
printf("%-46s has %d LOAD segments.\n", $filename, $nrLoad)
if ($nrLoad != 2);
}
printf "%4d files; %4d detected; min=0x%08x; max=0x%08x\n",
$., $detected, $min, $max; |
You will now experience a time warp. I introduced above script as a means to verify the existence of an exploitable peculiarity. But this also makes it a tool to check whether the peculiarity is already exploited. In other words, it is a scanner. For viruses we yet have to develop.
Of course the naive implementation through parsing the output of readelf(1) significantly limits performance. But use of file(1) as a fast file-type filter will lower noise ("has 0 LOAD segments") and duration to acceptable regions.
The first test is with typical places like /bin. Only the last line of output is shown.
Command: src/scanner/big.sh
#!/bin/sh
dst=$1
scanner=${2:-src/scanner/dist.pl}
find /bin /sbin /usr/bin /usr/sbin /usr/lib -type f -print0 \
| xargs -r0 file \
| sed -ne 's/: *ELF 32-bit [LM]SB executable,.*//p' \
| ( ${scanner} 2>&1 ) \
| tee ${dst}.full \
| tail -1 \
> ${dst} |
Output: out/redhat-linux-i386/scanner/dist_big
1475 files; 20 detected; min=0x00000020; max=0x0000101f |
Strange result. On my original installation of RedHat 7.2 all files could be infected through this method. Since the second quarter of 2002 some (but not all) executables of updated packages are immune. On a similar installation of RedHat 7.3 this scanner detects a few hundred files. Either someone is actively closing a gap. Or I caught an infection …
Anyway, there are still enough interesting files left. So on to all infected executables created from the sources of this document.
Command: src/scanner/small.sh
#!/bin/sh
dst=$1
scanner=${2:-src/scanner/dist.pl}
find ${TMP} -type f -name '*_infected' -print0 \
| xargs -r0 file \
| sed -ne 's/: *ELF 32-bit [LM]SB executable,.*//p' \
| ( ${scanner} 2>&1 ) \
| tee ${dst}.full \
| tail -1 \
> ${dst} |
Again only the last line of output is shown. It's enough to see that all infected files are detected.
Output: out/redhat-linux-i386/scanner/dist_small
35 files; 35 detected; min=0xf7f385a0; max=0x00001010 |
You may have heard that Linux is a difficult target for malware [3] because there are so many different distributions. Well, they all use basically the same compiler, producing the same idiosyncrasies. This allows us to cheat in big style.
Insert our code between code segment and data segment.
Modify inserted code to jump to original entry point afterwards.
Change entry point to start of our code.
Modify program header
Include increased amount of code in entry of code segment.
Move all following entries down the file.
Modify section header
Include trailing code in last section of code segment (should be .rodata).
Move all following sections down the file.
This setup has two big problems, however.
Code size is limited to 0x1000 bytes. Manageable with assembly. Tough luck for C.
Infected executables will be detected by above Perl script . But you know that already.
Since all executables in /bin and /usr/bin follow the same layout, a heuristic scanner can easily spot deviations. A "perfect infection", resulting in a executable indistinguishable from the real thing, is far from sight. But then there are bigger issues an innocent virus seeking a warm nest in the wild would face.
For example RPM-based distributions maintain a checksum database. Verifying a single file, a complete package, or even all installed packages takes just one command.
If you know what you are looking for:
rpm --verify -f /bin/sh |
For dedicated people with enough time to read the output:
/bin/nice -n 19 rpm --verify --all |
A possible counter attack is to patch the database after infection. This is distribution dependent and requires root permissions. And it won't help against people who have the checksums offline. That type of precaution is commonly called "Intrusion Detection System". There is a FAQ [4] hosted by SANS Institute [5] And a brief introduction to products is "Talisker's Intrusion Detection System List". [6] tripwire [7] has achieved a big name, but there is other free software: snort, [8] aide, [9] and lids [10]
Another possibility is to hide the original (uninfected) executable on the file system, and patch the kernel via an inserted module to fake calculation of the checksum. A common name for this concept is "Loadable Kernel Module (LKM) Rootkits". [11] And if the kernel is compiled without module-support, there is still direct access to /dev/kmem to install a kernel-patch…
On this road lies madness.
[1] | |
[2] | http://www.linuxbase.org//spec/gLSB/gLSB/specialsections.html |
[3] | |
[4] | |
[5] | |
[6] | |
[7] | |
[8] | |
[9] | |
[10] | |
[11] |