Anti-Intrusion Detection System (IDS) tactics were one of the original key features of my whisker web scanner. The goal of any anti-IDS tactic is to mutate a request so much that the ID systems will get confused, but the web server will still be able to understand it, hence the subtitle "just how bad can we ruin a good thing?" This paper is aimed at explaining the thought process and implementation behind various anti-IDS tactics whisker uses to avoid web scan detection. While I specifically have ID systems in mind, this also applies to monitors, sniffers, log parsers and anything else trying to interpret web traffic and/or requests. The methods, analysis and theories presented within this document can also be applied to other protocols and concepts--however, HTTP is my focus due to the implementation of whisker.
It is important to understand the components of an HTTP request. As defined by RFC 1945: [ GET1 /cgi-bin/some.cgi2 HTTP/1.03 ]4
Raw: Also referred to as 'packet grep' style ID systems, they typically just scan the unprocessed raw data for key strings. The benefit of this method is pure speed. I use the turn 'raw' not in a derogatory manner, but rather to identify that these ID systems usually deal with the raw data directly, rather than interpreting the protocols they are monitoring. In all honesty, I personally prefer this type of IDS approach. Example ID systems of this type would be Dragon and Snort. GET /phfiles/phonefiles.txt HTTP/1.0 So the ID vendors should be cautious when shortening a signature. They can assume the "/cgi-bin/" part, but this causes problems if the CGI is not located in /cgi-bin. If they leave this out, however, the signature becomes more prone to false alarms. So it's a balancing act. Many vendors have decided to keep the "/cgi-bin/" style notation*, with the notable exception of Snort. But either way, there are still problems. (*from what I can tell, anyway; many
ID systems are closed source)
On initial testing of whisker, many ID systems were failing due to the fact that they were assuming the requests to use the GET method--they were looking for the following style of signatures: GET /cgi-bin/some.cgi The trick here is that whisker didn't use the GET method; it used HEAD by default. Whisker was sending HEAD /cgi-bin/some.cgi ID systems were all missing the scans, flat out.
Accurate coding of the signature would not include the request method.
Granted, the attacker may have to use GET later to actually exploit the
CGI; however, it is often possible to still use HEAD and POST in the exploitation,
depending on how the CGI was coded. On some platforms, the method is even
ignored, so it becomes a mute point.
The classic trick with URL encoding is to encode
the URI with it's escaped equivalent. The HTTP protocol specifies that
arbitrary binary characters can be passed within the URI by using %xx notation,
where 'xx' is the hex value of the character. In theory, the raw ID systems
would fall prey to this, since the signature "cgi-bin" does not match the
string "%63%67%69%2d%62%69%6e". Also, in theory, the smart ID systems would
be able to plow past this, since they would decode the string similar to
a web server before actually checking for a signature. In reality, nowadays
all worthwhile ID systems decode encoded URIs, so this tactic is becoming
obsolete. This was implemented in whisker v1.0+ as the -I option, and as
the -I 1 option in v1.3.
In an effort to break up a string, the classic double
slash method replaced every single '/' with '//'. This resulted in checks
for "/cgi-bin/some.cgi" not matching "//cgi-bin//some.cgi". However, most
ID systems (smart and raw) are aware of this trick and all derivatives
of the trick using multiple (3+) slashes. Smart ID systems tend to correctly
interpret this (by logically combining all slashes into one); raw ID systems
vary by emulating smart ID systems (combining them), or just reporting
multiple slashes and moving along. This method is basically obsolete and
not implemented in whisker, in favor of self-referencing directories (see
below).
Another classic trick is to break apart a signature such as "/cgi-bin/some.cgi" by using reverse traversal directory tricks: GET /cgi-bin/blahblah/../some.cgi HTTP/1.0 which equates to "/cgi-bin/some.cgi" once the directory
traversal has been accounted for. However, like URI encoding, this trick
is old and well known. Most smart ID systems account for this (it's a core
feature of what makes them 'smart'), and raw ID systems usually alert the
fact that the request contains "/../". For all intents and purposes, this
tactic is becoming obsolete as well. It has not been implemented in whisker,
in favor of self-referencing directories.
A newer trick in the 'directory games' category is the self-referencing directory. While '..' means the parent directory, '.' means the current directory. So "c:\temp\.\.\.\.\.\" is equivalent to "c:\temp\" ("/tmp/./././././" being "/tmp/" for you Unix folk). In an effort to stop the raw ID systems from matching signatures like "/cgi-bin/phf", we can change the string to "/./cgi-bin/./phf". That means raw ID systems have three options:
The premature request ending tactic is specifically aimed at the smart ID systems. In an effort to save precious time and processing power (remember, the faster you scan packets, the more traffic you can view in real-time), smart ID systems may choose to implement an agreeable approach to detecting a scan: check only the request, and throw away extra client-submitted data. A typical request looks like: GET /some.file HTTP/1.0\r\n
There is no point in a smart IDS scanning the headers (although some do, which means they're using hybrid smart/raw tactics to balance speed with efficiency). The ID system can stop looking after the "HTTP/1.0\r\n". But they must be careful if they do. Imagine the following submission: GET /%20HTTP/1.0%0d%0aHeader:%20/../../cgi-bin/some.cgi HTTP/1.0\r\n\r\n This translates to: GET / HTTP/1.0\r\nHeader: /../../cgi-bin/some.cgi HTTP/1.0\r\n\r\n Or, if you will: GET / HTTP/1.0\r\n
Which is a valid request! Assuming the IDS will decode the encoding first, they will stop scanning at our fake 'premature' ending, rather than the real one. The proper approach is
Going further into the design of smart ID systems, you have the issues of parameters, which are submitted with dynamic content. Parameters to a page typically look like: somepage.php?name=rfp&prog=whisker&enemy=IDS¶m=... Obviously the data in the parameters need not be scanned (if you're only looking for particular file requests). Again, in an effort to save time and processing power, a smart IDS can stop processing once the '?' is reached, which indicates the rest of the data are parameters. Well, like the premature request end tactic, we can fake this anomaly as well: GET /index.htm%3fparam=/../cgi-bin/some.cgi HTTP/1.0 This translates to: GET /index.htm?param=/../cgi-bin/some.cgi HTTP/1.0 Again, this is a valid request. The proper method
of parsing is similar to the method I mentioned earlier--extract the portion
you wish to examine before decoding the encoded characters. This
tactic is implemented in whisker v1.3 as -I 5.
As I mentioned, a smart ID system could feasibly extract the URI of a request, possibly chop off the parameters, and then scan only within the leftover string. According to the HTTP RFC, a v1.0 request looks like: Method <space> URI <space> HTTP/ Version CRLF CRLF The key is that HTTP calls for spaces to separate the three components, and that the components appear in the specified order. This means it's easy to extract specific portions of the request--you merely need to use the spaces as separators, and adjust accordingly. Interestingly enough, Apache 1.3.6 and newer (and perhaps earlier versions; I have not traced the history of this 'feature') allow you to specify a slightly different syntax: Method <tab> URI <tab> HTTP/ Version CRLF CRLF This will ruin any processing dependant on the 'assumed' RFC format of a request. Even more specifically, there are ID systems that implement minimal signatures that depend on the trailing space for matching. For example, matching "/phf" could lead to many false positives, but "/phf " (notice the trailing space) helps assure that the final requested page is closer to the actual 'phf', and not just starting with the letters 'phf'. Also keep in mind HTTP v0.9 syntax, which is simply: GET <space> URI CRLF This means that ID systems depending on having three parameters may be confused by v0.9 requests; however, v0.9 only provides the GET method, and returns no headers--making automatic processing by CGI scanners much more difficult. Whisker v1.3 currently handles the tab separation
(-I 6). Whisker does not currently use any sort of v0.9 requesting
by default; however, you can code a script to implement this fairly easily.
An optimization of some raw ID systems is to only look within the first xx bytes of the request. Generally this works well, since the first line of the request needs to contain the URI. However, we can exploit this by submitting a request along the lines of: GET /rfprfp<lots of characters>rfprfp/../cgi-bin/some.cgi HTTP/1.0 The key is to include enough characters to move the
rest of the submitted request outside the scope of the ID systems' scan
limit. However, this tactic is very noisy in the web server logs,
especially when you are submitting 1-2K worth of random characters per
request. Whisker, by default, will submit 1-2K of random characters
when the -I 4 option is specified. The actual amount submitted is controlled
by the XXIDSMode4Limit variable.
Everyone has heard the story that Microsoft separates
directories using '\' simply because Unix uses '/'. However, if you notice
in the HTTP RFC, the syntax calls for '/'. That means Microsoft, with all
their ingenuity, lost the battle and must silently convert from '/' to
'\' internally in IIS (as well as all other DOS/Windows based web servers).
Interestingly enough, we can still use '\' in our requests, since they
are still valid as directory separators--this means on DOS/Windows platforms,
we could use requests such as "/cgi-bin\some.cgi", which will not match
a typical "/cgi-bin/some.cgi" signature. Note that the first character
of a URI must still be a '/', and not a '\'. This is tactic -I 8.
Many C string libraries use the NULL character to denote the end of the string. While I doubt most ID systems use these libraries (they are typically too slow for these high-speed applications), the reoccurrence of using NULLs to denote the end of strings is still quite common. We can use this to our advantage with the following type of request: GET%00 /cgi-bin/some.cgi HTTP/1.0 The theoretical flow of this tactic goes:
NOTE: Apache will not process any request that contains '%00' or '%2f'. However, this method has been found to work with IIS. All others are untested. Remember, the web server still has to see it as a valid request for it to be usable. Use the -I 0 option to invoke this tactic in whisker.
The DOS/Windows filesystem has a unique characteristic
that Unix doesn't: filenames are case insensitive. This means requests
for "index.htm", "INDEX.HTM" and "Index.Htm" are all the same. In our case,
the signature "/cgi-bin/some.cgi" does not literally match "/CGI-BIN/SOME.CGI".
In an optimal environment we should mix the case randomly throughout; however,
whisker v1.3 implements this by only capitalizing all characters when the
-I 7 option is used.
Session splicing is the only network-level anti-ID system tactic in whisker at the moment. Many raw ID systems, as well as some smart ones, only scan for a particular signature within the current packet--signatures are not split up and checked across multiple packets. Whisker exploits this by sending parts of the request in different packets. Note that this is not fragmentation; it is just multiple packets for the data. For example, the request "GET / HTTP/1.0" may be split across multiple packets to be "GE", "T ", "/", " H", "T", "TP", "/1", ".0". The current implementation in whisker (invoked with -I 9) will result in 1-3 characters in each packet, depending on your system and network speed. The proper defense to this tactic is session reassembly;
however, to reassemble a session, you must understand the protocol and
it's definition of a 'session'. Therefore, by implementing session reassembly,
you have incurred a large overhead in interpreting the protocol.
That basically completes the overview of anti-IDS tactics used in whisker. Starting in version 1.3 you can use the -I command to invoke the anti-IDS features. Multiple tactics can be used together by specifying multiple types, such as: whisker.pl -h www.server.com -I 124 This will invoke tactics 1, 2 and 4 to be used in
conjunction with each other. Note that particular combinations may not
work well together and have not been tested--use at your best judgement.
Whisker is available for download from www.wiretrip.net/rfp/ Current version is 1.3 (12/24/99). |