How to run userland code from the kernel on Windows – Version 2.0

Helix

Introduction

2 years ago, Thierry F. wrote an article in this blog about a technique that could allow a driver to inject a DLL in a process (https://thisissecurity.net/2014/04/08/how-to-run-userland-code-from-the-kernel-on-windows/). This was based on the reverse engineering of the field PEB.KernelCallbackTable, which is untyped and completely undocumented.

You may have discovered, through the article mentioned above that, behind this opaque pointer, there is a big table of pointers related to User32.dll. This means that a process that does not load User32.dll will not have the field PEB.KernelCallbackTable initialized.

Furthermore, because this is completely undocumented by Microsoft, it’s also not supported at all. The pointer to User32!__ClientLoadLibrary has always been located in the PEB.KernelCallbackTable since Windows XP at least, but its index in there changes for each new version of Windows though.

What if I tell you that there is a way a bit more documented to do the same, that doesn’t rely on any reverse engineering or assembly code at all? And if I add the fact that this DLL is invisible for any API (GetModuleHandle for example) and even for the !peb or !dlls in WinDBG? Is that enough to tickle your curiosity?

Just a note before we start: all the code you’ll see here is not bullet proof, or production-ready. This is just a sample, a proof-of-concept just to illustrate this technique. I am aware that there are some loopholes that need to be fixed, but consider them as an exercise left to the reader. You can find all the code detailed in this article on this repository: https://github.com/stormshield/Beholder-Win32
Note that the code in this article will also be stripped of some details (like the WoW64 processes’ injection support) for readability purposes but the github repository will hold the full code.

Peeping through the keyhole

When a program loads a DLL, there is a cascade of several events happening to this DLL. We can basically list them below:

  • A handle is opened on this DLL
  • A section of the mapped size of the DLL is created
  • The DLL is mapped into this section.
  • The loader does some internal work, such as taking care of some alignments, sections’ rights, etc.
  • Eventually, the DLL main is called

Again, this is a very rough description; the goal here is not to dive into each detail of this process. Anyway, what prevents a driver from doing exactly this? Opening a handle can easily be done with ZwCreateFile, mapping the DLL with ZwCreateSection/ZwMapViewOfSection. That leaves 2 problematic steps: the job done by the the loader and the execution of a function in this DLL. Hopefully, we can ask the Windows kernel to do that for us.

Before diving into the kernel code, let’s talk about the final objective here: the DLL. This DLL must NOT have any dependency. The loader will not resolve them for us (at least with the technique described here) and it could be complicated (and even unsafe since we are talking about userland space) to do it manually. The loader will also not fill the IAT (Import Address Table) of the DLL, so it’s a nice plus if the DLL can run without any imports. Obviously, for that kind of situation, kernel32!LoadLibrary and kernel32!GetProcAddress are our best friends (well, you will need to find those manually though).

I also promised there would be no assembly, so it must be done fully in C. Now let’s take a look at the kernel code.

Dive into the rabbit hole

To inject a DLL in every process, we need to be in each process’ context. For that, nothing is better than a notification callback. Let’s shoot two birds with one stone here and setup the LoadImage notification callback. Why this one? Because you’re sure it is called every time a process is created, and, cherry on the pie, it gives you the addresses of some interesting DLLs, like kernel32.dll. I suggest you call the function that will inject the DLL in the current process when you’re notified of the kernel32 mapping in this callback.

In this function, we’ll retrieve a handle on the current process:

Status = ObOpenObjectByPointer(PsGetCurrentProcess(),
                               OBJ_KERNEL_HANDLE,
                               NULL,
                               STANDARD_RIGHTS_READ,
                               NULL,
                               KernelMode,
                               &ProcessHandle);
if (!NT_SUCCESS(Status))
    return Status;

Now, we’re going to create and map the section in the current process memory:

InitializeObjectAttributes(&ObjectAttributes, NULL, OBJ_KERNEL_HANDLE, NULL, NULL);
Status = ZwCreateSection(&DllSectionHandle,
                         SECTION_MAP_READ | SECTION_MAP_EXECUTE | SECTION_QUERY,
                         &ObjectAttributes,
                         NULL,
                         PAGE_EXECUTE_READ,
                         SEC_IMAGE,
                         gDllHandle);
if (!NT_SUCCESS(Status))
{
    // cleanup
    return Status;
}

Status = ZwMapViewOfSection(DllSectionHandle,
                            ProcessHandle,
                            &DllMappingAddress,
                            0,
                            0,
                            NULL,
                            &ViewSize,
                            ViewUnmap,
                            0,
                            PAGE_EXECUTE_READ);
if (!NT_SUCCESS(Status))
{
    // cleanup
    return Status;
}

We’re mapping the DLL with read and execution rights only. This will prevent any modification of the DLL in any way possible by the userland. We also set the SEC_IMAGE flag. This will tell the kernel loader to map the image as an executable one. This means that it will make all the required fixups and alignment. But it will not resolve the imports of this DLL!
Now that our DLL is mapped, we can easily call an exported function of our DLL. But first, we may need to give to this function some parameters in order to run properly. In order to do that, we will first allocate some userland memory that will hold those parameters:

InitializeObjectAttributes(&ObjectAttributes, NULL, OBJ_KERNEL_HANDLE, NULL, NULL);
MappingSize.QuadPart = PAGE_SIZE;
Status = ZwCreateSection(&InputSectionHandle,
                         SECTION_MAP_READ | SECTION_QUERY,
                         &ObjectAttributes,
                         &MappingSize,
                         PAGE_READONLY,
                         SEC_COMMIT | SEC_NO_CHANGE,
                         NULL);
if (!NT_SUCCESS(Status))
{
    // cleanup
    return Status;
}

InputMappingAddress = NULL;
ViewSize = PAGE_SIZE;
Status = ZwMapViewOfSection(InputSectionHandle,
                            ProcessHandle,
                            &InputMappingAddress,
                            0,
                            PAGE_SIZE,
                            0,
                            &ViewSize,
                            ViewUnmap,
                            0,
                            PAGE_READONLY);
if (!NT_SUCCESS(Status))
{
    //cleanup
    return Status;
}

This page will only have the read right. At least, that will prevent the userland to temper with your input parameters. To deny even further any modification, I set the SEC_NO_CHANGE flag. Now, even VirtualProtect or NtMapViewOfSection will fail if it targets our page. Note that you can also call MmSecureVirtualMemory on top of that. I suggest that, if you need to set some output parameter from your DLL, to allocate another page with the write right so the input parameters remain unmodified no matter what.

Okay, one last step, we need to setup the input parameters of our DLL. Just create a custom structure with the input parameters and fill it. Here, I just need some information about kernel32.dll. A MDL is a nice way to get around the PAGE_READONLY right set previously on the input parameters. This will allow us to map the same page in kernel address space with both read and write rights enabled.

ParamMDL = IoAllocateMdl(InputMappingAddress, PAGE_SIZE, FALSE, FALSE, NULL);
if (ParamMDL == NULL)
{
    // cleanup
    return STATUS_UNSUCCESSFUL;
}

__try
{
    MmProbeAndLockPages(ParamMDL, UserMode, IoReadAccess);
}
__except (EXCEPTION_EXECUTE_HANDLER)
{
    // cleanup
    return STATUS_UNSUCCESSFUL;
}
SystemAddress = MmGetSystemAddressForMdlSafe(ParamMDL, NormalPagePriority);
if (SystemAddress == NULL)
{
    // cleanup
    return STATUS_UNSUCCESSFUL;
}

RtlZeroMemory(SystemAddress, PAGE_SIZE);
DllParam = (PDLL_PARAMS)SystemAddress;

DllParam->Kernel32Address = Kernel32Address;
DllParam->Kernel32Size = Kernel32Size;

MmUnlockPages(ParamMDL);
IoFreeMdl(ParamMDL); 

Last piece of the puzzle, the execution of our DLL. In order to make this DLL as small as possible, I didn’t set any export, but only an entry point fixed at 0x1000. Just use RtlCreateUserThread and you’re good to go. I suggest you to retrieve the entry point’s offset dynamically by parsing the PE header though.

Status = RtlCreateUserThreadPtr(ProcessHandle,
                                NULL,
                                FALSE,
                                0,
                                0,
                                0,
                                (PUCHAR)DllMappingAddress + 0x1000,
                                InputMappingAddress,
                                &ThreadHandle,
                                &ClientID);
if (!NT_SUCCESS(Status))
{
    // cleanup
    return Status;
}

RtlCreateUserThread has one major drawback: it’s not available on Windows 7. It is exported in XP and since Windows 8 though. So, if you need to support Windows 7, you will have to find another solution. I may or may not have something up in my sleeve that I will reveal a bit later about this situation 😉

I’m the one who knocks

Now, a thread has been created, even before the main thread of the process has reached the entry point of the program. This thread will not be active right now, since you’re in the middle of some DLL initialization. So, I cannot wait for my new thread to finish since it will deadlock the current process (and the workstation in the end). And I need to wait until my thread is done in order to clean up the sections or read the output left by my userland code, etc. If only there was some kind of notification where I could register a custom callback that would be called whenever a thread has finished…


Oh wait!
Yup, you guessed it! PsSetCreateThreadNotifyRoutine will be our friend here. You just need to keep a context when you create the thread with RtlCreateUserThread, seek any terminating thread in your callback registered through PsSetCreateThreadNotifyRoutine that matches the CLIENT_ID received as output parameter of RtlCreateUserThread. You’ll end up with something like that:

PINJECT_CONTEXT SearchForContext(__in HANDLE ProcessID, __in HANDLE ThreadID)
{
    PLIST_ENTRY CurrentElement = NULL;
    PCTX_LIST   CurrentContext = NULL;

    ExAcquireResourceExclusiveLite(&ContextListLock, TRUE);

    for (CurrentElement = ContextHeadList.Flink;
         CurrentElement != &ContextHeadList;
         CurrentElement = CurrentElement->Flink)
    {
        CurrentContext = (PCTX_LIST)CONTAINING_RECORD(CurrentElement,
                                                      CTX_LIST,
                                                      ListEntry);
        if (CurrentContext == NULL || CurrentContext->InjectContext == NULL)
            continue;
        if (CurrentContext->InjectContext->ClientID.UniqueProcess == ProcessID &&
            CurrentContext->InjectContext->ClientID.UniqueThread == ThreadID)
            break;
        CurrentContext = NULL;
    }

    if (CurrentContext)
        RemoveEntryList(CurrentElement);

    ExReleaseResourceLite(&ContextListLock);

    if (CurrentContext)
        return CurrentContext->InjectContext;
    return NULL;
}

VOID ThreadNotification(__in HANDLE ProcessID,
                        __in HANDLE ThreadID,
                        __in BOOLEAN Create)
{
    PINJECT_CONTEXT InjectContext = NULL;

    if (Create == TRUE)
        return;
        
    InjectContext = SearchForContext(ProcessID, ThreadID);
    if (InjectContext == NULL)
        return;

    // cleanup

    ExFreePoolWithTag(InjectContext, 'ewom');
}

You will notice that the thread created by RtlCreateUserThread can finish even before the process has started! In some situations, this can be very useful. Note that you can also call PsGetThreadExitStatus to retrieve the exit code of your thread.

There is no spoon

Now, the final step: the DLL itself. It must be compiled in a certain way if you want it without any IAT.
First, remove all dependencies and default libraries.

nolibs

Then, disable some security checks that requires some CRT. This mean no /GS and no /RTC

nogs

Setup your entry point to whatever function you want

addmain

And voilà! You’re all set. You can obviously remove some options in order to make it smaller, but that’s up to you. It is possible to get a DLL smaller than 4KB while still able to retrieve LdrLoadDll or GetProcAddress by itself. Now your function designed as entry point in your DLL will have this prototype:

INT main(PDLL_PARAMS DllParams)
{
    if (DllParams == NULL ||
        DllParams->Kernel32Address == NULL ||
        DllParams->Kernel32Size == 0)
        return 0;

	// do whatever you want here

    return 1;
}

PDLL_PARAMS is a structure I defined earlier. It’s the 7th parameter of RtlCreateUserThread. Yup, you can share any kind of data blob between your driver and your DLL. Now, you’re free to parse kernel32 in order to search for LoadLibrary and GetProcAddress, all using plain C code. Even better, you don’t have to worry about things like relocations or relative addresses. For example, using strlen or strcmp like below on a constant string works like a charm, all thanks to the DLL format and loader fixing everything for us. You can even use globals without any issue.

if (strlen(CurrentFunctionName) == (sizeof("LdrLoadDll") - 1) &&
    !strcmp(CurrentFunctionName, "LdrLoadDll"))

And, last but not least, if you want to use functions like strcmp, strlen, etc. like the example above but don’t want any imports (which is recommended here), set the /Oi options in your DLL’s project. You can find the list of useable intrinsic functions here: https://msdn.microsoft.com/en-us/library/tzkfha43.aspx

addoi

The cake is a lie

I must confess, I lied a bit in the introduction. We are not really loading a DLL, but using the structure of a DLL to wrap up some code. This allows us to create a file-backed section and map it into the userland memory of the current process while using documented functions. Even the loader helps us (a little bit at least) in our journey!
For me, this technique is more reliable than the one using KeUserModeCallback(). It even works like a charm with 32bits or 64bits processes. You just need to recompile in the desired architecture. WoW64 processes can be injected too, but take care of your pointers size then when filling the input parameters since your driver will be 64bits and your DLL will be 32bits. The code above might need some adjustments here and there, but nothing dramatic. The only major drawback is that it won’t work on Windows 7 since RtlCreateUserThread is not exported. In the end, it’s not perfect but that’s pretty close. In fact, it may even be the best technique to inject code from a driver you’ve ever seen.

How to run userland code from the kernel on Windows

Introduction

Before Windows NT 4.0, the graphical part of the Windows subsystem was implemented completely in userland. Starting from NT 4.0 Microsoft decided to move a large part of the Window Manager and the Graphics Device Interface to kernel-mode in the Win32k.sys component. However, part of the implementation is still present in userland and the kernel component needs to call back user-mode code. To do so, Microsoft implemented a ‘reverse’ system call, allowing the kernel to call userland code. The whole process has already been discussed and explained in previous articles so we will not detail it again. Please refer to Tarjei Mandt white paper that contains a comprehensive description of the mechanism. In this post, we detail how Windows (from Windows XP to Windows 8) uses this mechanism to load modules in running processes. Understanding the mechanism may allow you to use it for your own purposes, in particular as a way to inject custom DLLs in processes while running code in the kernel portion of the Windows operating system. A recent article published by @zer0mem uses the ‘reverse’ system call mechanism to execute code in user-mode from kernel code. This post offers an alternative approach if you are free to drop a Windows binary file on the file system.


User-mode callbacks walkthrough

General mechanism

The function that enables to call user code from kernel is located inside the Windows kernel and is an exported function named KeUserModeCallback. The prototype of KeUserModeCallback is

The initialization takes place in the function user32!UserClientDllInitialize (the entry point of the user32 DLL) and basically makes the KernelCallbackTable field point to the non-exported user32!apfnDispatch symbol.

NTSTATUS KeUserModeCallback (
IN ULONG ApiNumber,
IN PVOID InputBuffer,
IN ULONG InputLength,
OUT PVOID *OutputBuffer,
OUT PULONG OutputLength
);

Since this function is not present in any WDK header, you have to retrieve it dynamically with the help of an MmGetSystemRoutineAddress call.

What this function basically does is copying these parameters onto the userland stack and returning back to userland code (in ntdll!KiUserCallbackDispatcher function).

Callable functions are identified by the ApiNumber parameter. This is a zero-based index in an array accessible through the KernelCallbackTable field of the Process Environment Block.

This field is initialized when the user32 module is loaded in the process (before initialization, the field is NULL). The initialization takes place in the function user32!UserClientDllInitialize (the entry point of the user32 DLL) and basically makes the KernelCallbackTable field point to the non-exported user32!apfnDispatch symbol.

kd> dt nt!_PEB @$peb
+0x000 InheritedAddressSpace : 0 ''
+0x001 ReadImageFileExecOptions : 0 ''
+0x002 BeingDebugged : 0 ''
+0x003 SpareBool : 0 ''
+0x004 Mutant : 0xffffffff Void
+0x008 ImageBaseAddress : 0x00400000 Void
+0x00c Ldr : 0x00251e90 _PEB_LDR_DATA
+0x010 ProcessParameters : 0x00020000 _RTL_USER_PROCESS_PARAMETERS
+0x014 SubSystemData : (null)
+0x018 ProcessHeap : 0x00150000 Void
+0x01c FastPebLock : 0x7c990620 _RTL_CRITICAL_SECTION
+0x020 FastPebLockRoutine : 0x7c911000 Void
+0x024 FastPebUnlockRoutine : 0x7c9110e0 Void
+0x028 EnvironmentUpdateCount : 1
+0x02c KernelCallbackTable : 0x7e392970 Void
+0x030 SystemReserved : [1] 0
+0x034 AtlThunkSListPtr32 : 0
...

This table contains function pointers to various userland callable functions, all of them located in the user32 module. The contents (thus the index of the functions) and the length of the table depend on the operating system version.

Here is an example (truncated) displaying a function table in the Windows XP SP3 32 bits process:

kd> dps 0x7e392970 L0n98
7e392970 7e3a7f3c USER32!__fnCOPYDATA
7e392974 7e3d87b3 USER32!__fnCOPYGLOBALDATA
...
7e392a38 7e3d8eb9 USER32!__ClientCopyDDEIn1
7e392a3c 7e3d8efb USER32!__ClientCopyDDEIn2
7e392a40 7e3d8f5e USER32!__ClientCopyDDEOut1
7e392a44 7e3d8f2d USER32!__ClientCopyDDEOut2
7e392a48 7e3aeb09 USER32!__ClientCopyImage
7e392a4c 7e3d8f92 USER32!__ClientEventCallback
7e392a50 7e3b19f6 USER32!__ClientFindMnemChar
7e392a54 7e3a28f3 USER32!__ClientFontSweep
7e392a58 7e3d8e4c USER32!__ClientFreeDDEHandle
7e392a5c 7e3a82ff USER32!__ClientFreeLibrary
7e392a60 7e39f4b2 USER32!__ClientGetCharsetInfo
7e392a64 7e3d8e83 USER32!__ClientGetDDEFlags
7e392a68 7e3d8fdc USER32!__ClientGetDDEHookData
7e392a6c 7e3cf9f5 USER32!__ClientGetListboxString
7e392a70 7e39ec46 USER32!__ClientGetMessageMPH
7e392a74 7e3a16eb USER32!__ClientLoadImage
7e392a78 7e3a8023 USER32!__ClientLoadLibrary
7e392a7c 7e3aec03 USER32!__ClientLoadMenu
7e392a80 7e39ee0d USER32!__ClientLoadLocalT1Fonts
7e392a84 7e3a09e4 USER32!__ClientLoadRemoteT1Fonts
7e392a88 7e3d907b USER32!__ClientPSMTextOut
7e392a8c 7e3d90d1 USER32!__ClientLpkDrawTextEx
7e392a90 7e3d9135 USER32!__ClientExtTextOutW
7e392a94 7e3d919a USER32!__ClientGetTextExtentPointW
7e392a98 7e3d9019 USER32!__ClientCharToWchar
7e392a9c 7e39ed14 USER32!__ClientAddFontResourceW
7e392aa0 7e39a13e USER32!__ClientThreadSetup
7e392aa4 7e3d9253 USER32!__ClientDeliverUserApc
7e392aa8 7e3d91f1 USER32!__ClientNoMemoryPopup
7e392aac 7e3aa740 USER32!__ClientMonitorEnumProc
7e392ab0 7e3d944a USER32!__ClientCallWinEventProc
7e392ab4 7e3d8e15 USER32!__ClientWaitMessageExMPH
7e392ab8 7e3acf8e USER32!__ClientWOWGetProcModule
7e392abc 7e3d948d USER32!__ClientWOWTask16SchedNotify
7e392ac0 7e3d9266 USER32!__ClientImmLoadLayout
7e392ac4 7e3d92c2 USER32!__ClientImmProcessKey
...
7e392af4 7e3d950c USER32!__fnOUTLPSCROLLBARINFO

Conditions for calling KeUserModeCallback

Before calling KeUserModeCallback, you must first check the KernelCallbackTable field of the Process Environment Block is not NULL (KeUserModeCallback will not do this for you). This field is at offset 0x2c on a 32-bit system and 0x58 on a 64-bit system (from Windows XP to Windows 8). Omitting to do so will eventually lead to a BSOD.

On Windows XP, the operating system does not place any condition on the state of the current thread for calling the KeUserModeCallback function, so it is safe calling the function whenever you want.

Starting from Windows Vista, things are different. Indeed, the KeUserModeCallback function checks for the presence of the CallOutActive flag in the Flags field of the current _KTHREAD structure (this field is set at least by the nt!KeExpandKernelStackAndCalloutEx function). If present, the operating system issues a bugcheck with a 0x107 undocumented code.

On Windows 8, the Microsoft developers added even more constraints to allow the call to succeed.

The first check performed by Windows 8 is ensuring the current thread runs at PASSIVE_LEVEL. If not, the operating system issues a bugcheck with code 0x4A (IRQL_GT_ZERO_AT_SYSTEM_SERVICE).

Then, the operating system checks if APCs are enabled. If not, the operating system issues a bugcheck with code 1 (APC_INDEX_MISMATCH).

Finally, the operating system checks the CallbackNestingLevel field of the current thread. If this value reaches 32, the function fails with a code equals to 0xC00000FD (STATUS_STACK_OVERFLOW). This field is set by KeUserModeCallback to record the number of nested calls to user-mode callbacks.


 User-mode callback for loading a library

Among the interesting functions, we can notice the user32!__ClientLoadLibrary function pointer.

This functionality is natively used by win32k.sys to inject the uxtheme.dll in running processes, allowing the operating system to apply visual styles to applications.

This operation is twofold. First, it effectively loads the module in the process memory, as if it were loaded by userland code. Then, a function called ThemeInitApiHook is invoked giving uxtheme.dll a chance to provide alternate implementations for various functions used by user32. We will not dive into the details of how this initialization function is called and what the patched functions are used for. We will just try to describe the parameters needed to load a module without calling any specific initialization function.

Function index

The first parameter requested is the ApiNumber. The value for the ‘load library’ feature depends on the operating system version.

Capture d’écran 2014-04-08 à 16.51.22
From now on, the index does not depend on the operating system flavor (32 bits or 64 bits).

Input buffer

The second and third parameters of the functions are the input buffer and its associated length, in bytes.

The input buffer for the ‘load library’ feature is described by the following structure:

typedef struct _USERHOOK
{
DWORD      dwBufferSize;
DWORD      dwAdditionalData;
DWORD      dwFixupsCount;
LPVOID     pbFree
DWORD      offCbkPtrs;
DWORD      bFixed;
UNICODE_STRING lpDLLPath;
union
{
DWORD      lpfnNotify
UNICODE_STRING lpInitFunctionName;
}
DWORD      offCbk[2];
} _USERHOOK_s;

This structure is a specialization of a more general mechanism that exists in the win32k.sys driver for user-mode callbacks: it is composed of a fixed header (from dwBufferSize to bFixed) and a variable-length data (starting from lpDLLPath).

dwBufferSize contains the length of the whole buffer, including the variable-length data.
dwAdditionalData contains the length of the variable-length data.
pbFree is a pointer to the end of the variable-length data.

We will not go into the implementation details of how this dynamic buffer is allocated and the previous fields are used by the Windows routines. We just have to mimic the way the buffer is filled in in order to call KeUserModeCallback.

Note: you can have a look at the win32k!AllocCallbackMessage and win32k!CaptureCallbackData functions called by win32k!ClientLoadLibrary if you want to understand how this structure is allocated and updated.

Relocatable buffer

The buffer supplied by the caller of KeUserModeCallback resides in kernel memory. The buffer must eventually resides in the user memory of the process (it is copied on the userland stack) in order to be handled by the userland functions.

In order to make the buffer location-independent, the Windows developers implemented a simple mechanism consisting of ‘fix-ups’. If the bFixed is FALSE, every pointer does not contain an address but an offset relative to the beginning of the structure.

Let’s take for example the buffer passed to the user32!__ClientLoadLibrary on a Windows XP 32 bits:

kd> db ef5a78e0 L68
ef5a78e0 68 00 00 00 40 00 00 00-01 00 00 00 48 79 5a ef h...@.......HyZ.
ef5a78f0 24 00 00 00 00 00 00 00-3e 00 40 00 28 00 00 00 $.......>.@.(...
ef5a7900 40 9e 00 00 1c 00 00 00-43 00 3a 00 5c 00 57 00 @.......C.:.\.W.
ef5a7910 49 00 4e 00 44 00 4f 00-57 00 53 00 5c 00 73 00 I.N.D.O.W.S.\.s.
ef5a7920 79 00 73 00 74 00 65 00-6d 00 33 00 32 00 5c 00 y.s.t.e.m.3.2.\.
ef5a7930 75 00 78 00 74 00 68 00-65 00 6d 00 65 00 2e 00 u.x.t.h.e.m.e...
ef5a7940 64 00 6c 00 6c 00 00 00 d.l.l...
dwBufferSize: 0x68
dwAdditionalData: 0x40
dwFixupCounts: 1
pbFree: 0xef5a7948
offCbkPtrs: 0x24 -> 0xef5a7904
bFixed : FALSE
lpDLLPath : (Length: 0x3e, MaximumLength: 0x40, Buffer: 0x28)

The buffer contains 1 fix-up (dwFixupsCount = 1). The array containing this fix-up is at offset 0x24 from the beginning of the structure (thus residing at address 0xef5a7904). The first and only element of this array is the offset of the value to fix: it is the UNICODE_STRING buffer (value 0x28 at offset 0x1c). After being fixed, the buffer points to the real memory address (0xef5a78e0+ 0x28 = 0xef5a7908).

This resolution is performed by the FixupCallbackPointers function of user32, after the buffer has been copied to the user land stack.

The code for this function looks like:

void FixupCallbackPointers(_USERHOOK_s *pData)
{
LPWORD offsetPointers;
DWORD fixup;
offsetPointers = (LPBYTE)pData + pData->offCbkPtrs;
for(fixup=0;fixup < pData->dwFixupsCount;fixup++)
{
pData[*offsetPointers] += (LPVOID)pData;
offsetPointers++;
}
}

Load library-specific parameters

The first parameter in the dynamic part of the input buffer given to KeUserModeCallback is the name of the module to load; it is specified in the lpDLLPath field of the structure. The module is eventually loaded by the call to the kernel32!LoadLibraryExW function.

The second parameter passed in the buffer describes a function to call once the library is loaded and depends on the operating system version.

On Windows XP, the field (lpfnNotify) is an offset relative to the loaded module of the function to call. Starting from Windows Vista, the field (lpInitFunctionName) is the name of the function to call; this function must be exported because it is retrieved with the help of GetProcAddress.

To skip the initialization function call, simply specify a 0 value for lpfnNotify on Windows XP or specify no relocation for the function name (dwFixupsCount = 1 and offCbk[1] = 0) starting from Windows Vista.

Output buffer

On output, the KeUserModeCallback fills the OutputBuffer and OutputLength parameters with the results of the call if it succeeds.

For the load library case, the contents of the whole output buffer has not been investigated. However, the beginning of the output buffer matches the structure:

typedef struct _LOAD_OUTPUT
{
LPVOID lpBaseAddress;

} _LOAD_OUTPUT_s;

The lpBaseAddress field contains the base address of the loaded module.


What about Wow64?

What we described so far is relevant for 32-bit processes on 32-bit operating systems and 64-bit processes on 64-bit systems. But what about 32-bit processes on 64-bit systems?

The good news is that it works equally from the kernel point-of-view, so what we explained is still relevant.

In a Wow64 process, the first change is that the KernelCallbackTable field of the Process Environment Block now points to wow64win module functions:

kd> dps 0x00000000`73e51510 L0n105
00000000`73e51510 00000000`73e82894 wow64win!whcbfnCOPYDATA
00000000`73e51518 00000000`73e82a28 wow64win!whcbfnCOPYGLOBALDATA
...
00000000`73e516a0 00000000`73e87dc8 wow64win!whcbClientCopyDDEIn1
00000000`73e516a8 00000000`73e87f78 wow64win!whcbClientCopyDDEIn2
00000000`73e516b0 00000000`73e880b8 wow64win!whcbClientCopyDDEOut1
00000000`73e516b8 00000000`73e88280 wow64win!whcbClientCopyDDEOut2
00000000`73e516c0 00000000`73e883c0 wow64win!whcbClientCopyImage
00000000`73e516c8 00000000`73e884e8 wow64win!whcbClientEventCallback
00000000`73e516d0 00000000`73e8862c wow64win!whcbClientFindMnemChar
00000000`73e516d8 00000000`73e8878c wow64win!whcbClientFreeDDEHandle
00000000`73e516e0 00000000`73e888a4 wow64win!whcbClientFreeLibrary
00000000`73e516e8 00000000`73e889b4 wow64win!whcbClientGetCharsetInfo
00000000`73e516f0 00000000`73e88aec wow64win!whcbClientGetDDEFlags
00000000`73e516f8 00000000`73e88c04 wow64win!whcbClientGetDDEHookData
00000000`73e51700 00000000`73e88d6c wow64win!whcbClientGetListboxString
00000000`73e51708 00000000`73e88f14 wow64win!whcbClientGetMessageMPH
00000000`73e51710 00000000`73e89088 wow64win!whcbClientLoadImage
00000000`73e51718 00000000`73e8920c wow64win!whcbClientLoadLibrary
00000000`73e51720 00000000`73e89370 wow64win!whcbClientLoadMenu
00000000`73e51728 00000000`73e894c4 wow64win!whcbClientLoadLocalT1Fonts
00000000`73e51730 00000000`73e895ac wow64win!whcbClientPSMTextOut
00000000`73e51738 00000000`73e89718 wow64win!whcbClientLpkDrawTextEx
...
00000000`73e51850 00000000`73e8c6a8 wow64win!whcbfnINPGESTURENOTIFYSTRUCT

What these functions do is performing an extra-marshalling between 64 and 32-bit structures.

Regarding the load library functionality, the wow64win!whcbClientLoadLibrary function first calls wow64win!FixupCaptureBuf64 which resolves the relative offsets.

The original buffer contains the raw data as received by the kernel:

0:000> db 00000000006fdde8 L00000000000000c8
00000000`006fdde8 c8 00 00 00 70 00 00 00-02 00 00 00 00 00 00 00 ....p...........
00000000`006fddf8 f0 d0 e1 02 80 f8 ff ff-48 00 00 00 00 00 00 00 ........H.......
00000000`006fde08 00 00 00 00 00 00 00 00-3e 00 40 00 00 00 00 00 ........>.@.....
00000000`006fde18 58 00 00 00 00 00 00 00-20 00 22 00 00 00 00 00 X....... .".....
00000000`006fde28 98 00 00 00 00 00 00 00-30 00 00 00 40 00 00 00 ........0...@...
00000000`006fde38 00 00 00 00 00 00 00 00-43 00 3a 00 5c 00 57 00 ........C.:.\.W.
00000000`006fde48 69 00 6e 00 64 00 6f 00-77 00 73 00 5c 00 73 00 i.n.d.o.w.s.\.s.
00000000`006fde58 79 00 73 00 74 00 65 00-6d 00 33 00 32 00 5c 00 y.s.t.e.m.3.2.\.
00000000`006fde68 75 00 78 00 74 00 68 00-65 00 6d 00 65 00 2e 00 u.x.t.h.e.m.e...
00000000`006fde78 64 00 6c 00 6c 00 00 00-54 00 68 00 65 00 6d 00 d.l.l...T.h.e.m.
00000000`006fde88 65 00 49 00 6e 00 69 00-74 00 41 00 70 00 69 00 e.I.n.i.t.A.p.i.
00000000`006fde98 48 00 6f 00 6f 00 6b 00-00 00 00 00 00 00 00 00 H.o.o.k.........
00000000`006fdea8 00 00 00 00 00 00 00 00 ........

In the original buffer, 2 fix-ups are declared. The pseudo-pointers are 64-bit long (in yellow and green in the previous image).

wow64win!FixupCaptureBuf64 replaces the relative offsets with absolute addresses. Since all relocations are performed, it sets the number of fix-ups to ‘0’.

0:000> db 00000000006fdde8 Lc8
00000000`006fdde8 c8 00 00 00 70 00 00 00-00 00 00 00 00 00 00 00 ....p...........
00000000`006fddf8 f0 d0 e1 02 80 f8 ff ff-48 00 00 00 00 00 00 00 ........H.......
00000000`006fde08 00 00 00 00 00 00 00 00-3e 00 40 00 00 00 00 00 ........>.@.....
00000000`006fde18 40 de 6f 00 00 00 00 00-20 00 22 00 00 00 00 00 @.o..... .".....
00000000`006fde28 80 de 6f 00 00 00 00 00-30 00 00 00 40 00 00 00 ..o.....0...@...
00000000`006fde38 00 00 00 00 00 00 00 00-43 00 3a 00 5c 00 57 00 ........C.:.\.W.
00000000`006fde48 69 00 6e 00 64 00 6f 00-77 00 73 00 5c 00 73 00 i.n.d.o.w.s.\.s.
00000000`006fde58 79 00 73 00 74 00 65 00-6d 00 33 00 32 00 5c 00 y.s.t.e.m.3.2.\.
00000000`006fde68 75 00 78 00 74 00 68 00-65 00 6d 00 65 00 2e 00 u.x.t.h.e.m.e...
00000000`006fde78 64 00 6c 00 6c 00 00 00-54 00 68 00 65 00 6d 00 d.l.l...T.h.e.m.
00000000`006fde88 65 00 49 00 6e 00 69 00-74 00 41 00 70 00 69 00 e.I.n.i.t.A.p.i.
00000000`006fde98 48 00 6f 00 6f 00 6b 00-00 00 00 00 00 00 00 00 H.o.o.k.........
00000000`006fdea8 00 00 00 00 00 00 00 00 ........

The second step builds another buffer containing only the static part of the structure with a layout matching the 32-bit code expectations.

0:000> db 6fdd20 L28
00000000`006fdd20 c8 00 00 00 70 00 00 00-00 00 00 00 f0 d0 e1 02 ....p...........
00000000`006fdd30 48 00 00 00 00 00 00 00-3e 00 40 00 40 de 6f 00 H.......>.@.@.o.
00000000`006fdd40 20 00 22 00 80 de 6f 00 ."...o.

The control is then passed to the user32!__ClientLoadLibrary function that performs the operations as if it were running on a 32-bit operating system.

Since the loading of the specified module is performed as if it were called by the userland process, the standard restrictions and behaviors are applicable. In particular, DLL redirection is in effect and loading of a DLL in c:\Windows\System32 will be automatically redirected to c:\Windows\SysWOW64.


Conclusions

It is possible to use the KeUserModeCallback function to load a custom library in processes, provided they use the user32 module. In practice, nearly all end-user applications use this module so it should not be a strong constraint. Since this function and the associated parameters are not documented, this functionality is subject to change in the future versions (even if it did not change so much during the 15 past years).

If you are interested in investigating other methods of executing user-mode code from the kernel, you can also have a look at the 6-part articles published by Nynaeve.