

Aditya K Sood, 19th of August, 2009
http://www.secniche.org
http://zeroknock.blogspot.com
This paper sheds light on the prerequisites for performing efficient user mode heap analysis. The paper derives the internal concepts to analyze user mode heaps in an appropriate manner irrespective of any component dependencies. For performing inline user mode heap analysis, a detailed subset of component based knowledge related to system functionality is required. Dumps provide an ample amount of information of a system state at the time of crash. The prime part is to scrutinize and dissect the memory structures in order to exhibit the behavior of the software when it crashes. Efficient analysis of heap dumps provides information about the real cause of a crash. If the crash is due to incessant vulnerability in the software or system, the analysis proves beneficial in determining the exploitability state of the bug persisting inside.
Acknowledgements: I would like to thank Mr. Dmitry Vostokov for sharing his different patterns and auspicious comments for the completion of this paper.
Nature of Process and Heap
Whenever a process is created or loaded into main memory, a heap is allocated to it. A process can have one or more than one memory heap. There are number of call stacks found in the process. For every single call stack, a heap memory is allocated to it. The major point of analysis in dumping the user mode heaps is to find the allocation pattern used for memory i.e. to analyze the real memory statistics. In this technique, the stress is not on generating the stack traces for various functional calls but to find the allocated memory structures. Before making another statement, let’s analyze the architecture that provides the memory dissemination between kernel and user mode.
It provides an overview of memory objects and the implementation peripherals. Let’s have a look at the generic points as mentioned below:
1. This architecture is system specific whether 32 bit or 64 bit operating system is used. The address space varies according to the operating system version. The 32 bit systems have 4GB of virtual address space where as 64 bit system can have up to 16TB of virtual address space. Mostly, the address space pattern is considered as an architectural constraint.
2. The virtual memory address space is divided equally between user mode and kernel mode by default. This can be altered by administrators based on the application requirements intrinsically. For Example, by default 32 bit systems have 2GB of address space for both user and kernel mode. But it can be made to 3GB to user mode and 1 GB to kernel mode. If you analyze carefully, then user mode is subjected to run a number of applications and the virtual address space varies with the size of an application.
3. In order to reduce memory constraints, this step of increasing address space for user mode is preferred. Generally, the system code has standard base addresses defined for a number of system functions. These internal functions are located throughout at the same base address. Due to this reason the alterations are undertaken. If you remember, then ASLR is introduced for making addresses randomized even for user mode. Of course, it reduces the extent of exploitation of internal structures. One factor leads to another. Virtual memory is a kind of flexible memory allocated to applications for robust running.
4. The logical addresses are mapped to real addresses by hardware of the system with respect to operating software. As soon as the application is loaded into memory, the logical address space is divided into fixed sized chunks called as Pages. Actually, the virtual memory is considered as secondary memory and the primary memory is hardware specific.
5. A continuous mapping action is performed between primary to secondary memory for running application. The whole process is dynamic. When an application calls certain addresses from virtual memory, the specific pages are loaded into the main memory where the pages that are not referenced remain there in the secondary memory. The active pages from the virtual memory used by an application are called as Working Sets. The execution of any application running as a process depends on these working sets. Performance of heavy applications depends on the hardware too. A heavy memory operation affects the I/O mechanism of the system in a stringent manner. The real memory defines the number of bits that are associated with a memory address.

The above points provide a structural component dependency and working behavior with respect to operating system.
Global Fags – System Wide Dump Image Settings
System wide debugging, tracing and dump analysis depend on the configuration of global flags. These are the standard flags that define the behavior of operating system related to crashes or error generation. There are a number of processes that are activated simultaneously. In a complex environment, the system software is prone to hard crashes. In order to circumvent this situation or to handle the rogue conditions successfully, the system should be configured with Instant Debugging Checks. The setting of global flags results in effective debugging and dump analysis.
The global flags are set on different images. Every single image has a different set of global flags. This is because different images correlate to different processes. So there must be a different procedure to handle the different images based on the sphere of functionality. This makes the debugging process differential because it’s easy to debug the unique image by setting global flags. The image will be dealt with the defined flags that are set in a global manner. Windows Debugger consists of a bang command as [!gflags]. This command lists the exported entries from system wide setting of global flags. Basically NtGlobalFlag structure is queried for overall information. Precisely, definitive steps to be followed for core analysis are:
1. The debugging checks are to be implemented with additional global flags. This is due to the fact that certain flags are not enabled by default. This creates intrinsic problem when debugging analysis has to be done. Usually, it has been noticed that analyzing kernel level dumps are hard. The real cause is not the operating system but the debugging parameters which are not configured appropriately. Due to this, the information in the dumps is not dissected efficiently which makes it hard for the reverse engineer. So, it is advisable to understand the purpose and the parameters required for it.
Let’s have a look at the snapshot for different Global flag settings:

2. Always specify the image for debugging process in a unique manner. The working curvature of the process should be undertaken prior to initiating debugging. This means the reverse engineer should analyze the working functionality of threads. It provides information related to the working modes i.e. whether the threads are spending time in user mode or kernel mode. Basically, try to find the interdependencies by performing cross functional analysis. This favors the process of setting global flags.
3. The registry settings play a critical role in robust debugging. The setting of global flags has a direct impact on the system registry. That’s why it is considered very critical to alter registry in this manner. A simple mistake or wrong configuration can lead to irrecoverable losses. This is because it affects the kernel state directly. That’s why one must have seen BSOD, HAL missing or corrupt etc messages when something bad happens at the kernel level. So the registry should be tempered carefully.
4. When the global flags are set for HEAP operations, then it is defined for User Mode. The same structural implementation in kernel mode is done with POOL operation. So, when the flags are configured for pool operations, it is implemented for kernel level operations.
GFLAGS are used to create User Mode Stack Trace Database. It is primarily related to set windows properties to capture the stack traces for analyzing different heaps.
Pseudo Registers
Pseudo as the name suggests is not exactly what it seems. A pseudo register is not taken as the hardware register but it works like that i.e. it holds the functionality of a hardware register. This register helps you to traverse the debugger for specific values.
Example:
@ERR is the defined pseudo register. This is placed in the watch window. Its very first value is 0 which actually sets the code for GetLastError () function. So, when an analyst traverses the debugged code and any fault occurs the value will change accordingly.
Let’s look into one example:
FILE hfile = OpenFile( LPCSTR lpFileName, LPOFSTRUCT lpReOpenBuff, UINT uStyle );
A code snippet is provided above. Like, if a debugging breakpoint is set and the code is executed, the pseudo register conditional value is checked by the debugger. If the specified value of @ERR matches with the execution flow, the breakpoint will execute. If we synthesize it properly, then we will get error number 2 response. It means the handle to the file failed as no file name is specified. This turns out to be useful in direct modular check of applied functions. The pseudo registers are reliable in checking conditional debugging as per the modular specifications. Generically, the pseudo registers are effective in scrutinizing the return value of conditional modules.
char szProcessName[MAX_PATH] = "unknown";
HANDLE hProcess =
OpenProcess( PROCESS_QUERY_INFORMATION |
PROCESS_VM_READ, FALSE,
processID );
if ( NULL != hProcess )
{
HMODULE hMod;
DWORD cbNeeded;
if ( EnumProcessModules( hProcess, &hMod,
sizeof(hMod),&cbNeeded) )
{
GetModuleBaseName( hProcess, hMod,
szProcessName,sizeof(szProcessName) );
}
else return;
}
else return;
printf( "%s (Process ID: %u)\n", szProcessName, processID );
CloseHandle( hProcess );
Now, we select a breakpoint and set the value of @ERR register to 2 i.e. @ERR==2 or any other GetLastError () value. When the debugger is triggered the condition is checked against given @ERR pseudo register value. If the value specific error is matched, the debugger breaks the execution flow there by displaying the various register positions. If the @ERR value does not match, the debugger will not break the application even when any other error has occurred.
The list of other pseudo registers is mentioned below:
@TIB = Thread information block for the current thread; necessary because the debugger doesn't handle the "FS:0" format
@CLK = Undocumented clock register; usable only in the Watch window
@EAX, @EBX, @ECX, @EDX, @ESI, @EDI, @EIP, @ESP, @EBP, @EFL = Intel CPU registres
@CS, @DS, @ES, @SS, @FS, @GS = Intel CPU segment registers
@ST0, @ST1, @ST2, @ST3, @ST4, @ST5, @ST6, @ST7 = Intel CPU floating-point registers
All these registers play a crucial role in user modes dump analysis process.
Practical User Mode Heap Constructs
The above stated facts crystallize the important points taken into consideration while analyzing user mode heaps. The procedure can be implemented as:
The analyst captures the pointer related information by active debugging during the process for Heap analysis. There is no restriction on the number of heaps to be analyzed, rather all the heaps structured by the process will be scrutinized in a detailed manner.
The information mentioned below is the most critical in analyzing user mode heaps:
Before doing a generic analysis, some steps should be followed to optimize the analysis:
Tools in Practice
In order to implement this functional strategy, the preferred tools except from OllyDbg and IDA Pro are mentioned below which are designed specifically for heap analysis:
The above mentioned tools are used effectively to trace.
Conclusion
Analyzing user mode heaps in an effective manner requires a structural and hierarchical approach for looking at a specific set of information out of memory dumps. If the information is scrutinized by applying well defined methods, then the benchmarks result in effective outcomes. So, in order to critically examine the user mode heap dumps, the artifacts should be cleared and applied considering the dependencies of different components.
About the Author
Aditya K Sood is working as a Senior Security Researcher at Vulnerability Research Labs, COSEINC. He is also a founder of SecNiche Security, an independent security research arena for cutting edge research. He is having an experience of more than 6 years in the security world. He holds BE and MS in Cyber Law and Information Security. He is an active speaker at conferences like EuSecwest, XCON, Troopers, XKungfoo, OWASP, Clubhack, and CERT-IN. He has written for journals Hakin9, BCS, Usenix and Elsevier. His work has been quoted at eWeek, SCMagazine and ZDNet. He has given a number of advisories to forefront companies.
[8] http://support.microsoft.com/kb/268343
[9] http://perfinsp.sourceforge.net/hdump.html