

Aditya K Sood, 8th of December, 2009
http://www.secniche.org
http://zeroknock.blogspot.com
This paper discusses the effective steps and techniques to follow while analyzing software defects at thread level. Primarily, the aim is to trace the culprit thread in a system which creates potential damage due to the inherent defect present in its software. The paper presents methodical steps that should be followed in order to perform a diversified analysis from reverse engineering and security perspectives. The software defects are the outcome of design level inefficiency that triggers down to the bottom level and has serious repercussions on the system stability and resource utilization. The problem can be traced and resolved by following hierarchical steps to design appropriate solutions by debugging bad threads in the processes.
Overview
Threads are considered as second level structures used in the process execution. As per semantics, threads run as dynamic entities under processes. Whenever a new process is created in a system a number of threads are initialized. To understand the real cause of infection in processes, one has to traverse along the working procedure of a thread. The primary level debugging is required in order to gain access to the information that a reverse engineer requires. There are certain factors which a reverse engineer has to keep in mind while debugging objects. This is imperative because a hierarchical model is required to perform an active analysis. The model is comprised of ingrained factors that must be taken into consideration prior to dissecting threads. The error propagates from the top to the bottom and its impact accentuates as it moves downwards. So considering the process dissection analysis it is crucial to find the malware thread which is causing problems.
The steps and techniques presented in this paper enumerate certain benchmarks upon which the analysis should be performed. The respective factors are stated below. These are the crucial techniques that must be taken into account while debugging programs with software defects. The steps are applied appropriately to the threads that primarily consist of information as mentioned below:
A thread shares some data with its peer threads (all the other threads in this particular process). The data that it shares are:
Thread Maturity Check
Matured threads are considered as process specific threads that are executed during the process run in a system. It implies the numbers of threads that are executed completely without any obstruction in the context of running process and independent of its execution state. These threads are called directly or cross referenced. Every single user level thread adheres to the Thread Environment Block address.
The threads are considered matured because when the debugging of a process is initiated these threads provide the complete trace of the process function. It completely outputs the working state of the process with respect to import and export functions that are structured for a particular task and specific thread. The immature threads are termed as such because they may face some kind of unexpected blocking during their execution. This can be a result of some deadlocks. A thread that is prevented from execution is said to be blocked. A thread may be blocked because
There can be a question related to the completion of thread execution once the threads are notified or resumed. The reason is based on a thin red line here. Debugging is a dynamic activity that varies with the passage of time and the analysis is tuned at the time when a specific snapshot of debugged process is examined.
So it’s better to perform the check on the thread maturity in a process to perform effective analysis.
0 Id: 3f4.448 Suspend: 1 Teb: 7ffdf000 Unfrozen Start: wscntfy+0x27f2 (010027f2) Priority: 0 Priority class: 32 Affinity: 1 . 1 Id: 3f4.8ec Suspend: 1 Teb: 7ffde000 Frozen Start: ntdll!DbgUiRemoteBreakin (7c95077b) Priority: 0 Priority class: 32 Affinity: 1
These above mentioned threads present the different states of execution.
The generic commands that can be used to play around are mentioned below:
It works out for both invasive and non-invasive debugging layouts.
Exceptions: Active Thread Analysis
This is proven sequentially that a unique thread remains active whenever a process is debugged. This is because the state of the thread has to be retained at the time when the process is in a debug mode. In any process being debugged one of the specific threads is always in active state. A skillful analysis of the active thread is required to understand the context in which it is set as active. This is because active thread causes an exception. When that exception is encountered by the debugger, the process state is dissected stringently. The exception results in holding the state of various threads. It is always advisable to look into an Exceptional Thread. It helps in understanding the flow of modular calls during the occurrence of that exception.
Example:
FAULTING_IP: KERNEL32!SetErrorMode+14b 77e6c427 8a08 mov cl,byte ptr [eax] EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff) ExceptionAddress:77e6c427 (KERNEL32!SetErrorMode+0x0000014b) ExceptionCode: c0000005 (Access violation) ExceptionFlags: 00000000 NumberParameters: 2 Parameter[0]: 00000000 Parameter[1]: 087deadc Attempt to read from address 087deadc
One can use .exr debug command to display the content of the exception record.
Thread State Check
TEB structure provides information regarding Thread Environment. During debugging, the threads are suspended between states that define the execution part of that thread with respect to the process. Most of the threads oscillate between Frozen and Unfrozen state. The names clearly depict the state of threads as functional or stagnated in a clear context at the time the debugging is initiated. The thread checking is also critical from reverse engineer’s point of view. Sometimes a culprit thread causes memory leakage thereby adversely impacting the system’s functionality. It’s better to analyze only unfrozen threads to trace the malware thread. There is no need to race along every single thread for analysis. An effective reverse engineer continues to check the threads on the basis of their behavior. The debugger is well equipped with this type of functionality.
0:001> ~1f 0:001> ~ 0 Id: 3f4.448 Suspend: 1 Teb: 7ffdf000 Unfrozen . 1 Id: 3f4.8ec Suspend: 1 Teb: 7ffde000 Frozen 0:001> ~1u 0:001> ~ 0 Id: 3f4.448 Suspend: 1 Teb: 7ffdf000 Unfrozen . 1 Id: 3f4.8ec Suspend: 1 Teb: 7ffde000 Frozen 0:001> ~1u 0:001> ~ 0 Id: 3f4.448 Suspend: 1 Teb: 7ffdf000 Unfrozen . 1 Id: 3f4.8ec Suspend: 1 Teb: 7ffde000 Frozen 0:001> | . 0 id: 3f4 attach name: C:\WINDOWS\system32\wscntfy.exe
The threads state can be varied directly or indirectly.
Thread Entry Checks
The entry point addresses hold importance in their own context. These addresses define the curvature of the thread that begins to play a part in the active process. Suppose there are N threads running with different modular callings. Every single thread has an entry point address. With N number of threads one can possibly find N different entry point addresses. As we know the functions can be imported or exported. Both ways an entry point address is culminated. It’s very critical to understand the thread Entry Address specification. This is because it provides information regarding thread entry in the process Thread Pool.
Example:
HANDLE CreateThread( LPSECURITY_ATTRIBUTES lpThreadAttributes, SIZE_T dwStackSize, LPTHREAD_START_ROUTINE lpStartAddress, LPVOID lpParameter,DWORD dwCreationFlags, LPDWORD lpThreadId ); HANDLE CreateRemoteThread( HANDLE hProcess, LPSECURITY_ATTRIBUTES lpThreadAttributes, SIZE_T dwStackSize, LPTHREAD_START_ROUTINE lpStartAddress, LPVOID lpParameter, DWORD dwCreationFlags, LPDWORD lpThreadId);
The threads are generated by calling the above stated API's. But from the debugging point of view, knowing the entry point of thread is really crucial. Once the thread is created, it means the thread is ready to enter into the process space. Whether it is functional in the virtual memory of that process or not is not material, but more important is to gain acquaintance with the address of the entry. This is so because the working state of a process is going to be changed after the entry of a new thread. So to understand the direction of flow of process the entry checks are necessary to perform.
Commanding Thread Control
Commanding a specific thread is an art of a reverse engineer. It refers to the modus operandi by which a thread is cross dissected by running a different number of commands at a point of time. The structural components should be analyzed step by step in a detailed manner in order to build stronghold over the system. Consequently, multiple commands that can be run in the context of a specific thread come out to be an effective operation from debugging perspective. It not only controls the time limits but also makes it flexible. Let’s see:
0:001> ~1e ; p ; | ; kd ; k . 0 id: 3f4 attach name: C:\WINDOWS\system32\wscntfy.exe 007cffc8 f7d71cec 007cffcc 7c9507a8 ntdll!DbgUiRemoteBreakin+0x2d 007cffd0 00000005 007cffd4 00000004 007cffd8 00000001 007cffdc 007cffd0 007cffe0 00000000 007cffe4 ffffffff 007cffe8 7c90ee18 ntdll!_except_handler3 007cffec 7c9507c8 ntdll!`string'+0x7c 007cfff0 00000000 007cfff4 00000000 007cfff8 00000000 007cfffc 00000000 ChildEBP RetAddr 007cffc8 7c9507a8 ntdll!DbgBreakPoint+0x1 007cfff4 00000000 ntdll!DbgUiRemoteBreakin+0x2d 0:001> ~1e ; | ; kb ; .formats 7c9507a8 . 0 id: 3f4 attach name: C:\WINDOWS\system32\wscntfy.exe ChildEBP RetAddr Args to Child 007cffc8 7c9507a8 00000005 00000004 00000001 ntdll!DbgBreakPoint+0x1 007cfff4 00000000 00000000 00000000 00000000 ntdll!DbgUiRemoteBreakin+0x2d Evaluate expression: Hex: 7c9507a8 Decimal: 2090141608 Octal: 17445203650 Binary: 01111100 10010101 00000111 10101000 Chars: |... Time: Wed Mar 26 03:53:28 203 Float: low 6.19046e+036 high 0 Double: 1.03267e-314
Analyzing the (LEC) Last Error Check in Threads
Looking at generated errors within a specific thread is an expedient technique to follow. There is always a one active thread with (.) parameter when a debugger shows a list of threads. The call to GetLastError with a bang command (!gle) reflects any inherited error in the thread. This is done to check which function quits abnormally in the thread. If the functions returned successfully in a thread, the possibility of having an error is minute and the state of the thread will be normal. Let’s see:
0:009> ~.
. 9 Id: 4e4.bc Suspend: 1 Teb: 7ffdd000 Unfrozen
Priority: 0 Priority class: 32 Affinity: 1
0:009> !gle
LastErrorValue: (Win32) 0 (0) - The operation completed successfully.
LastStatusValue: (NTSTATUS) 0 - STATUS_WAIT_0
So no last error is presented in thread number 10. The number shown in the response is 9 but considering Zero Based Indexing the actual thread number is 10.
Unwinding Stacks at Thread Level
The stack unwinding is undertaken as realm of dissecting the stack based on various conditional facts in a program. The point of talk is whether it is prudent to unwind stack based on certain parameters while circumventing others. Debugging a heavy code is always a hard nut to crack if the cross structural references in modules are high. The concept rotates around the structural interdependences. Debugging with adequate symbols somewhat inhibits the debugger’s efficiency at work. It means the program can be disseminated according to the standard symbols loaded in the debugger. The stack is always processed for a single thread. What if threads are present in large numbers? Every thread has one stack. Sometimes a duplicate entry can be found in the system while debugging. The breakpoints whether conditional (bp) or direct running (g), intrinsically depends on the Return Addresses. Even the manipulated threads possess a valid return address but execution depends on whether it is successfully returned or not. It has been noticed that while applying a breakpoint a hang condition occurs. It occurs mainly if a working thread is entangled somewhere or the function has not returned. Backtracing (k) stack is a good practice.
The processing and debugging depends on time, complexity and resource utilization. It becomes stringent if an ambiguity occurs in any of the three parameters defined above. So in order to trap a bug in the requisite time period, the unwinding of stack with conditional entities has to be devised while debugging.
Let’s say:
Tn is a set of thread numbers
Tn = {t1, t2, t3, t4, …, tn}
Xn is a set of complexity numbers
Xn = {x1, x2, x3, x4, …, xn}
A simple one to one mapping is
Xn{x1, x2, x3, …, xn} ---------> Tn{t1, t2, t3, …, tn}
Condition: Interdependency i.e. mapping one to many / many to one.
This raises the complexity of the interdependency issue to its zenith. So a reverse engineer should try at utmost to circumvent this issue and focus on traversing the code irrespective of the ingrained complexity. Looking at certain specific parts of code rather than from every perspective strengthens the analysis. It is even advised to leverage as much as the information from the code in order to gain interim understanding of encountered process functions. Let’s look at a general example to traverse through the stack in an efficient manner. This makes the Register Checks easy for a given thread at a given point of time.
I simply attached a debugger to Google talk process. I found there are 7 threads running in it:
0:006> ~ 0 Id: 7d4.174 Suspend: 1 Teb: 7ffdd000 Unfrozen 1 Id: 7d4.7a4 Suspend: 1 Teb: 7ffdc000 Unfrozen 2 Id: 7d4.1b8 Suspend: 1 Teb: 7ffda000 Unfrozen 3 Id: 7d4.2d8 Suspend: 1 Teb: 7ffd8000 Unfrozen 4 Id: 7d4.7fc Suspend: 1 Teb: 7ffd7000 Unfrozen 5 Id: 7d4.2bc Suspend: 1 Teb: 7ffd6000 Unfrozen 6 Id: 7d4.2dc Suspend: 1 Teb: 7ffdb000 Unfrozen
Let’s run [~*kv] command to trace some unfiltered and complex output (we didn’t load symbols to simulate their unavailability):
0:006> ~*kv 0 Id: 7d4.174 Suspend: 1 Teb: 7ffdd000 Unfrozen ChildEBP RetAddr Args to Child 0012e838 7e4191be 7e4191f1 0012fb20 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0]) *** WARNING: Unable to verify checksum for C:\Program Files\Google\Google Talk\googletalk.exe *** ERROR: Module load completed but symbols could not be loaded for C:\Program Files\Google\Google Talk\googletalk.exe 0012e858 0040271e 0012fb20 00000000 00000000 USER32!NtUserGetMessage+0xc WARNING: Stack unwind information not available. Following frames may be wrong. 00000000 00000000 00000000 00000000 00000000 googletalk+0x271e 1 Id: 7d4.7a4 Suspend: 1 Teb: 7ffdc000 Unfrozen ChildEBP RetAddr Args to Child 0150fe18 7c90e399 77e76703 0000018c 0150ff70 ntdll!KiFastSystemCallRet (FPO: [0,0,0]) *** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\WINDOWS\system32\RPCRT4.dll 0150fe1c 77e76703 0000018c 0150ff70 00000000 ntdll!NtReplyWaitReceivePortEx+0xc (FPO: [5,0,0]) WARNING: Stack unwind information not available. Following frames may be wrong. 0150ff80 77e76c22 0150ffa8 77e76a3b 00153118 RPCRT4!I_RpcBCacheFree+0xcb 0150ff88 77e76a3b 00153118 0012e6b0 0012e22c RPCRT4!I_RpcBCacheFree+0x5ea 0150ffa8 77e76c0a 00156b40 0150ffec 7c80b683 RPCRT4!I_RpcBCacheFree+0x403 0150ffb4 7c80b683 0016f370 0012e6b0 0012e22c RPCRT4!I_RpcBCacheFree+0x5d2 0150ffec 00000000 77e76bf0 0016f370 00000000 kernel32!BaseThreadStart+0x37 (FPO: [Non-Fpo])
Again the output is too messy to get into it. To reduce complexity and unwind the stack in the best possible way, a direct call is made [!uniqstack] to filter it for better layout:
0:006> !uniqstack Processing 7 threads please wait . 0 Id: 7d4.174 Suspend: 1 Teb: 7ffdd000 Unfrozen Start: googletalk+0x15719a (0055719a) Priority: 0 Priority class: 32 Affinity: 1 ChildEBP RetAddr 0012e838 7e4191be ntdll!KiFastSystemCallRet 0012e858 0040271e USER32!NtUserGetMessage+0xc WARNING: Stack unwind information not available. Following frames may be wrong. 00000000 00000000 googletalk+0x271e . 1 Id: 7d4.7a4 Suspend: 1 Teb: 7ffdc000 Unfrozen Start: kernel32!BaseThreadStartThunk (7c810659) Priority: 0 Priority class: 32 Affinity: 1 ChildEBP RetAddr 0150fe18 7c90e399 ntdll!KiFastSystemCallRet 0150fe1c 77e76703 ntdll!NtReplyWaitReceivePortEx+0xc WARNING: Stack unwind information not available. Following frames may be wrong. 0150ff80 77e76c22 RPCRT4!I_RpcBCacheFree+0xcb 0150ff88 77e76a3b RPCRT4!I_RpcBCacheFree+0x5ea 0150ffa8 77e76c0a RPCRT4!I_RpcBCacheFree+0x403 0150ffb4 7c80b683 RPCRT4!I_RpcBCacheFree+0x5d2 0150ffec 00000000 kernel32!BaseThreadStart+0x37
So the structured output is very easy to traverse and module names with desired return addresses can be extracted easily. So a conditional and filtered output is sought by not looking at the duplicate entries. Now, one can further easily use set thread command [~s] to do register checks.
Conclusion
The aim of this paper is to enumerate the detailed methodology for analyzing all sorts of stringent behavior by the threads. The holistic approach is always useful in understanding the overall picture of thread analysis and the direction to be followed for analyzing threads that cause exceptions and other flaws. It’s always a good approach for functional and real time analysis of threads.
About the Author
Aditya K Sood is a Independent Security Consultant and Founder of SecNiche Security. He has already worked in the security domain for COSEINC and KPMG. He has been an active speaker at conferences like RSA (US 2010), TRISC, EuSecwest, XCON, Troopers, OWASP AppSec, FOSS, CERT-IN etc. He has written content for HITB Ezine, Hakin9, Usenix Login, Elsevier Journals, Debugged! MZ/PE.