Holistic Approach to Analysis of Defective Threads

Aditya K Sood, 8th of December, 2009
http://www.secniche.org
http://zeroknock.blogspot.com

This paper discusses the effective steps and tech­niques to follow while analyzing software defects at thread level. Primarily, the aim is to trace the culprit thread in a system which creates poten­tial damage due to the inherent defect present in its soft­ware. The pa­per presents methodical steps that should be followed in order to perform a diversified analysis from reverse engi­neering and security perspectives. The soft­ware defects are the outcome of design level inefficiency that triggers down to the bottom level and has serious repercussions on the system stability and resource utiliza­tion. The problem can be traced and resolved by following hierarchical steps to design appropriate solutions by de­bugging bad threads in the processes.

Overview

Threads are considered as second level structures used in the process execution. As per semantics, threads run as dynamic entities under processes. Whenever a new process is created in a system a number of threads are initialized. To understand the real cause of infection in processes, one has to traverse along the working proce­dure of a thread. The primary level debugging is required in order to gain access to the information that a reverse engineer re­quires. There are certain factors which a re­verse engineer has to keep in mind while debugging ob­jects. This is im­perative because a hierarchical model is required to per­form an active analysis. The model is com­prised of in­grained factors that must be taken into consid­eration prior to dissecting threads. The error propagates from the top to the bottom and its impact accentuates as it moves downwards. So considering the process dissec­tion analy­sis it is crucial to find the malware thread which is causing problems.

The steps and techniques presented in this paper enume­rate certain benchmarks upon which the analysis should be performed. The respective factors are stated below. These are the crucial techniques that must be taken into account while debugging programs with software defects. The steps are applied appropriately to the threads that primarily consist of information as mentioned below:

  • Thread ID
  • Program Counter
  • Register Set
  • Stack

A thread shares some data with its peer threads (all the other threads in this particular process). The data that it shares are:

  • Code Section (dynamic code that is executed)
  • Data Section (storage of variables and definitions)
  • Any operating system resources available to the process

Thread Maturity Check

Matured threads are considered as process specific threads that are executed during the process run in a system. It implies the numbers of threads that are executed com­pletely without any obstruction in the context of running process and independent of its execution state. These threads are called directly or cross referenced. Every single user level thread adheres to the Thread Environment Block address.

The threads are considered matured because when the debugging of a process is initiated these threads provide the complete trace of the process function. It completely outputs the working state of the process with respect to import and export functions that are structured for a par­ticular task and specific thread. The immature threads are termed as such because they may face some kind of unex­pected blocking during their execution. This can be a re­sult of some deadlocks. A thread that is prevented from execution is said to be blocked. A thread may be blocked because

  1. It has been put to sleep for some amount of time
  2. It is suspended with a call to suspend() and will be blocked until a resume message
  3. The thread is suspended by a call to wait(), and will become runnable on a notify message

There can be a question related to the completion of thread execution once the threads are notified or resumed. The reason is based on a thin red line here. Debugging is a dynamic activity that varies with the passage of time and the analysis is tuned at the time when a specific snapshot of debugged process is examined.

So it’s better to perform the check on the thread maturity in a process to perform effective analysis.

   0  Id: 3f4.448 Suspend: 1 Teb: 7ffdf000 Unfrozen
 Start: wscntfy+0x27f2 (010027f2) Priority: 0  Priority class: 32  Affinity: 1

.  1  Id: 3f4.8ec Suspend: 1 Teb: 7ffde000 Frozen
 Start: ntdll!DbgUiRemoteBreakin (7c95077b) Priority: 0  Priority class: 32  Affinity: 1

These above mentioned threads present the different states of execution.

The generic commands that can be used to play around are mentioned below:

  • ~f freezes a thread
  • ~u unfreezes a thread at a specific state.

It works out for both invasive and non-invasive debugging layouts.

Exceptions: Active Thread Analysis

This is proven sequentially that a unique thread remains active whenever a process is debugged. This is because the state of the thread has to be retained at the time when the process is in a debug mode. In any process being de­bugged one of the specific threads is always in active state. A skillful analysis of the active thread is re­quired to under­stand the context in which it is set as ac­tive. This is be­cause active thread causes an exception. When that excep­tion is encountered by the debugger, the process state is dissected stringently. The exception results in holding the state of various threads. It is always advisa­ble to look into an Exceptional Thread. It helps in under­standing the flow of modular calls during the occurrence of that exception.

Example:

FAULTING_IP: 
KERNEL32!SetErrorMode+14b
77e6c427 8a08            mov     cl,byte ptr [eax]

EXCEPTION_RECORD:  ffffffff -- (.exr 0xffffffffffffffff)
ExceptionAddress:77e6c427 (KERNEL32!SetErrorMode+0x0000014b)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 00000000
   Parameter[1]: 087deadc
Attempt to read from address 087deadc

One can use .exr debug command to display the content of the exception record.

Thread State Check

TEB structure provides information regarding Thread Environment. During debugging, the threads are sus­pended between states that define the execution part of that thread with respect to the process. Most of the threads oscillate between Frozen and Unfrozen state. The names clearly depict the state of threads as functional or stagnated in a clear context at the time the debugging is initiated. The thread checking is also critical from reverse engineer’s point of view. Sometimes a culprit thread causes memory leakage thereby adversely impacting the system’s functionality. It’s better to analyze only unfrozen threads to trace the malware thread. There is no need to race along every single thread for analysis. An effective reverse engineer continues to check the threads on the basis of their behavior. The debugger is well equipped with this type of functionality.

0:001> ~1f

0:001> ~
  0  Id: 3f4.448 Suspend: 1 Teb: 7ffdf000 Unfrozen
. 1  Id: 3f4.8ec Suspend: 1 Teb: 7ffde000 Frozen  

0:001> ~1u

0:001> ~
  0  Id: 3f4.448 Suspend: 1 Teb: 7ffdf000 Unfrozen
. 1  Id: 3f4.8ec Suspend: 1 Teb: 7ffde000 Frozen  

0:001> ~1u

0:001> ~
  0  Id: 3f4.448 Suspend: 1 Teb: 7ffdf000 Unfrozen
. 1  Id: 3f4.8ec Suspend: 1 Teb: 7ffde000 Frozen  

0:001> |
. 0  id: 3f4       attach name: C:\WINDOWS\system32\wscntfy.exe

The threads state can be varied directly or indirectly.

Thread Entry Checks

The entry point addresses hold importance in their own context. These addresses define the curvature of the thread that begins to play a part in the active process. Suppose there are N threads running with dif­ferent mod­ular callings. Every single thread has an entry point ad­dress. With N number of threads one can possibly find N different entry point addresses. As we know the functions can be imported or exported. Both ways an en­try point address is culminated. It’s very critical to understand the thread Entry Address specification. This is because it pro­vides information regarding thread entry in the process Thread Pool.

Example:

HANDLE CreateThread(
   LPSECURITY_ATTRIBUTES lpThreadAttributes,
   SIZE_T dwStackSize,
   LPTHREAD_START_ROUTINE lpStartAddress,
   LPVOID lpParameter,DWORD dwCreationFlags,
   LPDWORD lpThreadId );

HANDLE CreateRemoteThread(
   HANDLE hProcess,
   LPSECURITY_ATTRIBUTES lpThreadAttributes,
   SIZE_T dwStackSize,
   LPTHREAD_START_ROUTINE lpStartAddress,
   LPVOID lpParameter,
   DWORD dwCreationFlags,
   LPDWORD lpThreadId);

The threads are generated by calling the above stated API's. But from the debugging point of view, knowing the entry point of thread is really crucial. Once the thread is created, it means the thread is ready to enter into the process space. Whether it is functional in the virtual memory of that process or not is not material, but more important is to gain acquaintance with the address of the entry. This is so because the working state of a process is going to be changed after the entry of a new thread. So to understand the direction of flow of process the entry checks are necessary to perform.

Commanding Thread Control

Commanding a specific thread is an art of a reverse engi­neer. It refers to the modus operandi by which a thread is cross dissected by running a different number of com­mands at a point of time. The structural components should be analyzed step by step in a detailed manner in order to build stronghold over the system. Consequently, multiple commands that can be run in the context of a specific thread come out to be an effective operation from debugging perspective. It not only controls the time limits but also makes it flexible. Let’s see:

0:001> ~1e ; p ; | ; kd ; k
.  0   id: 3f4       attach name: C:\WINDOWS\system32\wscntfy.exe
007cffc8  f7d71cec
007cffcc  7c9507a8 ntdll!DbgUiRemoteBreakin+0x2d
007cffd0  00000005
007cffd4  00000004
007cffd8  00000001
007cffdc  007cffd0
007cffe0  00000000
007cffe4  ffffffff
007cffe8  7c90ee18 ntdll!_except_handler3
007cffec  7c9507c8 ntdll!`string'+0x7c
007cfff0  00000000
007cfff4  00000000
007cfff8  00000000
007cfffc  00000000
ChildEBP RetAddr  
007cffc8 7c9507a8 ntdll!DbgBreakPoint+0x1
007cfff4 00000000 ntdll!DbgUiRemoteBreakin+0x2d

0:001> ~1e ; | ; kb ; .formats 7c9507a8
.  0   id: 3f4       attach name: C:\WINDOWS\system32\wscntfy.exe
ChildEBP RetAddr  Args to Child              
007cffc8 7c9507a8 00000005 00000004 00000001 ntdll!DbgBreakPoint+0x1
007cfff4 00000000 00000000 00000000 00000000 ntdll!DbgUiRemoteBreakin+0x2d
Evaluate expression:
Hex:     7c9507a8
Decimal: 2090141608
Octal:   17445203650
Binary:  01111100 10010101 00000111 10101000
Chars:   |...
Time:    Wed Mar 26 03:53:28 203
Float:   low 6.19046e+036 high 0
Double:  1.03267e-314

Analyzing the (LEC) Last Error Check in Threads

Looking at generated errors within a specific thread is an expedient technique to follow. There is always a one active thread with (.) parameter when a debugger shows a list of threads. The call to GetLastError with a bang command (!gle) reflects any inherited error in the thread. This is done to check which function quits abnormally in the thread. If the functions returned successfully in a thread, the possibility of having an error is minute and the state of the thread will be normal. Let’s see:

0:009> ~.
.  9  Id: 4e4.bc Suspend: 1 Teb: 7ffdd000 Unfrozen
      Priority: 0  Priority class: 32  Affinity: 1

0:009> !gle
LastErrorValue: (Win32) 0 (0) - The operation completed successfully.
LastStatusValue: (NTSTATUS) 0 - STATUS_WAIT_0

So no last error is presented in thread number 10. The number shown in the response is 9 but considering Zero Based Indexing the actual thread number is 10.

Unwinding Stacks at Thread Level

The stack unwinding is undertaken as realm of dissecting the stack based on various conditional facts in a program. The point of talk is whether it is prudent to unwind stack based on certain parameters while circumventing others. Debugging a heavy code is always a hard nut to crack if the cross structural references in modules are high. The concept rotates around the structural interdependences. Debugging with adequate symbols somewhat inhibits the debugger’s efficiency at work. It means the program can be disseminated according to the standard symbols loaded in the debugger. The stack is always processed for a single thread. What if threads are present in large numbers? Every thread has one stack. Sometimes a duplicate entry can be found in the system while debugging. The break­points whether conditional (bp) or direct running (g), in­trinsically depends on the Return Addresses. Even the manipulated threads possess a valid return address but execution depends on whether it is successfully returned or not. It has been noticed that while applying a break­point a hang condition occurs. It occurs mainly if a work­ing thread is entangled somewhere or the function has not returned. Backtracing (k) stack is a good practice.

The processing and debugging depends on time, com­plexity and resource utilization. It becomes stringent if an ambiguity occurs in any of the three parameters defined above. So in order to trap a bug in the requisite time pe­riod, the unwinding of stack with conditional entities has to be devised while debugging.

Let’s say:

Tn is a set of thread numbers

Tn = {t1, t2, t3, t4, …, tn}

Xn is a set of complexity numbers

Xn = {x1, x2, x3, x4, …, xn}

A simple one to one mapping is

Xn{x1, x2, x3, …, xn} ---------> Tn{t1, t2, t3, …, tn}

Condition: Interdependency i.e. mapping one to many / many to one.

This raises the complexity of the interdependency issue to its zenith. So a reverse engineer should try at utmost to circumvent this issue and focus on traversing the code irrespective of the ingrained complexity. Looking at cer­tain specific parts of code rather than from every perspec­tive strengthens the analysis. It is even advised to leverage as much as the information from the code in order to gain interim understanding of encountered process functions. Let’s look at a general example to traverse through the stack in an efficient manner. This makes the Register Checks easy for a given thread at a given point of time.

I simply attached a debugger to Google talk process. I found there are 7 threads running in it:

0:006> ~
0  Id: 7d4.174 Suspend: 1 Teb: 7ffdd000 Unfrozen
1  Id: 7d4.7a4 Suspend: 1 Teb: 7ffdc000 Unfrozen
2  Id: 7d4.1b8 Suspend: 1 Teb: 7ffda000 Unfrozen
3  Id: 7d4.2d8 Suspend: 1 Teb: 7ffd8000 Unfrozen
4  Id: 7d4.7fc Suspend: 1 Teb: 7ffd7000 Unfrozen
5  Id: 7d4.2bc Suspend: 1 Teb: 7ffd6000 Unfrozen
6  Id: 7d4.2dc Suspend: 1 Teb: 7ffdb000 Unfrozen

Let’s run [~*kv] command to trace some unfiltered and complex output (we didn’t load symbols to simulate their unavailability):

0:006> ~*kv

   0  Id: 7d4.174 Suspend: 1 Teb: 7ffdd000 Unfrozen
ChildEBP RetAddr  Args to Child              
0012e838 7e4191be 7e4191f1 0012fb20 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
*** WARNING: Unable to verify checksum for 
C:\Program Files\Google\Google Talk\googletalk.exe
*** ERROR: Module load completed but symbols could not be loaded for 
C:\Program Files\Google\Google Talk\googletalk.exe
0012e858 0040271e 0012fb20 00000000 00000000 USER32!NtUserGetMessage+0xc
WARNING: Stack unwind information not available. Following frames may be wrong.
00000000 00000000 00000000 00000000 00000000 googletalk+0x271e

   1  Id: 7d4.7a4 Suspend: 1 Teb: 7ffdc000 Unfrozen
ChildEBP RetAddr  Args to Child              
0150fe18 7c90e399 77e76703 0000018c 0150ff70 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
*** ERROR: Symbol file could not be found.  Defaulted to export symbols for 
C:\WINDOWS\system32\RPCRT4.dll
0150fe1c 77e76703 0000018c 0150ff70 00000000 ntdll!NtReplyWaitReceivePortEx+0xc (FPO: [5,0,0])
WARNING: Stack unwind information not available. Following frames may be wrong.
0150ff80 77e76c22 0150ffa8 77e76a3b 00153118 RPCRT4!I_RpcBCacheFree+0xcb
0150ff88 77e76a3b 00153118 0012e6b0 0012e22c RPCRT4!I_RpcBCacheFree+0x5ea
0150ffa8 77e76c0a 00156b40 0150ffec 7c80b683 RPCRT4!I_RpcBCacheFree+0x403
0150ffb4 7c80b683 0016f370 0012e6b0 0012e22c RPCRT4!I_RpcBCacheFree+0x5d2
0150ffec 00000000 77e76bf0 0016f370 00000000 kernel32!BaseThreadStart+0x37 (FPO: [Non-Fpo])

Again the output is too messy to get into it. To reduce complexity and unwind the stack in the best possible way, a direct call is made [!uniqstack] to filter it for better layout:

0:006> !uniqstack
Processing 7 threads please wait
.  0  Id: 7d4.174 Suspend: 1 Teb: 7ffdd000 Unfrozen
Start: googletalk+0x15719a (0055719a) Priority: 0  Priority class: 32  Affinity: 1
ChildEBP RetAddr  
0012e838 7e4191be ntdll!KiFastSystemCallRet
0012e858 0040271e USER32!NtUserGetMessage+0xc
WARNING: Stack unwind information not available. Following frames may be wrong.
00000000 00000000 googletalk+0x271e

.  1  Id: 7d4.7a4 Suspend: 1 Teb: 7ffdc000 Unfrozen
 Start: kernel32!BaseThreadStartThunk (7c810659) Priority: 0  Priority class: 32  Affinity: 1
ChildEBP RetAddr  
0150fe18 7c90e399 ntdll!KiFastSystemCallRet
0150fe1c 77e76703 ntdll!NtReplyWaitReceivePortEx+0xc
WARNING: Stack unwind information not available. Following frames may be wrong.
0150ff80 77e76c22 RPCRT4!I_RpcBCacheFree+0xcb
0150ff88 77e76a3b RPCRT4!I_RpcBCacheFree+0x5ea
0150ffa8 77e76c0a RPCRT4!I_RpcBCacheFree+0x403
0150ffb4 7c80b683 RPCRT4!I_RpcBCacheFree+0x5d2
0150ffec 00000000 kernel32!BaseThreadStart+0x37

So the structured output is very easy to traverse and mod­ule names with desired return addresses can be extracted easily. So a conditional and filtered output is sought by not looking at the duplicate entries. Now, one can further eas­ily use set thread command [~s] to do register checks.

Conclusion

The aim of this paper is to enumerate the detailed metho­dology for analyzing all sorts of stringent behavior by the threads. The holistic approach is always useful in under­standing the overall picture of thread analysis and the di­rection to be followed for analyzing threads that cause exceptions and other flaws. It’s always a good approach for functional and real time analysis of threads.

About the Author

Aditya K Sood is a Independent Security Consultant and Founder of SecNiche Security. He has already worked in the security domain for COSEINC and KPMG. He has been an active speaker at conferences like RSA (US 2010), TRISC, EuSecwest, XCON, Troopers, OWASP AppSec, FOSS, CERT-IN etc. He has written content for HITB Ezine, Hakin9, Usenix Login, Elsevier Journals, De­bugged! MZ/PE.