

Jason Copenhaver, 2nd of August, 2011
Winner of Tell Your Windows Debugging Story Annual Competition
jcopenhaver[0x40]accusoft[0x2E]com
Our company has an internal library that allows teams in our company to package DLLs as resources inside a single DLL. This means our customers only have to worry about distributing a single file, and we are sure that the DLL we deliver to the customer always runs with the DLLs that we tested with. One of the teams using this library was running into a problem where Visual Studio 2008 was crashing when they were debugging their 64-bit product. It’s never good when you are doing something so wrong that your debugger crashes. Needing to figure out what was going on I launched Visual Studio 2008 (VS), attached a copy of WinDbg and then repeated the steps in VS to make it crash.
The crash occurred exactly as reported in the defect ticket.
(564.1298): Break instruction exception - code 80000003 (first chance) eax=00000000 ebx=00000000 ecx=7712e1fb edx=030bc5e9 esi=00270000 edi=08c11030 eip=77100004 esp=030bc838 ebp=030bc8b4 iopl=0 nv up ei pl nz na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206 ntdll!DbgBreakPoint: 77100004 cc int 3
A quick look at the stack and I had my first suspect.
0:001> k ChildEBP RetAddr 030bc834 7719bd9e ntdll!DbgBreakPoint 030bc8b4 7719ca95 ntdll!RtlReportCriticalFailure+0x2e 030bc8c4 7719cb83 ntdll!RtlpReportHeapFailure+0x21 030bc8f8 77143498 ntdll!RtlpLogHeapFailure+0xa1 030bc924 764a3f8b ntdll!RtlFreeHeap+0x60 030bc938 72a03c1b kernel32!HeapFree+0x14 030bc984 6a113a14 MSVCR90!free+0xcd [f:\dd\vctools\crt_bld\self_x86\crt\src\free.c @ 110] ← ← NatDbgDE (the VS debugger)is probably freeing something it shouldn’t be. 030bc9a0 6a0ecc9d NatDbgDE!LoadPdata+0x142 030bcc3c 6a0bd3ee NatDbgDE!LoadCVFile+0x1a4 030bd094 6a0bd52d NatDbgDE!AfterPDBOpen+0xef 030bdd10 6a0ba2ce NatDbgDE!OLStart+0x11e 030bdd4c 6a0ba36e NatDbgDE!LoadOmfForReal+0x23 030bdd64 6a0ba3d9 NatDbgDE!LoadSymbols+0x43 030bdd7c 6a0ba499 NatDbgDE!OLLoadOmf+0x55 030bddac 6a0b9dc1 NatDbgDE!SHLoadDll+0xd5 030bddfc 6a0b9cfa NatDbgDE!CSymbolHandlerX::SHLoadDll+0x5a 030bde94 6a0b9685 NatDbgDE!CModule::Load+0x1a1 030bdefc 6a0ab717 NatDbgDE!CNativeProcess::NotifyModLoad+0xc8 030be03c 6a0ab682 NatDbgDE!EngineCallback+0xb3 030be068 6a0b959d NatDbgDE!EMCallBackDB+0x4c
The Microsoft debugging team has put a lot of work into the heuristics based !analyze command in WinDbg so I always like to see what it thinks of a crash before I go too far.
0:001> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
FAULTING_IP:
ntdll!DbgBreakPoint+0
77100004 cc int 3
EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
.exr 0xffffffffffffffff
ExceptionAddress: 77100004 (ntdll!DbgBreakPoint)
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 1
Parameter[0]: 00000000
FAULTING_THREAD: 00001298
PROCESS_NAME: devenv.exe
ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION} Breakpoint A breakpoint has been reached.
EXCEPTION_CODE: (HRESULT) 0x80000003 (2147483651) - One or more arguments are invalid
EXCEPTION_PARAMETER1: 00000000
NTGLOBALFLAG: 400
APPLICATION_VERIFIER_FLAGS: 0
…
FOLLOWUP_IP:
NatDbgDE!LoadPdata+142
6a113a14 59 pop ecx
SYMBOL_STACK_INDEX: 3
SYMBOL_NAME: NatDbgDE!LoadPdata+142
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: NatDbgDE
IMAGE_NAME: NatDbgDE.dll
DEBUG_FLR_IMAGE_TIMESTAMP: 488f2cca
FAILURE_BUCKET_ID: ACTIONABLE_HEAP_CORRUPTION_heap_failure_block_not_busy_80000003_
NatDbgDE.dll!LoadPdata
BUCKET_ID: APPLICATION_FAULT_ACTIONABLE_HEAP_CORRUPTION_heap_failure_block_not_busy_
DOUBLE_FREE_NatDbgDE!LoadPdata+142
...
Followup: MachineOwner
---------
Well look at that “APPLICATION_FAULT_ACTIONABLE_HEAP_CORRUPTION_heap_failure_block_not_busy_DOUBLE_FREE_NatDbgDE!LoadPdata+142” the CRT is nice enough to point us to the code that did a double free. Looking at the stack trace with function names like SHLoadDll, AfterPdbOpen, and LoadPdata led me to believe this code was trying to parse a DLL. The .pdata section of the PE32+ format contains a table of IMAGE_RUNTIME_FUNCTION_ENTRY structures. This describes the Begin and End address of each function and the location of the stack unwind information needed to handle exceptions. Now all we have to do is understand NatDbgDE!LoadPdata well enough to see why it was double freeing some memory in this case.
Looking at the disassembly we can see that it allocates a buffer (.text:45BC394B), saves it to the stack (.text:45BC3952) and then passes it to a function call (.text:45BC3963) to read in the .pdata section.
.text:45BC391A mov eax, edi ; copy size to eax .text:45BC391C div ecx .text:45BC391E mov dword ptr [ebx+8], 1 .text:45BC3925 mov ecx, esi .text:45BC3927 mov [ebp+arg_0], eax ; store size on stack ... .text:45BC394B call ds:__imp__malloc ; allocate buffer .text:45BC3951 pop ecx .text:45BC3952 mov [ebp+arg_10], eax ; save buffer .text:45BC3955 test eax, eax .text:45BC3957 jnz short loc_45BC395D ... .text:45BC395D mov edx, [esi] .text:45BC395F push edi .text:45BC3960 push eax ; pass buffer .text:45BC3961 mov ecx, esi .text:45BC3963 call dword ptr [edx+10h] ; fill in the buffer
It then clears edi (.text:45BC396C), save it on the stack (.text:45BC3979), followed by inspecting the buffer that our .pdata section was copied into (.text:45BC3984).
.text:45BC396C xor edi, edi ; clear edi .text:45BC396E call ?SHGetPlatform@@YG?AW4_MPT@@PAUHPDS__@@@Z .text:45BC3973 dec eax .text:45BC3974 jz short loc_45BC39A0 .text:45BC3976 dec eax .text:45BC3977 jnz short loc_45BC3942 .text:45BC3979 cmp [ebp+arg_0], edi ; save edi .text:45BC397C jbe short loc_45BC39E4 .text:45BC397E mov ecx, [ebp+arg_10] ; get our buffer .text:45BC3981 add ecx, 8 ; buffer+8 .text:45BC3984 .text:45BC3984 loc_45BC3984: .text:45BC3984 cmp dword ptr [ecx-8], 0 ; Is BeginAddress 0 ? .text:45BC3988 jz short loc_45BC39C5 ; get out of here
If the first value of the .pdata section, the BeginAddress, is 0 it jumps to some cleanup code.
.text:45BC39C5 cmp edi, [ebp+arg_0] ; compare current edi .text:45BC39C8 jnb short loc_45BC39E4 ; with buffer size .text:45BC39CA mov [ebp+arg_0], edi ; save edi .text:45BC39CD imul edi, [ebp+var_4] ; edi*sizeof(FuncEntry) .text:45BC39D1 push edi .text:45BC39D2 push [ebp+arg_10] .text:45BC39D5 call ds:__imp__realloc ; realloc our buffer .text:45BC39DB pop ecx .text:45BC39DC pop ecx .text:45BC39DD test eax, eax .text:45BC39DF jz short loc_45BC3A0B
This last bit of code is where the error occurs. It reallocs the buffer to edi*sizeof(IMAGE_RUNTIME_FUNCTION_ENTRY) but since it errored out on the very first entry edi is still 0. The important part from the realloc documentation says “If size is zero, then the block pointed to by memblock is freed; the return value is NULL, and memblock is left pointing at a freed block.” So this code has just freed our buffer and returned NULL, so the test of eax at .text:45BC39DD will be zero and the code jumps to .text:45BC3A0B.
.text:45BC3A0B push [ebp+arg_10] ; our original buffer .text:45BC3A0E call ds:__imp__free ; LoadPdata+142 fault .text:45BC3A14 pop ecx
Which immediately calls free and it faults.
Tracing through the NatDbgDE!LoadPdata function we see that a double free can occur if we attempt to parse a .pdata section that starts with 0. Then the question becomes why do we have a .pdata section that starts with 0 and what can we do about it?
When we unpack the embedded DLL we write a temporary DLL to disk with just enough information to trick the OS into allocating the correct amount of memory and then load that DLL. This gets us a correct chunk of memory that we can then copy the real contents of the DLL into, doing all of the fix-ups, and even some function level hooks. Loading an actual DLL allows us to get useful DLL_THREAD_ATTACH, DLL_THREAD_DETACH, etc. messages from the OS. When building this temporary DLL we start from zero’d memory and copy only the parts that the OS needs to do the load. So we create a .pdata section that we’ve now learned is invalid for the debugger, but that is not invalid for the OS loader. We considered removing the .pdata section from the temporary DLL but then the OS can’t handle exceptions properly as it has no exception unwind information. The simple fix was to make sure we wrote at least one IMAGE_RUNTIME_FUNCTION_ENTRY to the .pdata section and that made Visual Studio happy. I’ve also added realloc size checks to my code review checklist to ensure this type of problem doesn’t happen in my code.