Reverse-Engineering a Windows PE Binary

I’m excited, guys! I just made my first real foray into the guts of an executable file à la reverse-engineering with the Interactive Disassembler (IDA) from Hex-Rays. I’m using the freeware version of IDA, which means I don’t get any tech support, so I’ve had to mostly figure out how to do everything myself (although I did get some cursory help from the how-to page on the Hex-Rays website). So far I’ve learned how to navigate all the different modes and windows in IDA, which had me stumped before. I’m also gradually piecing together a picture of how Windows PE executables are built, and what they look like from a nitty-gritty perspective.

Now to be fair, I have diassembled binaries before. I tried out the DEBUG command in DOS a few times, as well as the otool command in MacOS, and I even did a couple test runs on Windows PE files with IDA just to make sure I knew how to disassemble stuff. However, in all those cases, I never went beyond the disassembly stage. All I did was run the disassemble command with no intention of actually understanding the output. This is the first time I’ve really looked at the assembly dump and made a serious effort at figuring out what was going on. Fortunately I’ve gotten fairly good at reading code thanks to all the time I spent studying the WordPress source code as well as the source codes for the GNU Core Utilities. Reading assembly code is the next difficulty setting after reading high-level code.

I figured I’d start simple, so I wrote a basic “Hello World” program in C that I compiled with Open Watcom, and then I ran it through IDA. Immediately I was hit with a major caveat regarding reverse-engineering of Windows PE files, and that was with respect to the sheer complexity of these programs under the hood. It may seem simple when all you have is printf( "Hello, World!\n" );, but it turns out that that C Standard Library function translates to a multitude of low-level WinAPI calls. And that’s just the beginning, because there’s also all the register maintenance and pushing and popping that the program has to do whenever it makes a function or subroutine call or something similar. I still haven’t figured it all out yet.

So yeah, basically there’s a major difference between something like gcc, where the resulting assembly program just calls the C library functions directly, and something like Open Watcom, which breaks the C library functions down into their constituent API calls. The major tipoff in this case was the .idata section of the assembly dump:

.idata:00407000 ; Section 2. (virtual address 00007000)
.idata:00407000 ; Virtual size                  : 000003B6 (    950.)
.idata:00407000 ; Section size in file          : 00000400 (   1024.)
.idata:00407000 ; Offset to raw data for section: 00005E00
.idata:00407000 ; Flags C0000040: Data Readable Writable
.idata:00407000 ; Alignment     : default
.idata:00407000 ; ===========================================================================
.idata:00407000 ; Segment type: Externs
.idata:00407000 ; _idata
.idata:004070C8 ;
.idata:004070C8 ; Imports from USER32.DLL
.idata:004070C8 ;
.idata:004070C8 ; LPSTR __stdcall CharUpperA(LPSTR lpsz)
.idata:004070C8                 extrn CharUpperA:dword  ; CODE XREF: sub_406090+19↑p
.idata:004070C8                                         ; DATA XREF: sub_406090+19↑r ...
.idata:004070D0 ;
.idata:004070D0 ; Imports from KERNEL32.DLL
.idata:004070D0 ;
.idata:004070D0 ; BOOL __stdcall CloseHandle(HANDLE hObject)
.idata:004070D0                 extrn CloseHandle:dword ; CODE XREF: sub_402FA0+31↑p
.idata:004070D0                                         ; sub_405380+18↑p
.idata:004070D0                                         ; DATA XREF: ...
.idata:004070D4 ; HANDLE __stdcall CreateEventA(LPSECURITY_ATTRIBUTES lpEventAttributes, BOOL bManualReset, BOOL bInitialState, LPCSTR lpName)
.idata:004070D4                 extrn CreateEventA:dword
.idata:004070D4                                         ; CODE XREF: sub_402EC0+11↑p
.idata:004070D4                                         ; DATA XREF: sub_402EC0+11↑r ...
.idata:004070D8 ; HANDLE __stdcall CreateFileA(LPCSTR lpFileName, DWORD dwDesiredAccess, DWORD dwShareMode, LPSECURITY_ATTRIBUTES lpSecurityAttributes, DWORD dwCreationDisposition, DWORD dwFlagsAndAttributes, HANDLE hTemplateFile)
.idata:004070D8                 extrn CreateFileA:dword ; CODE XREF: sub_404ED0+3E↑p
.idata:004070D8                                         ; sub_404ED0+62↑p
.idata:004070D8                                         ; DATA XREF: ...
.idata:004070DC ; void __stdcall __noreturn ExitProcess(UINT uExitCode)
.idata:004070DC                 extrn ExitProcess:dword ; CODE XREF: sub_401120+772↑p
.idata:004070DC                                         ; sub_403D80+77↑p
.idata:004070DC                                         ; DATA XREF: ...
.idata:004070E0 ; BOOL __stdcall FlushFileBuffers(HANDLE hFile)
.idata:004070E0                 extrn FlushFileBuffers:dword
.idata:004070E0                                         ; CODE XREF: sub_4044E0+17↑p
.idata:004070E0                                         ; DATA XREF: sub_4044E0+17↑r ...
.idata:004070E4 ; UINT __stdcall GetACP()
.idata:004070E4                 extrn GetACP:dword      ; CODE XREF: sub_405467+E↑p
.idata:004070E4                                         ; DATA XREF: sub_405467+E↑r ...
.idata:004070E8 ; BOOL __stdcall GetCPInfo(UINT CodePage, LPCPINFO lpCPInfo)
.idata:004070E8                 extrn GetCPInfo:dword   ; CODE XREF: sub_405467+8A↑p
.idata:004070E8                                         ; DATA XREF: sub_405467+8A↑r ...
.idata:004070EC ; LPSTR __stdcall GetCommandLineA()
.idata:004070EC                 extrn GetCommandLineA:dword
.idata:004070EC                                         ; CODE XREF: sub_401570+E9↑p
.idata:004070EC                                         ; DATA XREF: sub_401570+E9↑r ...
.idata:004070F0 ; LPWSTR __stdcall GetCommandLineW()
.idata:004070F0                 extrn GetCommandLineW:dword
.idata:004070F0                                         ; CODE XREF: sub_401570+14D↑p
.idata:004070F0                                         ; DATA XREF: sub_401570+14D↑r ...
.idata:004070F4 ; DWORD __stdcall GetCurrentThreadId()
.idata:004070F4                 extrn GetCurrentThreadId:dword
.idata:004070F4                                         ; CODE XREF: sub_402A40+1B↑p
.idata:004070F4                                         ; DATA XREF: sub_402A40+1B↑r ...
.idata:004070F8 ; DWORD __stdcall GetFileType(HANDLE hFile)
.idata:004070F8                 extrn GetFileType:dword ; CODE XREF: sub_4053E0+8↑p
.idata:004070F8                                         ; DATA XREF: sub_4053E0+8↑r ...
.idata:004070FC ; DWORD __stdcall GetLastError()
.idata:004070FC                 extrn GetLastError:dword
.idata:004070FC                                         ; CODE XREF: sub_404530+70↑p
.idata:004070FC                                         ; sub_404530:loc_4045D8↑p ...
.idata:00407100 ; DWORD __stdcall GetModuleFileNameA(HMODULE hModule, LPSTR lpFilename, DWORD nSize)
.idata:00407100                 extrn GetModuleFileNameA:dword
.idata:00407100                                         ; CODE XREF: sub_401570+C3↑p
.idata:00407100                                         ; sub_401570+1E6↑p ...
.idata:00407104 ; DWORD __stdcall GetModuleFileNameW(HMODULE hModule, LPWSTR lpFilename, DWORD nSize)
.idata:00407104                 extrn GetModuleFileNameW:dword
.idata:00407104                                         ; CODE XREF: sub_403060+31↑p
.idata:00407104                                         ; DATA XREF: sub_403060+31↑r ...
.idata:00407108 ; HMODULE __stdcall GetModuleHandleA(LPCSTR lpModuleName)
.idata:00407108                 extrn GetModuleHandleA:dword
.idata:00407108                                         ; CODE XREF: sub_401570+98↑p
.idata:00407108                                         ; sub_401790+4F↑p ...
.idata:0040710C ; UINT __stdcall GetOEMCP()
.idata:0040710C                 extrn GetOEMCP:dword    ; CODE XREF: sub_405467+1D↑p
.idata:0040710C                                         ; sub_405467+AB↑p
.idata:0040710C                                         ; DATA XREF: ...
.idata:00407110 ; FARPROC __stdcall GetProcAddress(HMODULE hModule, LPCSTR lpProcName)
.idata:00407110                 extrn GetProcAddress:dword
.idata:00407110                                         ; CODE XREF: sub_401570+9F↑p
.idata:00407110                                         ; sub_401790+56↑p ...
.idata:00407114 ; HANDLE __stdcall GetStdHandle(DWORD nStdHandle)
.idata:00407114                 extrn GetStdHandle:dword
.idata:00407114                                         ; CODE XREF: sub_401120+1F↑p
.idata:00407114                                         ; sub_401120+43↑p ...
.idata:00407118 ; DWORD __stdcall GetVersion()
.idata:00407118                 extrn GetVersion:dword  ; CODE XREF: sub_401570+1E↑p
.idata:00407118                                         ; sub_403060+A↑p
.idata:00407118                                         ; DATA XREF: ...
.idata:0040711C ; HMODULE __stdcall LoadLibraryA(LPCSTR lpLibFileName)
.idata:0040711C                 extrn LoadLibraryA:dword
.idata:0040711C                                         ; CODE XREF: sub_403840+8↑p
.idata:0040711C                                         ; DATA XREF: sub_403840+8↑r ...
.idata:00407120 ; int __stdcall MultiByteToWideChar(UINT CodePage, DWORD dwFlags, LPCSTR lpMultiByteStr, int cbMultiByte, LPWSTR lpWideCharStr, int cchWideChar)
.idata:00407120                 extrn MultiByteToWideChar:dword
.idata:00407120                                         ; CODE XREF: sub_403060+56↑p
.idata:00407120                                         ; sub_406620+73↑p
.idata:00407120                                         ; DATA XREF: ...
.idata:00407124 ; BOOL __stdcall SetConsoleCtrlHandler(PHANDLER_ROUTINE HandlerRoutine, BOOL Add)
.idata:00407124                 extrn SetConsoleCtrlHandler:dword
.idata:00407124                                         ; CODE XREF: sub_4050F0+1D↑p
.idata:00407124                                         ; sub_405130+23↑p
.idata:00407124                                         ; DATA XREF: ...
.idata:00407128 ; BOOL __stdcall SetEnvironmentVariableA(LPCSTR lpName, LPCSTR lpValue)
.idata:00407128                 extrn SetEnvironmentVariableA:dword
.idata:00407128                                         ; CODE XREF: sub_405E60+C0↑p
.idata:00407128                                         ; DATA XREF: sub_405E60+C0↑r ...
.idata:0040712C ; DWORD __stdcall SetFilePointer(HANDLE hFile, LONG lDistanceToMove, PLONG lpDistanceToMoveHigh, DWORD dwMoveMethod)
.idata:0040712C                 extrn SetFilePointer:dword
.idata:0040712C                                         ; CODE XREF: sub_4044A0+17↑p
.idata:0040712C                                         ; sub_404530+4D↑p
.idata:0040712C                                         ; DATA XREF: ...
.idata:00407130 ; BOOL __stdcall SetStdHandle(DWORD nStdHandle, HANDLE hHandle)
.idata:00407130                 extrn SetStdHandle:dword
.idata:00407130                                         ; CODE XREF: sub_402E70:loc_402E84↑p
.idata:00407130                                         ; DATA XREF: sub_402E70:loc_402E84↑r ...
.idata:00407134 ; LPTOP_LEVEL_EXCEPTION_FILTER __stdcall SetUnhandledExceptionFilter(LPTOP_LEVEL_EXCEPTION_FILTER lpTopLevelExceptionFilter)
.idata:00407134                 extrn SetUnhandledExceptionFilter:dword
.idata:00407134                                         ; CODE XREF: sub_403D40+7↑p
.idata:00407134                                         ; sub_403D40+15↑p
.idata:00407134                                         ; DATA XREF: ...
.idata:00407138 ; LONG __stdcall UnhandledExceptionFilter(struct _EXCEPTION_POINTERS *ExceptionInfo)
.idata:00407138                 extrn UnhandledExceptionFilter:dword
.idata:00407138                                         ; CODE XREF: sub_403D80+66↑p
.idata:00407138                                         ; DATA XREF: sub_403D80+66↑r ...
.idata:0040713C ; LPVOID __stdcall VirtualAlloc(LPVOID lpAddress, SIZE_T dwSize, DWORD flAllocationType, DWORD flProtect)
.idata:0040713C                 extrn VirtualAlloc:dword
.idata:0040713C                                         ; CODE XREF: sub_404A10+E↑p
.idata:0040713C                                         ; DATA XREF: sub_404A10+E↑r ...
.idata:00407140 ; BOOL __stdcall VirtualFree(LPVOID lpAddress, SIZE_T dwSize, DWORD dwFreeType)
.idata:00407140                 extrn VirtualFree:dword ; CODE XREF: sub_4055F0+A↑p
.idata:00407140                                         ; DATA XREF: sub_4055F0+A↑r ...
.idata:00407144 ; SIZE_T __stdcall VirtualQuery(LPCVOID lpAddress, PMEMORY_BASIC_INFORMATION lpBuffer, SIZE_T dwLength)
.idata:00407144                 extrn VirtualQuery:dword
.idata:00407144                                         ; CODE XREF: sub_404000+16↑p
.idata:00407144                                         ; DATA XREF: sub_404000+16↑r ...
.idata:00407148 ; int __stdcall WideCharToMultiByte(UINT CodePage, DWORD dwFlags, LPCWSTR lpWideCharStr, int cchWideChar, LPSTR lpMultiByteStr, int cbMultiByte, LPCSTR lpDefaultChar, LPBOOL lpUsedDefaultChar)
.idata:00407148                 extrn WideCharToMultiByte:dword
.idata:00407148                                         ; CODE XREF: sub_404240+29↑p
.idata:00407148                                         ; DATA XREF: sub_404240+29↑r ...
.idata:0040714C ; BOOL __stdcall WriteFile(HANDLE hFile, LPCVOID lpBuffer, DWORD nNumberOfBytesToWrite, LPDWORD lpNumberOfBytesWritten, LPOVERLAPPED lpOverlapped)
.idata:0040714C                 extrn WriteFile:dword   ; CODE XREF: sub_401120+26↑p
.idata:0040714C                                         ; sub_401120+4A↑p ...

.idata is the section of a Windows PE binary that includes all symbols imported from external files. In this case, in order to run printf(), the executable needs to import functions from two DLL files: USER32.DLL and KERNEL32.DLL. These functions, or rather system calls, handle things like creating and writing to file streams, flushing the output buffer (because printf() is a buffered function), and reporting any errors if necessary. As you can see, a lot of work goes on behind the scenes with any Windows PE program.

The IDA application does a lot more than just disassemble binaries. It also has a lot of features that aid in the understanding of the assembly code. One of the ways it does this is by providing a graph mode where it maps the entire program with arrows that indicate jumps. You can quickly switch between text mode and graph mode by hitting the spacebar.

Screenshot of the Interactive Disassembler (IDA)

You can also view the disassembly in proximity mode, which is like graph mode except that whereas graph mode diagrams branches in control flow logic, proximity mode diagrams procedure calls. It is so-called because it shows all the callers and callees of the currently selected subroutine. The proximity browser allows you to expand different branches of the program one at a time to get a more complete picture. In the screenshot below, the currently selected subroutine is highlighted in green and calls to external procedures (system calls) are highlighted in magenta.

Screenshot of IDA's proximity browser

As you can see, this program is rather convoluted. And that’s why reverse-engineering is such hard work, whether you’re working with Windows PE or any other format. Often the resulting assembly code is basically unintelligeable to all but the most well-trained hackers.

There are a couple other observations I have been able to make in examining this assembly dump. First, we can see that the bulk of the instructions consist of pushing operands onto the stack in preparation for a procedure call. This is what’s known in the Linux world as the function preamble.

AUTO:00401134                 push                  ; lpOverlapped
AUTO:00401136                 lea     eax, [esp+0Ch+NumberOfBytesWritten]
AUTO:0040113A                 push    eax             ; lpNumberOfBytesWritten
AUTO:0040113B                 push    edx             ; nNumberOfBytesToWrite
AUTO:0040113C                 push    esi             ; lpBuffer
AUTO:0040113D                 push    0FFFFFFF4h      ; nStdHandle
AUTO:0040113F                 call    ds:GetStdHandle

Second, when we look at the function preambles for the system calls, we can see that they consist of pushing immediate operands as well as register contents. This means that the Windows ABI uses the stack to hold all the parameters of a system call. This is in contrast to the Linux ABI, which uses registers for the first few parameters and the stack for any remaining.

I have also noticed that all of the assembler directives in the assembly dump are those of the Microsoft Macro Assembler (MASM). This is very handy as the book I have that I learned x86 assembly language programming from is based on MASM. So I can use that book as a companion to my reverse-engineering adventures in IDA, along with the online help and some books on reverse-engineering.

In closing, I will state that reverse-engineering of executable files is something I’ve been meaning to get into for a really long time, but for some reason I never took the necessary steps until now. I think I was probably intimidated by the unwieldy nature of all the assembly dumps I had seen in the past, and this program was no exception, though I am now at a time in my life where I don’t have anything else going on, so I can devote all my time and energy to learning reverse-engineering and deciphering the Windows PE format. I believe being able to not only disassemble binaries but also read the assembly dumps and use them to modify the program to one’s liking is a rite of passage for every budding hacker.


One thought on “Reverse-Engineering a Windows PE Binary

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s