Uses for Reverse Engineering: Debugging C Code with a Disassembler

One thing I like about C programming for the DOS prompt is that when there’s a run time error, a whole collection of debugging information gets dumped to the console. This can be very useful in diagnosing errors, as opposed to the Unix command line, where all you get is a not-so-descriptive message like “Segmentation fault”. I was testing a C program I was writing when I got a run time error with the following debug info:

Debugging info indicating a null pointer reference

I noticed two things here: 1. The pointer value was zero, which significantly narrowed down what the exact problem was, and 2. I could look at the exact address of the exception in a disassembler like IDA to get further information. A reverse engineering approach might be a more intelligent way of going about it than my traditional method of inserting printf() statements at various points to try to pinpoint the exact line the error occurred on. With this in mind, I fired up IDA and disassembled my buggy C program…

Reverse-engineering a C program to find bugs

As I suggested previously, there are two bits of debugging information from the previous console dump that we can use to our advantage here. They are the value of the pointer and the address of the instruction where the exception occurred. Since the value of the pointer is zero, we know that the segfault occurred because we tried to dereference a null pointer. Obviously zero is an invalid address. The pointer was null either because it was not set to begin with, or because it was unset at some point in the execution. Looking at the instruction at the given address will provide insight into what exactly went wrong. So I’m using the “Jump to address” command in IDA to find the desired instruction:

Reverse-engineering a C program to find bugs

This brings us to the following line in the assembly code (address highlighted in yellow):

Reverse-engineering a C program to find bugs

For further clarity, here is the code of the disassembled subroutine in a textual format:


.code:00401D90 sub_401D90      proc near               ; CODE XREF: sub_401010+B7?p
.code:00401D90                 push    ecx
.code:00401D91                 push    edi
.code:00401D92                 mov     edieax
.code:00401D94                 sub     ecxecx
.code:00401D96                 dec     ecx
.code:00401D97                 xor     eaxeax
.code:00401D99                 repne scasb
.code:00401D9B                 not     ecx
.code:00401D9D                 dec     ecx
.code:00401D9E                 mov     eaxecx
.code:00401DA0                 pop     edi
.code:00401DA1                 pop     ecx
.code:00401DA2                 retn
.code:00401DA2 sub_401D90      endp

The culprit instruction here is REPNE SCASB. In the x86 architecture, REPNE repeats the instruction immediately after it until either ECX is zero or the ZF flag in the EFLAGS register is set. SCAS scans an array and compares each array element to the value in EAX, setting ZF accordingly. The B at the end means we are scanning a byte array. When combined, the REPNE and SCASB instructions scan a byte array (probably a string) until reaching the byte value stored in EAX. Looking back at the debug info in the first screenshot, we can see that this value is zero. So essentially, this instruction is scanning a string until it reaches a null character – the end of the string. There is indeed a function in C that does this, and it’s also a function that I used in my program. That function is strlen(). So I need to look more closely at the macro I wrote that uses that function:


// Read a single line from the file
#define read_line( buf, size, fp )\
        if( buf ) free( buf );\
        size = 64;\
        buf = (char *) malloc( size );\
        fgets( buf, size, fp );\
        while( buf[strlen( buf ) - 1] != '\n' ){\
                size <<= 1;\
                buf = (char *) realloc( buf, size );\
                fgets( buf + (size >> 1) - 1, size >> 1, fp );\
        }\
        size = 64

It turns out the condition in the while loop was always evaluating to true, causing the size variable to be left-shifted until the 1 bit was pushed off the end, causing realloc() to allocate a string with zero size. For some reason if you create a string with nothing in it (not even a null character) that string becomes a null pointer. So the problem here is that the last character in the string is never the newline.

Then it hit me: If the last line in a text file isn’t terminated by a newline (which is sometimes the case), this will result in the above condition always evaluating to true, causing the size to overflow to zero, which in turn leads to a null pointer reference. So what I need to do is add a quick function to check if the end of the file has been reached. Here’s the function I wrote:


// Used in case the last line of the
// file is not newline-terminated
bool is_eof( FILE *fp ){
        int c;
        bool end;
        end = ((c = fgetc( fp )) == EOF );
        ungetc( c, fp );
        return end;
}

And here is the modified version of my line-reading macro:


// Read a single line from the file
#define read_line( buf, size, fp )\
        if( buf ) free( buf );\
        size = 64;\
        buf = (char *) malloc( size );\
        fgets( buf, size, fp );\
        while( buf[strlen( buf ) - 1] != '\n' && !is_eof( fp ) ){\
                size <<= 1;\
                buf = (char *) realloc( buf, size );\
                fgets( buf + (size >> 1) - 1, size >> 1, fp );\
        }\
        size = 64

This part of the program now runs without any problems. Just a quick demonstration of the power of reverse engineering in the diagnosis of C programming bugs that can otherwise be very illusive. I think this is how I’m going to be doing all my debugging from now on. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s