One thing I like about C programming for the DOS prompt is that when there’s a run time error, a whole collection of debugging information gets dumped to the console. This can be very useful in diagnosing errors, as opposed to the Unix command line, where all you get is a not-so-descriptive message like “Segmentation fault”. I was testing a C program I was writing when I got a run time error with the following debug info:
I noticed two things here: 1. The pointer value was zero, which significantly narrowed down what the exact problem was, and 2. I could look at the exact address of the exception in a disassembler like IDA to get further information. A reverse engineering approach might be a more intelligent way of going about it than my traditional method of inserting printf()
statements at various points to try to pinpoint the exact line the error occurred on. With this in mind, I fired up IDA and disassembled my buggy C program…
As I suggested previously, there are two bits of debugging information from the previous console dump that we can use to our advantage here. They are the value of the pointer and the address of the instruction where the exception occurred. Since the value of the pointer is zero, we know that the segfault occurred because we tried to dereference a null pointer. Obviously zero is an invalid address. The pointer was null either because it was not set to begin with, or because it was unset at some point in the execution. Looking at the instruction at the given address will provide insight into what exactly went wrong. So I’m using the “Jump to address” command in IDA to find the desired instruction:
This brings us to the following line in the assembly code (address highlighted in yellow):
For further clarity, here is the code of the disassembled subroutine in a textual format:
.code:00401D90 sub_401D90 proc near ; CODE XREF: sub_401010+B7?p
.code:00401D90 push ecx
.code:00401D91 push edi
.code:00401D92 mov edi, eax
.code:00401D94 sub ecx, ecx
.code:00401D96 dec ecx
.code:00401D97 xor eax, eax
.code:00401D99 repne scasb
.code:00401D9B not ecx
.code:00401D9D dec ecx
.code:00401D9E mov eax, ecx
.code:00401DA0 pop edi
.code:00401DA1 pop ecx
.code:00401DA2 retn
.code:00401DA2 sub_401D90 endp
The culprit instruction here is REPNE SCASB
. In the x86 architecture, REPNE
repeats the instruction immediately after it until either ECX
is zero or the ZF
flag in the EFLAGS
register is set. SCAS
scans an array and compares each array element to the value in EAX
, setting ZF
accordingly. The B
at the end means we are scanning a byte array. When combined, the REPNE
and SCASB
instructions scan a byte array (probably a string) until reaching the byte value stored in EAX
. Looking back at the debug info in the first screenshot, we can see that this value is zero. So essentially, this instruction is scanning a string until it reaches a null character – the end of the string. There is indeed a function in C that does this, and it’s also a function that I used in my program. That function is strlen()
. So I need to look more closely at the macro I wrote that uses that function:
// Read a single line from the file
#define read_line( buf, size, fp )\
if( buf ) free( buf );\
size = 64;\
buf = (char *) malloc( size );\
fgets( buf, size, fp );\
while( buf[strlen( buf ) - 1] != '\n' ){\
size <<= 1;\
buf = (char *) realloc( buf, size );\
fgets( buf + (size >> 1) - 1, size >> 1, fp );\
}\
size = 64
It turns out the condition in the while
loop was always evaluating to true
, causing the size
variable to be left-shifted until the 1 bit was pushed off the end, causing realloc()
to allocate a string with zero size. For some reason if you create a string with nothing in it (not even a null character) that string becomes a null pointer. So the problem here is that the last character in the string is never the newline.
Then it hit me: If the last line in a text file isn’t terminated by a newline (which is sometimes the case), this will result in the above condition always evaluating to true
, causing the size to overflow to zero, which in turn leads to a null pointer reference. So what I need to do is add a quick function to check if the end of the file has been reached. Here’s the function I wrote:
// Used in case the last line of the
// file is not newline-terminated
bool is_eof( FILE *fp ){
int c;
bool end;
end = ((c = fgetc( fp )) == EOF );
ungetc( c, fp );
return end;
}
And here is the modified version of my line-reading macro:
// Read a single line from the file
#define read_line( buf, size, fp )\
if( buf ) free( buf );\
size = 64;\
buf = (char *) malloc( size );\
fgets( buf, size, fp );\
while( buf[strlen( buf ) - 1] != '\n' && !is_eof( fp ) ){\
size <<= 1;\
buf = (char *) realloc( buf, size );\
fgets( buf + (size >> 1) - 1, size >> 1, fp );\
}\
size = 64
This part of the program now runs without any problems. Just a quick demonstration of the power of reverse engineering in the diagnosis of C programming bugs that can otherwise be very illusive. I think this is how I’m going to be doing all my debugging from now on. 🙂