Extracting Text Files from an Archive – With a Hex Editor

I started taking a cyber-security course on Coursera, and one of the projects assigned involves working on a virtual machine, which is to be imported from a .ova archive file downloaded from the course website. Since I have plans to distribute my own homebrewed VMs as .ova files in the future, I wanted to figure out how to archive a VirtualBox VM myself. So I started doing some research on the format using Wikipedia. In the process I learned that these archives often contain lots of text files with important metadata, and one can glean a lot of interesting information about the VM from these files. One file in particular is an XML file toward the beginning of the archive, which I intended on looking at for myself.

Of course the most obvious thing to do is to simply open the archive in a text editor, a hex editor, or some other file viewing program and see what’s inside. But this approach leaves a lot to be desired. Extracting the text of the XML file into its own separate file would allow me to view it with full syntax highlighting in Vim, and parse the XML elements as desired. To do this, I would need to perform a binary copy, much like the dd program only with the ability to copy byte-by-byte rather than block-by-block. I would also need to specify the exact start and end location of the file. Fortunately I had just the tool to do this: a tool I cooked up a couple years ago in C that I call microdd. Its code looks like this:

 1 #include <stdio.h>
 2 #include <stdlib.h>
 3 #include <string.h>
 4 #include <unistd.h>
 5 #include <fcntl.h>
 6 #include <sys/stat.h>
 7 #include <errno.h>
 8 #include <limits.h>
10 int mainint argc, char **argv ){
11         char infile [FILENAME_MAX];
12         char outfile[FILENAME_MAX];
13         size_t bc = 0;    // Number of bytes to copy (if zero, copy entire file)
14         off_t istart = 0// starting position in input file
15         off_t ostart = 0// starting position in output file
18         forint i = 1; i < argc; i++ ){
19                 char *name = strtok( argv[i], "=" );
20                 char *value = strtokNULL"\0" );
21                 if( !strcmp( name, "if" ) ) strncpy( infile, value, strlen( value ) );
22                 else if( !strcmp( name, "of" ) ) strncpy( outfile, value, strlen( value ) );
23                 else if( !strcmp( name, "bc" ) ) bc = atoi( value );
24                 else if( !strcmp( name, "istart" ) ) istart = atoi( value );
25                 else if( !strcmp( name, "ostart" ) ) ostart = atoi( value );
26                 else fprintfstderr"%s%s: switch not recognized.\n", argv[0], name );
27         }
30         // This section is for determining when the end of a regular file has been reached.
31         // Processing for a given file will be skipped if the file was not specified.
32         struct stat istat, ostat;
33         if( infile[0] ) stat( infile, &istat );
34         if( errno == ENOENT ) fprintfstderr"%s%s: File not found.\n", argv[0], infile );
35         errno = 0;
36         if( outfile[0] ) stat( outfile, &ostat );
37         int savederrno = errno;
38         if( !bc ){
39                 if( infile[0] && S_ISREG( istat.st_mode ) ) bc = istat.st_size - istart;
40                 else if( outfile[0] && savederrno != ENOENT /* Check to see if outfile exists */ && S_ISREG( ostat.st_mode ) ) bc = ostat.st_size - ostart;
41         }
43         // And now for the actual copying...
44         int ip, op;
45         if( infile[0] ){
46                 ip = open( infile, O_RDONLY );
47                 lseek( ip, istart, SEEK_SET );
48         }
49         else ip = 0;
50         if( outfile[0] ){
51                 op = open( outfile, O_WRONLY | O_CREAT0644 );
52                 lseek( op, ostart, SEEK_SET );
53         }
54         else op = 1;
55         char c;
56         forint i = 0; i < (bc?bc:INT_MAX); i++ ){
57                 read( ip, &c, 1 );
58                 write( op, &c, 1 );
59         }
60         close( ip );
61         close( op );
62         return 0;
63 }

In case the synopsis of this program isn’t clear from the code, I will provide a screenshot of the man page for microdd (I redacted my two aliases for OPSEC purposes):


In the following two screenshots, I have the .ova archive open in a hex editor, specifically the View subprogram in Midnight Commander running in hex mode. In the screenshots I am locating the start and end bytes, respectively, of the XML file. They are highlighted in green. The position of the byte is shown at the top in hexadecimal format.



The command I run to extract the XML code is as follows:

$ microdd if=mooc-vm3.ova of=mooc-vm3.xml istart=512 bc=13625

I had to convert the hexadecimal numbers to decimal using a programmer’s calculator (I really just used the Windows calculator in programming mode). In the future I may modify this program so that it can take hex input directly.

Opening the XML file in Vim, we see that it has indeed been successfully extracted from the archive.


Now if only I knew how to read this shit…


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s