Implementing the DoD Data Sanitation Algorithm in C

Most cyber security buffs are familiar with the Gutmann secure erase algorithm – a 35-pass wiping algorithm designed to completely obliterate sensitive information stored on a magnetic medium. The rationale behind this algorithm is that simply erasing a file and overwriting it with zeros is not enough to completely destroy its contents. Since the position and strength of the head vary slightly from one write to the next, any overwrite of a bit is not going to flip all magnetic domains. Traces of the original value of the bit are left behind, and these can be detected through magnetic force microscopy to determine the original contents of the file even after overwriting. Thus a 35-pass secure erase algorithm is used to ensure that the original data is completely destroyed.

What is not universally known even in the cyber security community is that the Gutmann algorithm is complete overkill, and is now considered obsolete even by its creator. Each of the 35 passes performed by the Gutmann algorithm writes a different wiping pattern designed to address a different type of drive or drive encoding. Since these drives are no longer used, there is no longer any point in using these wiping patterns. Furthermore, the Gutmann algorithm can seriously wear out the drive head if you’re erasing an entire 500 GB drive. Just think about how many writes the drive head has to perform doing 35 passes on a drive that is much bigger than the drives Gutmann was intended for. Yet the Gutmann algorithm is still a de facto industry standard for sanitation of magnetic drives, and in fact is the algorithm used by the popular srm program to this day.

DoD 5220.22 is a group of similar standards for data sanitation developed by the US Department of Defense. Although there are many variations, all of them involve three overwrites – first using a certain character (could be all ones, all zeros, a user-supplied character, or something else), second using the character’s complement (meaning what you get when you do a bitwise XOR), then using a pattern of random characters. Although current recommendations from NIST (see Appendix A) suggest using data sanitation methods specifically tailored to the cyber security needs of the organization, the DoD algorithm, or some variation of it, is usually sufficient for generic use cases.

I have written a program in C that implements a variation of the DoD algorithm. This is a variation that I came up with; you will not find it in any standard. I will show the code here, and then explain what all the parts do…


 1 /*******************************************
 2  * srm-dod v. 1.0                          *
 3  * Description: Implementation of the DoD  *
 4  * secure erase algorithm with a couple of *
 5  * modifications for extra obscurity       *
 6  * Author: Michael Warren                  *
 7  * License: Michael Warren FSL             *
 8  * Date: January 8 & 17 2019               *
 9  *******************************************/
10 
11 #include <stdio.h>
12 #include <stdlib.h>
13 #include <errno.h>
14 #include <string.h>
15 #include <time.h>
16 
17 int mainint argc, char **argv ){
18         FILE *fp;
19         forint i = 1; i < argc; i++ ){
20                 if( !(fp = fopen( argv[i], "r+" )) ){
21                         // Error handling section:
22                         char errorstring[strlen( argv[0] ) + strlen( argv[i] ) + 3];
23                         sprintf( errorstring, "%s%s", argv[0], argv[i] );
24                         perror( errorstring );
25                         exit( errno );
26                 }
27                 // Determine file length:
28                 fseek( fp, 0SEEK_END );
29                 long file_length = ftell( fp );
30                 rewind( fp );
31 
32                 // Proceed with DoD algorithm:
33                 srandtimeNULL ) );
34                 char c = rand() % 256;
35                 forint i = 0; i < file_length; i++ ){
36                         fputc( c, fp );
37                 }
38                 rewind( fp );
39                 c = ~c;
40                 forint i = 0; i < file_length; i++ ){
41                         fputc( c, fp );
42                 }
43                 rewind( fp );
44                 forint i = 0; i < file_length; i++ ){
45                         fputcrand() % 256, fp );
46                 }
47                 rewind( fp );
48 
49                 // Two extra zero passes to conceal the
50                 // fact that data was securely erased:
51                 forint i = 0; i < file_length; i++ ){
52                         fputc'\0', fp );
53                 }
54                 rewind( fp );
55                 forint i = 0; i < file_length; i++ ){
56                         fputc'\0', fp );
57                 }
58 
59                 // Truncate:
60                 fclose( fp );
61                 fp = fopen( argv[i], "w" );
62                 fclose( fp );
63 
64                 // Change filename and delete:
65                 char newname[TMP_MAX];
66                 forint i = 0; i < TMP_MAX; i++ ){
67                         newname[i] = rand() % 26 + 'A';
68                 }
69                 ifrename( argv[i], newname ) ){
70                         printf"%s: Error renaming file %s\n", argv[0], argv[i] );
71                         printf"Error code: %d\n", errno );
72                         exit( errno );
73                 }
74                 remove( newname );
75                 fclose( fp );
76         }
77         return 0;
78 }

The program starts by reading a list of filenames from the argument vector. For each filename, it opens the corresponding file using the "r+" flag, which means the file is open for both reading and writing. It then has an error-handling section (lines 22-25) in case there was an error in opening the file. This section simply builds an error string, prints it, and then exits with an exit status equal to the error number so users of the program can do the proper post-mortem debugging.

Lines 28-30 determine the length of the file to be erased, which is a very important step for obvious reasons. I used a trick I learned from reverse-engineering the wc program from the GNU CoreUtils (you can learn a lot about programming by looking at other people’s code) – jumping to the end of the file and then reading the current position.

Lines 33-57 are the actual meat of the program: the wiping of the file according to the DoD algorithm specifications. This algorithm actually performs five passes: First a randomly generated character and its complement followed by random characters, then two zero wipes for extra security. These additional overwrites are not so much to hide the data itself as they are to hide the fact that data has been wiped, which I believe is very useful in throwing people off your trail. Then there’s also the fact that you should always be suspicious of anything developed by the US government, as they have been known to deliberately push protocols with bad security so they can more effectively spy on people (WEP comes to mind). Knowing this, I added a couple extra passes for good measure. The number of passes is still only 1/7 that of the Gutmann algorithm, so there’s still significantly less strain on your hard drive components.

Lines 60-62 truncate the file. This part is fairly simple. All you have to do to truncate a file in C is open it with the "w" flag.

Lines 65-73 rename the file with a random string of capital letters (which is simpler than determining which characters will cause problems if included in a filename). Again there is an error-handling section in lines 70-72. Line 74 finally unlinks the file and line 75 closes the file stream.

I have tested each part of my DoD program individually and they all work. The thing about a programming problem such as this one is you can’t test the whole program all at once, because there’s no way of knowing if the wiping has been successful. I had a problem with the code where it was returning an error whenever I tried to overwrite anything. I realized I had used the wrong flag for fopen(). I wouldn’t have caught this error if I had simply tested the program by running it and seeing if it segfaulted or whatever. I would have effectively written an rm clone with no security. This illustrates the importance of testing each part of your program separately, which is especially crucial if you’re doing something related to cyber security.

I have also ported my secure erase program to Windows, and perhaps I will do another article in the future detailing this process. I have created a complete data sanitation system for Windows that allows me to recursively wipe entire directories using a DOS batch file that I wrote and the DoD program as a back-end, and that will make for a fairly juicy story/tutorial for the future of this blog. For now, I think I have said enough. So until next time, farewell.

4 thoughts on “Implementing the DoD Data Sanitation Algorithm in C

  1. hello man, i just read your code, and i can´t believe that i´m capable of coding something like that. start reading your post and i was like “yeah , DoD Data Sanitation Algorithm, i have no idea of the topic, or what i´m about to read but whatever, it´s written in C so maybe i can steal some knowledge from it”, and in fact i did. anyway my question is how do you do to make your code look like that im posting mine in a lame gray area, and looks boring compared to yours…its a paid feature? good wordpress skills?

    Liked by 1 person

    1. It’s actually not a paid feature. I’m using a free plan. What I did was I used Vim’s TOhtml plugin to convert the syntax highlighting for the code to an HTML format. I then did some editing to remove extra newlines and whatnot, then opened the HTML page in my browser and copied and pasted it into the HTML for my blog entry between two pre tags. The new blog editor doesn’t allow you to edit HTML directly, so you have to use the legacy editor and switch to HTML mode. You can find the legacy editor by clicking on the gear icon at the top of the page. Hope that helps. Feel free to ask me if you have any difficulty or need further clarification on any steps.

      Like

      1. wow, thanks for the reply, i will try it soon with a new post, i will ask you if i feel lost in the process, i come up with a semi solution, which is putting the tag , but i can't stand white bacground for code, it makes my eyes hurt.

        Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s