Obfuscated Unix Scripting with sed

Okay, so I had something else planned for this Friday night, but the project I’m working on that I wanted to share is taking a lot longer than expected, so I’m having to push the publication date back a few days while I get all the code working and create all the visuals in SVG. In the meantime, I thought I’d keep my readers entertained with a weird scripting language that I used to have endless fun with back in my early days of coding (what could be termed my “larval stage” according to the Jargon File). The language I’m talking about is of course sed.

sed is an ancient scripting language from the old days of Unix, back when it was still owned by AT&T Bell Labs. It was developed as a text processing language for automating small changes to text files to allow them to work with different programs and on different versions of Unix. Much like the other so-called “little languages” of Unix, the goal of sed was to do one thing and do one thing well, in keeping with the general Unix philosophy.

sed also resembles other Unix little languages in the terseness and lack of clarity in its syntax. To illustrate, here is a “Hello World” program in sed:


#!/usr/bin/sed -f

1{
        s/^.*$/Hello, World!/
        q
}

The reason this “Hello World” program looks so weird is because sed lacks the standard input and output functions common in most other programming languages. Instead, it uses the substitute command s, which performs a substitution on the current line of the input file (1) for the portion of the line that matches the regular expression (in this case the entire line). After changing the entire line to the text “Hello, World!”, the program quits using the q command, so that it doesn’t end up repeating the command for subsequent lines. In effect, this script does the exact same thing as a standard “Hello World” program in any normal language, but it does it in a way that is not even remotely intuitive.

sed is very effective and efficient for certain text processing tasks. For example, here is an implementation of the Unix head command in sed:


#!/usr/bin/sed -f

10q

It should be noted that sed by default echos every line of the input file to standard output, so there is no need to do this explicitly in the script, only to quit after echoing the 10th line.

The next script is my token nontrivial sed script for this article, nontrivial in the sense that it both 1. does something useful, and 2. actually took some effort to write. This script counts the number of non-comment, non-whitespace lines in a C source file, effectively counting the number of “true” lines of code in the file so you can get a more accurate idea of how complex the program is. It doesn’t output the number directly, but instead strips all comment lines and whitespace lines from the file, allowing the output to be piped into wc -l.


 1 #!/usr/bin/sed -f
 2 # Pipe into wc -l
 3 
 4 /^[     ]*\/\//d
 5 /^\r*\n*$/d
 6 /^[     ]*\/\*/{
 7         :start
 8         N
 9         /.*\*\/.*/bend
10         bstart
11         :end
12         d
13 }

This script demonstrates how it is possible to write a form of while loop in sed. There’s an unconditional branch command b that branches to a given label. Line numbers and pattern matching expressions serve as the conditionals of sed, so combining these with the unconditional branch is a way to create the equivalent of a while loop. The N command stands for Next, and its function is to expand the scope of a command to the next line. This code effectively adds lines to the pattern space until reaching the end of a multi-line comment, at which point the d command is called to delete all the selected lines.

The last script I will be sharing here is an “AI program” that plays 20 Questions with the user. Basically, you ask it a series of questions, and it will answer either “Yes.” or “No.” to each one, and after 20 questions you guess what the AI is thinking of and it tells you if you guessed right. Of course the program is a joke and there’s no way to win. 😛


#!/usr/bin/sed -f

s/^.*[AEIOUaeiou]?*\r*\n*$/Yes./
s/^.*[QWRTYPSDFGHJKLZXCVBNMqwrtypsdfghjklzxcvbnm]?*\r*\n*$/No./
21{
        s/^.*$/Wrong! You lose!/
        q
}

Here’s a sample session talking to this “AI” (I pared it down to four questions for brevity’s sake):


$ sed -f 20q.sed
Is it blue?
Yes.
Is it larger than a breadbox?
No.
Is it smaller than a breadbox?
No.
Does it go Wiki Wiki Wiki?
Yes.
It's a blue breadbox that goes Wiki Wiki Wiki!
Wrong! You lose!
$

Note: If you’re a man of culture, you may have recognized that this script is a reference to a certain geeky web comic from the early 2000’s. Whoever can guess the comic first wins the prize. 😛

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s