Obfuscated Unix Scripting with dc

So I realized that I liked the concept behind Obfuscated Unix Scripting with sed and I thought I’d make a series. There are a number of scripting languages native to Unix, and many of them qualify as being what you would call “obfuscated”. One such language is the classic Unix calculator known as dc. This program is a treasure trove of idiosyncrasies and comes with a lot of features that are virtually unique to that language.

The name dc is an acronym that stands for “Desk Calculator”. dc is one of the oldest programs in Unix/Linux, going all the way back to the days of Research Unix at AT&T. As its name suggests, it is a calculator that you run from the Unix command line, a predecessor to the newer calculator program known as bc (Basic Calculator).

dc is a reverse Polish notation (RPN) calculator, meaning that the operands come first and the operator goes at the end (e.g. 2 3 + as opposed to 2 + 3). This notation will be familiar to anyone who has used an old HP graphing calculator. There’s infix notation, which is what most people are used to. Then there’s Polish notation, where the operator comes first. This is most common in Lisp dialects. RPN is simply the reverse of Polish notation.

dc is RPN because everything it does is based on stack operations. Every command either pushes something onto the stack or pops the top value or values off the stack and does something with them. For example the sequence 2 3 + is actually three commands: 2 and 3 to push the values 2 and 3 onto the stack respectively and + to pop the two values off the stack, add them, and push the result.

Let’s see how this would work on the Unix command line. If we wanted to add two numbers with the bc calculator, we would use a command like this:

$ echo "2 + 3" | bc

The pipeline used for dc is fairly similar except that the command string is different:

$ echo "2 3 + p" | dc

Notice that there is an additional p at the end of the dc command string. This is a command to print the value on the top of the stack (there are two printing commands in dc – P pops the string off the stack when printing it while p leaves the stack unchanged). If we only do the addition, the result is we have a stack holding the sum of two numbers, but in order to actually do anything with this sum, we need an additional print command.

According to the Seventh Edition Unix Programmer’s Manual, bc is actually a frontend for dc. It takes mathematical expressions written in familiar infix notation and compiles them into code that is then interpreted by dc.

Let’s take a look at some more complex code written in the dc language. Here is a script for calculating the length of the hypotenuse of a triangle using the Pythagorean theorem:

#!/usr/bin/dc -f

5 k
[Adjacent? ] P
? d *
[Opposite? ] P
? d *
+ v p

In this script, the ? takes the two side lengths as input from the user and pushes them onto the stack. The d (double) command copies the top element and pushes the copy onto the stack. After this, the two identical numbers are popped and multiplied. At the end of the script the two results are added and then the v command takes the square root. The k command is used to set the precision for the result, and the brackets are used for grouping, either for a string to be printed or for a macro (in this case the former).

dc may not be readily intuitive, but it makes up for that in simplicity and elegance. To illustrate the elegance of dc and of stack-based programming in general, here is an equivalent program in C:

 1 #include <stdio.h>
 2 #include <math.h>
 4 int mainint argc, char **argv ){
 5         int precision = 5;
 6         int adj;
 7         int opp;
 8         float hyp;
 9         int buf;
10         printf"Adjacent? " );
11         scanf"%d", &adj );
12         printf"Opposite?" );
13         scanf"%d", &opp );
14         hyp = sqrt(adj * adj + opp * opp);
15         hyp *= precision;
16         buf = hyp;
17         hyp = (float) buf / precision;
18         printf"%d\n", hyp );
19         return 0;
20 }

As you can see, this program is much longer than the dc alternative. Hell, even if I wrote the script in Python or PHP, it would still be much larger than the equivalent dc code, and on top of that it would be much slower since it would be using a parse tree rather than a stack for calculations.

Now for another fun fact: dc is a free-form language, meaning whitespace is completely ignored. The only time you need to separate commands with whitespace is when you’re separating adjacent numbers from one another. If I wanted to, I could have written the above program as follows:

5k[Adjacent? ]P?d*[Opposite? ]P?d*+vp

Of course you wouldn’t want to actually write it like this as it’s practically unreadable. I choose to separate commands by spaces and group related commands together on the same line.

Because it’s meant to be a calculator first and foremost, dc provides several commands to enhance numerical computations. The following six commands are especially useful:

k – pop the stack and use the popped value as the precision
i – pop the stack and use it to set the input radix (base)
o – pop the stack and use it to set the output radix
K – push the current precision on the stack
I – push the current input radix on the stack
O – push the current output radix on the stack

These commands allow dc to use bases like octal, hex, binary, and in fact any conceivable base that there is a notation for. You could have dc print its output in trinary (base 3) for example.

In addition to this, dc is also an arbitrary precision calculator. It can work with numbers of any conceivable size and is not limited by a specific bit width. You can feed it numbers with hundreds or even thousands of digits and it will do the calculations just fine.

Here’s a script that illustrates the practical value of the base-setting commands:

#!/usr/bin/dc -f

[Input base? ] P
? i
[Output base? ] P
? o
[Number? ] P
? p

This script prompts the user for an input base, output base, and number to convert, and converts from the input base to the output base. No fancy calculations necessary, just get the number from the user and echo it back, and because we’ve set the two radixes to different values, the conversion happens automatically without even needing any additional steps. You now have a universal base conversion utility for Unix and it only took six lines of code to implement.

dc is fully Turing-complete, meaning it provides for conditionals, loops, and non-transient variables. You can store variables long-term using registers. There are 256 possible registers, and these are labeled using any one-byte character. This character can be anything – printable, whitespace, control character, or Extended ASCII character – as long as it’s only one byte in length.

For working with registers, dc has four load/store commands. These are as follows:

sR – pop the stack and store the value in register R
lR – load the value in register R onto the stack
SR – pop the main stack and store the value on the top of R‘s stack
LR – pop register R‘s stack and load the value into the main stack

Note that a register can be treated as either a single value or an auxiliary stack, depending on which load/store command is used. Also note that the l command doesn’t do anything to the value in the register, it simply copies it to the main stack.

The final two features that really make dc useful are macros and conditionals. A macro is any string consisting of valid dc commands. This string is pushed onto the stack, then popped into a register, where it can be executed at any time. Loops in dc are implemented using recursive macros. Conditionals pop the top two items on the main stack and compare them, then decide whether or not to execute a given macro. By combining recursive macros with conditionals, we can have loops that actually terminate.

Here’s a somewhat trivial example. This script takes a number N from the user and then prints the numbers 1 through N in hexadecimal.

#!/usr/bin/dc -f

16 o
[li li li 1 - si 0 !>m]
? si
0 0
=m st st f

This script makes use of conditionals. Each conditional in dc is based on a specific numerical comparison operator. They are as follows:

>R – expand R if the top element is greater than the lower element
!>R – expand R if the top element is less than or equal to the lower element
=R – expand R if the top two elements are equal
!=R – expand R if the top two elements are not equal

For a less trivial example of dc’s Turing-completeness, the following script computes the factorial of a number:

 1 #!/usr/bin/dc -f
 3 [ ln 1 - sn
 4   ln la * sa
 5   ln 1 !=f ] sf
 7 [Number? ] P
 8 ? sn
 9 ln sa
10 lf x
11 la p

This script uses two registers to keep track of the values we need: There is a number register n which initially stores the number whose factorial is being computed and then continuously decrements down to 1 with each recursive call to the macro. Then there is an accumulator register a which is multiplied by the number in the first register with each iteration. In order to use either of these numbers, they must first be loaded onto the main stack, which is why this script makes frequent use of load/store commands. The macro f is the actual algorithm that computes the factorial.

Incidentally, dc is particularly well-suited to computing factorials due to its arbitrary precision arithmetic. For any other language, a factorial will start overflowing once it gets to about 22!, but dc computes 1000! perfectly fine:

$ dc -f factorial.dc
Number? 1000

So yeah, that’s yet another neat scripting language from the depths of Unix history. Hope you liked my little exploration of this language. See ya!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s