Generalized Password Cracking, Part 1: A Description of the PCL Password Cracking Language

Disclaimer: The present series of posts contains information on how to exploit security vulnerabilities in passwords. It is intended for educational and research purposes only. Neither the password cracking tools described in this series nor any of the exploits developed with these tools are to be used for gaining unauthorized access to accounts or other computer resources in the real world. Don’t hack anyone else’s account, computer, device, or network without their express permission. I am not responsible for any illegal hacking activities you decide to engage in using this information, nor am I responsible for any damages that may result from such activities.


There are a few different algorithms for attacking password hashes: brute-force attacks, dictionary attacks, attacks consisting of a dictionary attack followed by a brute-force attack, attacks consisting of concatenations of multiple dictionary words, or of dictionary words with digits or special characters. What if there were a way to generalize all of these exploits, and use a scripting language to automate any conceivable password attack you could think of? In this project, I hope to provide such a scripting language.

The scripting language I have designed is called PCL or Pickle. PCL stands for Password Cracking Language, and Pickle is just a catchy play on that acronym. (I’m using the name PCL to refer to the language and Pickle to refer to the program that interprets the language, though they are both pronounced the same.) PCL is a special-purpose high-level programming language that incorporates ideas from several earlier Unix languages, including bash, awk, GNU Make, troff, and Vimscript. Its goal is to be as small and simple as possible (because then it’ll be easier to implement) while still providing the full functionality needed to automate just about any practical password attack you can think of.

My idea for PCL started off as a simple pattern-based password cracker that would combine different password attacks into a sort of hybrid attack based on a pattern given at the command line. I then thought, what if you could put multiple patterns in a batch file and have it search all of them sequentially? Then I thought, why not go a step further and create an entire scripting language for password cracking? So that’s what I am now doing.

Of course before I release my invention onto the world so people can use it to wreak havoc on social media, I first need an adversary to test the pattern attacks against. I searched DuckDuckGo for some leaked password hashes and found this database on the HaveIBeenPwned website. It contains hashes of all passwords from all documented data breaches, and you can download a full text file of all the password hashes in either SHA-1 or NTLM format (the file is several gigabytes in size). I figured that was good enough for now. I also found some wordlists for dictionary attacks on this page. Armed with these files as well as a text editor and compiler, I now had all the resources I needed to get started on my password cracking adventure.

I’ve designed my password cracking system so that it allows you to specify options like what hashing algorithm to use and whether to use salts. These values are held in environment variables that are maintained by the interpreter. There are also options to account for password databases that have multiple fields (for example the database I downloaded has two fields – one for the password hash and one for the total number of times that password appeared in the data breaches). You can specify characters for a field separator and record separator and also specify what field the hashes are in. The field separator string given by the user is directly copied into the second argument of strtok() used in the C code.

Here’s a synopsis of pcl, which is the command for running Pickle:


pcl [-f script] [-p pattern [wordlists]] [-r range] [-s] [-h hash] [-o
output-file] [--sep=char] [--field=number] input-file

-f: Specify a batch file or script to execute commands from
-h: Specify the hashing algorithm to use, default is "md5"
    Possible values: md4, md5, sha0, sha1, sha256, ntlm, bcrypt, desx, bf, mod
-o: Specify a file to write the cracked passwords to
-p: Specify a pattern to search with zero or more wordlists
-q: Quiet mode, don't print any output unless instructed to by a print or
    regdump statement; by default Pickle will print all matches it finds to
    standard output
-r: Specify a range of records in the input file to crack
-s: Use salts
-v: Verbose mode; show continuous progress on searches
-vv: Verbose mode, and also pause after finding each match
--color: Color-code standard output messages
--field=num: Specify which field contains the hash if the input file has
             multiple fields
--sep=chars: Specify field separator characters for an input file with multiple
             fields

PCL uses a syntax for search patterns that is loosely modeled on the format strings used by C and awk, using the same slash delimiters that awk uses for pattern matching. These patterns can either be passed to Pickle at the command line, or fed to it via a batch file or script. An example pattern would be something like /%f%f%d/, which searches all passwords consisting of two dictionary words and then a digit. So a password like “spacelightning1” would be cracked using this attack. If you use multiple patterns one after another, you can crack large numbers of passwords. In fact given the right patterns to search, Pickle can theoretically crack any password with sufficiently low entropy in a reasonable amount of time.

Here’s a complete listing of all the metacharacters and escape sequences used in PCL:


Metacharacters for pattern strings:

%%	Literal % character
%?	Wildcard, can stand for any single character
%a	Alphabetic character
%A	Non-alphabetic character
%c	Control character
%C	Non-control character (control characters are skipped by default)
%d	Digit
%D	Non-digit character
%F	Dictionary file, capitalize first letters
%f	Dictionary file, don't capitalize first letters
%N	Non-alphabetical character
%n	Non-alphabetical, non-whitespace character
%P	Sorted precomputed table (must appear by itself)
%p	Nonsorted precomputed table (must appear by itself)
%s	Special character
%S	Non-special character
%U	Uppercase alphabetic character
%u	Lowercase alphabetic character
%w	Whitespace character
%W	Non-whitespace character


Escape sequences for pattern strings:

\"	Literal quote character
\/	Literal slash character
\\	Literal backslash character
\000	Octal specification for a byte
\n	Newline
\r	Carriage return
\t	Horizontal tab
\uXXXX	Hex specification for a UTF-8 encoded character
\xXX	Hex specification for a byte

Because I want to simplify the interpreter’s implementation as much as possible, I have not provided facilities for arbitrary user-defined variables. There are three types of variables in PCL and each one is referenced using a distinct syntax: There are parameter variables, which are immutable and are denoted by nonnegative integers indicating where they appear in the parameter list. These can be parameters of either the script itself or a macro within the script. In addition to this there is a small predefined set of registers, inspired by a similar feature in the troff text formatter. Some of these can be changed by the user while others are used by Pickle during the password cracking procedure to log information for statistical purposes and are not user-modifiable. Users cannot declare their own registers. Registers are denoted by parenthesis surrounding the register name. The third kind of variable is counters. These are positive integers that only exist within the scope of a for loop and increment over a certain range of values specified in the for statement.

A complete list of the registers used by PCL is as follows:


Name:		Type:		Default:	Description:
(salt)		Boolean		false		Use salts
(hash)		Enum		md5		Hashing algorithm to use
(fs)		String		"\t"		Field separator characters
(rs)		String		"\n"		Record separator character
(nf)		Number		1		Field containing hashes
(matches)	Number array	{}		Number of matches for each search
(times)		Number array	{}		Time in seconds for each search
(patterns)	String array	{}		Pattern used for each search

For simplicity’s sake, I have decided to enforce a rule that each line of a script corresponds to a single command. There are no semicolons or curly brackets in PCL, so you can’t put multiple commands on the same line. You can however break a single command into multiple lines by putting a backslash at the end of each non-terminating line. All consecutive backslashed lines will be joined together by the tokenizer when it builds the token list.

There are two kinds of commands in PCL: pattern commands and keyword commands. Pattern commands are what do the actual password cracking. They start with a search pattern delimited by slashes, which is followed by an optional list of lookup table or wordlist files to use and an optional range of indices within the password database to run the pattern attack against (by default Pickle will attempt to crack every password hash in the database). Keyword commands start with one of a small number of reserved keywords and may include additional code after those tokens. Possible keyword commands used by PCL are the following:


reg		Set a register value
for		Begin a for loop
endfor		End a for loop
while		Begin a while loop
endwhile	End a while loop
if		Begin an if statement
else		Begin the else clause of an if statement
endif		End an if statement
def		Begin a macro definition
endef		End a macro definition
expand		Expand a macro
goto		Go to a label within the program
print		Print a string or value
regdump		Dump all registers for debugging purposes
exit		Exit the program

Control flow constructs include if/else clauses, for loops, while loops, and goto statements. PCL’s control flow constructs are modeled on the control flow constructs in Vimscript. In trying to think of some sample scripts for practical password attacks, I couldn’t find much use for while and goto. They’re mostly there to make the language Turing-complete.

The interpreter takes a modular approach to parsing a PCL script, dividing the process into five discrete steps: 1. Divide the program into tokens using a pushdown transducer (the tokenizer); 2. Use a collection of finite acceptors (lexers) along with a single pushdown stack to validate each token and determine its token type; 3. Use a Turing machine whose input alphabet is the collection of token types to validate the phrase structure syntax of the program token-by-token (as opposed to character-by-character, which would be a lot more tedious as well as unnecessary); 4. Expand all macros and macro parameters; and 5. Reorganize the tokens into a tree structure which can then be traversed by the interpreter when running the program.

PCL’s system for variables, macros, and functions is based heavily on languages like bash and GNU Make. Macros can have parameters, which within the body of a macro definition are denoted by double dollar signs rather than single dollar signs. Macros are referenced in the program using the expand command. This is a command to the preprocessor to replace that code with the code in the body of the macro. Macros are never included in the syntax tree used during execution.

PCL has a small set of predefined functions. Functions in PCL are modeled after functions in Make, and in fact have the exact same syntax. They are essentially variables with arguments, and they can be used any place a variable can be used. All functions in PCL have no side effects; they simply take parameters and produce a value based on those parameters. Internally they are translated directly into C functions by the interpreter. In fact some of them are exact analogues of functions from libc. The current specification for PCL does not allow for user-defined functions (again for simplicity), although I may consider adding this feature in a later version.

The functions defined by the current PCL specification are as follows:


PCL function:	C prototype:				Description:
$(count)	int pcl_count( void * );		Array length
$(exec)		char *pcl_exec( char * );		Output of shell command
$(max)		int pcl_max( int * )			Maximum value in numerical array
$(mean)		int pcl_mean( int * )			Mean value in numeric array
$(median)	int pcl_median( int * )			Median value in numeric array
$(min)		int pcl_min( int * )			Minimum value in numerical array
$(mode)		int pcl_mode( int * )			Mode of numeric array
$(strlen)	int pcl_strlen( char * );		String length
$(strtok)	char *pcl_strtok( char *, char *, int )	Tokenize first argument with second argument as a separator and return the token given by the third argument
$(substr)	char *pcl_substr( char *, int, int )	Substring starting at index given by second argument with length given by third argument
$(sum)		int sum( int * )			Sum of values in numeric array

Now that I’ve described my password cracking language a bit and clarified how it’s supposed to work, I think this is a good place to post the complete language specification for PCL, in EBNF notation:


program> ::= <statement><br>,{<statement><br>}
<br> ::= "\n",{"\n"} | "\r\n",{"\r\n"}
<statement> ::= <pattern-command> | <print-statement> <reg-statement> | <if-statement> | <for-loop> | <while-loop> | <macro-definition> | <macro-expansion> | <registry-dump> | <exit-statement> | <goto-statement> | <label>
<reg-statement> ::= "reg",<register>,<value>
<print-statement> ::= "print",<string-literal>
<if-statement> ::= "if",<condition><br>,<program>,["else"<br>,<program>],"endif"
<while-loop> ::= "while",<condition><br>,<program>,"endwhile"
<for-loop> ::= "for",<counter>,"in",<register> | <range-list><br>,<program>,"endfor"
<macro-definition> ::= "def",<token><br>,<program>,"endef"
<macro-expansion> ::= "expand",<token>,{<value>}
<registry-dump> ::= "regdump"
<exit-statement> ::= "exit",[<number-literal>]
<goto-statement> ::= "goto",<token>
<label> ::= ":"<token>
<condition> ::= ["not"],<condition>,{<boolean-operator>,["not"],<condition>} | <comparison> | "("<condition>")"
<boolean-operator> ::= "and" | "or" | "nand" | "nor" | "xor" | "xnor"
<comparison> ::= "$"<variable>,<comparison-operator>,<value>
<variable> ::= <register> | <counter> | <parameter>
<comparison-operator> ::= "=" | "<>" | "<" | ">" | "<=" | ">="
<value> ::= <number-literal> | <string-literal> | <boolean-literal> | <enum-literal> | <variable-value>
<counter> ::= <letter>
<parameter> ::= <number-literal> | "$"<number-literal> | "("<number-literal>")" | "$("<number-literal>")"
<register> ::= "("<register-name>")" | "("<register-array>")"["["<number>"]"]
<register-name> ::= "hash" | "salt" | "fs" | "rs" | "nf" | "matches" | "times" | "patterns"
<register-array> ::= "matches" | "times" | "patterns"
<number-literal> ::= <digit>{<digit>}
<string-literal> ::= "\""{<character> | <variable-value>}"\""
<boolean-literal> ::= "true" | "false"
<enum-literal> ::= <token>
<variable-value> ::= "$"<variable-or-function>
<variable-or-function> ::= <variable> | <function-call>
<token> ::= <letter>{<letter> | <digit>}
<character> ::= <letter> | <digit> | <special> | <escape>
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
<letter> ::= "A"..."Z" | "a"..."z"
<special> ::= "~" | "!" | "@" | etc.
<escape> ::= "\\n" | "\\r" | "\\t" | "\\\"" | "\\/" | "\\\\" | "\\#" | "\\"<octal><octal><octal> | "\\x"<hex><hex> | "\\u"<hex><hex><hex><hex>
<octal> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7"
<hex> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" | "a" | "b" | "c" | "d" | "e" | "f"
<function-call> ::= "(",<function-name>,<value>,{<value>},")"
<function-name> ::= "count" | "exec" | "max" | "mean" | "median" | "min" | "mode" | "strlen" | "strtok" | "substr" | "sum"
<range-list> ::= <range-item>{","<range-item>}
<range-item> ::= <number-literal> | <range> | <variable-value> | <hash>
<hash> ::= "{"{<hex>}"}"
<range> ::= <number-literal>"-"<number-literal>
<pattern-command> ::= <pattern-string>,[<range-list>],{<filename>}
<filename> ::= <string-literal>
<pattern-string> ::= "/"<placeholder>{<placeholder>}"/"
<placeholder> ::= <character> | <metacharacter> | "$"<variable>
<metacharacter> ::= "%"[<number-value>]<meta-character>
<number-value> ::= <number-literal> | <counter>
<meta-character> ::= "%" | "?" | "A" | "a" | "C" | "c" | "D" | "d" | "F" | "f" | "N" | "n" | "P" | "p" | "S" | "s" | "U" | "u" | "W" | "w"

Of course it would be silly for me to design a new programming language without being sure that its features are actually useful in the real world. So as a proof-of-concept demonstration, and to clarify how this language should be used in practice, I have typed up three example PCL scripts. These scripts automate pattern searches that attack the first 100 hashes in the password database and exit as soon as a certain number of passwords have been successfully cracked. Often it’s preferable to do an incomplete attack and just crack some of the passwords and call it a day (especially considering that there may be very high-entropy passwords that are literally impossible to crack) rather than to tie up system resources for days or weeks on end trying to crack every single password in the list.

The following script automates a brute-force attack that searches all passwords up to 8 characters in length (8 characters is about the entropy limit for a password that can be feasibly cracked using this method):


 1 #!/usr/local/bin/pcl -f
 2 
 3 reg (hash) sha1
 4 reg (fs) ":"
 5 reg (nf) 1
 6 reg (salt) true
 7 
 8 for i in 1-8
 9   /%$iC/ 1-100
10   if $(sum $(matches)) = 100
11     exit 0
12   endif
13 endfor

This next script does a dictionary search before doing a brute-force search. This is another standard method of cracking passwords, as it can find a lot of weak passwords very quickly before going into the more thorough but also more time-consuming brute-force procedure:


 1 #!/usr/local/bin/pcl -f
 2 
 3 reg (hash) sha1
 4 reg (fs) ":"
 5 reg (nf) 1
 6 reg (salt) true
 7 
 8 def exit_when_done
 9   if $(sum $(matches)) >= $$1
10   exit 0
11 endef
12 
13 /%f/ "dictionary.txt" 1-100
14 expand exit_when_done 50
15 
16 for i in 1-8
17   /%$iC/ 1-100
18   expand exit_when_done 50
19 endfor

The last script implements a somewhat novel attack that’s supposed to work for modern password policies, which often require at least one capital letter, lowercase letter, digit, and symbol for each password. I honestly have no idea how long this scan would take, but it’s bound to crack some passwords that standard methods like brute-force and dictionary attacks won’t:


 1 #!/usr/local/bin/pcl -f
 2 
 3 reg (hash) sha1
 4 reg (fs) ":"
 5 reg (nf) 1
 6 reg (salt) true
 7 
 8 for i in "%F","%f"
 9   for j in "%d","%s"
10     for k in "%d","%s"
11       /$i$j$k/ "dictionary.txt" 1-100
12       if $(sum $(matches)) >= 20
13         exit 0
14       endif
15     endfor
16   endfor
17 endfor

Obviously this attack won’t crack every password in the database, but it’ll definitely crack some of them, as people tend to choose the weakest passwords they can get away with. Under one of these modern password policies, a single capitalized word followed by one digit and one symbol should be a very common password choice. A dictionary attack using a wordlist that includes 1337-speak versions of all dictionary words is also likely to crack a lot of passwords selected under these policies.

Note that all of these scripts have salts turned on. This tells Pickle to look for a salt next to every password hash and salt the password generated by the current iteration of a pattern before hashing it. Salts are the main reason we can’t just use pre-computed tables for everything (although that is an option that PCL provides). A pre-computed table lookup is by far the fastest way to crack passwords, but such an attack is easily foiled using just some basic hash-scrambling measures. In most practical examples where we’re cracking leaked password hashes, we will have to deal with salts, which is basically the entire reason a tool like Pickle is necessary.

So that’s basically the PCL password cracking language in a nutshell. The next step would be to implement some traditional password attacks, including brute-force search, dictionary search, and pre-computed table lookup, so I can combine elements of these attacks in the pattern searches used by the Pickle tool. Keep in mind I haven’t implemented any interpreter or password cracker functionality at the moment. All I have done is designed a language specification as well as mentally worked out most of the important implementation details. Anyway, that’s all for now. See you next time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s