Introduction to GPU Password Cracking: Owning the LinkedIn Password Dump

This blog was written by Martin Bos, Senior Principal Security Consultant – TrustedSec

Unless you’ve been living under a rock for the past few months you have probably heard about the dump from the 2012 LinkedIn hack being released.  TrustedSec was able to acquire a copy of the list and use it for research purposes. Our friends over at Korelogic have already posted an excellent analysis of the list showing the most common words, patterns, and other statistics so we are not going to rehash that information. The LinkedIn list offers an opportunity for us at TrustedSec to share our password recovery methodology step by step and show how we attack large password breach lists. The passwords gained from these types of breaches are very valuable to us on penetration tests because people often reuse passwords across work and social media. Our hope is that by now everyone on this list has reset their password and is no longer using the password they used for LinkedIn in 2012, however since we cannot be sure, we have no plans to share the list so please don’t ask.
The list we received contained 167,370,909 entries in a SHA1 unsalted hash format. The list contains a large number of duplicate hashes which is valuable for statistical analysis but we don’t need that to go over cracking methodology. After removing all of the duplicates and blank lines we were left with 117,205,871 unique hashes to crack.

At TrustedSec, we have a large password cracking server that was provided to us by Jeremi Gosney and the fine folks over at Sagitta. It is more than capable of loading up the whole 117 million hashes at one time, however because not everyone has a box of this size I decided to split up the list into more manageable chunks. You can decide what is a manageable number of lines based on your hardware specifications.

I used 10 million lines to split up my list but you may want to use less.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat/HASHFILES# split -dl 10000000 --additional-suffix=.txt linkedin.hash link
root@kracker:~/LINKEDIN_WORKING/cudaHashcat/HASHFILES# ls
link00.txt  link00.txt.new  link01.txt  link02.txt  link03.txt  link04.txt  link05.txt  link06.txt  link07.txt  link08.txt  link09.txt  link10.txt  link11.txt  linkedin.hash

This gave me 11 hash file lists to work with. My plan was to run all of the basic password attacks against each of these lists and get all of the low hanging fruit passwords out of the way and then recombine the hash lists and start the more advanced password recovery tactics. This post is also meant to be a tutorial on how to use Cudahashcat so I will try to showcase each of the attack modes even though it may not be totally necessary.

The first thing to identify in any hash list is the type of hash. We know the list is in unstalted SHA1 format so we need to find the mode for that type of hash. Executing Cudahashcat with the –h flag will show us all of our options.

Hash types:

900 = MD4
0 = MD5
5100 = Half MD5
100 = SHA1
10800 = SHA-384
1400 = SHA-256
1700 = SHA-512
5000 = SHA-3(Keccak)
10100 = SipHash
6000 = RipeMD160
6100 = Whirlpool
6900 = GOST R 34.11-94
11700 = GOST R 34.11-2012 (Streebog) 256-bit
11800 = GOST R 34.11-2012 (Streebog) 512-bit

You can see here that SHA1 is mode 100 so that is what we are going to be using for the entire exercise.

The next thing we want to determine is the attack mode we want to start up with.

Attack modes:
0 = Straight
1 = Combination
3 = Brute-force
6 = Hybrid dict + mask
7 = Hybrid mask + dict

To begin with, I always do a bruteforce of 1-6 characters to get the party started. Cudahashcat uses what is referred to as password masks to assign a variable to represent each character set.

?l = abcdefghijklmnopqrstuvwxyz
?u = ABCDEFGHIJKLMNOPQRSTUVWXYZ
?d = 0123456789
?s =  !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
?a = ?l?u?d?s
?b = 0x00 - 0xff

This will be particularly useful later on when we want to tailor our attack a little more but to get started we are going to use the ?a mask to represent all of the characters.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/link01.txt -o linkedin.cracked -i -a 3  ?a?a?a?a?a?a

The flags I am using for this attack are as follows:

-m 100 = the hash mode. Remember it was SHA1
–remove = This removes the hash from the file once it has been recovered.
-o = This is the output file for your cracked hashes.
-I – This signifies increment mode so the cracking will start at 1 and move up in increments to the number of ?a you defined on the command line.
-a 3 = Our attack mode. We are using brute force for this example.
?a?a?a?a?a?a = This is the number we want to brute force up to. In this example, it is 6 but you can use 5,7,8 or whatever you want.

I will use this attack against each of my 11 word lists. You could easily write a quick for loop in bash to iterate through each of the files.

#/bin/bash
for file in *.txt
do
./cudaHashcat64.bin -m 100 --remove HASHFILES/$file -o linkedin.cracked -i -a 3  ?a?a?a?a?a?a
done

Next, let’s take a look at a simple wordlist attack. This is attack mode 0, however since it is the default attack we don’t have to specify an –a flag on the command line. The basis of this attack is very simple because it simply goes through a wordlist and does a comparison and sees if a password is recovered. Obviously, this attack is only as good as your wordlist collection. At the end of the article, I will link to some resources to download some wordlists to get you started. I will begin by checking the Rockyou.txt wordlist.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/link01.txt -o linkedin.cracked  /wordlists/rockyou.txt

One really cool thing about Cudahashcat is that you can specify an entire directory of wordlists with a * so instead of having one giant list, you can have multiple smaller lists.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/link01.txt -o linkedin.cracked  /wordlists/*

Once again you can just use a quick for loop to iterate through all of your 11 hash files.

#/bin/bash
for file in *.txt
do
./cudaHashcat64.bin -m 100 --remove HASHFILES/$file -o linkedin.cracked /wordlists/*
done

Once you have burned through all of your wordlists, it’s time to add some rules. Cudahashcat has rule files that have one command per line. For a thorough breakdown of the rule-based attack, you can see the Hashcat Wiki. For the most part, all of the effective rules have been written already and are included with Cudahashcat. In order to use a rule file, we specify –r on the command line and the path to the rule file.

./cudaHashcat64.bin -m 100 --remove HASHFILES/link01.txt -o linkedin.cracked -r rules/best64.rule /wordlists/*

The next attack we want to look at is the Hybrid Attack. This is a combination of the dictionary attack and the mask attack.

6 = Hybrid dict + mask
7 = Hybrid mask + dict

We choose either –a 6 or –a 7 depending on what we want to do.  The 6 attack appends the mask we define and the 7 attack prepends the mask.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/link01.txt -o linkedin.cracked -i -a 6  /wordlists/rockyou.txt  ?a?a?a?a

We can once again use the increment flag to start at 1 character space and move up to 4 character spaces which we define with ?a?a?a?a. You can see below in the output that we have a wordlist as the left input and a mask as the right input.

Session.Name...: cudaHashcat
Status.........: Running
Input.Left.....: File (/wordlists/rockyou.txt)
Input.Right....: Mask (?a) [1]
Hash.Target....: File (HASHFILES/link10.txt)
Hash.Type......: SHA1
Time.Started...: Wed Jun 15 08:08:37 2016 (8 secs)
Time.Estimated.: 0 secs
Speed.GPU.#1...: 36307.1 kH/s
Speed.GPU.#2...: 31949.1 kH/s
Speed.GPU.#3...: 20723.3 kH/s
Speed.GPU.#4...: 22585.2 kH/s
Speed.GPU.#5...: 24069.9 kH/s
Speed.GPU.#6...: 23748.2 kH/s
Speed.GPU.#7...: 27342.7 kH/s
Speed.GPU.#8...: 22934.5 kH/s
Speed.GPU.#*...:   209.7 MH/s
Recovered......: 82753/2924027 (2.83%) Digests, 0/1 (0.00%) Salts
Recovered/Time.: CUR:N/A,N/A,N/A AVG:588157.69,35289464.00,846947072.00 (Min,Hour,Day)
Progress.......: 1362613120/1362613120 (100.00%)
Rejected.......: 151905/1362613120 (0.01%)

Likewise, we can run it the other direction as well and see what shakes out.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/link01.txt -o linkedin.cracked -i -a 7 ?a?a?a?a  /wordlists/rockyou.txt

You can see in the below output that this type of attack is very effective for catching passwords with random special characters and things in the middle of the words.

f7d5b2c833ef067bf3d5764e3dd28c1b97c94385:style&zo
cab5a9547e82cb7bf7c86f32df74f7da69d527c0:16633p/s
2d56ee1f63a4bb297bc79f76367c10acef0b1155:ling71.t
da6e3d2b461a9710f7b7d505e0346438629ad286:mc333x7s
f76d5db19a723711295b7ceb1942fcc852a0d3eb:ayudame*+8

At this point, we have pretty much exhausted all of the easy stuff. Let’s check and see how many we have cracked.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# wc -l linkedin.cracked
61033579 linkedin.cracked

Looks like we are at about 50% cracked. This is pretty normal for this stage of the attack. At this point, it’s usually prudent to do a little analysis of the list and see what the most common patterns are. We can recycle these patterns back into Cudahashcat and hopefully crack some more passwords.

There are several tools out there to analyze wordlists but the one I like the best is called PACK and is available at The Sprawl.

The first thing we need to do is remove just the passwords from our Cudahashcat output file.

cut -d : -f 2 linkedin.cracked > linkedin.analyze

The first thing I like to do is use the statsgen.py tool to get the top 20 masks.

[*] Advanced Masks:
[+]          ?l?l?l?l?l?l?l?l: 05% (1859090)
[+]              ?l?l?l?l?l?l: 04% (1411462)
[+]          ?l?l?l?l?l?l?d?d: 04% (1365203)
[+]            ?l?l?l?l?l?l?l: 04% (1344455)
[+]          ?d?d?d?d?d?d?d?d: 04% (1340848)
[+]        ?l?l?l?l?l?l?l?l?l: 03% (1087191)
[+]      ?d?d?d?d?d?d?d?d?d?d: 03% (1036375)
[+]          ?l?l?l?l?d?d?d?d: 03% (987014)
[+]            ?d?d?d?d?d?d?d: 02% (776811)
[+]      ?l?l?l?l?l?l?l?l?l?l: 02% (769990)
[+]        ?l?l?l?l?l?l?l?d?d: 02% (730153)
[+]              ?d?d?d?d?d?d: 02% (686769)
[+]        ?l?l?l?l?l?d?d?d?d: 02% (671888)
[+]            ?l?l?l?l?l?d?d: 01% (636132)
[+]            ?l?l?l?d?d?d?d: 01% (572769)
[+]          ?l?l?l?l?l?d?d?d: 01% (546086)
[+]          ?l?l?l?l?l?l?l?d: 01% (542754)
[+]      ?l?l?l?l?l?l?d?d?d?d: 01% (535142)
[+]      ?l?l?l?l?l?l?l?l?d?d: 01% (505372)
[+]              ?l?l?d?d?d?d: 01% (464088)
[+]              ?l?l?l?l?d?d: 01% (452142)
[+]    ?l?l?l?l?l?l?l?l?l?l?l: 01% (431507)
[+]        ?l?l?l?l?l?l?d?d?d: 01% (418555)
[+]            ?l?l?l?l?l?l?d: 01% (349865)
[+]        ?l?l?l?l?l?l?l?l?d: 01% (345546)

We add the list of masks to a file and give it the .hcmask extension. We can now use the mask file with Cudahashcat and the toll will iterate through each of the masks one by one.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/link01.txt -o linkedin.cracked -a 3 linkedin.hcmask

Another useful attack we can use is to get the most common masks from other password breaches or wordlists that we have already cracked. In this example, we will generate a list of masks from the rockyou.txt wordlist.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat/tools/PACK# python statsgen.py /wordlists/rockyou.txt -o rockyou.masks

Using the mask file we outputted from the statsgen tool, and we can tailor our attack and use the maskgen tool to optimize our mask file based on Occurrence, complexity and optindex and target cracking time. I will make one mask file for each mode. The target time is defined in seconds so we decide how long we want to run our attack for and the tool makes the correct amount of masks to fit into the target time frame.

python maskgen.py rockyou.masks --targettime 3600 --optindex -q -o rockyou-optindex.hcmask
python maskgen.py rockyou.masks --targettime 3600 --complexity -q -o rockyou-complexity.hcmask
python maskgen.py rockyou.masks --targettime 3600 --occurrence -q -o rockyou- occurrence.hcmask

Now that I have three mask files I can combine them into one file and remove any duplicates. You also have to be sure to remove the occurrence number at the end of the mask line. Let’s also remove any masks that are 6 characters or shorter since we already did a brute force for anything 6 characters or smaller.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat/tools/PACK# cat rockyou-optindex.hcmask rockyou- complexity.hcmask rockyou- occurrence.hcmask | cut –d , -f 1 | sed -r '/^.{,12}$/d' | sort -u > rockyou.hcmask

Then I use the newly created mask files to attack the LinkedIn list again.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/link01.txt -o linkedin.cracked -a 3 rockyou.hcmask

At this point, we have probably cracked enough hashes that we can combine the remaining hashes in the 11 lists into one.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat/HASHFILES# cat *.txt | sort -u > linkedin.remaining

Now we can run our newly created rockyou.hcmask file against the remaining hashes in the list.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/linkedin.remaining -o linkedin.cracked  -a 3 rockyou.masks

At this point I have one more type of attack I would like to show. This is called the combinator attack. In the same style as the hybrid attack used a dictionary on one side and a mask on the other side, the combinator attack uses a dictionary on both sides. This is a very effective attack for recovering long passwords that may have otherwise been missed.

Here is an example taken from the hashcat wiki

If our dictionary contains the words:

pass
12345
omg
Test

Hashcat creates the following password candidates:

passpass
pass12345
passomg
passTest
12345pass
1234512345
12345omg
12345Test
omgpass
omg12345
omgomg
omgTest
Testpass
Test12345
Testomg
TestTest

Additionally, we can also add a single rule to either side of the dictionary.

-j = Single rule applied to each word on the left dictionary
-k = Single rule applied to each word on the right dictionary

If we wanted to add a hyphen in between the word and a ! at the end we would use the following two rules.

-j ‘$-‘
-k ‘$!’

Which would give us this:

Pass-pass!
Pass-12345!
Pass-omg!
Pass-Test!
12345-pass!
12345-12345!
12345-omg!
12345-Test!
omg-pass!
omg-12345!
omg-omg!
omg-Test!
Test-pass!
Test-12345!
Test-omg!
Test-Test!

This attack is the most effective with smaller word lists as it can take an extremely long time. For this example, I am just going to use a simple English wordlist with 394748 words in it. You can feel free to mix it up and use different dictionaries on each side.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/linkedin.remaining -o linkedin.cracked  -a 1 /wordlists/english.txt /wordlists/english.txt

After I let that run I will do it a few more times with various rules. Here is an example with the rules we mentioned above.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/linkedin.remaining -o linkedin.cracked  -j '$-'  -k '$!'
-a 1 /wordlists/english.txt /wordlists/english.txt

The last part of this attack is using the expander to create a bigger dictionary out of our already cracked list. The expander tool can be found inside the hashcat-utils download.

Here is an example from the Hashcat wiki on how the expander works.

$ echo pass1 | ./expander.bin  | sort -u
1
1p
1pas
1pass
a
as
ass
ass1
ass1p
p
pa
pas
pass
pass1
s
s1
s1p
s1pa
s1pas
ss
ss1
ss1p
ss1pa

Let’s try expanding our passwords we have already cracked. Using the words you have already cracked from the list can improve your chances of cracking more on the list because many times companies’ passwords will follow patterns and themes. Normally we would be using much smaller dictionaries for this because it’s not often we have a hash list with 117 million hashes in it. This attack may take too long and be unrealistic but I just wanted to show an example. It works exceptionally well with smaller wordlists.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# cat linkedin.cracked | cut -d : -f 2 > linkedin.dic
root@kracker:~/LINKEDIN_WORKING/cudaHashcat/tools/hashcat-utils# ./expander.bin < ../../linkedin.dic | sort -u > ../../linkedin.exp

Then we use the expanded dictionary on both sides of the combinatory attack.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/linkedin.remaining -o linkedin.cracked  -a 1 linkedin.exp linkedin.exp

Another effective attack method I use is to combine all the rule files into one big file and then use that with a dictionary file made from my already cracked hashes. This is very effective because it simply adds more patterns to already existing patterns. For this attack I will use a set of rules created by the folks over at Korelogic for the Crack Me if You Can Contest. They are available here.

root@kracker:~/LINKEDIN_WORKING/cudaHashcat# cat linkedin.cracked | cut -d : -f 2 | sort -u > linkedin.dic
root@kracker:~/LINKEDIN_WORKING/cudaHashcat# cat rules/KoreLogic/*.rules | sort -u > ../KoreLogicBigRule.rule
root@kracker:~/LINKEDIN_WORKING/cudaHashcat# ./cudaHashcat64.bin -m 100 --remove HASHFILES/linkedin.remaining -o linkedin.cracked -r rules/KoreLogicBigRule.rule linkedin.dic

This, of course, can take a very long time so this should be one of your last ditch efforts to recover a password.

At this point, I have cracked about 85% of the LinkedIn list and I am pretty happy with the results. I will probably continue to modify these attacks we talked about in this article with different masks and wordlists and try to get more passwords.

A nice collection of wordlists to get you started is available here:
https://github.com/danielmiessler/SecLists/tree/master/Passwords

TrustedSec

Author: TrustedSec

TrustedSec is a highly specialized information security company made up of some of the industry’s most respected individuals. We work with our business partners to increase their security posture, helping to reduce risk and impact in an ever-changing cyber landscape.