2010-12-14

My quick analysis of the leaked Gawker passwords

I just got an email from Gawker Media stating that their login details on Lifehacker, Gizmodo etc. had been compromised and a database of 1.3M usernames and passwords was being distributed via Bittorrent. Naturally I went and found the database and downloaded it. I extracted the subset of passwords from the file that have already been cracked, and uniquified and generated counts. You can download the list at the end of this post.

I have written about the dangers of using one password on multiple accounts before, and when I used to work at a company where I had access to a massive password database, I was shocked to discover how many people use really weak passwords -- like a first name or a number like 123456, or the word "password".

The leaked Gawker data contains the following explanatory text (along with a ton of leaked private chat logs between Gawker executives, and other juicy stuff):


  After gaining access to gawkers MySQL database we stumble upon a huge
  table containing ~1,500,000 users. After a few days of dumping we
  decided that 1.3 million was enough.


  Gawker uses a really outdated hashing algorithm known as DES (Data Encryption Standard).
  Because DES has a maximum of 8chars using a password like "abcdefgh1234" only the
  first 8 characters "abcdefgh" are encrypted and stored in the database. If your
  password is longer than 8 characters you only need to enter the first 8 characters
  to log in! 


  YA DONT SAY!! :D?


  Because of this we were only able to recover the first 8 characters of someones password!
  If the password is 8 characters long there's a good chance that it migt be longer
  than 8 characters! But still, there's 1000's of people using 1 - 8 character passwords
  for us to have some fun with! 


  We managed to crack ~200,000 hashes, if you want the rest of them cracking
  DO IT YOUR ****ING SELF! >:3 (censored)

So ~200,000 hashes were cracked out of 1.3M by de-hashing (actually 188281 hashes were cracked, producing 91688 unique passwords). I assume that the 189k passwords that were cracked are somewhat representative of the rest of the database.

I ran some basic statistics on the password database because I was interested in seeing the distribution of password usage.  Here is a plot of the usage count (out of 189k cracked passwords total) for the top 50 passwords:


Here is the same plot with a log Y axis and with the rank of all cracked passwords shown on the X axis:



Basically the top 5 or so passwords are used by a ridiculously high proportion of users, and the top few thousand passwords are very common and therefore very easy to guess using a dictionary attack.

Here are the top 50 passwords, with their rank and count out of 189k:


Rank    Count    Password
   1     3057    123456
   2     1955    password
   3     1119    12345678
   4      661    lifehack
   5      418    qwerty
   6      333    abc123
   7      311    111111
   8      300    monkey
   9      273    consumer
  10      253    12345
  11      247    letmein
  12      241    trustno1
  13      233    dragon
  14      213    baseball
  15      208    superman
  16      202    iloveyou
  17      202    1234567
  18      199    gizmodo
  19      196    sunshine
  20      194    1234
  21      187    princess
  22      184    starwars
  23      179    whatever
  24      175    shadow
  25      158    000000
  26      157    cheese
  27      156    123123
  28      149    nintendo
  29      149    football
  30      148    computer
  31      141    ****you (censored)
  32      135    654321
  33      134    blahblah
  34      132    passw0rd
  35      132    master
  36      126    soccer
  37      124    michael
  38      120    666666
  39      118    jennifer
  40      115    gawker
  41      114    Password
  42      114    jordan
  43      113    pokemon
  44      113    pepper
  45      113    michelle
  46      113    killer
  47      111    welcome
  48      111    batman
  49      109    kotaku
  50      109    internet


This gives an insight into the password-setting habits (and, if you read through more of the list, the mentality) of a large proportion of the Internet-using population.

A lot of people use numerical passwords -- here are the top 50 numerical passwords. Check out the password at rank 221:


Rank   Count    Password
   1    3057    123456
   3    1119    12345678
   7     311    111111
  10     253    12345
  17     202    1234567
  20     194    1234
  25     158    000000
  27     156    123123
  32     135    654321
  38     120    666666
  74      82    123321
 100      72    123
 101      72    121212
 137      63    159753
 163      56    88888888
 164      56    11235813
 186      53    7777777
 202      50    555555
 221      48    8675309
 236      47    98765432
 237      47    11111111
 243      46    696969
 253      45    112233
 267      43    00000000
 272      42    1111
 286      41    123654
 318      39    222222
 344      37    131313
 430      32    0000
 501      29    987654
 502      29    55555
 537      28    12341234
 608      25    102030
 643      24    147258
 645      24    101010
 684      23    888888
 685      23    159357
 741      22    789456
 742      22    11223344
 743      22    007007
 799      21    12312312
 869      20    99999999
 871      20    147852
 872      20    1212
 873      20    11111
 874      20    09876543
1105      17    0123456
1218      16    151515
1219      16    123789
1221      16    112358


DOWNLOAD LINK: Curious to see the passwords of all 189,000 users? Here's the whole password list with counts for each password.  It's a .tsv file (tab-separated values), you can load it into a spreadsheet or text editor. (This file doesn't contain names or usernames, just the password info.  If you want usernames you'll have to go get them yourself.)

UPDATES:

  1. Ranks and accidentally-stripped leading zeroes fixed.
  2. Highlighting one of my replies to a comment: if you have to even ask if your password is on this list, it's probably not secure enough! 

6 comments:

  1. In the numeric table, why is the password of "0" counted three separate times? Are they actually varying amounts of zeroes? ("00" vs. "000")

    ReplyDelete
  2. Sweet. I just searched and the password I use on every single site I ever use isn't on the list!

    SCORE! ;)

    (Well... I do have a password that I use for totally insecure sites - and it isn't on the list... which also explains why I could never remember my login for gawker sites)

    ReplyDelete
  3. Evan: good catch, the password list passed through Gnumeric at one point and it stripped leading zeroes. Will fix and update the post and the downloadable list.

    Canyoneering: This only represents a small percentage of the total number of passwords -- though it's probably representative of common passwords. However if you have to even ask if your password is used by somebody else, it's probably not secure enough!

    ReplyDelete
  4. Mine was unique... but decoded. DOH!

    I went through and changed all my passwords EXCEPT the gawker ones yesterday. If someone really wants to post random comments or what have you ... well, go to town, I guess?

    ReplyDelete
  5. Any justification on why the passwords that were cracked are representative of the ones that weren't? That is...quite an assumption.

    ReplyDelete
  6. eve11: Good question -- the justification is simply that a good hashing algorithm relatively evenly distributes hashed data through the hash key space. So there is no reason that the cracked subset of hashes is not representative of the rest of the dataset. I have to imagine that they cracked the hashes in order of prevalence, which doesn't affect the results or conclusions. This attack worked on such a large scale since there was no salting of the hashes.

    ReplyDelete