2010-12-31

Why we all need to learn Chinese

I had to write a short essay about an emerging issue in US-China relations and what I think should be done about it. I figured I'd post it here too. Short story: we all need to learn Chinese.

--

The next 10 years will see the beginning of the end of the US' "free lunch".  As the standard of living continues to improve in China and as economic reform and access to information continues to spur growth, wages and prices will rise, which will cause an increase in the cost of goods manufactured in China, and much of this cost increase will be passed on to the American consumer. The increase in the cost of consumer goods may make continuing to import goods from China unsustainable for some industries. Unfortunately the US has already lost (or never developed) the ability to manufacture certain goods and materials in quantity, and has long relied on cheap manufacturing in Chinese factories. Chinese economic growth is therefore likely to cause tensions between China and the US.

Meanwhile China has started investing heavily in outsourcing cheap manufacturing to Africa and other developing regions, so it is likely that China will emerge as the next super-consumer country, and with an emerging middle-class and much greater purchasing power than the US (and maintaining trillions of dollars of US debt), the rise of China will likely drag the US into economic doldrums.

The traditional business and economic approaches to address this problem will all of course be pursued (investing in emerging Chinese markets, exporting Western brands to China and/or developing multinational business conglomerates).  However I think to truly stay relevant, the US needs to focus on teaching Chinese language and culture to every school student the way that every Chinese school student is taught English language and culture, and the US government needs to focus on setting up an extensive network of student exchange programs with China and other Chinese-speaking countries.  By exposing school children to Chinese language and culture, the next generation of business leaders, political leaders, scientists and engineers will be enabled to work alongside Chinese counterparts rather than simply competing against them while the economics of scale turn in China's favor.

2010-12-14

My quick analysis of the leaked Gawker passwords

I just got an email from Gawker Media stating that their login details on Lifehacker, Gizmodo etc. had been compromised and a database of 1.3M usernames and passwords was being distributed via Bittorrent. Naturally I went and found the database and downloaded it. I extracted the subset of passwords from the file that have already been cracked, and uniquified and generated counts. You can download the list at the end of this post.

I have written about the dangers of using one password on multiple accounts before, and when I used to work at a company where I had access to a massive password database, I was shocked to discover how many people use really weak passwords -- like a first name or a number like 123456, or the word "password".

The leaked Gawker data contains the following explanatory text (along with a ton of leaked private chat logs between Gawker executives, and other juicy stuff):


  After gaining access to gawkers MySQL database we stumble upon a huge
  table containing ~1,500,000 users. After a few days of dumping we
  decided that 1.3 million was enough.


  Gawker uses a really outdated hashing algorithm known as DES (Data Encryption Standard).
  Because DES has a maximum of 8chars using a password like "abcdefgh1234" only the
  first 8 characters "abcdefgh" are encrypted and stored in the database. If your
  password is longer than 8 characters you only need to enter the first 8 characters
  to log in! 


  YA DONT SAY!! :D?


  Because of this we were only able to recover the first 8 characters of someones password!
  If the password is 8 characters long there's a good chance that it migt be longer
  than 8 characters! But still, there's 1000's of people using 1 - 8 character passwords
  for us to have some fun with! 


  We managed to crack ~200,000 hashes, if you want the rest of them cracking
  DO IT YOUR ****ING SELF! >:3 (censored)

So ~200,000 hashes were cracked out of 1.3M by de-hashing (actually 188281 hashes were cracked, producing 91688 unique passwords). I assume that the 189k passwords that were cracked are somewhat representative of the rest of the database.

I ran some basic statistics on the password database because I was interested in seeing the distribution of password usage.  Here is a plot of the usage count (out of 189k cracked passwords total) for the top 50 passwords:


Here is the same plot with a log Y axis and with the rank of all cracked passwords shown on the X axis:



Basically the top 5 or so passwords are used by a ridiculously high proportion of users, and the top few thousand passwords are very common and therefore very easy to guess using a dictionary attack.

Here are the top 50 passwords, with their rank and count out of 189k:


Rank    Count    Password
   1     3057    123456
   2     1955    password
   3     1119    12345678
   4      661    lifehack
   5      418    qwerty
   6      333    abc123
   7      311    111111
   8      300    monkey
   9      273    consumer
  10      253    12345
  11      247    letmein
  12      241    trustno1
  13      233    dragon
  14      213    baseball
  15      208    superman
  16      202    iloveyou
  17      202    1234567
  18      199    gizmodo
  19      196    sunshine
  20      194    1234
  21      187    princess
  22      184    starwars
  23      179    whatever
  24      175    shadow
  25      158    000000
  26      157    cheese
  27      156    123123
  28      149    nintendo
  29      149    football
  30      148    computer
  31      141    ****you (censored)
  32      135    654321
  33      134    blahblah
  34      132    passw0rd
  35      132    master
  36      126    soccer
  37      124    michael
  38      120    666666
  39      118    jennifer
  40      115    gawker
  41      114    Password
  42      114    jordan
  43      113    pokemon
  44      113    pepper
  45      113    michelle
  46      113    killer
  47      111    welcome
  48      111    batman
  49      109    kotaku
  50      109    internet


This gives an insight into the password-setting habits (and, if you read through more of the list, the mentality) of a large proportion of the Internet-using population.

A lot of people use numerical passwords -- here are the top 50 numerical passwords. Check out the password at rank 221:


Rank   Count    Password
   1    3057    123456
   3    1119    12345678
   7     311    111111
  10     253    12345
  17     202    1234567
  20     194    1234
  25     158    000000
  27     156    123123
  32     135    654321
  38     120    666666
  74      82    123321
 100      72    123
 101      72    121212
 137      63    159753
 163      56    88888888
 164      56    11235813
 186      53    7777777
 202      50    555555
 221      48    8675309
 236      47    98765432
 237      47    11111111
 243      46    696969
 253      45    112233
 267      43    00000000
 272      42    1111
 286      41    123654
 318      39    222222
 344      37    131313
 430      32    0000
 501      29    987654
 502      29    55555
 537      28    12341234
 608      25    102030
 643      24    147258
 645      24    101010
 684      23    888888
 685      23    159357
 741      22    789456
 742      22    11223344
 743      22    007007
 799      21    12312312
 869      20    99999999
 871      20    147852
 872      20    1212
 873      20    11111
 874      20    09876543
1105      17    0123456
1218      16    151515
1219      16    123789
1221      16    112358


DOWNLOAD LINK: Curious to see the passwords of all 189,000 users? Here's the whole password list with counts for each password.  It's a .tsv file (tab-separated values), you can load it into a spreadsheet or text editor. (This file doesn't contain names or usernames, just the password info.  If you want usernames you'll have to go get them yourself.)

UPDATES:

  1. Ranks and accidentally-stripped leading zeroes fixed.
  2. Highlighting one of my replies to a comment: if you have to even ask if your password is on this list, it's probably not secure enough!