Hash your passwords!

At my place of work we use a certain program. It’s a multi-user client/server program, with authenticated user accounts. Recently while I was logging in something unexpected happened. After providing my credentials and hitting enter I realized I’d inadvertently typed one extra character at the end of my password. I expected to have to retype it, but to my surprise the program granted me access. To be sure I logged out and tried it again, this time purposely appending an extra character, and again I was granted access. A few more experiments revealed that my password would be accepted with any number of characters added at the end, whereas any other change was properly rejected.

This peculiarity baffles me; I can’t fathom why anyone would program a password check in such a way. But leaving that aside, what are the implications of this behaviour? Minimally, that the program must know the length of my password. Without that information it couldn’t know how many characters to ignore, were I to type too many. (Well, actually that isn’t strictly true—there is a way to do it without knowing the length of my password, but it’s a silly trick.) Now, there are basically 2 ways in which the program could know the length of my password. The most direct and obvious way is to store my password in its database and thus be able to simply count the characters. Alternatively, it could store only the length as a number but not the password itself.

But how could the program possibly be not storing my password? How can it check whether the right password has been supplied, if it doesn’t know what the right password is? By using a form of encryption known as hashing. A hash function takes plaintext (original, readable text) of arbitrary length and converts it into a ciphertext (encrypted, unreadable text) of fixed length, which is called the hash. The important property of this function, that differentiates it from other kinds of encryption, is that the reverse operation can’t be done. This means the function is “one-way” only: you can turn any plaintext into a hash, but given a hash it’s practically impossible to reconstruct the plaintext.

Using this scheme then, instead of storing the password as plaintext, the program could store just the hash. To authenticate a user, the program need only hash whatever the user has typed in, and compare the result against what’s in the database. Thus, without even knowing what the password actually is, it could ensure that the correct one was given. Because all hashes produced by a particular function are the same size, even the length of the original password is obscured. That’s why if the program were using hashing to avoid storing the password itself, it would have to store the length of the password separately. Unless, that is, it used the silly trick I mentioned—which I’ll not reveal here, as it should be fairly trivial to figure out (feel free to ask in the comments if you’re stuck though).

So what is the program actually doing? Well, the observed behaviour is most straightforwardly explained by the storing of plaintext passwords. In fact I jumped to that conclusion when I discovered the peculiarity, and it was only after analyzing the situation in depth for this post that I realized that it wasn’t quite certain. But anyway what does it matter whether it’s storing plaintext or hashed passwords? In a word, security.

Consider what would happen if an attacker were to compromise the database and obtain read access. If the passwords are stored in plaintext, then it’s a disaster; but if only hashes are stored, then the passwords themselves are still (for the moment) secret. Even under normal circumstances there is a benefit—hashed passwords are secret also to the administrators of the system. Users commonly use the same password for many services, so they are better protected if they don’t have to risk placing undue trust in admins; and if something does go wrong, an admin is under less suspicion if they could not possibly have known a user’s password.

This is a very basic security technique, but the knowledge of it seems depressingly uncommon in the software engineering field. A major part of the blame lies with educational institutions; I was not taught this in college, having instead to learn it on my own (fortunately not the hard way). Hashing is only the beginning though, as there are still ways an attacker could defeat such a system. The second line of defense is to add salt, which I’ll cover in a future post.

Follow

Get every new post delivered to your Inbox.