Tuesday, November 13, 2018

Comprehensive Password Security

We are doing passwords all wrong.  Requiring excessively complex passwords that are impossible to remember is absurd, especially given how much easier it is to ensure password security through other means.  Requiring numbers or special characters is far less effective than requiring a few additional characters.  Users are far more likely to write down passwords that are difficult to remember in places that are easy to find.  When users forget passwords frequently, and they have to be reset, it sets a precedent that makes it easier for someone else to fraudulently reset the password.  In general, our current guidelines for password security are a disaster.

I am going to give you a warning now: This post is going to be math heavy.  I am not going to try to explain how all of the math works, but I will try to explain what numbers mean, to make comparisons more meaningful.

To start with, you should really read this comic: https://xkcd.com/936/  According to some sources, some of the math in there is wrong, but one thing is definitely right: The second password, composed of four common English words, is more secure than the first one, which uses a bunch of numbers and special characters that make it harder to remember.

The calculations used in the above comic measure password security by calculating how much entropy the password has.  The entropy represents the number of possible passwords that could exist of that length with the combination of character classes used.  Common character classes used in passwords include letters (these can be separated into an uppercase and lowercase class), numbers, and "special characters" which is anything that can be typed on a keyboard that is not in the other two classes.  Space tends to be ignored, but it could be grouped into special characters.  To convert the entropy value into the actual number of possible combinations, you raise 2 to the power of the entropy.  So, if the entropy of a password is 28, you can calculate the number of possible combinations with 2^28 = ~268m.  On average, a brute force hacking algorithm should be able to figure out your password with only having to try around half of the possible combinations, in this case 134 million, which a modern computer can easily do fairly quickly.  We won't bother looking at actual numbers of combinations from here on out, but keep in mind that adding one bit of entropy doubles the difficulty of cracking a password.

Most accounts require a password at least 8 characters long, with letters, numbers, and special characters.  Assuming the password is just a randomly selected combination of all characters that appear on a typical keyboard, that is 95 possible characters, which has an entropy of about 6.6 per character or 52.4.  So let's use 52 as a minimum acceptable security level, since that is approximately what most web sites require.

The xkcd comic suggests that a password of four common words, in all lower case, has an entropy of 44.  When something like known words are used in a password, we cannot just calculate entropy per character.  First, each word must be treated as a whole entity itself, because a word has different entropy (typically less) than the same number of characters.  If a hacking algorithm tests common words before going to brute force (most do now days), it will break your password faster than a pure brute force algorithm.  English has around 3,000 common words.  That's close to 11.5 bits of entropy per word (verified by calculating 2^11.5).  Four words thus will have about 46 bits of entropy (which makes xkcd's estimate right on, if he rounded down to 11 bits per word).  The average English word is about 5 characters long, so a four word password is around 20 characters long.  That might seem like a lot, but memorizing a password of four common words is a lot easier than memorizing eight random characters.  An entropy of 46 bits, however, is well under 52 bits, which is our minimum reasonable limit (a 6 bit difference means the 46 bit password is 64 times less secure).  Four common English words is not enough for good security.

The Second Edition of the Oxford English Dictionary contains around 180,000 words that are still in use (as well as 47,000 obsolete words, which I will leave out for convenience, but which might actually be good to use, since a dictionary attack is unlikely to include them).  That is around 17.5 bits of entropy per word.  A password composed of four randomly selected words from the 180k words that are still in use listed in this dictionary would have a total entropy of 70.  That's 18 more bits (2^18 = 262k times more secure) than our baseline of 52.  It will probably be a little bit harder to memorize than the common words, but it should still be pretty easy, and it is still only around 20 characters.  Without capital letters, numbers, or special characters, this password is five orders of magnitude more secure than the baseline.

We can do better though.  There are a few reasons words are so good for passwords.  They are all based on the fact that words are easier to remember than random character strings.  First, pronounceable words will be easier to remember, and second, the meanings of words make them easier to remember.  These happen in different parts of the brain.  Pronunciation of a word will be stored in areas of the brain associated with decoding the meaning of language heard, as well as areas associated with speaking.  Meaning will be stored in regions associated with abstract thought.  These will form connections that make the password much easier to remember.  If we make up pronounceable words and then give them meanings though, we eliminate the possibility of a dictionary attack, falling back to brute force, where we add up the entropy of each bit.  Now it depends on the length of the password again, and we don't have this issue where a long word has no more entropy than a short one.

Before we do this, we should look at the entropy for various sets of character classes.  The letter class (uppercase and lowercase) has 52 possibilities, for 5.7 bits of entropy.  Letters and numbers has about 5.9 bits of entropy.  Letters, numbers, and special characters (everything that is on a modern keyboard) has about 6.6 bits of entropy.  Notice that adding numbers only increases the entropy by 0.2 bits per character (a 14% increase in possibilities).  Adding special characters only increases entropy by 0.7 bits per character (a 62% increase in possibilities).  For an 8 character password, adding numbers increases entropy by a total of 1.6 bits, and adding special characters only increases entropy by a total of 5.6 bits.  Adding a single character increases overall security by more than adding either of these.  Adding numbers and special characters to an 8 character password increases overall entropy by 7.2, which is better than adding a single character but far worse than adding two.  In short, adding length to a password makes a far bigger difference than requiring more character classes.  Requiring numbers and special characters makes only a very small difference.  The primary value in this is making it harder for users to make terribly insecure passwords, by discouraging the use of single common words.  The key for getting secure passwords is all about length, not character classes.  For an additional character class to add even 1 bit of entropy per character, it has to double the total number of possible characters.  Adding length, on the other hand, increases security exponentially.

Now, let's say you make a password of four made up words, all lowercase, without spaces, where the length comes out to 20 characters.  Because the words are made up, a dictionary search won't work, so we have to use brute force.  Each character adds 4.7 bits of entropy.  (All lower case gives 26 possibilities per character.)  This password has a total of 94 bits of entropy.  That is extremely secure compared to the baseline of 52.  Because these words are pronounceable, they are easy to remember.  If you give them meanings, that will make them even easier to remember.  Because there are so many characters, despite its minimal effect, throwing in a number will increase the entropy by 4 (16 times harder to break, which is honestly not as impressive as it sounds), giving you 98 bits of entropy.  Throwing in a special character or two will increase the entropy by 14 bits (which is a little bit more impressive but still pretty low compared to initial 20).  Together, you get a total entropy of 112 bits, at the cost of having a couple of characters that make it harder to memorize.  (Honestly, I am still questioning the value of adding the numbers and special characters, because that's only a total 16% increase in security, at the cost of being harder to memorize...  Adding only 4 additional characters would have a bigger effect.)  The biggest effect you can get is by capitalizing the first character of each word, which doubles the number of possibilities per character, increasing the entropy by 1 per character.  That gives you a total entropy of 114 bits, without adding numbers or special characters.

Good password security is not about using numbers and special characters in your passwords.  It is purely about length.  If you avoid using combinations of characters that are coherent in any common language (ie, don't use words that exist in common languages), you can still create easily memorable passwords that are long enough to be extremely secure.  Just 20 lowercase characters has an entropy of 94, which is 42 bits stronger or 2^42 =  4.4 trillion times more secure than the minimum security required by most sites.  The only problem you may run into is some sites using poor security practices themselves limit password length to 16 characters (75.2 bits for all lowercase made up words and 92.1 bits if you include some uppercase letters).  Even with this limitation, using made up words is massively superior to short passwords that are too random to be easy to memorize.  In fact, with only lowercase letters, you only need 12 characters of made up words to have a password of equivalent strength to an 8 character password including lower case, upper case, numbers, and special characters, and with both lower and upper case, you only need 10 characters to beat it.

The takeaway here is that the best passwords are made of multiple made up words that are pronounceable and have meanings.  Passwords like this, with at least 16 characters, will be significantly stronger than the weakest password most sites allow, and on top of that they will even be easy to memorize, and all of this is true even if you use only lowercase English letters.

No comments:

Post a Comment