Why do we need separate cryptographic hashing and encryption methods? Isn't one enough? And why are hashes sometimes salted? The simple answer to that is “no.” Hashing and encryption cannot replace each other because the properties that makes each good for their own tasks makes them bad for the other’s tasks. By looking into what each is used for, and how salting complements a necessary weakness, we can better understand the correct way to use them to make a company more secure from technical attacks.
Encryption: Your friendly neighborhood Fcvqrezna
Encryption is the process of taking plain text that anyone can read and turning it into text that is apparently random. Crucially, though, encryption is designed so that it can be reversed (ie decrypted) with the right knowledge (either the private key or the shared password). The value/process used to encrypt and decrypt could be exactly the same (these are called symmetric ciphers, eg ROT-13, which takes each letter of the alphabet and replaces it with the letter 13 after it (A comes after Z)) or the value/process could be completely different for encryption and decryption (eg public-key cryptography). In both cases, the encryption and decryption processes must be deterministic (ie the same inputs produce the same outputs), or else it wouldn't be possible to reverse the encryption. The goal of encrypting something is to make sure that only authorized people (ie people who have the decryption key can see the plain text). In the case of the web, that means your browser and the website to which you're connecting.
Hashing: Not your father’s encryption method: 01C803C138CAD5A4AD75307260B2328E8
A cryptographic hash function is a process that takes plain text and turns it into an apparently random chain of numbers and letters, called a digest. Unlike encryption cryptographic hashes are designed to be as hard to reverse as possible, but like encryption, they need to be deterministic. A good cryptographic hash has 3 important theoretical properties and 1 important practical ones:
- Pre-image resistance - Pre-image resistance just means that it is difficult to find a plaintext that hashes to a given hash value (yes, it ok for a hash function to hash two different values to the same hash value. Hold that thought, though). Essentially pre-image resistance makes finding an answer to the following question hard: "Hey, I have a hash value X. What string hashes to it?"
- Second pre-image resistance -Second pre-image resistance means that if you have a given plaintext, it is hard to find another plaintext string that has the same hash value as the given plain text. This is part of why it's ok for two messages to hash to the same value. Essentially, second pre-image resistance makes finding an answer to the following question hard: "Hey, I have plaintext Y. What's another plaintext that has the same hash value as X?"
- Collision resistance -Collision resistance means that it is difficult to find any two different plaintexts that have the same hash value. Having this property means that the hash function also has second pre-image resistance (nb: having second pre-image resistance does NOT mean that a hash function has collision resistance). Collision resistance makes finding an answer to the following question hard: "Can I find any two different plaintexts that have the same hash value?"
- Computation time - A good hash function needs to be computationally simple enough that computing the hash value of something once is cheap, but computationally complex enough that attempting a brute-force reversal of the hash function is not economically justified. This is very much a “goldilocks” problem which is complicated by the fact that “too-simple” and “too-complex” are constantly shifting due to advances in hardware.
These properties make it so that no one can modify or replace the plaintext without changing its hash value as well. This is cryptographically useful, because it means we can be really confident that two strings with identical hashes are actually identical without needing to compare the plaintexts directly and really confident that the message hasn’t been altered.
Salting: More than just a seasoning + 5EF4BE199297
The biggest weakness of a good hash or encryption function is due to the fact that it's deterministic. Attackers can exploit this determinism to make inferences about the plaintext having only seen multiple ciphertexts. Because the output of a hash or encryption function is always the same for the same input (the password/key is part of the input of an encryption function), when it comes to password hashes it's possible to precompute a table of text values and their hashes or to determine that two accounts share the same password in certain cases. When it comes to encrypting data the same plaintext data will have the same ciphertext.
In this penguin example, the problem is especially egregious and obvious because it was encrypted using a block cipher, which means that the image was broken up into blocks and each block is encrypted separately from all the other blocks, so we can see repeated values within the image because the ciphertext for two equivalent plaintext blocks is the same.*
This where a salt comes in for hashes and where IVs and chaining come in for ciphers (that will have to be addressed in another blog post). A salt is just some known bit of plain text that is added before or after the plain text that needs to be hashed. This renders precomputed hash tables ('rainbow tables' are a version of this) useless. An attacker will have to compute a rainbow table specifically for your stored hashes thanks to the salt. But if you use one salt for everything (this is called a static salt), then the attacker can reuse their work for different accounts on your site, and they can tell if two accounts share a password. The solution to this is to use a dynamic salt, or a salt that is different for every single user. One of the simplest examples of this would be hashing <random number><password> instead of <password>. Since each salt is unique, the attacker must calculate a hash table for each account that they want to crack. When a good hash function is used, calculating this is infeasible because the computational cost is too large.
Ultimately, all three of these are necessary for your company’s cryptographic security. Each prevents different attacks, and when used in conjunction, make your digital environment more secure. While most companies don’t need to hire their own cryptographer, understanding where cryptographic technologies are strong and weak will help you evaluate security products for your company.
* With a stream cipher, we can't see the repeats within a single plaintext and ciphertext encryption stream, but the same issue applies for multiple streams. You should never ever reuse a key for a stream cipher, even for different plaintexts, though, because attackers can use this and other properties to fully recover both plaintexts