Keep your data secure from prying eyes: An encryption primer
Do you have sensitive data on your Web site? Do you send confidential e-mail you want to keep private? We tell you about the types of encryption and how they work to your advantage
Encryption can be used to maintain confidentiality and protect data from illegal snooping, but it can also be used to prove the authenticity of a message's originator. We'll show you how and present some tips for selecting encryption systems. We'll also show you why digital signatures are becoming so important for electronic transactions. (3,200 words, including a glossary of common key algorithms)
Are you setting up a system for electronic commerce? Do you have confidential data that must be kept confidential? Perhaps you merely want to insure that the data isn't altered by unknown, unauthorized persons or applications? Or perhaps you're regularly exchanging information with others over the Internet, such as by e-mail or the Web, and you want to protect that information against illegal changes, snooping, or misrepresentation. If so, then you need to know about encryption.
Communications and transaction security starts with authentication and encryption. Encryption, or encoding data into an unreadable form to ensure privacy, is probably the first use for cryptography you would think of to secure your server. But new uses for authentication of individuals or computers, such as in Web-based transactions, have increased the utility of cryptography. Digital signatures, which can be generated quickly and bind a document or message to the owner of a particular key, are also proving useful for authenticating messages.
For encryption to work properly, both the sender and receiver have to know what rule, or cipher, was used to transform the original information into its coded form, often called cipher text. A simple cipher might be to shift all characters in a message by an arbitrary number of characters, say 13. As long as I know that's what you did to your message, I can subtract 13 characters from the message I received from you to extract the original text. (This particular cipher is called the caesar after Julius Caesar, who was noted for using it to communicate with his field commanders.)
Encryption is based on two components: an algorithm and a key. A cryptographic algorithm is a mathematical function that takes intelligible information (plain text) as input and changes it into unintelligible cipher text. In order to encrypt the plain text, most algorithms use a key as input in conjunction with an encryption formula. Both the key and the function used are crucial to the encryption -- the same key used in two different encryption functions will produce two different results, and two keys used with the same function also produce two different results. The number of possible keys each algorithm can support depends on the number of bits in the key.
The difficulty of cracking an encrypted message is a function of the key length. For example, an 8-bit key allows for only 256 possible keys (28). Even having a computer sequentially guess each possible key and decrypting the message to see if it makes sense would lead to finding the correct key quickly. Now try the same thing with a computer guessing 1 million keys every second for a 100-bit key (which equates to searching 2100 keys). It could take centuries to discover the right key.
Let's consider the security of an encryption algorithm -- an encryption algorithm is considered secure if its security is dependent only on the length of its key. Why? If security were dependent on the secrecy of the algorithm, the inaccessibility of the cipher text or plain text, or anything else, those items could be derived from publications, pattern analysis of messages, or collected in other ways, such as traffic monitoring. They then could be used to decrypt your communications. However, knowing that a key is n bits long only gives you an idea of how much time you'd have to spend to break the code.
The key to encryption: Knowing which is which
The oldest form of key-based cryptography is called secret-key or symmetric encryption. In this scheme, both the sender and recipient possess the same key, which means that both parties can encrypt and decrypt data with the key. This presents some drawbacks: A shared secret key must be agreed upon by both parties. If you have n correspondents then you have to keep track of n secret keys, one for each of your correspondents. If you use the same key for more than one correspondent, then each correspondent will be able to read the other's mail.
Another problem of symmetric encryption schemes is that the authenticity of a message's originator or recipient cannot be proved. Since both possess the same key, either of them can create and encrypt a message and claim that the other person sent it. This built-in ambiguity about who authored a message makes it impossible for one person to prove that they did or did not send a message (non-repudiation). By using what is called public key cryptography, which makes use of asymmetric encryption algorithms, the non-repudiation issues can be resolved.
Public-key cryptography is based on the concept of a key pair. Each half of the pair (one key) can encrypt information so that only the other half (the other key) can decrypt it. One part of the key pair, the private key, is known only by the designated owner; the other part, the public key, is published widely but is still associated with the owner.
Key pairs have a unique feature -- data encrypted with one key can be decrypted with the other key in the pair. In other words, it makes no difference if you use the private key or public key to encrypt a message, the recipient can use the other key to decrypt it.
These keys can be used in different ways to provide message confidentiality and to prove the authenticity of a message's originator. In the first case, you'd use the recipient's public key to encrypt a message; in the other, you'd use your private key to encrypt a message. For example, in order to create a confidential message, Tim would first acquire Ann's public key. Then he uses her public key to encrypt the message and sends her the encrypted message. Since the message was encrypted with Ann's public key, only someone with Ann's private key (and we presume only Ann has that) can decrypt the message.
Although encrypting a message with a part of a public key pair isn't very different from using secret-key encryption, public-key systems offer some advantages. For instance, the public key of your key pair can be readily distributed (on a server or via e-mail, for example) without fear that this compromises your use of your private key. You don't have to send a copy of your public key to all your respondents; they can get it from a key server maintained by your company or from what are called certificate authorities (we'll get to them in a moment).
Another advantage of public-key cryptography is that it allows you to authenticate a message's originator. The basic idea is this: You are the only person who can encrypt something with your private key. If someone can use your public key to decrypt the message, then the message must have come from you. Thus, your use of your private key on an electronic document is similar to your signing a paper document.
But using public-key cryptographic algorithms to encrypt messages is computationally slow, so cryptographers have come up with a way to generate a short, unique representation of your message called a message digest that can be encrypted and then used as your digital signature.
Some popular, fast cryptographic algorithms for generating message digests are known as one-way hash functions. A one-way hash function doesn't use a key, it's simply a formula to convert a message of any length into a single string of digits called a message digest. For example, if I were using a 16-byte hash function, any text I process with that hash function would produce 16 bytes of output, such as CBBV235ndsAG3D67. The important thing to remember is that each message should produce a random message digest. Now encrypt that message digest with your private key and you've got a digital signature.
As an example, let's have the sender, Tim, calculate a message digest for his message, encrypt it with his private key and send that digital signature along with the plain-text message to Ann. After Ann uses Tim's public key to decrypt the digital signature, she has a copy of the message digest that Tim calculated. Since she was able to decrypt the digital signature with Tim's public key, she knows that Tim created it, authenticating the originator. Ann then uses the same hash function (which was agreed-upon beforehand) to calculate her own message digest of Tim's plain-text message. If her calculated value and the one Tim sent her are the same, then she can be assured that the digital signature is authentic.
The one problem with this approach is that a copy of the plain text is sent as part of the message and therefore Tim's message is not protected from snooping. Although it further complicates matters, a standard approach is to use a symmetric algorithm with a secret key to encrypt the plain text of the message. The computational intensity of public-key encryption makes it unsuitable for encrypting the entire message.
No one encryption system is ideal for all situations. The following table illustrates some of the advantages and disadvantages of each type of encryption.
|Advantages and Disadvantages of Cryptographic Systems|
Add to this the differences of key lengths and algorithms and it can be difficult to select what's the appropriate algorithm to use. The general rule of thumb is: First determine how sensitive your data is and for how long it will be sensitive and have to be protected. Once you've figured that out, select an encryption algorithm and key length that will take longer to break than the length of time for which you data will be sensitive.
One of the best discussions of key lengths and the efforts required to break a key is found in Chapter 7 of Applied Cryptography by Bruce Schneier (Second Edition, John Wiley & Sons, 1996). The following is a condensation of his table estimating the cost of building a computer in 1995 to crack symmetric keys and the time required to crack certain length keys.
|Length of Key in Bits|
|$100,000||2 secs||35 hours||1 year||70,000 years||1019 years|
|$1 million||.2 secs||3.5 hours||37 days||7,000 years||1018 years|
|$100 Million||2 msecs||2 minutes||9 hours||70 years||1016 years|
|$1 billion||.2 msecs||13 secs||1 hours||7 years||1015 years|
|$100 billion||2 microsecs||.1 secs||32 secs||24 days||1013 years|
Remember that this is not a static situation either. Computing power is always going up as costs fall (Moore's law), so it'll get easier to break larger keys in the future. These estimates are for brute-force attacks, i.e., guessing every possible key. There are other methods for cracking keys, depending on the ciphers used (that's what keeps cryptoanalysts employed), but estimates for brute-force attacks are commonly cited as a measure of the strength of an encryption method.
Secret- and public-key ciphers use different key lengths, so the above table cannot be used for setting all of your security requirements. Schneier has a table comparing the two systems for similar resistance to brute-force attacks.
|Secret-key Key Length||Public-key Key Length|
|56 bits||384 bits|
|64 bits||512 bits|
|80 bits||768 bits|
|112 bits||1792 bits|
|128 bits||2304 bits|
When it comes to selecting software and/or hardware for your purposes, recall that more than one encryption system might be used in the product -- that's a common practice because of the different computational requirements for secret- and public-key algorithms. For example, here's how PGP (Pretty Good Privacy) uses RSA, IDEA, and MD5:
Why use digital certificates?
One of the key pieces for securing electronic transactions is the digital certificate, a foolproof way of identifying any party. (Of course, as the axiom says, nothing is foolproof in the hands of a fool, but as of today, the technology is sound.) The digital certificate acts like an electronic version of a driver's license -- by being an accepted method for distributing a private key to you, it provides you with a way to prove your identity. Every digital certificate is unique, just as a driver's license is unique. But, instead of "showing" your digital certificate upon request, you use the private key it contains to generate digital signatures for your electronic documents.
Digital certificates, which are issued by certificate authorities such as Verisign, GTE, Cybertrust, and Nortel, include the holder's name, the name of the certificate authority (CA), a public key for cryptographic use, and a time limit for the use of the certificate, frequently six months to a year.
A digital certificate can come in one of four classes that indicate to what degree the holder has been verified. Class 1 is the easiest to get and includes the fewest checks on the user's background; only his or her name and e-mail address are verified. For Class 2, the issuing authority checks a driver's license, social security number, and date of birth. Users applying for a Class 3 certificate can expect the issuing authority to perform a credit check using a service such as Equifax, in addition to the information required for a Class 2 certificate. The fourth class includes information about the individual's position within an organization, but the verification requirements for Class 4 certificates have not yet been finalized.
Certificate authorities (CAs) also have the responsibility to maintain and make available a Certificate Revocation List, or CRL, that lets users know which certificates are no longer valid. The CRL doesn't include expired certificates, since each certificate has an expiration built-in. Certificates can be revoked because they were lost, stolen, or because an employee left the company, for example. One problem here is that CRLs are issued periodically, often every day or so, and there's no guarantee that a certificate hasn't been revoked since the list was issued.
In addition to commercial CAs, corporations can become a certificate authority by purchasing a certificate server from a vendor that has been certified by a CA. Such arrangements are useful when a number of employees need to be issued digital certificates for doing business, either within the company or with other companies. With systems becoming available for tying computer access to digital certificates as a refinement of access control lists, corporate-maintained certificate servers will also become more important.
A recipient of correspondence that's been encrypted or signed with your private key might want to verify who you are and check the validity of the key (it might have been stolen, for example). To do that, he'd ask the issuing certificate authority to verify that they issued a digital certificate to you and that the certificate has not been revoked.
Depending on the hierarchy of certificate authorities and the recipient's paranoia, the validation request might be followed through various layers of CAs to a top-level government agency. Much of this infrastructure is nearly non-existent and isn't yet designed for a large number of transactions, such as would be expected for electronic commerce. For example, the U.S. government is trying to set up the Public Key Infrastructure for certificate authorities to help standardize the infrastructure and procedures for distributing and verifying certificates.
Big Brother is watching you
Perhaps the biggest problem in using encryption systems is governmental restrictions. The U.S. government, among others, imposes restrictions on the key lengths that can be used in products that are sold in other countries. The 40-bit restriction was relaxed somewhat (to 56 bits) in November 1996, but the U.S. is still seeking to impose key escrow requirements on encrypted products shipped overseas during the next two years.
In key escrow, a trusted third-party maintains a special key for the encryption system and is supposed to surrender that key when shown "just cause" by governmental agencies, like the FBI. IBM has been promoting the Security Alliance and is developing its SecureWay framework to assist in key management and escrow. Hewlett-Packard has proposed its International Cryptography Framework (ICF) to address the issue of exportability. It's a little early to tell which, if any, key escrow systems will be widely implemented -- some developers would like to see no escrow system at all. Stay tuned.
If you have technical problems with this magazine, contact email@example.com
DES (Data Encryption Standard): A block cipher created by IBM and endorsed by the U.S. government in 1977. Uses a 56-bit key and operates on block of 64 bits. Relatively fast and used to encrypt large amounts of data at one time.
Triple DES: Based on DES. Encrypts a block of data three times with three different keys. Being proposed as an alternative to DES, since it's been said that the potential of easily and quickly cracking DES is increasing every day.
RC2 and RC4: Designed by Ron Rivest (the R in RSA Data Security Inc.). Variable key size ciphers for very fast bulk encryption. A bit faster than DES, the two algorithms can be made more secure by selecting a longer key size. RC2 is a block cipher and can be used in place of DES. RC4 is a stream cipher and is as much as 10 times faster than DES.
IDEA (International Data Encryption Algorithm): Created in 1991, it was designed to be efficient to compute in software. Offers very strong encryption using a 128-bit key.
RSA: Named after Rivest, Shamir, and Adelman, its designers. Public-key algorithm supports a variable key length as well as variable blocksize of the text to be encrypted. The plain-text block must be smaller than the key length. Common key length is 512 bits.
Diffie-Hellman: The oldest public-key cryptosystem still in use. Does not support either encryption or digital signatures. System is designed to allow two individuals to agree on a shared key, even though they only exchange messages in public.
DSA: Digital Signature Algorithm, developed by NIST (National Institute of Standards and Technology) based on what's called the El Gamal algorithm. The signature scheme uses the same sort of keys as Diffie-Hellman and can create signatures faster than RSA. Being pushed by NIST as DSS, the Digital Signature Standard, although its acceptance is far from assured.
About the author
Dave Kosiur, Ph.D., (firstname.lastname@example.org) is an independent networking consultant and freelance writer. He has published two books on networking, including The Macworld Networking Bible (IDG Books), which won a Computer Press Association award in 1995. His latest book, on business-to-business electronic commerce, will be published by Microsoft Press this April. He's now concentrating on electronic commerce, e-mail, and security issues, as well as the World Wide Web (isn't everyone?). Reach Dave at email@example.com.