Data Encryption in Cloud

(This was the streamlined transcript of a community talk I prepared for some school students.)


Hello friends,

Thanks for the opportunity today for me to talk about a fascinating technical topic – data encryption. 

Data, in common terms, is information: information about something or someone. In our today’s topic, data means the information processed and stored in computers, on the internet, and in telecommunication systems. Computer data, as per Oxford Languages, is the quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. 

Not surprisingly, with more and more things online, in the clouds, the volume of computer data has increased exponentially. Think about all the social media contents uploaded, all the websites, online maps, online banking, video conferences, emails, the assignments you wrote that stored in Google Drive…etc, all these, and a lot more. Data encryption is the concept of making the presentation of such data unrecognisable to unintended parties to protect the confidentiality and integrity of the data. 

Imagine, in ancient times, someone wanted to send a secret to a recipient. The sheepskin parchment the secret was written on could be accessed by anyone on the way. To protect the secret, the sender decided to write every letter in every word in a scrambled way: start from the location of the letter in the alphabet, go forward four places, then multiplies the position number by three (the alphabet recurs), then go backward seven places, use that letter. This way, letter a becomes h, p becomes a, l becomes o, and e becomes c. So if the secret were apple, the sender would write it as haaoc. Anyone that was not the intended recipient, upon accessing the text, would not access the secret. This is a simple example of encryption. The encrypted content is called ciphertext.

For the recipient to be able to restore the content (to decrypt), they need to know the way the content was encrypted, which had to be communicated separately from the sender to the recipient. It is called the encryption key. E.g., ‘+4, x3, -7’ is the key to encrypt and decrypt the specific example chunk of ciphertext.    

In the real world, the encryption keys would be a lot more complex than the simple example given above, so the encryption is a lot stronger, making it too difficult for people without the key to crack it. 

The example still describes some characteristics of such encryption and its keys: that the key cannot be just stored or transported together with the encrypted data, that the same key is used to encrypt and decrypt. Such keys are called symmetric keys.  

You would have started to question: for this to work, the encryption key needs to be transported from the sender to the recipient – which then itself might be accessed by anyone on the way – who can then just use the key to decrypt subsequent ciphertext. Of course, you can encrypt the key again using another key (such practice is called envelope encryption), but then the same situation repeats – another key needs to be sent.

This was the reason that another method of cryptography was invented by mathematicians – Public Key Cryptography. It goes this way: mathematical algorithms can be used to generate key pairs, e.g., two different keys that are corresponding. Key 2 in a key pair can be used, and only key 2 can be used, to decrypt ciphertext that was encrypted by key 1 in that pair. Even key 1 itself cannot be used to decrypt. 

As long as I keep key 2 to myself, not disclosing it to anyone, I can advertise key 1 to the whole open world: send messages to me using this key to encrypt. I do not need to worry about the bad guys or whoever out there getting hold of this key – go ahead, help yourselves. But the only one in the whole world that can decrypt ciphertext that is encrypted by this key is the one who possesses key 2.

Generally, key 1 is fitly named Public Key, and key 2, Private Key. Public Key Cryptography got its name this way. Such cryptography is also called asymmetric encryption (which uses asymmetric keys). No worries of transmission of keys from a sender to a recipient are needed – public keys are assumed to be known to anyone. 


Now, cloud computing has been revolutionising the digital and online world, processing volumes of data that were unthinkable before this era. Logically, it would need to deal with data encryption on a scale unthinkable before. 

Though it is called cloud computing, nowadays modern clouds offer comprehensive cloud services, from computing instances to various data storage, databases, machine learning, AI, Internet of Things, Media and many others. For example, AWS (Amazon Web Services) has been a pioneer and a major cloud service provider. AWS offers more than 200 fully featured services, most of them, if not all, process data. Encrypting data at-rest (when data is stored in a computer system) and data in-transit (when data is being transported from one system to another) is a big deal.      

Data encryption and cryptography have been one of the strengths of AWS. The powerhouse of managing cryptography in AWS is a service called KMS.

AWS Key Management Service (KMS) is the AWS service that gives centralised control over the cryptographic keys, integrating with all other AWS services that processing data. Encryption and decryption is naturally a primary functionality of KMS. It does this using envelope encryption that was mentioned earlier.

The large volume of data in the cloud is typically encrypted using data keys. Data keys are mostly symmetric cryptographic keys. (there are cases that asymmetric keys are used for data encryption but what we discuss here are the most common scenarios.) Symmetric keys are used as they encrypt and decrypt way faster than asymmetric keys. This is important as we are talking about gigabytes, terabytes or petabytes of data here – speed is essential when dealing with large data volumes. Then, KMS uses and manages asymmetric keys to encrypt and decrypt data keys (which in turn encrypt data). By doing so, the encrypted data keys can be stored together with the data its plaintext form was used on, in the conveniently organised way – and data can still be secure, then KMS manages asymmetric keys for the decryption and encryption of the huge number of data keys.

I can give some ideas on how secure KMS is - which is fundamentally important, since if KMS keys are tampered with then data keys would be compromised, then of course data safety and integrity would be compromised.

No one in the world, including all the people in AWS, is able to obtain plaintext keys from AWS KMS. Plaintext means in a form that is unencrypted or decrypted. This assurance is there by the design of how KMS was built. Even within KMS, plaintext keys are never written to any data storage, and of course never leave KMS. Unencrypted KMS keys are only used in the volatile memory of highly specialised hardware security modules that KMS uses and only during the momentary time needed for the encryption or the decryption operation. 

These hardware security modules are called Cloud HSMs on AWS. Cloud HSMs enable FIPS 140-2 certified cryptographic operations. Cloud HSMs can ensure this as they are continually validated under the U.S. National Institute of Standards and Technology (NIST) Federal Information Processing Standards (FIPS) 140-2 Cryptographic Module Validation Program (CMVP). This official program provides validation information of all cryptographic modules that have been tested and validated by it. Each Cryptographic and Security Testing Laboratories (CSTL) is an independent laboratory accredited by NVLAP. CSTLs verify each module meets a set of testable cryptographic and security requirements, with each CSTL submission reviewed and validated by CMVP. The current in service FIPS cryptographic standard is 140-2, but CMVP has began validating cryptographic modules to FIPS 140-3, which eventually will be used in services down the track in coming years. 

Besides encryption, AWS KMS also helps in digital signing – digitally verifying the data coming from a sender is indeed from that sender. Imagine someone is sending a message to another party; in many cases it is crucial for that party to be able to verify that the message is indeed coming from that sender, not from someone pretending to be them, and not altered by another party on the way.

Just like encryption, digital signing is also enabled by the public key cryptography mechanism. Remember the public and private key pairs? It can be used in the other direction: the one with the possession of the private key can use the private key to digitally sign the data it sends out. Then people with the public key of that key pair can use the public key to verify that the digitally signed data was indeed from the person / computer who possessed the private key. Anyone else that does not have the private key would not be able to sign the data that passes the verification by the public key of that key pair. 


I hope you have found today’s discussion on data encryption in AWS cloud not totally boring, and that it might even be interesting to some. I also secretly wish that you may have renewed respect to all the branches of mathematics which have made our world as splendid as it is. Lastly, I encourage everyone to develop broad interest towards the world that we are in. Many thanks.   

                                                                                                                                        -- Simon Wang

 

Comments

Popular posts from this blog

Fairness Evaluation and Model Explainability In AI

AWS and Generative AI

Amazon CloudFront and Its Primary and Secondary Origins