In this blog post, you will learn how to store sensitive information in a database the right way. This involves learning about data encryption keys, key encryption keys, key derivation functions, Advanced Encryption Standard (AES) 256-bit symmetric key encryption, and more all while using best software engineering practices. The particular database used in this blog post is Elasticsearch; however, the same cryptographic concepts used in this blog post apply to any relational database or NoSQL database.
This is part 1 of two blog posts. Part 1 goes into detail about the cryptographic concepts being used and how to store the encrypted data in Elasticsearch or any other database and Part 2 will go over how to fetch and decrypt the data being stored.
Prerequisite Cryptographic Concepts
Before diving into the code, the following are the prerequisite cryptographic concepts you must be aware of to follow along with this blog post:
Advanced Encryption Standard (AES) 256-bit – AES is a symmetric key block cipher algorithm for encrypting and decrypting data with a single key (Bernstein & Cobb, 2021).
Data Encryption Key (DEK) – The DEK is a cryptographic key used to encrypt and decrypt the data (“Data Encryption Key”, 2014).
Key Encryption Key (KEK) – The KEK is the cryptographic key used to encrypt and decrypt the DEK (“Key-encryption-key (KEK) – glossary”, n.d.).
Key Derivation Function (KDF) – “A KDF is a cryptographic algorithm designed to generate a secure secret key from a single key value” (“What Are Key Derivation Functions?”, 2023). In a Password Based KDF (PBKDF), the KDF takes a password, salt, difficulty level, and key size to generate a secret key.
Cryptographic Salt – A salt in cryptography is a random value used to make the hash of a password unique by appending or prepending the salt to the password before computing the hash (“What is a cryptographic salt?”, n.d.).
How to Securely Store Sensitive Information in a Database like Elasticsearch
Securely storing the sensitive information in a database can be done in eight steps (“How to encypt sensitive data in database of a web app?”, n.d.):
- Generate the DEK, which is used for encrypting sensitive data.
- Encrypt the sensitive data with the DEK (from step 1).
- Base 64 encode the encrypted data (from step 2).
- Generate the KEK with the user password and random salt, which is used for encrypting the DEK (from step 1).
- Encrypt the DEK (from step 1) with the KEK (from step 4).
- Base 64 encode the encrypted DEK (from step 5)
- Base 64 encode the KEK salt (random salt from step 4).
- Index the document that contains the base 64 encoded encrypted DEK (from step 6), KEK salt (from step 7), encrypted data (from step 3), and any other necessary attributes in Elasticsearch or other database.
It’s also important to note that when decryption occurs, which is discussed in more detail in the next blog post, the KEK can be regenerated when the user logs in and stored in a user session or the KEK can be stored, but it must be stored outside the database containing the encrypted DEK and encrypted data to increase security. For example, the KEK can be stored in a key management system.
A Proof of Concept
The Proof of Concept (PoC) in this blog post will be storing user documents that contain a username, hashed password, and an encrypted Social Security Number (SSN) in Elasticsearch in an index called users.
Designing the Proof of Concept
A logically cohesive class called Crypto will contain class methods that perform various cryptographic algorithms, such as encrypting data, decrypting data, and so on (Note: If you don’t know what cohesion is, than you can read my blog post on The 7 Types of Cohesion You Need To Know To Be The Best Software Engineer).
A model class called User with the properties id, username, password, and SSN will be created when fetching and storing users into the Elasticsearch database.
A Data Access Object (DAO) is used to interface with the Elasticsearch database when storing a user model object as a document in Elasticsearch and retrieving the user document and marshaling it into a user model object.
A class called ElasticsearchDb, which is a singleton class, that is used to initialize the Elasticsearch object once, which is used to get a database connection to Elasticsearch.
The Code for the Proof of Concept with Explanation
The Crypto Class
First, let’s go over the code for the Crypto class, which is stored in crypto.py:
from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes
import scrypt
import bcrypt
class Crypto:
# AES Key Size is 32 bytes or 256 bits.
AES_KEY_SIZE = 32
# Salt length is 32 bytes.
SALT_LENGTH = 32
# Generates a Cryptographically secure random sequence of bytes.
@classmethod
def random_bytes(cls, key_size):
return get_random_bytes(key_size)
# Encrypts data given an AES key.
@classmethod
def encrypt(cls, data, key):
cipher = AES.new(key, AES.MODE_EAX)
ciphertext, tag = cipher.encrypt_and_digest(data)
nonce = cipher.nonce
return (ciphertext,tag,nonce)
# KDF generating a key given a password and salt.
@classmethod
def key_derivation_function(cls, password, salt):
# argument list being passed into scrypt hash function:
# arg0 - password
# arg1 - salt
# arg2 - iteration count
# arg3 - block size
# arg4 - number of threads
# arg5 - hash size
return scrypt.hash(password, salt, 2**12, 2**3, 1, 32)
# Hash a Password
@classmethod
def hash_password(cls, password):
password_bytes = bytes(password, 'ascii')
salt = bcrypt.gensalt()
hash = bcrypt.hashpw(password_bytes, salt)
return hash
In order for this class to run properly you must install the scrypt, bcrypt, and pycryptodome packages using pip. The following pip command will install the required packages for this class:
$ pip install scrypt bcrypt pycryptodome
Let’s investigate each method of this class in more detail.
First, let’s examine random_bytes method, which has the following implementation:
# Generates a Cryptographically secure random sequence of bytes.
@classmethod
def random_bytes(cls, key_size):
return get_random_bytes(key_size)
This method calls the get_random_bytes function, which is a cryptographically secure random number generator, which generates key_size number of random bytes (“What is the difference between Pycrypto’s Random.get_random_bytes and a simple random byte generator?”, n.d.).
Next, the following is the implementation of the encrypt function:
# Encrypts data given an AES key.
@classmethod
def encrypt(cls, data, key):
cipher = AES.new(key, AES.MODE_EAX)
ciphertext, tag = cipher.encrypt_and_digest(data)
nonce = cipher.nonce
return (ciphertext,tag,nonce)
The first line of this method will construct an AES object, which takes the symmetric key and mode of AES encryption, which is MODE_EAX in this case (“AES encryption & decryption in Python: Implementation, modes & key management”, n.d.). EAX stands for (Encrypt-then-authenticate-then-translate), which means it performs authenticated encryption (“Choice of authenticated encryption mode for whole messages”, n.d.).
The second line, invokes the encrypt_and_digest method and passes the data to encrypt, which returns the cipher text and authentication tag, which is also known as the Message Authentication Code (MAC), which is used to determine the data is authenticate and not tampered with (Dennis, 2023; “What is a Message Authentication Code (MAC)?”, n.d.).
The third line grabs the nonce and sets it to variable, which is an arbitrary number that is used to ensure that the same plaintext being encrypted is not the same each time when it is converted to cipher text (Hinch, 2023).
The forth line returns the cipher text, tag, and nonce as a tuple.
Next, let’s examine the code for the key_derivation_function, which is the following:
# KDF generating a key given a password and salt.
@classmethod
def key_derivation_function(cls, password, salt):
# argument list being passed into scrypt hash function:
# arg0 - password
# arg1 - salt
# arg2 - iteration count
# arg3 - block size
# arg4 - number of threads
# arg5 - hash size
return scrypt.hash(password, salt, 2**12, 2**3, 1, 32)
The key_derivation_function will perform a secure and time consuming hash to generate a secure and symmetric AES KEK. In particular, the KDF being used here is a secure PBKDF. For more information about the arguments being passed to the scrypt.hash function, you can checkout the following article: https://blog.boot.dev/cryptography/very-basic-intro-to-the-scrypt-hash/.
Next, let’s examine the code for the hash_password function:
# Hash a Password
@classmethod
def hash_password(cls, password):
password_bytes = bytes(password, 'ascii')
salt = bcrypt.gensalt()
hash = bcrypt.hashpw(password_bytes, salt)
return hash
This hash_password method implementation uses bcrypt. The bcrypt hash function is another secure and time consuming hash function, and it will also store the hashed password and salt in the same string. Increasing the time consumption increases security because it makes it unfeasible to try to crack the password through brute force since it takes longer to compute a single hash (Grigutytė, 2023).
The ElasticsearchDb Class
The following is the code for the ElasticsearchDb class, which is stored in elasticsearchdb.py:
from elasticsearch import Elasticsearch
import os
class ElasticsearchDb:
def __new__(cls, host, port, protocol="http", ca_certs=None, username=None, password=None):
if not hasattr(cls, 'instance'):
cls.instance = super(ElasticsearchDb, cls).__new__(cls)
return cls.instance
def __init__(self, host, port, protocol="http", ca_certs=None, username=None, password=None):
if protocol is None:
raise Exception("Protocol cannot be None.")
if host is None:
raise Exception ("Host cannot be None.")
if port is None:
raise Exception("Port cannot be None.")
protocol = protocol.lower()
if protocol not in ["http", "https"]:
raise Exception("Invalid protocol: it must be http or https.")
kwargs = {}
if username is not None and password is not None:
kwargs["basic_auth"] = (username, password)
if ca_certs is not None:
kwargs["ca_certs"] = ca_certs
self.__es = Elasticsearch(str(protocol) + "://" + str(host) + ":" + str(port), **kwargs)
@property
def elasticsearch(self) -> Elasticsearch:
return self.__es
The above class follows the Singleton Object Oriented Programming (OOP) design pattern, which means that only one instance of the object can be instantiated during execution of the application.
For the above code to execute correctly, it needs the elasticsearch dependency, which can be installed with pip as follows:
pip install elasticsearch
The following code ensures that the class is a Singleton (“Singleton pattern in python – A complete guide”, 2020):
def __new__(cls, host, port, protocol="http", ca_certs=None, username=None, password=None):
if not hasattr(cls, 'instance'):
cls.instance = super(ElasticsearchDb, cls).__new__(cls)
return cls.instance
The following code sets the __es attribute of the ElasticsearchDb class, which is a connection to an Elasticsearch cluster:
def __init__(self, host, port, protocol="http", ca_certs=None, username=None, password=None):
if protocol is None:
raise Exception("Protocol cannot be None.")
if host is None:
raise Exception ("Host cannot be None.")
if port is None:
raise Exception("Port cannot be None.")
protocol = protocol.lower()
if protocol not in ["http", "https"]:
raise Exception("Invalid protocol: it must be http or https.")
kwargs = {}
if username is not None and password is not None:
kwargs["basic_auth"] = (username, password)
if ca_certs is not None:
kwargs["ca_certs"] = ca_certs
self.__es = Elasticsearch(str(protocol) + "://" + str(host) + ":" + str(port), **kwargs)
The host and port arguments represent the host of the Elasticsearch cluster and REST port it is running on. The remaining keyword arguments are for configuring the security of the connection, such as HTTPS, certificate authority certificates, and basic authentication. You can learn more about the significance of these parameters by reading my previous blog post on Securely and Programmatically Accessing Elasticsearch with curl and Python.
The User Model Class
The following is the code for the user model class stored in model.py:
class User:
def __init__(self, id, username, password, ssn):
self.__id = id
self.__username = username
self.__password = password
self.__ssn = ssn
@property
def id(self) -> str:
return self.__id
@property
def username(self) -> str:
return self.__username
@property
def password(self) -> str:
return self.__password
@property
def ssn(self) -> str:
return self.__ssn
This class is pretty self explanatory. The User model object encapsulates all the information about a User where a user has an id, username, password, and SSN. The User model class also has a set of getter methods to access each property.
The UserDao Class
The following is the code for the UserDao class stored in dao.py:
from model import User
from crypto import Crypto
import base64
from elasticsearchdb import ElasticsearchDb
class UserDao:
def __init__(self, es):
self.__es = es
def insert_user(self, user:User):
# 1. Generate the Data Encryption Key - used for encrypting user data.
data_encryption_key = Crypto.random_bytes(Crypto.AES_KEY_SIZE)
# 2. Encrypt the sensitive user data.
encrypted_user_ssn, ssn_tag, ssn_nonce = Crypto.encrypt(bytes(user.ssn, "ascii"), data_encryption_key)
# 3. Base 64 Encode the sensitive and encryted user data and the MAC and nonce.
b64_encrypted_user_ssn = base64.b64encode(encrypted_user_ssn).decode("ascii")
b64_ssn_tag = base64.b64encode(ssn_tag).decode("ascii")
b64_ssn_nonce = base64.b64encode(ssn_nonce).decode("ascii")
# 4. Generate the Key Encryption Key - used for encrypting the data encryption key.
kek_salt = Crypto.random_bytes(Crypto.SALT_LENGTH)
key_encryption_key = Crypto.key_derivation_function(user.password, kek_salt)
# 5. Encrypt the Data Encrytion Key with the Key Encryption Key
encrypted_dek, dek_tag, dek_nonce = Crypto.encrypt(data_encryption_key, key_encryption_key)
# 6. Base 64 Encode The Data Encryption Key and the MAC and nonce
b64_encrypted_dek = base64.b64encode(encrypted_dek).decode("ascii")
b64_dek_tag = base64.b64encode(dek_tag).decode("ascii")
b64_dek_nonce = base64.b64encode(dek_nonce).decode("ascii")
# 7. Hash the password
password_hash = Crypto.hash_password(user.password).decode("ascii")
# 8. Base 64 Encode the KEK salt
b64_kek_salt = base64.b64encode(kek_salt).decode("ascii")
# 9. Store the document in Elasticsearch.
resp = self.__es.index(index="user", id=user.id, document={
"username" : user.username,
"password" : password_hash,
"ssn" : b64_encrypted_user_ssn,
"ssn_tag": b64_ssn_tag,
"ssn_nonce" : b64_ssn_nonce,
"dek" : b64_encrypted_dek,
"dek_tag" : b64_dek_tag,
"dek_nonce" : b64_dek_nonce,
"kek_salt" : b64_kek_salt
})
return resp
if you read the above code with the comments, it’s pretty self-explanatory. One important thing to note that has not been mentioned yet is that the MAC and nonce are stored in the user document for decrypting the sensitive encrypted data and the encrypted DEK (Note: the SSN has it’s own MAC and nonce and the DEK has it’s own separate MAC and nonce).
Running the Proof of Concept
The following is some code that will run the aforementioned code and store a user document with sensitive information, which is the SSN (Note: The SSN used in this blog post is fake):
from model import User
from dao import UserDao
from elasticsearchdb import ElasticsearchDb
from dotenv import load_dotenv
import os
def get_user_dao():
es_password = os.environ.get("ELASTIC_PASSWORD")
esdb = ElasticsearchDb("localhost", 9200,
protocol="https",
ca_certs="~/elasticsearch-8.11.2/config/certs/http_ca.crt",
username="elastic",
password=es_password
)
es = ElasticsearchDb.elasticsearch.fget(esdb)
userDao = UserDao(es)
return userDao
def register_new_user(uid, username, password, ssn):
userDao = get_user_dao()
u = User(uid, username, password, ssn)
resp = userDao.insert_user(u)
return resp
def main():
load_dotenv()
es_response = register_new_user(1, "gary.drocella", "passw0rd", "000-00-0000") # insert user with fake SSN
print(es_response)
main()
Now, when you query for the user document stored on the Elasticsearch cluster, this is what a sample user document looks like:
{
"_index": "user",
"_id": "1",
"_version": 5,
"_seq_no": 14,
"_primary_term": 5,
"found": true,
"_source": {
"username": "gary.drocella",
"password": "$2b$12$eS8ObWbYlcjtPBIB51JvluNnU5VSLC42HuEBpPu6.Yvj/8JVxyyjm",
"ssn": "LLJCF1lrwSYpw+s=",
"ssn_tag": "+oXWPSSjkQHx2P7FvRTitw==",
"ssn_nonce": "WWFI+RGUEiXTNGqMgSkAAw==",
"dek": "49ffEKw8Dut23QvGsyhI5QpMlyGboC81FR5FRKvjnlg=",
"dek_tag": "XV3JA+E88Akfk+cL4jp+mg==",
"dek_nonce": "e0rfaYUk5FDsV1ok0C5wrA==",
"kek_salt": "ZlC9fQ1u4xowftR8FUINQzuJc1pO1hbtGb9WG6qWR+E="
}
}
This concludes Part 1 of this blog post. If you found this blog post helpful, then share, subscribe to my blog, and buy me a coffee! Part 2 of this two-part blog post can be found here.
References
AES encryption & decryption in Python: Implementation, modes & key management. (n.d.). Onboardbase.com. Retrieved January 4, 2024, from https://onboardbase.com/blog/aes-encryption-decryption/
AES-GCM authenticated encryption. (2023, January 1). Cryptosys.net. https://www.cryptosys.net/pki/manpki/pki_aesgcmauthencryption.html
Bernstein, C., & Cobb, M. (2021, September 24). Advanced Encryption Standard (AES). Security; TechTarget. https://www.techtarget.com/searchsecurity/definition/Advanced-Encryption-Standard
Choice of authenticated encryption mode for whole messages. (n.d.). Cryptography Stack Exchange. Retrieved January 4, 2024, from https://crypto.stackexchange.com/questions/18860/choice-of-authenticated-encryption-mode-for-whole-messages
Data Encryption Key. (2014). Techopedia.com. https://www.techopedia.com/definition/5660/data-encryption-key-dek
Dennis, Y. (2023, February 21). PyCryptodome: Secure your data with ease. Geek Culture. https://medium.com/geekculture/pycryptodome-secure-your-data-with-ease-4d70817fae7
Grigutytė, M. (2023, June 16). What is bcrypt and how does it work? NordVPN. https://nordvpn.com/blog/what-is-bcrypt/
Hinch, D. (2023, October 3). Understanding Nonces and Their Use in AES-GCM. Linkedin.com. https://www.linkedin.com/pulse/understanding-nonces-use-aes-gcm-derek-hinch/
How to encypt sensitive data in database of a web app? (n.d.). Information Security Stack Exchange. Retrieved January 4, 2024, from https://security.stackexchange.com/questions/166286/how-to-encypt-sensitive-data-in-database-of-a-web-app/166288
Key-encryption-key (KEK) – glossary. (n.d.). Nist.gov. Retrieved January 4, 2024, from https://csrc.nist.gov/glossary/term/key_encryption_key
Singleton pattern in python – A complete guide. (2020, October 30). GeeksforGeeks. https://www.geeksforgeeks.org/singleton-pattern-in-python-a-complete-guide/
What Are Key Derivation Functions? (2023, June 22). Baeldung.com. https://www.baeldung.com/cs/kdf-cryptography
What is a cryptographic “salt”? (n.d.). Cryptography Stack Exchange. Retrieved January 4, 2024, from https://crypto.stackexchange.com/questions/1776/what-is-a-cryptographic-salt
What is a Message Authentication Code (MAC)? (n.d.). Fortinet. Retrieved January 4, 2024, from https://www.fortinet.com/resources/cyberglossary/message-authentication-code
What is the difference between Pycrypto’s Random.get_random_bytes and a simple random byte generator? (n.d.). Stack Overflow. Retrieved January 4, 2024, from https://stackoverflow.com/questions/22395478/what-is-the-difference-between-pycryptos-random-get-random-bytes-and-a-simple-r