How to Pythonically Store Sensitive and Encrypted Information in a Database For a Web Application the Right Way – Part 1

Spread the love

In this blog post, you will learn how to store sensitive information in a database the right way. This involves learning about data encryption keys, key encryption keys, key derivation functions, Advanced Encryption Standard (AES) 256-bit symmetric key encryption, and more all while using best software engineering practices. The particular database used in this blog post is Elasticsearch; however, the same cryptographic concepts used in this blog post apply to any relational database or NoSQL database.

This is part 1 of two blog posts. Part 1 goes into detail about the cryptographic concepts being used and how to store the encrypted data in Elasticsearch or any other database and Part 2 will go over how to fetch and decrypt the data being stored.

Prerequisite Cryptographic Concepts

Before diving into the code, the following are the prerequisite cryptographic concepts you must be aware of to follow along with this blog post:

Advanced Encryption Standard (AES) 256-bit – AES is a symmetric key block cipher algorithm for encrypting and decrypting data with a single key (Bernstein & Cobb, 2021).

Data Encryption Key (DEK) – The DEK is a cryptographic key used to encrypt and decrypt the data (“Data Encryption Key”, 2014).

Key Encryption Key (KEK) – The KEK is the cryptographic key used to encrypt and decrypt the DEK (“Key-encryption-key (KEK) – glossary”, n.d.).

Key Derivation Function (KDF) – “A KDF is a cryptographic algorithm designed to generate a secure secret key from a single key value” (“What Are Key Derivation Functions?”, 2023). In a Password Based KDF (PBKDF), the KDF takes a password, salt, difficulty level, and key size to generate a secret key.

Cryptographic Salt – A salt in cryptography is a random value used to make the hash of a password unique by appending or prepending the salt to the password before computing the hash (“What is a cryptographic salt?”, n.d.).

How to Securely Store Sensitive Information in a Database like Elasticsearch

Securely storing the sensitive information in a database can be done in eight steps (“How to encypt sensitive data in database of a web app?”, n.d.):

  1. Generate the DEK, which is used for encrypting sensitive data.
  2. Encrypt the sensitive data with the DEK (from step 1).
  3. Base 64 encode the encrypted data (from step 2).
  4. Generate the KEK with the user password and random salt, which is used for encrypting the DEK (from step 1).
  5. Encrypt the DEK (from step 1) with the KEK (from step 4).
  6. Base 64 encode the encrypted DEK (from step 5)
  7. Base 64 encode the KEK salt (random salt from step 4).
  8. Index the document that contains the base 64 encoded encrypted DEK (from step 6), KEK salt (from step 7), encrypted data (from step 3), and any other necessary attributes in Elasticsearch or other database.

It’s also important to note that when decryption occurs, which is discussed in more detail in the next blog post, the KEK can be regenerated when the user logs in and stored in a user session or the KEK can be stored, but it must be stored outside the database containing the encrypted DEK and encrypted data to increase security. For example, the KEK can be stored in a key management system.

A Proof of Concept

The Proof of Concept (PoC) in this blog post will be storing user documents that contain a username, hashed password, and an encrypted Social Security Number (SSN) in Elasticsearch in an index called users.

Designing the Proof of Concept

A logically cohesive class called Crypto will contain class methods that perform various cryptographic algorithms, such as encrypting data, decrypting data, and so on (Note: If you don’t know what cohesion is, than you can read my blog post on The 7 Types of Cohesion You Need To Know To Be The Best Software Engineer).

A model class called User with the properties id, username, password, and SSN will be created when fetching and storing users into the Elasticsearch database.

A Data Access Object (DAO) is used to interface with the Elasticsearch database when storing a user model object as a document in Elasticsearch and retrieving the user document and marshaling it into a user model object.

A class called ElasticsearchDb, which is a singleton class, that is used to initialize the Elasticsearch object once, which is used to get a database connection to Elasticsearch.

The Code for the Proof of Concept with Explanation

The Crypto Class

First, let’s go over the code for the Crypto class, which is stored in crypto.py:


from Crypto.Cipher import AES
from Crypto.Random import get_random_bytes

import scrypt
import bcrypt

class Crypto:
    # AES Key Size is 32 bytes or 256 bits.
    AES_KEY_SIZE = 32
    
    # Salt length is 32 bytes.
    SALT_LENGTH = 32

    # Generates a Cryptographically secure random sequence of bytes.
    @classmethod
    def random_bytes(cls, key_size):
        return get_random_bytes(key_size)

    # Encrypts data given an AES key.
    @classmethod
    def encrypt(cls, data, key):
        cipher = AES.new(key, AES.MODE_EAX)
        ciphertext, tag = cipher.encrypt_and_digest(data)
        nonce = cipher.nonce

        return (ciphertext,tag,nonce)

    # KDF generating a key given a password and salt.
    @classmethod
    def key_derivation_function(cls, password, salt):
        # argument list being passed into scrypt hash function:
        # arg0 - password
        # arg1 - salt
        # arg2 - iteration count
        # arg3 - block size
        # arg4 - number of threads
        # arg5 - hash size

        return scrypt.hash(password, salt, 2**12, 2**3, 1, 32)
    
    # Hash a Password
    @classmethod
    def hash_password(cls, password):
        password_bytes = bytes(password, 'ascii')
        salt = bcrypt.gensalt()

        hash = bcrypt.hashpw(password_bytes, salt)

        return hash
    

In order for this class to run properly you must install the scrypt, bcrypt, and pycryptodome packages using pip. The following pip command will install the required packages for this class:

$ pip install scrypt bcrypt pycryptodome

Let’s investigate each method of this class in more detail.

First, let’s examine random_bytes method, which has the following implementation:

    # Generates a Cryptographically secure random sequence of bytes.
    @classmethod
    def random_bytes(cls, key_size):
        return get_random_bytes(key_size)

This method calls the get_random_bytes function, which is a cryptographically secure random number generator, which generates key_size number of random bytes (“What is the difference between Pycrypto’s Random.get_random_bytes and a simple random byte generator?”, n.d.).

Next, the following is the implementation of the encrypt function:

    # Encrypts data given an AES key.
    @classmethod
    def encrypt(cls, data, key):
        cipher = AES.new(key, AES.MODE_EAX)
        ciphertext, tag = cipher.encrypt_and_digest(data)
        nonce = cipher.nonce

        return (ciphertext,tag,nonce)

The first line of this method will construct an AES object, which takes the symmetric key and mode of AES encryption, which is MODE_EAX in this case (“AES encryption & decryption in Python: Implementation, modes & key management”, n.d.). EAX stands for (Encrypt-then-authenticate-then-translate), which means it performs authenticated encryption (“Choice of authenticated encryption mode for whole messages”, n.d.).

The second line, invokes the encrypt_and_digest method and passes the data to encrypt, which returns the cipher text and authentication tag, which is also known as the Message Authentication Code (MAC), which is used to determine the data is authenticate and not tampered with (Dennis, 2023; “What is a Message Authentication Code (MAC)?”, n.d.).

The third line grabs the nonce and sets it to variable, which is an arbitrary number that is used to ensure that the same plaintext being encrypted is not the same each time when it is converted to cipher text (Hinch, 2023).

The forth line returns the cipher text, tag, and nonce as a tuple.

Next, let’s examine the code for the key_derivation_function, which is the following:

    # KDF generating a key given a password and salt.
    @classmethod
    def key_derivation_function(cls, password, salt):
        # argument list being passed into scrypt hash function:
        # arg0 - password
        # arg1 - salt
        # arg2 - iteration count
        # arg3 - block size
        # arg4 - number of threads
        # arg5 - hash size

        return scrypt.hash(password, salt, 2**12, 2**3, 1, 32)

The key_derivation_function will perform a secure and time consuming hash to generate a secure and symmetric AES KEK. In particular, the KDF being used here is a secure PBKDF. For more information about the arguments being passed to the scrypt.hash function, you can checkout the following article: https://blog.boot.dev/cryptography/very-basic-intro-to-the-scrypt-hash/.

Next, let’s examine the code for the hash_password function:

    # Hash a Password
    @classmethod
    def hash_password(cls, password):
        password_bytes = bytes(password, 'ascii')
        salt = bcrypt.gensalt()

        hash = bcrypt.hashpw(password_bytes, salt)

        return hash

This hash_password method implementation uses bcrypt. The bcrypt hash function is another secure and time consuming hash function, and it will also store the hashed password and salt in the same string. Increasing the time consumption increases security because it makes it unfeasible to try to crack the password through brute force since it takes longer to compute a single hash (Grigutytė, 2023).

The ElasticsearchDb Class

The following is the code for the ElasticsearchDb class, which is stored in elasticsearchdb.py:

from elasticsearch import Elasticsearch
import os

class ElasticsearchDb:
    def __new__(cls, host, port, protocol="http", ca_certs=None, username=None, password=None):
        if not hasattr(cls, 'instance'):
            cls.instance = super(ElasticsearchDb, cls).__new__(cls)

        return cls.instance

    def __init__(self, host, port, protocol="http", ca_certs=None, username=None, password=None):
        if protocol is None:
            raise Exception("Protocol cannot be None.")

        if host is None:
            raise Exception ("Host cannot be None.")
        
        if port is None:
            raise Exception("Port cannot be None.")
        
        protocol = protocol.lower()

        if protocol not in ["http", "https"]:
            raise Exception("Invalid protocol: it must be http or https.")

        kwargs = {}

        if username is not None and password is not None:
            kwargs["basic_auth"] = (username, password)

        if ca_certs is not None:
            kwargs["ca_certs"] = ca_certs
        
        self.__es = Elasticsearch(str(protocol) + "://" + str(host) + ":" + str(port), **kwargs)

    
    @property
    def elasticsearch(self) -> Elasticsearch:
        return self.__es

The above class follows the Singleton Object Oriented Programming (OOP) design pattern, which means that only one instance of the object can be instantiated during execution of the application.

For the above code to execute correctly, it needs the elasticsearch dependency, which can be installed with pip as follows:

pip install elasticsearch

The following code ensures that the class is a Singleton (“Singleton pattern in python – A complete guide”, 2020):

    def __new__(cls, host, port, protocol="http", ca_certs=None, username=None, password=None):
        if not hasattr(cls, 'instance'):
            cls.instance = super(ElasticsearchDb, cls).__new__(cls)

        return cls.instance

The following code sets the __es attribute of the ElasticsearchDb class, which is a connection to an Elasticsearch cluster:

    def __init__(self, host, port, protocol="http", ca_certs=None, username=None, password=None):
        if protocol is None:
            raise Exception("Protocol cannot be None.")

        if host is None:
            raise Exception ("Host cannot be None.")
        
        if port is None:
            raise Exception("Port cannot be None.")
        
        protocol = protocol.lower()

        if protocol not in ["http", "https"]:
            raise Exception("Invalid protocol: it must be http or https.")

        kwargs = {}

        if username is not None and password is not None:
            kwargs["basic_auth"] = (username, password)

        if ca_certs is not None:
            kwargs["ca_certs"] = ca_certs
        
        self.__es = Elasticsearch(str(protocol) + "://" + str(host) + ":" + str(port), **kwargs)

The host and port arguments represent the host of the Elasticsearch cluster and REST port it is running on. The remaining keyword arguments are for configuring the security of the connection, such as HTTPS, certificate authority certificates, and basic authentication. You can learn more about the significance of these parameters by reading my previous blog post on Securely and Programmatically Accessing Elasticsearch with curl and Python.

The User Model Class

The following is the code for the user model class stored in model.py:

class User:
    def __init__(self, id, username, password, ssn):
        self.__id = id
        self.__username = username
        self.__password = password
        self.__ssn = ssn
    
    @property
    def id(self) -> str:
        return self.__id

    @property
    def username(self) -> str:
        return self.__username

    @property
    def password(self) -> str:
        return self.__password

    @property
    def ssn(self) -> str:
        return self.__ssn

This class is pretty self explanatory. The User model object encapsulates all the information about a User where a user has an id, username, password, and SSN. The User model class also has a set of getter methods to access each property.

The UserDao Class

The following is the code for the UserDao class stored in dao.py:

from model import User
from crypto import Crypto

import base64
from elasticsearchdb import ElasticsearchDb

class UserDao:

    def __init__(self, es):
        self.__es = es

    def insert_user(self, user:User):
        # 1. Generate the Data Encryption Key - used for encrypting user data.
        data_encryption_key = Crypto.random_bytes(Crypto.AES_KEY_SIZE)

        # 2. Encrypt the sensitive user data.
        encrypted_user_ssn, ssn_tag, ssn_nonce = Crypto.encrypt(bytes(user.ssn, "ascii"), data_encryption_key)

        # 3. Base 64 Encode the sensitive and encryted user data and the MAC and nonce.
        b64_encrypted_user_ssn = base64.b64encode(encrypted_user_ssn).decode("ascii")
        b64_ssn_tag = base64.b64encode(ssn_tag).decode("ascii")
        b64_ssn_nonce = base64.b64encode(ssn_nonce).decode("ascii")

        # 4. Generate the Key Encryption Key - used for encrypting the data encryption key.
        kek_salt = Crypto.random_bytes(Crypto.SALT_LENGTH)
        key_encryption_key = Crypto.key_derivation_function(user.password, kek_salt)

        # 5. Encrypt the Data Encrytion Key with the Key Encryption Key
        encrypted_dek, dek_tag, dek_nonce = Crypto.encrypt(data_encryption_key, key_encryption_key)

        # 6. Base 64 Encode The Data Encryption Key and the MAC and nonce
        b64_encrypted_dek = base64.b64encode(encrypted_dek).decode("ascii")
        b64_dek_tag = base64.b64encode(dek_tag).decode("ascii")
        b64_dek_nonce = base64.b64encode(dek_nonce).decode("ascii")

        # 7. Hash the password
        password_hash = Crypto.hash_password(user.password).decode("ascii")

        # 8. Base 64 Encode the KEK salt        
        b64_kek_salt = base64.b64encode(kek_salt).decode("ascii")

        # 9. Store the document in Elasticsearch.
        resp = self.__es.index(index="user", id=user.id, document={
            "username" : user.username,
            "password" : password_hash,
            "ssn" : b64_encrypted_user_ssn,
            "ssn_tag": b64_ssn_tag,
            "ssn_nonce" : b64_ssn_nonce,
            "dek" : b64_encrypted_dek,
            "dek_tag" : b64_dek_tag,
            "dek_nonce" : b64_dek_nonce,
            "kek_salt" : b64_kek_salt
        })

        return resp

if you read the above code with the comments, it’s pretty self-explanatory. One important thing to note that has not been mentioned yet is that the MAC and nonce are stored in the user document for decrypting the sensitive encrypted data and the encrypted DEK (Note: the SSN has it’s own MAC and nonce and the DEK has it’s own separate MAC and nonce).

Running the Proof of Concept

The following is some code that will run the aforementioned code and store a user document with sensitive information, which is the SSN (Note: The SSN used in this blog post is fake):

from model import User
from dao import UserDao
from elasticsearchdb import ElasticsearchDb

from dotenv import load_dotenv
import os

def get_user_dao():    
    es_password = os.environ.get("ELASTIC_PASSWORD")

    esdb = ElasticsearchDb("localhost", 9200, 
        protocol="https", 
        ca_certs="~/elasticsearch-8.11.2/config/certs/http_ca.crt",
        username="elastic",
        password=es_password
    )

    es = ElasticsearchDb.elasticsearch.fget(esdb)
    userDao = UserDao(es)
    return userDao

def register_new_user(uid, username, password, ssn):
    userDao = get_user_dao()
    u = User(uid, username, password, ssn)
    resp = userDao.insert_user(u)
    
    return resp

def main():
    load_dotenv()
    es_response = register_new_user(1, "gary.drocella", "passw0rd", "000-00-0000") # insert user with fake SSN
    print(es_response)

main()

Now, when you query for the user document stored on the Elasticsearch cluster, this is what a sample user document looks like:

{
  "_index": "user",
  "_id": "1",
  "_version": 5,
  "_seq_no": 14,
  "_primary_term": 5,
  "found": true,
  "_source": {
    "username": "gary.drocella",
    "password": "$2b$12$eS8ObWbYlcjtPBIB51JvluNnU5VSLC42HuEBpPu6.Yvj/8JVxyyjm",
    "ssn": "LLJCF1lrwSYpw+s=",
    "ssn_tag": "+oXWPSSjkQHx2P7FvRTitw==",
    "ssn_nonce": "WWFI+RGUEiXTNGqMgSkAAw==",
    "dek": "49ffEKw8Dut23QvGsyhI5QpMlyGboC81FR5FRKvjnlg=",
    "dek_tag": "XV3JA+E88Akfk+cL4jp+mg==",
    "dek_nonce": "e0rfaYUk5FDsV1ok0C5wrA==",
    "kek_salt": "ZlC9fQ1u4xowftR8FUINQzuJc1pO1hbtGb9WG6qWR+E="
  }
}

This concludes Part 1 of this blog post. If you found this blog post helpful, then share, subscribe to my blog, and buy me a coffee! Part 2 of this two-part blog post can be found here.

Subscribe

* indicates required

Intuit Mailchimp

References

AES encryption & decryption in Python: Implementation, modes & key management. (n.d.). Onboardbase.com. Retrieved January 4, 2024, from https://onboardbase.com/blog/aes-encryption-decryption/

AES-GCM authenticated encryption. (2023, January 1). Cryptosys.net. https://www.cryptosys.net/pki/manpki/pki_aesgcmauthencryption.html

Bernstein, C., & Cobb, M. (2021, September 24). Advanced Encryption Standard (AES). Security; TechTarget. https://www.techtarget.com/searchsecurity/definition/Advanced-Encryption-Standard

Choice of authenticated encryption mode for whole messages. (n.d.). Cryptography Stack Exchange. Retrieved January 4, 2024, from https://crypto.stackexchange.com/questions/18860/choice-of-authenticated-encryption-mode-for-whole-messages

Data Encryption Key. (2014). Techopedia.com. https://www.techopedia.com/definition/5660/data-encryption-key-dek

Dennis, Y. (2023, February 21). PyCryptodome: Secure your data with ease. Geek Culture. https://medium.com/geekculture/pycryptodome-secure-your-data-with-ease-4d70817fae7

Grigutytė, M. (2023, June 16). What is bcrypt and how does it work? NordVPN. https://nordvpn.com/blog/what-is-bcrypt/

Hinch, D. (2023, October 3). Understanding Nonces and Their Use in AES-GCM. Linkedin.com. https://www.linkedin.com/pulse/understanding-nonces-use-aes-gcm-derek-hinch/

How to encypt sensitive data in database of a web app? (n.d.). Information Security Stack Exchange. Retrieved January 4, 2024, from https://security.stackexchange.com/questions/166286/how-to-encypt-sensitive-data-in-database-of-a-web-app/166288

Key-encryption-key (KEK) – glossary. (n.d.). Nist.gov. Retrieved January 4, 2024, from https://csrc.nist.gov/glossary/term/key_encryption_key

Singleton pattern in python – A complete guide. (2020, October 30). GeeksforGeeks. https://www.geeksforgeeks.org/singleton-pattern-in-python-a-complete-guide/

What Are Key Derivation Functions? (2023, June 22). Baeldung.com. https://www.baeldung.com/cs/kdf-cryptography

What is a cryptographic “salt”? (n.d.). Cryptography Stack Exchange. Retrieved January 4, 2024, from https://crypto.stackexchange.com/questions/1776/what-is-a-cryptographic-salt

What is a Message Authentication Code (MAC)? (n.d.). Fortinet. Retrieved January 4, 2024, from https://www.fortinet.com/resources/cyberglossary/message-authentication-code

What is the difference between Pycrypto’s Random.get_random_bytes and a simple random byte generator? (n.d.). Stack Overflow. Retrieved January 4, 2024, from https://stackoverflow.com/questions/22395478/what-is-the-difference-between-pycryptos-random-get-random-bytes-and-a-simple-r