Encrypted Databases: a Private Low-Level Storage Model
We propose a model for database encryption such that user data remains private if the storage server peeks, and is tamper-evident if the storage server pokes, yet can be incrementally updated on a per-row basis. The only bad thing a storage server can do is deny data availability, which can be remediated by using multiple independent servers as redundant backup storage.
With the encryption protocol described below, a high-level database, that can have an arbitrarily complex schema with many tables, relations, indexes, etc., will be implemented on top of a low-level database, that can be a simple binary key-value store with fixed-size keys.
Row-Level Database Encryption
We want data to be encrypted as the level of every “row”, such that the database can be updated incrementally, i.e. each transaction only needs to modify the entries at stake (as well as corresponding index entries), and not other entries. In other words, local changes in the high-level database according to its abstract data model that users care about lead to local changes in the low-level data that is present in the encrypted data store.
Attack Model
- The user trusts the code the local client is running.
- Only each user knows their passphrase, that is only used on their trusted local client.
- The database is only usable if you know (one of) the users’ passphrases.
- An attacker who steals the encrypted database content can learn nothing about the plain text content except the number of rows and the distribution of row data sizes.
- An attacker who can watch databases updates can learn nothing about the plain text content except the size of updates, which keys have been mutated and thus correspond to mutable data, and which keys have been added, modified or removed in a same update.
- An attacker who can tamper the data cannot do so without making the tampering evident when the user tries to use the part of the data that was tampered with.
- The data can thus be stored on remote servers that are not trusted not to be hacked. The user only needs to trust that at least one of his locally stored or remotely served copies is up-to-date.
- If a user restarts his activity from an out-of-date copy of his data, he may lose recent updates indeed, but his further activity won’t open him to differential replay attacks between his multiple histories from the resume point.
Cryptographic Primitives
The protocol will assume the following cryptographic primitives:
- KD, a password-based key derivation function.
Consider enough iterations of PBKDF2-HMAC-SHA256, or scrypt, Argon2id, etc.
- SE, a symmetric encryption cipher used in CTR mode.
Consider Chacha20, AES-256, etc.
- HF, a cryptographic hash function.
Consider BLAKE3, etc.
- AS, an asymmetric cryptography signature protocol.
Consider ECDSA with secp256k1, etc.