The title may sound as a paper title, rather than a blogpost, because it was originally an idea for such, but I’m unlikely to find the time to put a proper paper about it, so here it is – a blogpost.
Blockchain has been touted as the ultimate integrity guarantee – if you “have blockchain”, nobody can tamper with your data. Of course, reality is more complicated, and even in the most distributed of ledgers, there are known attacks. But most organizations that are experimenting with blockchain, rely on a private network, sometimes having themselves as the sole owner of the infrastructure, and sometimes sharing it with just a few partners.
The point of having the technology in the first place is to guarantee that once collected, data cannot be tampered with. So let’s review how that works in practice.
First, we have two define two terms – “tamper-resistant” (sometimes referred to as tamper-free) and “tamper-evident”. “Tamper-resistant” means nobody can ever tamper with the data and the state of the data structure is always guaranteed to be without any modifications. “Tamper-evident”, on the other hand, means that a data structure can be validated for integrity violations, and it will be known that there have been modifications (alterations, deletions or back-dating of entries). Therefore, with tamper-evident structures you can prove that the data is intact, but if it’s not intact, you can’t know the original state. It’s still a very important property, as the ability to prove that data is not tampered with is crucial for compliance and legal aspects.
Blockchain is usually built ontop of several main cryptographic primitives: cryptographic hashes, hash chains, Merkle trees, cryptographic timestamps and digital signatures. They all play a role in the integrity guarantees, but the most important ones are the Merkle tree (with all of its variations, like a Patricia Merkle tree) and the hash chain. The original bitcoin paper describes a blockchain to be a hash chain, based on the roots of multiple Merkle trees (which form a single block). Some blockchains rely on a single, ever-growing merkle tree, but let’s not get into particular implementation details.
In all cases, blockchains are considered tamper-resistant because their significantly distributed in a way that enough number of members have a copy of the data. If some node modifies that data, e.g. 5 blocks in the past, it has to prove to everyone else that this is the correct merkle root for that block. You have to have more than 50% of the network capacity in order to do that (and it’s more complicated than just having them), but it’s still possible. In a way, tamper resistance = tamper evidence + distributed data.
But many of the practical applications of blockchain rely on private networks, serving one or several entities. They are often based on proof of authority, which means whoever has access to a set of private keys, controls what the network agree on. So let’s review the two cases:
- Multiple owners – in case of multiple node owners, several of them can collude to rewrite the chain. The collusion can be based on mutual business interest (e.g. in a supply chain, several members may team up against the producer to report distorted data), or can be based on security compromise (e.g. multiple members are hacked by the same group). In that case, the remaining node owners can have a backup of the original data, but finding out whether the rest were malicious or the changes were legitimate part of the business logic would require a complicated investigation.
- Single owner – a single owner can have a nice Merkle tree or hash chain, but an admin with access to the underlying data store can regenerate the whole chain and it will look legitimate, while in reality it will be tampered with. Splitting access between multiple admins is one approach (or giving them access to separate nodes, none of whom has access to a majority), but they often drink beer together and collusion is again possible. But more importantly – you can’t prove to a 3rd party that your own employees haven’t colluded under orders from management in order to cover some tracks to present a better picture to a regulator.
In the case of a single owner, you don’t even have a tamper-evident structure – the chain can be fully rewritten and nobody will understand that. In case of multiple owners, it depends on the implementation. There will be a record of the modification at the non-colluding party, but proving which side “cheated” would be next to impossible. Tamper-evidence is only partially achieved, because you can’t prove whose data was modified and whose data hasn’t (you only know that one of the copies has tampered data).
In order to achieve tamper-evident structure with both scenarios is to use anchoring. Checkpoints of the data need to be anchored externally, so that there is a clear record of what has been the state of the chain at different points in time. Before blockchain, the recommended approach was to print it in newspapers (e.g. as an ad) and because it has a large enough circulation, nobody can collect all newspapers and modify the published checkpoint hash. This published hash would be either a root of the Merkle tree, or the latest hash in a hash chain. An ever-growing Merkle tree would allow consistency and inclusion proofs to be validated.
When we have electronic distribution of data, we can use public blockchains to regularly anchor our internal ones, in order to achieve proper tamper-evident data. We, at LogSentinel, for example, do exactly that – we allow publishing the latest Merkle root and the latest hash chain to Ethereum. Then even if those with access to the underlying datastore manage to modify and regenerate the entire chain/tree, there will be no match with the publicly advertised values.
How to store data on publish blockchains is a separate topic. In case of Ethereum, you can put any payload within a transaction, so you can put that hash in low-value transactions between two own addresses (or self-transactions). You can use smart-contracts as well, but that’s not necessary. For Bitcoin, you can use OP_RETURN. Other implementations may have different approaches to storing data within transactions.
If we want to achieve tamper-resistance, we just need to have several copies of the data, all subject to tamper-evidence guarantees. Just as in a public network. But what a public network gives is is a layer, which we can trust with providing us with the necessary piece for achieving local tamper evidence. Of course, going to hardware, it’s easier to have write-only storage (WORM, write once, ready many). The problem with it, is that it’s expensive and that you can’t reuse it. It’s not so much applicable to use-cases that require short-lived data that requires tamper-resistance.
So in summary, in order to have proper integrity guarantees and the ability to prove that the data in a single-owner or multi-owner private blockchains hasn’t been tampered with, we have to send publicly the latest hash of whatever structure we are using (chain or tree). If not, we are only complicating our lives by integrating a complex piece of technology without getting the real benefit it can bring – proving the integrity of our data.
The post Integrity Guarantees of Blockchains In Case of Single Owner Or Colluding Owners appeared first on Bozho's tech blog.