Merkle Tree: Implementation in java and its real world applications

In this article, we are going to learn about a fascinating data structure called Merkle tree and its applications.

Merkle Tree
Hash Function
Implementation in Java
Applications in real world

Merkle Tree

Merkle tree is a tree data structure with leaf nodes and non leaf nodes. It also known as Hash tree. The reason behind it is it only stores the hashes in its nodes instead of data. In its leaf nodes, it will store the hash of the data. Non leaf nodes contain the hash of its children.

But, What is a hash? How do we generate it?

Hash Functions

A hash or hash-value or a message digest is an array of fixed size random characters generated when a message or data is passed through an a Mathematical algorithm.

These mathematical algorithms are one way functions meaning, we can generate the from input but not vice-versa. It has to be deterministic, meaning the input should always maps to same output regardless of the number of times it passed through it.

Y(I) = O, where

Y = Our hash function

I = Input

O = Output.

Even a small change in the input produces produces completely output. This property is also known as avalanche effect.

Y(I') = O'

These mathematical algorithms with the above properties are also known as Hash Functions.

Good Hash Functions also never generate same hash for two different messages. This property is known as collision resistance. These all properties of hash algorithm lets us finger print (uniquely identify) the message. This is known as Content based addressing.

There is one last property for the good hash functions. They should be very fast. They should provide the output in reasonable time.

Example

In the following example, I provided my blog name as input.

See, now I have added a single extra character and the resulting hash is completely different from the previous one.

You can check this for yourself here. Play around with this tool.

There are many different hash functions. Notable standards are SHA, SHA-2, SHA-3. They are again subdivided based on the digest sizes.

For example, SHA-256 generates 256-bit (32 byte) hash. So, there are 2^256 combinations which is approximately equal to the number of atoms in this universe. So, it is very hard to guess and also very hard to brute force it.

Here is an interesting video on 2^256 bit security from my favourite YouTube channel.

Remember, hashing is not equal to encryption. When we encrypt, we can decrypt the data. But with hashing, we don't do encryption just a unique mapping.

There are two types of encryption & decryption techniques. Read more about them below.

Let's Implement a merkle tree in Java.

Merkle Tree Implementation in Java

In this example implementation, We are going to implement binary merkle tree. As the first step, let's define the node. Like a regular tree, it has a data field to store the hash and left and right pointers to point to left child and right child of the binary tree.

We going to use the below dependency for the java implementations of the cryptography algorithms. However, we are going to use keccak 256 for our implementation.

Keccak 256 is part of SHA 3 (Secure Hash Algorithm) standard.

We are going to implement a complete binary tree. In case of odd nodes, we will consider the odd node twice to generate the hash of its parent.

Output

After generating the output, we are going to print the hashes in level order by doing a level order traversal.

The following is the output for the above example.

Now, let's change the s in Doctor strange to capital letter. Like Doctor Strange.

Attaching the input below for more clarity.

We will get complete different output for that data block. So, it means, its parent hash changes and all its ancestors hash value changes in the tree like below.

Applications

In general, a merkle tree separates data validation from the data itself, as a result it requires very little memory space - not the entire data only tree of hashes.

Merkle tree also provide a means to validate the integrity of the data with the help of hashes stored in it.

It also requires very less bandwidth to transmit over a network as it contains only fixed size hashes not the entire data.

Some of its applications are

Version control system like Git or mercurial where merkle trees of two different commits can be used to tell which all files changed.
In NoSQL databases like Apache Cassandra or Amazon dynamo DB, where the replicas needs to be consistent with each other. Here, one replica would send the merkle tree of its data and another replica contructs a merkle tree of its own data and it will find the difference between two replicas data. Then it will request only the chunks which changed. This communication of merkle tress saves a lot of network bandwidth.
In Peer to peer cyptocurrency networks like ethereum, bitcoin - it will calculate the root of all tranasaction hashes in a block. With this, there is no need to download the entire block to verify the transactions.

These are different applications of merkle tree in a brief manner. I will explain the applications of merkle tree in detailed in another post.

Keep Experimenting 🔎

Keep Learning 🚀

Pranay Bathini

Contents