In this article, we are going to learn about a fascinating data structure called Merkle tree and its applications.
Contents
- Merkle Tree
- Hash Function
- Implementation in Java
- Applications in real world
Merkle Tree
Merkle tree is a tree data structure with leaf nodes and non leaf nodes. It
also known as Hash tree. The reason behind it is it only stores the hashes
in its nodes instead of data. In its leaf nodes, it will store the hash of
the data. Non leaf nodes contain the hash of its children.
But, What is a hash? How do we generate it?
Hash Functions
A hash or hash-value or a message digest is an array of fixed size
random characters generated when a message or data is passed through an a Mathematical
algorithm.
These mathematical algorithms are one way functions meaning, we can
generate the from input but not vice-versa. It has to be deterministic,
meaning the input should always maps to same output regardless of the number
of times it passed through it.
Y(I) = O, where
Y = Our hash function
I = Input
O = Output.
Even a small change in the input produces produces completely output. This
property is also known as avalanche effect.
Y(I') = O'
These mathematical algorithms with the above properties are also known as
Hash Functions.
Good Hash Functions also never generate same hash for two different
messages. This property is known as collision resistance. These all
properties of hash algorithm lets us finger print (uniquely identify) the
message. This is known as Content based addressing.
There is one last property for the good hash functions. They should be very
fast. They should provide the output in reasonable time.
Example
In the following example, I provided my blog name as input.
See, now I have added a single extra character and the resulting hash is
completely different from the previous one.
You can check this for yourself
here. Play around with this tool.
There are many different hash functions. Notable standards are SHA,
SHA-2, SHA-3. They are again subdivided based on the digest sizes.
For example, SHA-256 generates 256-bit (32 byte) hash. So, there are 2^256
combinations which is approximately equal to the number of atoms in this
universe. So, it is very hard to guess and also very hard to brute
force it.
Here is an interesting video on 2^256 bit security from my favourite
YouTube channel.
Remember, hashing is not equal to encryption. When we encrypt, we can
decrypt the data. But with hashing, we don't do encryption just a unique
mapping.
There are two types of encryption & decryption techniques. Read more
about them below.
Let's Implement a merkle tree in Java.
Merkle Tree Implementation in Java
In this example implementation, We are going to implement binary merkle
tree. As the first step, let's define the node. Like a regular tree, it
has a data field to store the hash and left and right pointers to
point to left child and right child of the binary tree.
We going to use the below dependency for the java implementations of the
cryptography algorithms. However, we are going to use
keccak 256
for our implementation.
Keccak 256 is part of SHA 3 (Secure Hash Algorithm) standard.
We are going to implement a complete binary tree. In case of odd nodes, we
will consider the odd node twice to generate the hash of its
parent.
Output
After generating the output, we are going to print the hashes in level
order by doing a level order traversal.
The following is the output for the above example.
Now, let's change the s in Doctor strange to capital letter. Like
Doctor Strange.
Attaching the input below for more clarity.
We will get complete different output for that data block. So, it means,
its parent hash changes and all its ancestors hash value changes in the
tree like below.
Applications
In general, a merkle tree separates data validation from the data
itself, as a result it requires very little memory space - not the
entire data only tree of hashes.
Merkle tree also provide a means to validate the integrity of the data with the help of hashes stored in it.
It also requires very less bandwidth to transmit over a network as it contains only fixed size hashes not the entire data.
Some of its applications are
- Version control system like Git or mercurial where merkle trees of two different commits can be used to tell which all files changed.
- In NoSQL databases like Apache Cassandra or Amazon dynamo DB, where the replicas needs to be consistent with each other. Here, one replica would send the merkle tree of its data and another replica contructs a merkle tree of its own data and it will find the difference between two replicas data. Then it will request only the chunks which changed. This communication of merkle tress saves a lot of network bandwidth.
- In Peer to peer cyptocurrency networks like ethereum, bitcoin - it will calculate the root of all tranasaction hashes in a block. With this, there is no need to download the entire block to verify the transactions.
These are different applications of merkle tree in a brief manner. I will
explain the applications of merkle tree in detailed in another
post.
Keep Experimenting 🔎
Keep Learning 🚀
Post a Comment