Cointime

Download App
iOS & Android

Understanding How Merkle Trees Benefit Blockchains

Validated Individual Expert

A Merkle tree is a common term that you will come across as you start reading more technically advanced topics on blockchain structures. However, context on Merkle trees is rarely provided as the topic is usually never a central point of discussion, and when writers do attempt to explain them, proper understanding can get thwarted by convoluted language fast. And I’m being nice — frankly, summary descriptions on the topic from even the most respected sources are horrible for beginners. They jump right into cryptographic hash functions, leaf nodes, and other foreign terms immediately which defeats the purpose for many readers.

I hope to change this state of affairs by providing an overview on the topic in simple terms and with appropriate context. Read this, meet the term head on, and move on to more interesting topics which may very well leverage the benefits that Merkle trees provide.

This article focuses on Merkle trees as they apply to the Bitcoin network and is structured as follows:

  • Context on Blockchain Structure. An introduction to “block” transactions and block headers is provided for context on blockchain structure, which helps in later understanding the specific benefits that Merkle trees provide.
  • Merkle Trees. Merkle trees and their component parts are introduced and explained.
  • Benefits of Merkle Trees. A few important benefits that Merkle trees provide blockchains are explored.
Image Credit: Cryptopedia

“Block” Transactions

First, at a very high level, let’s take a step back and think about how blockchains process transactions. Taking the Bitcoin network as an example — any transaction that occurs on the network is recorded via a “block.” If I wanted to send a counterparty bitcoin, this transaction would be included in a block for purposes of approval by the network before it can be sent. Each block is typically comprised of hundreds or thousands of transactions like this one from different people. Once a block is “mined” and accepted by the network, it is linked to the previously accepted block (in a chain of data), and all transactions within that block are confirmed. Each block contains information about the date and time it was added to the network as well as the particular transactions included within it, and are linked securely together to prevent any block from being altered or moved out of order in the existing chain — hence the immutable nature of a ‘blockchain.’

  Image Credit: Medium publication


Block Headers

Every block that is accepted on the Bitcoin network can be identified by a unique hash, which is just a 64-character identifier (or name) for that block comprised of both random letters and numbers. For example, 0000000000000000000590fc0f3eba193a278534220b2b37e9849e1a770ca959 is the hash for block #700,000 on the Bitcoin network (remember, blocks occur in a sequential chain, so this is the 700,000th block after genesis block #0). If you search for a specific hash on explorers like Blockstream Explorer, details regarding that particular block are provided.

The hash for a block is created by “hashing” what is known as the ‘block header’ for that block. Hashing in this context simply means taking data as input (i.e., the block header), scrambling it, and spitting out a unique fixed-length output (i.e., the 64-character block hash). Bitcoin miners all over the world are competing against each other hashing the block header countless times to solve for the correct hash set by Bitcoin’s proof-of-work algorithm. As of writing, the reward for the first miner who solves the correct hash is 6.25 BTC, after which the process starts again for the next block.

Simple enough. But what exactly is the block header, which importantly serves as the input used to create each unique block hash? Think of the block header as a summary of all the important items relating to each block (mining detail, transactions included within it, etc.). This summary comes in the form of a data field and contains six items:

  • Version number
  • Previous block’s hash
  • Merkle root of the transactions included in the block
  • Timestamp
  • Mining difficulty value
  • Proof of work nonce

This article will not cover each of these items in detail, but what is great is that each of these defining items can be succinctly captured and strung together in a message for purposes of hashing the block header. See below for an example in what is known as hexadecimal format (the format itself is not important for our purposes).

  Bitcoin block header for block #645,536. Image Credit: Medium publication


The individual colors in the above message correspond to each of the six items in the block header noted previously (in order). Once hashed and accepted by the network, these inputs ultimately define the block as well as subsequent blocks that come after (given this block’s resulting hash will be an input for the next block’s header and so on).

Merkle Trees

A Merkle tree is a data structure that allows us to summarize a large data set in an efficient manner. As you may have guessed, as applied to blockchains, Merkle trees are used to summarize a large number of blockchain transactions (such as the ones occurring within each Bitcoin block) in a data-efficient way.

This ‘summary’ is accomplished once again through hashing. Take the below which is representative of a basic Merkle tree structure.

Image Credit: Bitpanda

As applied to a Bitcoin block, the leaf nodes at the bottom of this structure represent the transaction IDs (or hashes) of every individual transaction included within a block. This layer is essentially the data dump. Every node above the various leaf nodes represents a hash of the pair beneath it. Note that a “node” here is just referring to an individual part of the larger data structure.

As explained before, you can think of hashing as a way to convert data inputs of any size into unique outputs of fixed size. See below purely as an illustration. No matter the size or slight variation in the input on the left-hand side, a hash function will yield a totally different output of fixed size on the right.

  Image Credit: Wikipedia


A similar process applies with Merkle trees. Each leaf is paired with another and hashed to create a layer of hashed nodes above it. These second-layer nodes in the tree do not contain transaction IDs, but rather store the hash of the two leaf nodes below them which they now represent. This paired hashing process is continued up the tree until only one node remains — which is called the Merkle root.

The Merkle root represents the hashes of all the individual nodes and transactions beneath it. If a single detail in any of the transactions / leaves changes, so does the Merkle root (recall what happens when we changed inputs in the hashing example above). If the transactions (or leaves) remain identical but are listed in a different order at the bottom of the tree, the Merkle root will also change (since this again affects the hash inputs for some of the non-leaf nodes). As a result, the Merkle root serves as cryptographic proof of which transactions are in a block, and which order they are in.

Let’s next see how this summary of data benefits blockchains.

How Merkle Trees Benefit Blockchains

As you may recall, one of the items included in each block header is the Merkle root of the transactions in that block:

Image Credit: Ethereum Foundation blog

As previously explained, the Merkle root of a block is a digital fingerprint of the entire set of transactions included in that block. Now you may ask — why not just hash the 1000 or so transactions in a block into a single string and get the value of the root this way? This process would avoid the complicated tree structure altogether and still represents all of the transactions in a single hash. As you will see, the answer concerns data size.

Verifying Blockchain Data

Consider the consequences if we were to replace Merkle roots with a single hash of all the transactions in a block. If you ever wanted to confirm the integrity of a transaction that occurred in a block (e.g., that it actually occurred in X block at position X), with a single hash approach you would need to know and examine every transaction ID in the associated block (since the single layer of data underlying the hash includes all transaction IDs, which you would have to examine to confirm). This authentication process requires a lot of memory to be stored and transmitted across the network.

With Merkle trees, the verification process is more efficient and does not require downloading the data of all transactions that occurred in a block. By just knowing the transaction ID (or leaf) in question, the Merkle root, and the “branch” consisting of all of the hashes up the path from the leaf to the root, data can be verified. These Merkle “proofs” allow for the efficient authentication of a small amount of data (like a single transaction in a block) within large databases of potentially unbounded size. In other words, Merkle trees and proofs provide a much quicker and simpler test of whether a particular transaction is included within a block.

Image Credit: Radix Blog

Simplified Payment Verification / Light Clients

Let’s extend the verification benefits introduced in the previous section to a more familiar concept — peer-to-peer transactions of crypto among people like me and you. Without the benefits that Merkle proofs provide in terms of verifying individual transactions, it would be extremely difficult for us to send and receive crypto from our personal wallets over our phones and computers.

Satoshi Nakamoto actually introduced this benefit in Section 8 of his Bitcoin whitepaper, and described the concept as “simplified payment verification” (SPV).

When a sender sends bitcoin to a recipient, this is a transaction between “nodes” (i.e., computers that are connected to the Bitcoin network that serve as pillars in a properly functioning network). Each of our crypto wallets is therefore a node. In order for a wallet to verify and process a payment transaction such as this, it is necessary to have certain knowledge of the entirety of the Bitcoin network. For example, one must be able to link the transaction to the particular block in the network it is included in. Otherwise, verification cannot occur.

As to the required knowledge needed to run nodes, what is known as “full” nodes store complete copies of the Bitcoin network on their devices, which obviously requires a very large amount of storage space. However, there are also what is known as “light” nodes (or light clients). Instead of storing complete copies of the network, light clients only download copies of the block headers, which as we know only contain the Merkle root of a block’s transactions. Knowing what we do about Merkle roots, it should be clear that light clients are also able to verify individual payments in a network. They accomplish this by obtaining the Merkle proof that links a particular transaction to the block it is timestamped in.

“A [SPV] user only needs to keep a copy of the block headers of the longest proof-of-work chain, which he can get by querying network nodes until he’s convinced he has the longest chain, and obtain the Merkle branch linking the transaction to the block it’s timestamped in.” — Satoshi Nakamoto, Section 8 of the Bitcoin whitepaper

How much storage space is saved by SPV and the use of light clients, which are made possible by Merkle trees? As of writing, a wallet could store all necessary block headers in around 61MB which covers the entire Bitcoin network (with 80 bytes per block and around 765,000 blocks in the chain). Contrast this with the hundreds of gigabytes (1 GB is equal to 1000 MB) that would be required to store the entire chain if SPV were not being used.

Imagine having to store hundreds of GB on your phone just to send and receive bitcoin using a mobile wallet. The efficiencies and practical applications made possible by Merkle trees become obvious.

Image Credit: Cointelegraph

Conclusion

Having read this article, it should be clear that Merkle trees play a key technical component in Bitcoin’s structure and partly enable people around the world to send, receive, and verify transactions with crypto wallets that can be easily run through personal computers or smartphones. The efficiencies that Merkle trees provide in data storage, transfer, and verification are profound, and the structure used in Bitcoin has influenced numerous other public networks that have come after. Some, like Ethereum, have since taken the concept even further — although that is a discussion for another day.

References

Comments

All Comments

Recommended for you

  • Web3 AI training company FLock raises $6 million in seed funding

    Web3 artificial intelligence training company FLock has raised $6 million in seed funding led by Lightspeed Faction and Tagus Capital. FLock will use these funds to develop its team and build a federated learning-driven artificial intelligence training platform.

  • Prisma: Vault owners need to prohibit delegation of contracts related to LST and LRT

    The LSD stablecoin protocol Prisma Finance stated in a post that for vault owners, please prohibit delegating authorization of the LST contract starting with 0xcC72 and the LRT contract starting with 0xC3eA.

  • MAS: Singapore is working on global first-tier fund tokenization regulation

    Chia Der Jiun, Managing Director of the Monetary Authority of Singapore, introduced some fund tokenization pilots at an event for asset managers. These pilots are part of the Project Guardian and MAS Global Layer 1 (GL1) tokenization plans. Chia Der Jiun emphasized the advantages of tokenization in real-time settlement and process automation, which can improve efficiency and achieve greater customization of funds. UK asset management company Schroders and fund distribution platform Calastone are exploring this as part of the Project Guardian public blockchain trial in Singapore. A recent survey by Calastone showed that 96% of asset management companies in the Asia-Pacific region plan to launch tokenized products within three years. Chia stated that as these Project Guardian pilot projects approach commercialization, MAS is working with the pilot project managers to study the legal and regulatory treatment and impact of tokenized investment funds."

  • Indonesia's Financial Services Authority to Regulate Crypto Industry in 2025 with Evaluation in Regulatory Sandbox

    Indonesia's Financial Services Authority (OJK) will take over regulation of the crypto industry from the commodities agency Bappebti. Crypto firms must undergo evaluation in a regulatory sandbox before being licensed to operate in the country. The OJK aims to prioritize consumer protection and education, and firms operating without evaluation in the sandbox will be considered illegal. The sandbox provides a safe and isolated environment for testing and innovation development, helping to enhance security and responsible management in the financial sector. Once under OJK's oversight, crypto assets will likely be reclassified as financial instruments.

  • The Shenzhen Illegal Fund Raising Prevention Office issued a risk warning on the "DDO digital options" business

    The Shenzhen Office for Preventing and Dealing with Illegal Fundraising issued a risk warning regarding the "DDO digital option" business. The activities related to the DDO digital option business conducted in the name of Dingyifeng International are essentially the issuance and trading of virtual currencies. According to the "Notice on Further Preventing and Dealing with Risks of Speculation in Virtual Currency Trading" jointly issued by ten departments including the People's Bank of China in September 2021, it is clear that virtual currency-related business activities are illegal financial activities, and overseas virtual currency exchanges providing services to residents within China are also illegal financial activities. The activities conducted by Dingyifeng International in the name of serving residents within China are suspected of illegal fundraising and other illegal financial activities. Our office has organized relevant departments to carry out work, resolutely deal with illegal fundraising and criminal activities, and seriously investigate the legal responsibilities of relevant personnel. (Shenzhen Local Financial Supervision and Administration Bureau)

  • The Hong Kong Legislative Council plans to review the relevant stable currency consultation and sandbox legislation at the end of this year or next year

    Hong Kong legislator Wu Jiezhuang revealed that Hong Kong will release stablecoin consultation and sandbox (computer security mechanism), which will allow the industry to innovate digital asset projects in the sandbox environment. Relevant legislation will be reviewed in the Legislative Council at the end of this year or next year, which will help the entire digital asset industry ecosystem. Hong Kong has been improving the digital asset (virtual asset) market on different legal levels. Last year, there were regulations on virtual currency trading platforms and issuance systems.

  • Vitalik: Humanity needs to create a world where blockchain and artificial intelligence work together

    Vitalik Buterin, the founder of Ethereum, stated at BiddleAsia 2024 held at Signiel Seoul in the Songpa district on March 28 that artificial intelligence is a huge market and its importance is increasing day by day. We need to create a world where blockchain and artificial intelligence work together. Artificial intelligence can now create applications with 100 to 500 lines of code. Vitalik also stated that the ability to write 10,000 lines of code can eliminate most of the bugs in the Ethereum virtual machine.

  • South Korean RWA blockchain technology development company PARAMETA completed a new round of financing of approximately US$7.5 million

    South Korean RWA blockchain technology development company PARAMETA announced the completion of a new round of financing of KRW 9 billion (approximately $7.5 million), with Shinhan Hyperconnect Investment Fund under Shinhan Venture Investment and Korea Asset Investment & Securities participating. As of now, the company's total financing has reached KRW 25 billion (approximately $20.8 million). PARAMETA plans to use this investment to expand its own blockchain technology research and development capabilities to meet RWA technology needs and expand from core technologies such as engines/chains to service applications. Relevant services are expected to be launched within the year.

  • Incheon, South Korea launches blockchain hub city

    South Korea announced on the 28th that it will establish a blockchain technology innovation support center in the Songdo Michu Building in the second half of this year. Incheon was finally selected as a participant in the "2024 Regional Blockchain Technology Innovation Support Center Construction Project" jointly organized by the Korean Ministry of Science, ICT and Future Planning and the Korea Internet & Security Agency (KISA). Incheon is the third region to be selected after Busan and Daegu. In February last year, Incheon established a dedicated blockchain department and formulated a four-year plan to create a blockchain center city, which was promised by Incheon Mayor Liu Zhengfu. After being selected, Incheon will receive KRW 1.8 billion in government funding.

  • BTC breaks through $70,000

    The market shows that BTC has broken through $70,000 and is now reporting at $70,003.6. The intraday decline has reached 0.58%, and the market fluctuation is large. Please be prepared for risk control.