Goyalayus

Notes, essays, and fragments from the edge of understanding.

Blockchain Internals for Developers

August 15, 2025

Original Substack post

So let’s start with what exact problem Bitcoin is solving in this world.

Bitcoin is a decentralized monetary network where rules are enforced by code and a distributed set of nodes, not by any single institution.

But what is the problem with money being controlled by a central authority? It helps to

1. Identify fraud 2. Control inflation.

In the same way they identify fraud and freeze their accounts, they can freeze accounts of protesters protesting against them

And the USA government often prints money to control inflation, but it also prints money to bail out insiders (the 2008 crash) and pay its loans.

How do we create Decentralized money?

First, we have to understand what can be considered as money. For any object to be money, it should have three properties

  1. it should be scarce,

  2. it should be easily transferable

  3. it should have inherent value

Seashells were used as money historically because they were scarce, easily transferable, and had value as they could be used as jewelry. Once people figured out how to mine a lot of seashells from the ocean, it lost its value as money. This is the risk with printing dollars: it can lose meaning.

Gold is considered money because of these three things.

dThe ollar has an artificial value because it is backed by the us gov

Why Decentralized

the issue with centralized storage or database is, the central authority can change values inside it and no-one else can verify. we can just trust.

which decentralization a lot of people have the same database, so to change anything in the database all the people will have to make that change that is the reason blockchain is so hard to hack and is considered very secure.

so enough with the theory and let’s see how blockchain in implemented in practice.

Basic flow of blockchain

blockchain is nothing but a database of all the transactions happening, but with the twist that the database is stored in multiple computers of different people instead of a centralized database.

These people are called miners. Whenever you do a transaction, they verify your transaction, add your transaction to their computer database, and tell other miners to add the transaction to their databases.

Now, one basic question that should come to your mind is what if the miner verifying your transaction behaves maliciously and modifies the transaction to send your money to their personal account.

These are some challenges we will be discussing one by one. Don’t worry if everything feels very vague as of now; we will soon make everything concrete.

_Block_chain

Let’s forget about the chain as of now and just focus on the block

a block is an object ( in computer science language) that stores some data, which consists of

  1. Block Number: An ID number to identify the block (e.g., Block #1, Block #2).

  2. Timestamp: The exact time the block was created.

  3. Data: The actual information we want to store. This could be anything – medical records, transaction details, a contract, or just the words "Hello, world!".

  4. Hash: This is the most critical part coming ahead.

The Concept of a "Hash"

A hash function is a mathematical process that takes any input data—no matter how large or small—and converts it into a unique, fixed-length string of characters. This output string is called the hash.

For this blog, we will use a famous and real hash function called SHA-256 (Secure Hash Algorithm 256-bit). You don't need to know the math behind it, but you MUST understand its properties:

  1. Deterministic**:** If you put the exact same data into the function, you will always get the exact same hash out.

    1. SHA256("Hello") -> 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969

    2. If you run SHA256("Hello") a million times, you will get that same hash a million times.

  2. Extreme Sensitivity**:** If you change even a single tiny bit of the input data, the output hash will change completely and unpredictably.

  3. One-Way Function**:** It is incredibly easy to calculate a hash from data. However, it is practically impossible to take a hash and figure out what the original data was

Creating Our First Block

  • Instantiate the Block: We decide to create a new block with the number 1 and the data "Alice pays Bob $10".

    my_first_block = NEW Block(block_number: 1, data: "Alice pays Bob $10")

  • Initialization: Inside the block, the initialize function runs:

    • block_number is set to 1.

    • Data is set to "Alice pays Bob $10".

    • timestamp is set to the current time, let's say 2023-10-27-10:00:00.

  • Hash Calculation: Now, the most important step. The calculate_this_blocks_hash() function is called.

    • It combines the block's information into a single piece of text: "12023-10-27-10:00:00Alice pays Bob $10"

    • It feeds this entire string into the SHA-256 function: SHA256("12023-10-27-10:00:00Alice pays Bob $10")

    • This produces a unique hash. Let's imagine it's: 0a2b1c3d... (just an example).

    • This hash, 0a2b1c3d..., is now stored as the hash property of my_first_block

Our first block is now complete and sealed!

If Eve changes the data to "Eve pays Bob $100," the hash input changes, producing a completely different hash instead of 0a2b1c3d…, making the block internally consistent yet tamper-evident.

but what if she temper’s with the hash and changes it to the hash of "Eve pays Bob $100," afterall SHA-256 is a publicly available algorithm?

The Solution: Linking Blocks with Hashes. Every new block must now contain the hash of the block that came before it, and the hash of the individual blocks would consider the hash of the previous block as the input. So if you change the hash of block #99, the hash of block #100 automatically becomes invalid; you would have to recalculate it.

So if Eve wants to change the data of block #2, and there are a total of 100 blocks in the chain, she would also have to change the hash of all the blocks till #100.

But that’s not the full solution, sha 256 takes seconds to run, she could easily recalculate the hash of 100 blocks. What could be the solution to this?

make it harder to calculate the hash

We will introduce an artificial rule: A hash is only considered "valid" if it meets a specific, arbitrary condition.

The most common condition is this: The hash must start with a certain number of leading zeros.

For example, we could declare that for a block to be valid, its hash must start with four zeros, like 0000a8fde....

Why does this make it difficult? Remember the properties of a hash function:

  • You cannot predict the output.

  • A tiny change to the input completely changes the output.

This means there is no shortcut to finding a hash that starts with 0000. The only way is by brute-force trial and error: you have to keep trying different inputs until you get lucky and find one that produces the desired hash.

This process of searching for a valid hash is called "Proof of Work".

But what can we change in the input to get a new hash? We can't change the block number, the data, or the previous hash, because that would be tampering!

So, we add one more piece of data to our block. It's a special number whose only purpose is to be changed during the mining process. It's called a Nonce, which stands for "Number used once."

The mining process now looks like this:

  1. Combine all the block's data (block_number, timestamp, data, previous_hash) with a nonce starting at 0.

  2. Calculate the hash of this combined information.

  3. Check the hash. Does it start with the required number of zeros (e.g., 0000)?

  4. If no: Increment the nonce to 1, and go back to step 2.

  5. If no: Increment the nonce to 2, and go back to step 2.

  6. ...and so on, potentially for millions or billions of attempts, until a hash that satisfies the rule is found.

  7. If yes: Success! The block is "sealed." The nonce that resulted in the correct hash is now locked into the block, proving that the work was done.


one natural question comming to your mind would be

okay it’s impossible to change previous blocks of blockchain but what if the miner tries to change the data of the latest block. he can just write a transaction to your behalf to him

Proving Identity without a Password

In a centralized system like your bank, you prove your identity with a username and password. But in a decentralized system with no central server to store passwords, how do you prove you own your funds?

1. The Solution: A Pair of Cryptographic Keys

Every user on the network generates a key pair: a Private Key and a Public Key.

  • Private Key: This is a very large, secret number. Think of it as the ultimate password. It must be kept absolutely secret by its owner. The private key has one critical function: to create digital signatures.

  • Public Key: This is another very large number that is mathematically derived from your private key. You can share your public key with anyone. It's safe. The public key has two functions:

    1. To act as your "address." People send money to your public key.

    2. To verify a digital signature.

The Unbreakable Rule: It is easy to calculate the public key if you have the private key. It is computationally impossible to figure out the private key just by looking at the public key. This one-way relationship is the foundation of all user security on the blockchain.

2. How a Real Transaction is Made

Let's revisit "Alice pays Bob," but this time with the correct mechanics.

  1. Alice's Intent: Alice wants to send 2 BTC to Bob. She constructs a transaction message that says, in essence: "2 BTC to Bob's public key from my pulic address"

  2. Creating the Signature: Alice then takes this exact transaction message and uses her private key to sign it. This process creates a unique string of characters called a digital signature. This signature is a cryptographic proof that the owner of the private key has seen and approved that specific message.

  3. Broadcasting the Package: Alice broadcasts a package to the network containing three things:

    • The transaction message ("2 BTC to Bob's public key from my pulic address").

    • Her public key (so everyone knows who is claiming to send the funds).

    • The digital signature she just created.

3. How Your "Fake Transaction" Attack Fails

Now, let's put you in the shoes of the attacker, Eve. You are running your custom software, "EveNode." You want to add a fake transaction to the block you are mining: "Alice pays Eve 10 BTC."

Here's why you will fail:

  1. Constructing the Fake Transaction: You create the transaction message just fine. ("Alice pays Eve 10 BTC.")

  2. The Missing Piece: Now, you must provide a digital signature to prove that Alice authorized this. But to create that signature, you need Alice's Private Key.

  3. The Failure: You don't have her private key. It is secret. You cannot guess it. You cannot reverse-engineer it from her public key. It is impossible for you to generate the correct digital signature for this transaction.

The Network's Verification Process:

When you mine a block containing your fake transaction and broadcast it, every honest node will perform these checks on your transaction:

  • Check 1 (UTXO Check): "Does the 10 BTC that this transaction claims to spend actually exist?" (Let's say yes).

  • Check 2 (Signature Check): "Let's verify the signature." The node takes the three pieces of information—the transaction message, Alice's public key (which is listed as the owner of the UTXO), and the signature you provided. It runs a mathematical verification function.

This function will return FALSE. It will cryptographically prove that the signature provided was not created by Alice's private key.

The instant your transaction fails this verification, the entire block you mined is considered invalid by the entire network. They will discard it, and you will have wasted all the electricity and time you spent mining it.

// Data structure to represent a Transaction
STRUCT Transaction:
    senderAddress        // Public key or wallet address of sender
    receiverAddress      // Public key or wallet address of receiver
    amount               // Amount of tokens/coins transferred
    timestamp            // When the transaction occurred
    signature            // Digital signature (ensures authenticity)

// Data structure to represent a Block
STRUCT Block:
    index                // Position in the blockchain
    timestamp            // When the block was created
    listOfTransactions   // Array/List of transactions
    previousHash         // Hash of the previous block
    nonce                // Number used for Proof-of-Work
    hash                 // Current block’s cryptographic hash

The Decentralized Network and Reaching Consensus

1. The Problem: Centralization is a Weakness

Imagine our blockchain, with all its Proof-of-Work security, is stored on a single server at "Blockchain Corp."

  • Single Point of Failure: If that server crashes, is hacked, or catches fire, the entire blockchain is gone.

  • Censorship and Control: The owner of Blockchain Corp. could decide to refuse certain transactions or even alter the history (though it would be very difficult, as we learned). We are forced to trust this single entity.

The goal is to create a system that requires no trust in any single person or company.

2. The Solution: A Distributed, Peer-to-Peer (P2P) Network

Instead of one central server, a blockchain operates on a network of thousands of independent computers, often called nodes or peers.

The Golden Rule: Every node on the network keeps its own, identical copy of the entire blockchain.

Think of it not as a single book owned by a library, but as thousands of people all owning an identical copy of the same book. When a new page is written, everyone must agree on what it says and add that exact same page to their own copy.

This shared, synchronized database is called a distributed ledger.

3. The Life of a New Block: From Creation to Acceptance

Here is the process that answers your question, "Who gets to add the next block?"

Step 1: Broadcast Transactions

  • When someone wants to add new information to the blockchain (e.g., "Alice pays Bob $10"), they create a "transaction" and broadcast it to the network.

  • This transaction flies around the network from node to node.

Step 2: The Mining Race Begins

  • Each "miner" on the network (a node that chooses to participate in mining) listens for these new transactions.

  • They gather up a bunch of these unconfirmed transactions into a "candidate block."

  • They then start the Proof-of-Work. They try trillions of nonce values to find a hash for their candidate block that starts with the required number of zeros.

This is the crucial part: MANY miners are all working on this at the same time, in a global competition. They are all racing to be the first one to find a valid hash for the next block.

Step 3: A Winner is Found

  • By pure chance and computational effort, one miner somewhere in the world will find a valid hash first. Let's call her Miner Z.

  • Miner Z has "won" the race for this block.

Step 4: The Winner Broadcasts the New Block

  • Miner Z immediately broadcasts her newly solved block (which includes the transactions, the timestamp, the previous hash, and the magic nonce she found) to all the other nodes on the network.

Step 5: Verification and Consensus

  • Every other node that receives this new block from Miner Z does not trust it blindly. They perform a quick and easy verification checklist:

    1. Is the previous_hash in this new block correct? Does it match the hash of the last block in my copy of the chain?

    2. Is the Proof-of-Work valid? If I take all the data in this new block (including the nonce Miner Z found) and run it through the SHA256 function myself, does the resulting hash actually start with the required number of zeros?

  • This verification is very fast, unlike the mining which was very slow.

Step 6: Acceptance and Moving On

  • If the block passes verification, the nodes do two things:

    1. They add the new block to the end of their own copy of the chain. The chain is now one block longer.

    2. They immediately stop working on their own candidate block for that height and begin a new race to find the next block, using the hash of Miner Z's block as the new previous_hash.

This process is how the network reaches consensus (agreement) without a central authority. The "truth" is not what one person says it is; the "truth" is the longest chain that has been validated by everyone.

Incentives

1. The Problem: The Cost of Security

. The Proof-of-Work that secures the network is incredibly expensive. It requires:

  • Specialized Hardware: Powerful computers (ASICs or GPUs) that can perform hashing operations at immense speeds.

  • Massive Electricity Consumption: Running this hardware 24/7 consumes a significant amount of power.

No individual or company would voluntarily take on these costs out of sheer goodwill. The security of our decentralized system relies on motivating a large and competitive group of people to do this work.

2. The Solution: Economic Rewards

The protocol solves this by building economic incentives directly into the rules of the network. The miner who successfully mines a block (by being the first to find the valid hash) receives a reward for their effort.

This reward consists of two parts:

A. The Block Reward (The Creation of New Currency)

This is the most ingenious part of the system. When a miner creates a new valid block, the protocol gives them permission to include a very special, unique transaction in that block. This is called the Coinbase Transaction.

  • What it is: The coinbase transaction is the very first transaction in any block. It has no sender. It magically mints a brand-new, fixed amount of the network's native currency and assigns it to the miner's own address.

  • Example (Bitcoin): When Bitcoin started, the block reward was 50 BTC. The first miner to find Block #1 was allowed to include a transaction that said, "Create 50 new BTC and give them to Miner X's address."

  • The Key Insight: This is how a cryptocurrency like Bitcoin is born. The currency is a direct byproduct of the security process. The act of creating new coins is inextricably linked to the act of expending computational power to secure the network.

B. Transaction Fees

When users broadcast transactions (like "Alice pays Bob $10"), they can voluntarily attach a small fee. This fee is like a "tip" for the miners.

  • Miners are incentivized to include transactions with higher fees in their candidate blocks because if they win the race, they get to keep all the fees from the transactions they included.

  • This creates a marketplace for block space. If the network is busy, users who want their transactions processed quickly will offer higher fees.

So, the total reward for a miner is: Total Reward = Block Reward + Sum of all Transaction Fees.

3. The Self-Sustaining Economic Loop

This system creates a powerful, self-reinforcing cycle:

  1. Miners are motivated by the reward (new coins + fees).

  2. This motivation drives them to spend money on hardware and electricity to compete.

  3. This massive, global competition (Proof-of-Work) makes the blockchain incredibly secure and difficult to attack.

  4. The high security gives the network integrity and makes the coins valuable and trustworthy.

  5. The value of the coins gives miners a strong incentive to continue mining (back to step 1).

4. A Final Piece of Genius: Difficulty Adjustment

One last problem: What happens if the currency becomes very valuable and thousands of new, powerful miners join the network? They would find the valid hash much faster. If blocks are supposed to be found every 10 minutes, now they might be found every 1 minute. This would be unstable.

The protocol has a solution for this: Difficulty Adjustment.

  • The network's code automatically checks how long it took to find the last, say, 2,016 blocks.

  • If they were found too fast (e.g., an average of 9 minutes instead of 10), the protocol increases the difficulty. It does this by requiring more leading zeros for a hash to be valid (e.g., changing the rule from 0000... to 00000...). This makes it harder to find a hash, bringing the block time back to 10 minutes.

  • If blocks were found too slowly (e.g., 12 minutes), the protocol decreases the difficulty, requiring fewer zeros.

This keeps the pace of new blocks entering the system remarkably consistent, no matter how many miners are competing.

Programmed Scarcity

The creator of Bitcoin, Satoshi Nakamoto, wanted to create a system that mimicked the mining of a precious resource like gold: easy to mine at first, and progressively harder over time until the resource is exhausted.

To achieve this, a simple rule was programmed directly into the protocol:

The block reward is cut in half after a specific number of blocks have been mined.

  • In Bitcoin, This event happens every 210,000 blocks. Since a block is mined roughly every 10 minutes, this works out to be approximately every four years.

  • becomes zero around the year 2140. This ensures that there will only ever be a maximum supply of approximately 21 million Bitcoin, making it a provably scarce digital asset.


The Longest Chain Rule

Imagine our global, decentralized network. Due to the natural delays in information traveling around the world (network latency), it is not just possible, but guaranteed that two miners will occasionally solve a block at almost the exact same time.

Let's set up the scenario:

  • The last block everyone agrees on is Block #499,999.

  • Miner A (in Canada) and Miner B (in Australia) are both racing to find Block #500,000.

  • By sheer luck, they both find a valid hash for their respective candidate blocks at the same moment.

Now, we have a problem.

  • Miner A broadcasts his Block #500,000-A to the network. Nodes in North America and Europe see it first.

  • Miner B broadcasts her Block #500,000-B to the network. Nodes in Asia and Australia see it first.

The network is now in a state of temporary disagreement. There are two competing, valid versions of the blockchain. This is called a temporary fork.

  • Team A's Chain: ... -> 499,999 -> 500,000-A

  • Team B's Chain: ... -> 499,999 -> 500,000-B

Which one is the "real" Block #500,000? Without a central authority to decide, the network needs an automatic, objective, and trustless way to resolve this tie.

2. The Solution: "The Longest Chain is the Truth"

The rule that every node follows to resolve this is simple and powerful: Continue to work on whichever chain you saw first, but if you ever see a longer valid chain, immediately discard your current work and switch to that longer chain.

The "length" of a chain is not just the number of blocks, but the total accumulated Proof-of-Work. A longer chain represents more computational effort, more energy spent, and more security. Therefore, it is considered the "truth."

3. Walkthrough: Resolving the Fork

Let's see how our fork from Step 1 gets resolved.

  1. The Split: The network is temporarily split. Miners in "Team A" start trying to mine Block #500,001 on top of Block #500,000-A. Miners in "Team B" start trying to mine Block #500,001 on top of Block #500,000-B.

  2. The Race to Extend: It is now a race to see which team finds the next block first. It's a 50/50 chance (assuming hash power is evenly split, which it rarely is).

  3. A Winner Emerges: Let's say a miner on "Team B" gets lucky and finds Block #500,001-B and adds it to her chain. Her chain now looks like: ... -> 499,999 -> 500,000-B -> 500,001-B This chain has a length of 500,002 blocks (if we start from block 0).

  4. Broadcasting the Longer Chain: This miner immediately broadcasts her new, longer chain to the entire network.

  5. Reaching Consensus:

    • The nodes that were on "Team A" now receive this new chain. They run their verification checks. Everything is valid.

    • They compare it to their current chain (... -> 500,000-A), which has a length of 500,001 blocks.

    • They see that the new chain is longer. Following the protocol's core rule, they must switch.

    • They discard Block #500,000-A. It is now an "orphan block"—a valid but ultimately rejected block. Any transactions that were in 500,000-A but not in 500,000-B go back into the pool of unconfirmed transactions to be mined later.

    • All nodes on the network now agree that the chain ending in 500,001-B is the one true history. The fork is resolved, and the entire network is back in consensus, working to find Block #500,002.


The 51% Attack

A 51% attack is a brute-force exploitation of the longest valid chain rule

Who is the Attacker?

A 51% attacker is not a typical hacker trying to find a bug in the code. The attacker is a single entity—a person, a government, or a collaborating group of miners (a "cartel")—that has managed to gain control of more than 50% of the entire network's total hashing power.

  • If the total network hash rate is 100 Exahashes/second, the attacker needs to control at least 51 Exahashes/second.

  • This means, on average, they will solve blocks (win the mining race) faster than the entire rest of the world combined.

Let's call our attacker Mallory.

3. The Attack Scenario: The "Double-Spend" Heist

The most common goal of a 51% attack is to perform a double-spend: spending the same coins twice. Here is how Mallory would do it, step-by-step.

1. The Public Transaction (The Bait)

  1. Mallory Makes a Purchase: Mallory finds a merchant, Bob, who is selling a luxury car for 100 BTC.

  2. Mallory Broadcasts the Payment: Mallory creates and broadcasts a valid transaction: Send 100 BTC from Mallory's address to Bob's address.

  3. The Transaction is Confirmed: This transaction is picked up by the honest network and included in a block, let's call it Block #500,000. The honest chain now looks like ... -> 499,999 -> 500,000 (contains Mallory's payment to Bob).

  4. Bob Waits for Confirmations: Bob is a savvy merchant. He doesn't ship the car immediately. He waits for a few more blocks to be mined on top of Block #500,000. When the honest chain is ... -> 500,000 -> 500,001 -> ... -> 500,006, he feels confident the transaction is final. He ships the car to Mallory.

2. The Secret Attack (The Alternate Reality)

  1. Mallory Starts a Secret Chain: At the same time that Step 3 was happening, Mallory used her massive >50% hash power to start mining her own private version of the blockchain. She starts her secret chain from Block #499,999 (the block before her payment to Bob).

  2. Mallory Re-writes History: On her secret chain, Mallory mines a fraudulent version of Block #500,000. Let's call it Block #500,000'. In this block, she does not include the transaction that pays Bob. Instead, she includes a different transaction that sends the same 100 BTC back to another address she controls.

  3. The Hash Rate Race:

    • The honest network, with <50% of the power, is slowly finding blocks: 500,001, 500,002, 500,003...

    • Mallory, with >50% of the power, is finding blocks on her secret chain faster: 500,001', 500,002', 500,003', 500,004'...

3. The Ambush (The Reorganization)

  1. Mallory's Chain Becomes Longer: Eventually, Mallory's secret chain will overtake the honest chain in length. For example, the honest chain might be at Block #500,006, but Mallory's secret chain is now at Block #500,007'.

  2. Mallory Broadcasts Her Chain: Mallory reveals her longer chain to the entire network.

  3. The Network Switches Allegiance: Honest nodes now see two competing histories. Their programming's single directive is to follow the longest chain. They see Mallory's chain is longer, so they discard the honest chain (from #500,000 to #500,006) and adopt Mallory's chain as the one true history. This event is called a chain reorganization or reorg.

4. The Aftermath

  • For Mallory: She has the car and she has her 100 BTC back (because the transaction that paid Bob has been erased from the official history). She has successfully double-spent her coins.

  • For Bob: The 100 BTC payment he received has vanished from the blockchain. He has lost the money and the car.

  • For the Network: Confidence in the blockchain is shattered.

5. What a 51% Attacker CAN and CANNOT Do

This is a very important distinction to make.

An attacker CAN:

  • Reverse their own recent transactions (as shown in the double-spend example).

  • Prevent specific new transactions from being confirmed by refusing to include them in the blocks they mine.

  • Disrupt the network by mining empty blocks, slowing it down.

An attacker CANNOT:

  • Steal coins from someone else's wallet. To do that, they would need that person's private key. The rules of digital signatures (Chapter 5.5) still apply.

  • Change the fundamental rules of the network, such as increasing the block reward or creating coins out of thin air. Any such block would be rejected as invalid by all the honest nodes.

  • Reverse very old transactions. The amount of computational work required to rebuild the chain from a very old block would be unimaginably vast and expensive.

6. Why is a 51% Attack so Difficult on a Major Blockchain?

  1. Astronomical Cost: For a network like Bitcoin, the amount of hashing power is immense. An attacker would need to acquire billions of dollars worth of specialized mining hardware (ASICs) and have access to city-sized amounts of electricity to achieve 51% control.

  2. Economic Disincentive: The attack itself is a form of economic suicide. If Mallory spends billions to attack Bitcoin, the moment the world finds out, confidence in Bitcoin would collapse, and its price would plummet. The 100 BTC she stole would become worthless, and the value of her multi-billion dollar mining operation would be wiped out. It is far more profitable to use that hashing power to mine honestly and collect the block rewards.


The Last Concept: The Merkle Tree

. Suppose you are running a Bitcoin wallet on your phone ( the software you use to buy and sell Bitcoin) And somebody sends you a transaction of 3 BTC in your account, how do you verify?

Because Bitcoin is a decentralized system, you would have to run the whole blockchain on your phone, but as of now, the whole blockchain is more than 500GB’s so you clearly can not downloadthe whole blockchain on your phone.

So you run a light SPV node instead of a full node (that miners run)

The only difference between a spv light node and full node is instead of full “Data” feild of a block it consist of a single Markel Root Hash.

The Merkle Tree

Imagine a block contains 2,000 transactions. We need a way to prove that all 2,000 of these transactions are present and have not been tampered with.

  • The Naive Approach: We could concatenate all 2,000 transactions into one giant string of text and then hash that entire string. This would create a single hash representing all the data.

    • The Problem with this approach: What if a Lightweight Node (from Chapter 4) only wants to verify if one specific transaction is in the block? With this naive approach, to prove it, they would need to be given all 1,999 other transactions to recalculate the giant hash. This is incredibly inefficient and defeats the purpose of a lightweight node.

We need a method that is both tamper-evident and allows for efficient, selective verification.

The Solution: The Merkle Tree (or Hash Tree)

A Merkle Tree is a way of building a "pyramid" of hashes, starting from individual transactions at the bottom and ending with a single hash at the top. This single hash at the very top is the Merkle Root.

This structure was invented by Ralph Merkle in 1979, long before blockchain, and it provides elegant solutions to our problems.

3. How a Merkle Tree is Built: A Step-by-Step Walkthrough

Let's build a simple Merkle Tree for a block with just four transactions: T_A, T_B, T_C, and T_D.

Step 1: Hash each individual transaction. This is the bottom layer of our pyramid (the "leaves" of the tree).

  • Hash(T_A) -> H_A

  • Hash(T_B) -> H_B

  • Hash(T_C) -> H_C

  • Hash(T_D) -> H_D

Our tree now looks like this:

    [      ?      ]  <-- The Merkle Root we want to find
        /   \
  [  ?  ]   [  ?  ]
   /   \     /   \
[ H_A ] [ H_B ] [ H_C ] [ H_D ]

Step 2: Pair up the hashes and hash the pairs. We go up one level. We concatenate the hashes of adjacent leaves and hash the result.

  • Hash(H_A + H_B) -> H_AB

  • Hash(H_C + H_D) -> H_CD

Our tree now looks like this:

    [      ?      ]  <-- The Merkle Root
        /   \
  [   H_AB   ] [   H_CD   ]
   /     \     /     \
[ H_A ] [ H_B ] [ H_C ] [ H_D ]

Step 3: Repeat until only one hash remains. We repeat the process for the next level up.

  • Hash(H_AB + H_CD) -> H_ABCD

This final hash, H_ABCD, is our Merkle Root.

Our complete tree:

         [H_ABCD]  <-- **THE MERKLE ROOT**
        /        \
  [   H_AB   ] [   H_CD   ]
   /     \     /     \
[ H_A ] [ H_B ] [ H_C ] [ H_D ]

(A quick note: If there is an odd number of hashes at any level, the last hash is simply duplicated and hashed with itself to make the number even.)

This final Merkle Root is the only thing that gets included in the block header. It acts as a single, secure, and compact fingerprint for all 2,000 transactions.

so now your mobile light spv client will make a fetch request to a full running node who has exposed the blockchain, it will see if the recent transaction is present and if present then in which block.

suppose it tell you block number #500, but how do you veirfy this? do you simply trust it? no.

taking the above example suppose the transaction was T_D, so you have H_D and H_ABCD (merkel root is present in the light spv node remember).

so you need H_C and H_AB, you request these from the full node provider.

once complete you can confirm if the H_ABCD you made matches H_ABCD present in your light node.


This completes all the major concepts of Blockchain

Next: a deeper Ethereum note, with the spelling fixed this time.