How to use Ethereum Proofs
In this guide Infura’s VP of Engineering, Kris Shinn, deep dives into Ethereum proofs and how to use them.
The web3 community often discusses the impact of centralized node providers for serving users. As an example, US based web3 companies were required to block transactions from getting sent to the TornadoCash contract in late 2022. This sparked a debate on the topic of how far this could go. Could blocks be censored? Could node providers block wholesale access to the contract? While transaction censorship is a complex topic that extends way beyond a technical blog post, this post will focus on data censorship. The root of the problem comes down to trust assumptions.
Can we trust that a node provider is serving the right data? Can we trust that the state of the blockchain is correct? Ethereum is a technology rooted in a trust but verify principle. It uses incentives to influence good behavior where higher levels of trust are needed. Ethereum was built to be a publicly accessible database with mechanisms built in to verify its state. We will explore some of those topics from a technical perspective here.
We recently came across this blog post that sounds the alarm that the state database around TornadoCash was getting censored. While it is true that OFAC regulations required US Based companies to block transactions from getting sent to the tornadocash contracts, read access is unaffected. This would be alarming if read access was censored. Node providers would be censoring public information. It turns out that statement made in the original post is incorrect, but you don’t have to trust me. We can prove it!
I have to admit preparing to write this blog post was fun. It involved developing a deeper understanding of Ethereum Name Service (ENS), a lot of playing with cast - one my new favorite developer tools - and taking a deep dive into understanding proofs in Ethereum. We’ll take a similar journey in this blog post. Ready? Let’s go!
Special thanks to our principal engineer, Ryan Schneider, who helped out a great deal to find out what was happening and helped develop some of the content in this article.
A Quick Exploration of Ethereum ENS
The above mentioned blog post inspired us to dig in further. I know Infura does not censor read access or alter data. I had to try it for myself:
ETH_RPC_URL=https://mainnet.infura.io/v3/<apiKey> cast call 0x226159d592E2b063810a10Ebf6dcbADA94Ed68b8 "contenthash(bytes32 node)" tornadocash.eth
Result:
0x00000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000000
Recreating his method both on Infura and on my locally hosted node produced the same answer as the author described: a null response. Infura wasn’t the problem. Something was happening here, but I didn’t understand enough about how ENS worked in order to understand what I was seeing.
I started where any engineer would start: the docs. The architecture of ENS has three main concepts:
namehash: ENS doesn’t use the raw strings of the domain names to store them on the blockchain. For a number of great reasons you can read about in the docs, they use a hashing scheme to store values on the blockchain.
registry: The main entrypoint to resolving an ENS domain is in the Registry. This contract stores the domain’s owner, the resolver address, and the TTL.
resolver: A contract that does the actual resolution of a namehash to the target value.
Additionally, ENS supports multiple resolver types defined for different use cases. Here are a few:
- The method `addr(bytes32)` resolves to ethereum addresses
- The method `contenthash(bytes32)` allowing for a better defined system of mapping names to network and content addresses.
- The method `abi(bytes32)` a mechanism for storing ABI definitions in ENS, for easy lookup of contract interfaces by callers.
A quick aside: using a command line tool like `cast`, it is easy to look up ENS domains.
ETH_RPC_URL=https://mainnet.infura.io/v3/<apiKey> cast resolve-name tornadocash.eth
Under the covers, `resolve-name` is converting `tornadocash.eth` into its namehash and doing an eth_call on `addr(bytes32)` to get the value of the target address. If you perform this call, you get a null return value for tornadocash.eth, which is the correct value for the `addr(bytes32)` call.
So back to our experiment, we tried to replicate the contenthash(bytes32) call and got the same answer. Since we are doing a straight eth_call (using `cast call` in this case), the tool is not translating the domain name to a namehash for us, so we fix that first:
ETH_RPC_URL=https://mainnet.infura.io/v3/<apiKey> cast call 0x226159d592E2b063810a10Ebf6dcbADA94Ed68b8 "contenthash(bytes32 node)" $(cast namehash tornadocash.eth)
Result:
0x00000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000000
This still looks like the result is getting censored. The ENS documentation says:
The majority of the contracts use a default resolver which is set at the time of registration. However, the default resolver has changed a few times to add new functionalities (eg: coin type). To find out all the resolver addresses, you have to get the resolver address through ENSRegistry.
The original author uses a hardcoded resolver address of: `0x22…8b8`. Taking a step back, let’s validate that the resolver contract is correct using the ENS Registry. The contract address of the ENS registry is published here on their website as 0x00000000000C2E074eC69A0dFb2997BA6C7d2e1e. Admittedly, I am trusting that the ENS website is publishing legitimate information. But since I can also verify that on etherscan, I’m convinced that the registry contract address is correct.
ETH_API_URL=https://mainnet.infura.io/v3/<apiKey> cast call 0x00000000000C2E074eC69A0dFb2997BA6C7d2e1e resolver(bytes32) $(cast namehash tornadocash.eth)
Result:
0x0000000000000000000000004976fb03c32e5b8cfe2b6ccb31c09ba78ebaba41
Using this call we got a different resolver contract: 0x4976fb03C32e5B8cfe2b6cCB31c09Ba78EBaBa41
Eureka! There is a different resolver contract listed with the ENS registry. When we call the contracthash method against this address:
cast call 0x4976fb03C32e5B8cfe2b6cCB31c09Ba78EBaBa41 “contenthash(bytes32)” $(cast namehash tornadocash.eth)
Now we got the correctly encoded contenthash as the return value.
0x00000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000026e30101701220d422ef6e800db34f50101daa4ea6b04365ab44b49bf58c00b54c1067befb73700000000000000000000000000000000000000000000000000000
Running this result through cast once more:
cast --abi-decode “contenthash(bytes32)(bytes)” 0x00000000000000000000000000000000000000000000000000000000000000200000000000000000000000000000000000000000000000000000000000000026e30101701220d422ef6e800db34f50101daa4ea6b04365ab44b49bf58c00b54c1067befb73700000000000000000000000000000000000000000000000000000
Result:
0xe30101701220d422ef6e800db34f50101daa4ea6b04365ab44b49bf58c00b54c1067befb7370
With a deeper understanding of how ENS works and playing with `cast` we were able to find the true value of the tornadocash.eth contenthash. As we saw above, contenthash(bytes32) allows you to store and access content addresses. You can validate the target by decoding the value and interrogating the ipfs dag to find that this is the tornadocash website. While the IPFS exploration is beyond the scope of this article, let us know if you want a part 2 and we can explore that part!
So, it turns out that the appearance of censorship was just a misunderstanding of the tools and ENS contracts. Coming back to the idea of censorship, how can we trust what the contract returned is the canonical ENS resolver and not some sort of elaborate honeypot? The great thing about Ethereum is that we can do just that. In the next section, we will explore using proofs to validate the state of the blockchain.
Ethereum Proofs explained
There is a lot of material published about the benefits of blockchains such as Ethereum, one of them being you don’t need to build the entire state database to prove something is correct. The Ethereum database is implemented as a modified Merkle Patricia trie. There are a number of unique properties of Merkle Patricia tries. One of those properties here is that you can utilize Merkle proofs as a computationally efficient means to prove a number of different assertions about the blockchain including account state and storage values.
There is some rather basic information to review to make sure we understand what’s going on. Users send transactions to the blockchain to update the database state. There are 3 main Merkle tries updated when a transaction is included:
- The receipt trie (representing the transaction that updated the state),
- The state trie (tracking changes to the accounts)
- The storage tree (a per account trie that tracks contract state changes).
When you send a transaction that performs a write to the blockchain, the storage and account tries change. This change is represented by the value of the `stateRoot` in the block header. When you call a JSONRPC method such as `eth_getBlockByNumber` you are seeing the block header with cryptographic hashes representing the state of the blockchain at a given point in time. For the purposes of this article the important bit to remember is that the `stateRoot` is a cryptographic hash that you can use to validate that everything is correct and untampered. It is the root of the Merkle trie.
How to use Ethereum Proofs
Since the main difference between the source article and the path we just took to validate the ENS target ended up being a different resolver address, this seemed like a good place to start an exploration of proofs. If we can prove that the address resolver we used is the correct contract and not returning an illegitimate copy of the contract under an attacker’s (or censor’s) control, we can trust the value of the contenthash(bytes32) call that gets returned. For the sake of brevity, I’m only going to step through proving the registry. The same exercise can be done with the contenthash method on Resolver contract as well, but it’s largely the same method and is left as an exercise to the reader.
We will use the method `eth_getProof` to pull the merkle proofs and verify correctness. In order to use `eth_getProof` we will need the contract address, the storage slot, and the block reference. If you reference the ENS contract, you can see that the values are stored in a map variable type. Map type storage slots are one of the more difficult storage indexes to calculate. For those that are curious, you can find details on storage mapping here. One method to find the storage slot is by using `eth_createAccessList`. So we start with the resolver contract `0x00000000000C2E074eC69A0dFb2997BA6C7d2e1e`. You can do this through cast via:
cast access-list --from <EOA that has eth for gas> 0x00000000000C2E074eC69A0dFb2997BA6C7d2e1e "resolver(bytes32)" $(cast namehash tornadocash.eth)
At the time of this writing Infura does not yet support `eth_createAccessList`** so I needed to run this on a local node. However, this exploration shows the utility of this method is pretty clear, so we are looking to add this method in the future. The key values that gets returned are
[ "0xf8b3ca70e07afc9d3c9a4f37fd6adccac81587450545d4b161205b35bf9b1ecd",
"0xf8b3ca70e07afc9d3c9a4f37fd6adccac81587450545d4b161205b35bf9b1ece"
]
Why are there 2 storage slots that get returned? Examining the ENS Registry contract we can see that the data structure stored for a given node is:
struct Record {
address owner;
address resolver;
uint64 ttl;
}
The record will span 3 storage slots if the entire record has values. The absence of a 3rd slot, means there is no TTL set for this record.
With those 2 values we can now call `eth_getProof` and process the results.
cast proof --rpc-url https://mainnet.infura.io/v3/384418b1eb3743ac82e784d0ebab61f5 0x00000000000C2E074eC69A0dFb2997BA6C7d2e1e 0xf8b3ca70e07afc9d3c9a4f37fd6adccac81587450545d4b161205b35bf9b1ecd 0xf8b3ca70e07afc9d3c9a4f37fd6adccac81587450545d4b161205b35bf9b1ece
The method returns the following object:
{
address: Address
balance: uint256
codeHash: string
nonce: uint256
storageHash: string accountProof: string[]
storageProofs: [
{ key: string, proof: string[], value: string },
…
]
}
So let’s take a look at what we have now. We have some data that tells us what address we are proving, some attributes of the account (nonce, balance, storageHash, and code hash), and some Merkle proofs that we can use to validate the account and validate the storage trie. What do all of these mean and how do we use them? Now we get to the fun part.
In order to make use of these proofs, we need to use an implementation of the merkle-patricia-trie. Implementing the code for an Ethereum compatible Merkle tree is non-trivial to get right, so we will use the ethereumjs implementation. Proving correctness will consist of recreating a Merkle tree from the proof values, validating against the stateRoot, and validating the result the trie produces when fetching a key against it. For the remainder of this article we’ll switch to Typescript so that we can play with Merkle tries.
Merkle Proofs with EthereumJS
We’ll be using the following libraries to build the Merkle tries and perform some data transformations where necessary. All of the code that is used in this blog post is available through this Replit template.
import {Trie} from “@ethereumjs/trie”
import * as ethers from ‘ethers’
First to validate the account against the current block header, we fetch the latest block and reference the stateRoot value. Using your Infura api key, we can use the JsonRpcProvider of ethers to manage the connection to the network and fetch the data we need through standard JSONRPC calls.
const rpc = new ethers.providers.JsonRpcProvider(
'https://mainnet.infura.io/v3/<api-key>',
'mainnet'
);
const latestBlockNumber = await rpc.send('eth_blockNumber', []);
const { stateRoot } = await rpc.send('eth_getBlockByNumber', [latestBlockNumber, false]);
// Get the proof
const proof = await rpc.send('eth_getProof', [
ENS_REGISTRY,
[
"0xf8b3ca70e07afc9d3c9a4f37fd6adccac81587450545d4b161205b35bf9b1ecd",
"0xf8b3ca70e07afc9d3c9a4f37fd6adccac81587450545d4b161205b35bf9b1ece"
]
latestBlockNumber
]);
We create a Merkle trie with the block’s stateRoot value as the trie root. Using the library, we can populate the trie with a call to `trie.fromProof()`. If the proof values were incorrect or a trie could not be constructed with the values, the library will throw an exception here. First, create a new Trie using the state root value we got from the block header.
const trie = new Trie({root: stateRoot, useKeyHashing: true})
await trie.fromProof(proof.accountProof.map((p:string) => toBuffer(p)))
Note: the merkle-patricia-tree implementation requires input and output to be converted to buffers (rather than using the raw string data). Utility functions in `toBuffer` and `bufferToHex` in `@ethereumjs/util` can be used to convert values back and forth.
In the above example, the library will throw an exception if the proof is invalid. Additionally, if we try to access a value not included in this tree, it will also throw an exception.
When we then attempt to get the trie’s value by the key (the original address), we get an RLP encoded array of `[ nonce, value, storageHash, codeHash ]` which can be validated against the values returned by the getProof. All of this is validated against the stateRoot hash so we know all of this is correct.
const val = await trie.get(toBuffer(ENS_REGISTRY), true)
So, what did we just prove? We proved that the Ethereum account belonging to the registry contract is correct against the stateRoot calculated for the latest block. Of the information secured in the stateRoot, the storageHash represents the root of a trie that stores all of the values in a contract. To prove the storage value, we can now construct an additional trie using that storageHash as the root.
const storageTrie = new Trie({root: toBuffer(proof.storageHash), useKeyHashing: true})
Like we did in the other proof, we call trie.fromProof() as we did in the previous validation using the storageProofs.
for (var i = 0; i < RESOLVER_KEYS.length; i++) {
const proofBuffer = proof.storageProof[i].proof.map((p: string) =>
toBuffer(p)
);
storageTrie.fromProof(proofBuffer);
// Examine the records of the resolver
const storageVal = await storageTrie.get(toBuffer(RESOLVER_KEYS[i]));
if (storageVal == null) {
console.log("Nothing returned");
} else {
if (i == 0) {
// The storage of resolvers is a record. The first field is owner
console.log(`Owner: ${ethers.utils.RLP.decode(bufferToHex(storageVal))}`);
} else if (i == 1) {
// Second field is the resolver contract
console.log(
`Resolver: ${ethers.utils.RLP.decode(bufferToHex(storageVal))}`
);
} else if (i == 2) {
// Third field is TTL if it's set
console.log(`TTL: ${ethers.utils.RLP.decode(bufferToHex(storageVal))}`);
} else {
console.log("Field unknown");
}
}
}
We can then use trie.get() using the storage slot index and we get 2 values (one for each slot): the owner (verified as tornadocash on etherscan) and the resolver contract.
If any of the information was incorrect (the merkle proofs didn’t produce a valid trie or the keys we were performing the get on did not exist), the library would have thrown an exception.
Understanding the properties of Merkle tries, Merkle proofs, and how to use them provides a deeper understanding of what is going on in the blockchain. These methods are used extensively in things like light clients that function without needing to hold a full copy of the database. While the facts of the original article were not completely accurate, the need to be able to understand and verify the information you are getting back from a node provider is a good point.
We fully support the article’s proposal of running a validating light client like helios on top of Infura or using `eth_getProof` in more manual ways to prove correctness of your requests. As a core value, Infura does not censor or alter data on the blockchain. Public chain data is just that: public and should be represented unadulterated.
We all stand on the shoulders of giants. Special thanks to the developers of the Foundry toolset who developed cast. Also, to the developers at ethereumjs for maintaining a most excellent set of modules for working on Ethereum in javascript.
Infura doesn’t currently support eth_createAccessList because we felt there were more urgent methods to enable first such as trace methods and the benefit of the method was not well understood. Now that light clients like Helios requires the use of the method, we are currently evaluating when we can add it.