Portal Network Design Requirements

Piper Merriam,projectp2pstorageprotocolpeer-to-peer

The design of the Portal Network has grown organically over the last years. What originated as an idea in my head has now been written down in the form of our specifications (opens in a new tab) and codified in the form of multiple (opens in a new tab) different (opens in a new tab) client (opens in a new tab) implementations (opens in a new tab). Today, I am taking a stab at trying to write down some of the overall protocol design requirements that have emerged during this process.

At it's core, the Portal Network is a specialized storage network for Ethereum's data. It is a distributed peer-to-peer network where all nodes in the network participate in storing and serving data. The ongoing health of the network depends on our protocol designs following these rules.

Provable Data Only

All data stored in the network must be able to be cryptographically anchored such that it can be proven to be correct. For example, a block header is addressed by it's block hash, and upon retrieval, the header can then be proven to be the correct data by taking the keccak(header) and verifying that it matches the expected hash.

This requirment ensures that the network can only be used to store the intended data, which protects it against both spam and certain categories of denial of service attacks. Without this requirement the network could be abused by people attempting to store other types of data, or attacked by people trying to fill the network's storage capacity up with garbage data.

Canonical Data Only

All data stored in the network must be canonically anchored. This requirement is similar to the previous "Provable Data Only", however is a bit more specific in that it requires all data to not only be provable, but also canonical. An example of this would be the header for a non-canonical block which are often referred to as ommers (which is the gender nuetral term for uncle).

An example of this is our use of double batched merkle log accumulators (opens in a new tab). These allow us to easily prove an individual block header is part of the canonical chain of headers. This also leads us to our next requirement...

Easy To Prove

Some things are provable, but not easy to prove. An example of this is proving that any individual block header is canonical. The naive approach to proving a header is part of the canonical chain is to trace it's parent chain backwards in time to either genesis or to another historical header that is trusted. In practice, this is an arduous process that requires downloading and verifying very large amounts of information.

The Portal Network solves this problem by borrowing from the Beacon chain specifications and using double batched merkle log accumulators. These accumulators allow for users to verify a header is canonical by verifying a small merkle proof instead of needing to verifying the entire historical chain of hashes.

In general, this principle must apply to all data in our networks.

Amplification Attacks

As we look at the various proof requirements, one "trap" that is easy to fall into is the introduction of what is referred to as an "amplification attack". Suppose that we were to introduce a new type of data B that in order to prove it's canonicalness, a node would need to fetch a different piece of data A from the network. If the relative sizes are such that B is smaller than A then we will also be introducing an attack vector.

A bad actor can send out an invalid payload for B. The receiving node would then need to fetch A from the network in an attempt to verify the B payload they received. The verification will ultimately fail for B, but the damage to the network occurs because the bad actor has been able to send a small payload which results in the recipients then sending out larger payloads. This allows someone to execute a denial of service attack on our network by sending out a lot of small payloads which inturn cause the receiving nodes to then send out a bunch of even larger payloads, effectively amplifying the amount of bandwidth needed to execute this attack.

Evenly Distributed Data

The Portal Network is a fancy DHT. This DHT has a 256 bit address space, and all content within the portal network lives at a certain location in this address space. Data in the portal network is required to be evenly distributed across this address space. This is how we ensure that individual nodes in the network are able to effectively choose how much data they are willing to store for the network.

If our addressing scheme does not evenly distribute the data then we would end up with areas of the address space where there is significantly more data than other areas of the network. We refer to such areas as "hot spots". Nodes in the DHT that found themselves very close to these hot spots would end up being responsible for a disporportionately large amount of data. And since individual node addresses in the DHT are uniformly and randomly distributed, this would mean that data in this region would be replicated less and more likely to be distcarded from those nodes storage as more and more data gets assigned to that area of the network.

Continuing to build and design Portal

The framework that we've built thus far in Portal has the potential to transform Ethereum's data and peer-to-peer ecosystems. We've only just begun to scratch the surface with what is possible. As our networks mature in the coming years you should expect to see these design principles applied to things like Data Availability Sampling and Blobs, more areas of the Beacon chain data and Verkle tries and possibly L2 infrastructure as well.

Portal Website
Portal Specs