42Papers
Cutting-Edge Computer Science and AI Papers

# Measuring Miner Decentralization in Proof-of-Work Blockchains

Sishan Long Cornell Tech; Cornell University; IC3 Soumya Basu Cornell University; IC3  and  Emin Gün Sirer Ava Labs
###### Abstract.

Proof of work cryptocurrencies began with the promise of a more egalitarian future with a decentralized monetary system with no powerful entities in charge. While this vision is far from realized, these cryptocurrencies are still touted to be much more decentralized than traditional centralized systems. While it is well understood that cryptocurrencies are centralized, it is still unclear what the underlying causes are. This work aims to address this gap and examines some of the forces behind mining centralization.

The internals of cryptocurrency mining are very opaque and difficult to study since it traditionally requires forming relationships with miners, who are typically reticent to share internal information about their competitive advantages. This work takes a different approach by combining large scale statistical techniques with publicly available blockchain data in order to answer previously intractable questions. The crux of our analysis technique is based on the simple observation that some miners can utilize their hashpower more efficiently due to their position in the network. By teasing out that effect, we de-bias the mining power distribution to get a more accurate estimate. Using that de-biased mining power distribution, we can answer questions about the network position of miners in each cryptocurrency network. Finally, during the course of this study, we observed some unusual mining behaviors which we highlight.

## 1. Introduction

Proof of work cryptocurrencies are touted to be a trustless, censorship-resistant form of money. The key value proposition is that there is no individual or small group of entities that have control over the entire currency. In particular, there must necessarily be many, non-colluding miners that are processing transactions in order to prevent any small group of them to enact policies that may break censorship resistance. If there are only a few miners, then these systems would be no different from an inefficient version of a centralized database.

The degree of mining centralization has been previously studied (adem18, ; meiklejohn, ; giniblogpost, ), and some of its underlying causes are understood. For example, individual miners form mining pools and share resources to reduce the variance of their block rewards. Another source of mining centralization is the efforts miners spend to be well positioned in the mining network, including making manual connections to many nodes, multiple points of presence, and even peering relationships between large miners to relay newly mined blocks more quickly between them. While these behaviors are well-known, the degree to which these behaviors affect mining centralization is not well understood.

Understanding the effect of these behaviors requires making inferences about the mining network. However, direct measurements on mining networks are extremely difficult to conduct since miners are privacy oriented. Indirect measurements to infer network properties is also often difficult and expensive (Miller2015DiscoveringB, ). This paper takes a different approach using large scale statistical techniques on readily available blockchain data to extract insights about the mining network. Blockchain data has the benefits of direct measurement, as the fields of a block are filled by the miner of that block, without the difficulty of deploying a measurement probe and convincing miners to run it.

The crux of our approach rests on the following simple, but powerful, observation. Per the Nakamoto consensus rules, a miner will transmit their newly mined block to all other miners in the network. However, the transmission of their freshly mined block takes time, dependent on their position in the network. Thus, the miner of a newly mined blocks gains an advantage over all other miners since it receives its own block first, an advantage we call the previous block advantage. Thus, by looking at the blocks that were mined immediately after a particular miner, we can infer how well connected that miner is to all other miners in the network. A particular advantage to this approach is that it relies on publicly available blockchain data which makes it feasible for us to study and compare six different cryptocurrency networks, the largest such study to date.

We proceed with our measurement study in three distinct phases. The first phase is the preliminary measurements and inferences. We confirm that the hypothesized previous block advantage indeed exists in the publicly available blockchain data. Once confirmed, we note that the mining power distribution is biased due to the previous block advantage, and we present a more unbiased mining power estimate. An accurate mining power distribution is useful for the community at large to detect when a miner is amassing a large amount of hashpower and for miners to see how much they are underutilizing their hashpower.

Then, our study moves on to answering detailed questions about the mining network and its block propagation characteristics. Understanding block propagation is key to developers so they can spend their energy optimizing the correct bottlenecks and make informed decisions about key protocol parameters such as the block size. Such decisions are very difficult as some cryptocurrencies, such as Bitcoin (BTC) and Ethereum (ETH), have much faster networks than others that have larger block size limits, such as Bitcoin Cash (BCH) or Bitcoin SV (BSV). The block size debate is extremely contentious and is one of the reasons behind multiple chain splits. Previous studies (adem18, ; onscalingdecentralized, ) have proposed setting the blocksize by measuring the bandwidth of the entire peer to peer network.

Our work takes an alternate approach to the block size question by looking at the previous block advantage. Larger blocks propagate through the network more slowly, so the position of a miner in the network confers more of an advantage compared to smaller blocks. However, depending on the network capacity, the threshold for how large a block has to be to confer such an advantage changes. This threshold is called the safe envelope of a cryptocurrency, which is formally defined as the maximum block size that gives the same previous block advantage as an empty block. We estimate this threshold for all cryptocurrencies and present a comparative analysis to show the effect of Ethereum network upgrades in December 2019, which have helped decrease the previous block advantage.

Finally, we examine some unusual miner behaviors in the mining network, namely the formation of mining cartels and active chain switching. We search and find examples of miner cartelization in ZCash and Litecoin, where mining pools are only nominally separated into separate entities and functionally act as a single larger mining pool. Cartels are dangerous as they provide the illusion of decentralization rather than true decentralization. We also examine one case of active chain switching, a behavior by miners to switch between chains frequently in order to generate the maximum returns on their mining power. Due to space constraints, we summarize the high level takeaways for the unusual miner behaviors and defer detailed discussion to the appendix.

Our work illustrates how much data there is contained in the blockchain itself and how we can leverage that data to answer otherwise intractable questions about the mining network. Such techniques have a significant advantage over conventional measurement techniques since they do not require extensive, custom-built measurement probes which allow for larger-scale studies. Through these techniques, this work was able to extract a more accurate estimate of the mining power distribution, provide guidance to developers on how to measure the effects of an upgrade and set important protocol parameters, and find examples of unusual mining behavior.

## 2. Background

While cryptocurrencies today have drastically different designs (algorand, ; thundercore, ; ava, ), our study focuses on the subset that uses proof-of-work based Nakamoto consensus (nakamoto2008bitcoin, ; garay2015bitcoin, ; ethereum, ; zerocoin, ). In this section, we cover some of the more pertinent aspects of these cryptocurrencies to our study.

### 2.1. Proof of Work Cryptocurrencies

Cryptocurrencies sequence transactions so that all honest participants can agree on the history of transactions. This agreement is reached in the presence of Byzantine actors in the network, as long as the Byzantine actors are sufficiently weak. In particular, Nakamoto consensus assumes that Byzantine actors are limited to using at most a quarter of the total compute resources in the network (selfishmining, ).

Transactions are sequenced in batches, called blocks. These blocks are ordered by miners who append each batch to the tail of the blockchain so that all blocks are consistently ordered. To append a block, miners must first decide on its contents and then solve a crypto puzzle in order to prove that they have spent a large amount of computing resources. Our study relies on two pieces of information that are attached to each block: the identity of the miner and the timestamp. Miner identities are self-reported by the miners and are not enforced by the protocol. Miners gain influence the more blocks they mine, so, like prior work (adem18, ; Nayak2015StubbornMG, ; wang2015exploring, ), we assume that miner identity information is accurate.

Occasionally, miners attempt to insert a block into the same location on the blockchain. Such events, called forks, occur when one miner fails to hear of a recent block and has mined a block on top of a stale blockchain. Some forks are inevitable in Nakamoto consensus and most cryptocurrencies resolve forks through the longest chain rule, where the longest chain of blocks is considered valid. Most cryptocurrencies drop forked blocks as they are no longer part of the agreed-upon history of events. Ethereum, on the other hand, saves these forked blocks and rewards miners of forked blocks using the GHOST (ghost, ) protocol, though the reward for a forked block is significantly smaller. How often forks occur depends on how long blocks take to propagate and the block interval, which is how often blocks are mined. Table 1 presents summary statistics on how large each system’s blocks are and the corresponding effective throughput of the chain.

### 2.2. Miner Behaviors

Miners are rewarded when their blocks are incorporated into the blockchain. As a result, miners innovate to find novel ways to transmit blocks to and from all other miners. Miners, especially large miners, often connect directly to each other to transmit block information and optimistically start mining on top of a newly received block without fully validating its contents. These relationships are not necessarily symmetric as well, as a small miner may be willing to optimistically mine a block from a larger miner while the reverse may not be true. These relationships are typically forged directly between miners and not openly accessible, which makes inferring properties about the structure of the mining network extremely difficult. This makes answering questions about mining centralization hard.

Miner behavior also becomes more complex when considering the modern cryptocurrency ecosystem. When Bitcoin (nakamoto2008bitcoin, ) was first released, it was the only proof of work cryptocurrency in existence, and miners consisted of home users mining on their laptops. Modern miners use specialized hardware to mine most large cryptocurrencies and have a choice in which cryptocurrency to mine. Some coins, such as Bitcoin (BTC), Bitcoin Cash (BCH), and Bitcoin SV (BSV), share the same crypto puzzle so specialized hardware built for one coin can be redirected fairly easily to another coin. Thus, miners may choose to switch between coins every so often depending on how profitable it is to mine each one.

## 3. Measurement Data

Our analysis relies on readily available blockchain data and uses large scale statistical analysis to extract insights. In a given cryptocurrency, we extract the following data from each block: (1) the block height, which is determined by counting the number of blocks preceding it in the blockchain, (2) the block size, which is the number of bytes in a block, (3) the block timestamp, which is the time that the block was mined, and (4) the identity of the miner who mined this block. This information is readily available on any block explorer for a cryptocurrency. We collected data on Bitcoin (BTC), Bitcoin Cash (BCH), Bitcoin SV (BSV), Ethereum (ETH) and Litecoin (LTC) from BlockChair (https://blockchair.com/, ). ZCash (ZEC) data was obtained from the ZChain (https://api.zcha.in/, ) block explorer. Summary statistics about the block data used in our study are presented in Table 2.

Our target measurement period is August 2018 for most coins and August 2019 for BSV, as BSV did not exist in 2018. As our statistical techniques require larger amounts of data, we extend our measurement period slightly before and after that time. We choose the smallest time period as we assume that the mining hashpower is relatively stable so that the differences we observe can be attributed to the network.

Two of the fields we extract from the blockchain have potential inaccuracies: the miner identities and the block timestamps, which are both set by the miner of the block. Similar to prior work, we believe the miner identities are accurate as large miners are able to negotiate preferable peering relationships and otherwise influence the network, so it in their best interest to claim their own blocks (adem18, ; Nayak2015StubbornMG, ; wang2015exploring, ). We validated the block timestamp data by checking whether the timestamp field matched the time that the block was received by the block explorer and ensuring that they were similar. As an additional validation step, we ran a full node on some networks such as Bitcoin (BTC) and Ethereum (ETH) and verified that the block was received at a time similar to its timestamp.

To sanitize our data, we drop certain types of blocks. Smaller miners, which we define as having mined less than $1\%$ of the total number of blocks in our sample, need to be dropped as our analysis requires a large amount of data and small miners just have not produced very many blocks. Additionally, some blocks are unclaimed and they also get dropped. These blocks are typically mined by smaller miners, so we do not expect that ignoring unclaimed blocks will affect the soundness of our analysis. We use all blocks in one analysis where we are comparing the relative decentralization of all cryptocurrencies in Section 4.1. The number of blocks dropped in each cryptocurrency during sanitization is noted in Table 2. Note that dropping blocks does not present an error in our measurement, as our inferences are on the miners that were included in our sample.

## 4. The Previous Block Advantage

When a miner mines a block, it receives the most recent copy of the blockchain first and obtains an advantage in the race to mine the next block, a phenomenon we call the previous block advantage. The previous block advantage encourages mining centralization as miners may choose to coalesce into larger miners or form peering relationships in order to gain a more advantageous position in the network. Thus, it is crucial to understand and quantify the mechanics of the previous block advantage.

To understand the previous block advantage and its effect on decentralization, we first need to quantify what we mean by decentralization. We use a metric based on the Shannon entropy (shannon, ) to compare how decentralized the mining power is between different cryptocurrencies. Then, we verify whether or not the previous block advantage exists in our dataset and introduce a metric to quantify how prevalent it is across cryptocurrencies. Finally, we eliminate the effect of the previous block advantage on the mining power distribution in order to get a more accurate estimate of the miner’s true hashpower. This allows us to examine the state of the mining network more carefully in subsequent sections.

### 4.1. Mining Power Distribution

The mining power distribution of each coin is the proportion of hashpower owned by each miner on that coin. A miner with enough hashpower can gain control of the cryptocurrency, censor transactions, and ruin the value proposition of the cryptocurrency. To measure decentralization, we compare the randomness present in the mining power distribution with uniform distribution with $n$ miners. The larger $n$ is, the more decentralized the mining power distribution is. To compute $n$, we take two to the power of the Shannon entropy (shannon, ) in a manner similar to how the effective anonymity set size is computed in anonymous communication systems (danezis, ).

Blocks that were not claimed by any miner introduce a layer of uncertainty in this computation. To handle these cases, we first assume that unclaimed blocks do not belong to miners that have claimed other blocks as their own. Then, the pessimistic case is that the unclaimed blocks were all mined by a single miner, which gives us a lower bound on $n$. Similarly, to get an upper bound on $n$, we treat all unclaimed blocks to be mined by distinct miners. We present the lower and upper bound of $n$ for all coins in our study in Table 3.

Results. Similar to previous studies (adem18, ), we observe that none of these coins are very decentralized. A consensus protocol with fourteen equally powerful, independent members would be more decentralized than the most decentralized cryptocurrency, ZEC. However, even among this small range, there are fairly substantial differences in decentralization across cryptocurrencies. The most centralized coins, LTC and BSV, have the equivalent of four to five equally powerful miners controlling their cryptocurrency. This is two to three times more centralized than the most decentralized coin, ZEC, which has about twelve to fourteen such miners.

### 4.2. Existence of Previous Block Advantage

We now ask the question of whether the hypothesized previous block advantage exists in our dataset. To do this, we compute the hashpower of a miner $M$ in two different ways. First, we use the proportion of blocks that were mined by $M$ in the entire sample period. Then, we use the proportion of blocks that were mined by $M$, but only when we look at blocks that were mined immediately after $M$ mines a block. If the previous block advantage does not exist, then both sample proportions should be equal, which is our null hypothesis. We run a two proportion z-test to see if this is indeed the case for each of the miners in the six cryptocurrencies we studied. Note that the two samples are independent because the miner of any particular block on the chain is chosen independently at random from the set of all miners.

Table 4 shows the p-values from the two proportion z test with statistically significant results shaded. As our data is fairly noisy, we are more likely to make a type two error so we use a significance level of $5\%$. Note that a Bonferroni correction (bonferroni, ) is not required as each of our proportions are drawn from an independent set of blocks so the result of one of our z tests has no effect on the result of other tests.

Results. The previous block advantage is present across many small and large miners, with $49\%$ of miners across all cryptocurrencies obtaining a statistically significant advantage. If we consider just the largest four miners in each cryptocurrency, $62.5\%$ have the previous block advantage, which shows the prevalence of this advantage amongst larger miners.

Some cryptocurrencies have a stronger previous block advantage than others. For example, Bitcoin (BTC) and ZCash (ZEC) have very few advantaged miners, which may have to do with the design of the corresponding network layers. Bitcoin has multiple relay networks (adem18, ; fibre, ) that help blocks propagate faster and alleviate the previous block advantage. ZCash’s mining distribution is the most decentralized out of any cryptocurrency, which is a potential reason for ZEC’s low previous block advantage. In contrast, we see that Bitcoin Cash (BCH), Bitcoin SV (BSV), and Ethereum (ETH) has most, if not all, of the top miners benefitting from a statistically significant previous block advantage.

The previous block advantage in ETH seems to extend to almost every miner including smaller miners. This supports the existence of centralization pressure in ETH due to the previous block advantage. The likely explanation behind the prevalence of the previous block advantage in ETH is due to the comparatively smaller block interval in ETH. This makes it easier for our statistical tests to detect the existence of the previous block advantage. The failure to detect this in other cryptocurrencies is likely due to the larger block interval causing a smaller sample size. All other metrics measuring the prevalence of the previous block advantage do not show that Ethereum is particularly facing this centralization pressure. As a result, we believe that the previous block advantage is inherent to proof of work mining.

Note that some standard mining practices, such as spy mining, which occurs when a mining pool adds some of their mining power to another pool to obtain block headers more quickly, will not increase the previous block advantage. These practices are a way for miners to reduce the block transmission time to the network, which reduces the previous block advantage.

Caveats. Our data, especially among smaller miners, is fairly noisy since they have mined comparatively few blocks. Thus, making statistical inferences is more difficult as the previous block advantage obtained by them would have to be quite large. However, our results strongly suggest that the previous block advantage is real and present in our dataset.

Succession Matrix. Since we know that the previous block advantage exists, we systematically study these network effects using the succession matrix of a cryptocurrency. To build the succession matrix, index each known miner in the cryptocurrency as $M_{1},M_{2},\dots M_{n}$ where $M_{i}$ has more hashpower than $M_{j}$ if $i>j$ based on the estimate from the full blockchain. $S_{ij}$ is computed as the proportion of blocks that $M_{j}$ has mined when considering only the blocks immediately following blocks mined by miner $M_{i}$. This matrix allows us to look at the previous block advantage that miners obtain when mining after each other as well as themselves. We include the succession matrices for all studied cryptocurrencies in Figure 6 located in Appendix B.

### 4.3. Measuring the Previous Block Advantage

A metric to quantify the prevalence of the previous block advantage is important as it allows us to compare how prevalent the advantage is across cryptocurrencies as well as across the same cryptocurrency over time. To reason about how to construct such a metric, we start with the succession matrix. If miner $M_{j}$ gets a large previous block advantage when some miner $M_{i}$ mines the previous block, then we would expect the entry $S_{ij}$ to be large relative to the mining power of $M_{j}$. Thus, our first step to construct this metric is to normalize the succession matrix to highlight how large the previous block advantage is.

The normalized succession matrix $N$ is the matrix where $N_{ij}=\frac{S_{ij}}{m_{j}}$, where $m_{j}$ is the mining power of miner $M_{j}$. If $N_{ij}$ is larger than one, then $M_{j}$ has a slight advantage when mining after $M_{i}$ mines a block and if $N_{ij}$ is smaller than one, $M_{j}$ is at a disadvantage. We include the normalized succession matrix for all studied cryptocurrencies in Figure 7 located in Appendix B.

A first attempt at such a metric might simply be to compare $N_{ii}$ to $1$ and add them up across all miners, i.e. $\sum_{i=1}^{n}(N_{ii}-1)$ where $n$ is the number of miners. However, that metric treats small and large miners equally, which inaccurately describes the prevalence of the previous block advantage. A miner gets the previous block advantage every time they mine a block, so the more hashpower a miner has, the more heavily they should be weighted. With that intuition, if we let $P_{i}$ be the fraction of hashpower that $M_{i}$ controls, our final distance metric is $D=\sum_{i=1}^{n}(N_{ii}-1)P_{i}$. A system with no net previous block advantage will have $D\leq 0$.

Results. Table 5 shows the resulting distance metrics for all cryptocurrencies in our study. BTC miners have a very small previous block advantage compared to other cryptocurrencies, which is likely due to the small block size and the presence of multiple fast relay networks that improve block propagation. Interestingly, ETH has a fairly low previous block advantage metric even though most ETH miners have a statistically significant previous block advantage. This lends credence to the theory that the previous block advantage was more easily detectable in ETH due to the smaller block interval. Finally, we see that outliers have a large effect on these metrics. While many BCH miners do not obtain a significant previous block advantage, our metric is large for BCH due to the large previous block advantage obtained by BTC.TOP, which is one of BCH’s largest miners.

On further investigation, the reason for this outlier is that BTC.TOP does active chain switching between BCH and BTC. Thus, BTC.TOP gets a previous block disadvantage of sorts: when BTC.TOP has not mined the previous block, it is less likely to be mining on BCH and is consequently less likely to mine the next block. This behavior violates our assumption that the hashpower of each miner remains constant throughout the measurement period. BTC.TOP is not as influential of a miner in BTC compared to BCH and consequently does not skew the conclusions we make for BTC as much. We provide further evidence for BTC.TOP’s behavior in Section A.2.

### 4.4. Correcting for Previous Block Advantage

The proportion of blocks belonging to a miner on the main chain is typically used to estimate their hashpower (adem18, ; blockchaininfo, ). However, the previous block advantage skews these estimates as larger miners have a larger advantage. Understanding the hashpower distribution of each cryptocurrency is critical as miners who get too powerful can be put under pressure to scale back their operations, such as when a miner obtained 51% of the hashpower in Bitcoin in July 2014 (ghashissue2014, ). In this section, we show how to correct for this effect in order to more accurately estimate the mining power distribution.

Denote the $n$ miners of a cryptocurrency as $M_{1},M_{2},\cdots,M_{n}$ and the true, unbiased proportion of the mining power owned by miner $M_{i}$ is $O_{i}$. Note that $\sum_{i=1}^{n}O_{i}=1$, and that $O_{i}$ is the probability that $M_{i}$ has mined a randomly selected block in the blockchain.

Let $\alpha_{i}$ be the relative amount that $M_{i}$’s hashpower is increased due to its previous block advantage. Thus, $M_{i}$ has a relative probability of $O_{i}+\alpha_{i}$ of mining the next block when it as mined the previous block. Any other miner, $M_{j}$, just has a relative probability of $O_{j}$ of mining the next block. To get the absolute probability that any miner has mined a block, we renormalize the probabilities by dividing by $1+\alpha_{i}$. Thus, we see that the probability that $M_{i}$ mines a block immediately after mining the previous block is $\frac{O_{i}+\alpha_{i}}{1+\alpha_{i}}$. This is also the diagonal entry $S_{ii}$ in the successor matrix.

We now find an expression for the fraction of blocks mined by $M_{i}$ on the main chain using $O_{i}$ and $\alpha_{i}$. Since $M_{i}$ has $O_{i}$ of the hashpower, $M_{i}$ has a probability of $O_{i}$ to mine a particular block. However, $O_{i}$ of the time, $M_{i}$ gets an additional $\alpha_{i}$ probability to mine the next block. Thus, we see that the relative probability of $M_{i}$ mining any block on the main chain is $O_{i}(1+\alpha_{i})$. To renormalize, we divide the relative probability by $\sum_{j=1}^{n}O_{i}(1+\alpha_{j})$. Thus, the proportion of blocks mined by $M_{i}$ on the main chain is $\frac{O_{i}(1+\alpha_{i})}{1+\sum_{j=1}^{n}O_{j}\alpha_{j}}$.

This is the proportion of blocks that were mined by $M_{i}$ on the blockchain or the biased mining power estimate. We can solve the resulting set of $2n$ linear equations in order to solve for $O_{i}$, as desired.

Results. Table 6 shows the biased and corrected estimates for the mining power distribution. As we expect, since large miners have a bigger previous block advantage, their mining power was overestimated when using solely the proportion of blocks on the chain. Using the modified mining power distribution, we recompute the normalized succession matrix. Figure 1 shows all normalized succession matrices and the distance metrics introduced in Section 4.3 are shown in Table 5. The recomputed values indicate that there is a larger previous block advantage than we initially believed.

Small discrepancies in the mining power distribution matter immensely to individual miners and underutilizing their hashpower by even a tiny amount might equate to tens of thousands of dollars lost. An accurate estimate of a miner’s hashpower and how advantaged they are in the network allows miners to understand how to improve their mining operations.

Validation. As forked blocks are discarded and forgotten in most cryptocurencies, this corrected mining power distribution is not possible to validate. However, Ethereum’s forked blocks, called uncle blocks, are stored and rewarded in accordance with the GHOST (ghost, ) protocol. Uncle blocks can be stored on the chain much later than when they are mined, which eliminates the previous block advantage as the propagation of uncle blocks can be much slower than regular blocks. Therefore, including uncle blocks when computing the mining power distribution will also eliminate the previous block advantage. We include a detailed comparison of these two distributions in Table 8 in Appendix B. The high level takeaway is that our technique provides a more accurate estimate of the mining power distribution than simply using blocks on the main chain, with discrepancies between the estimates being attributable to noise.

## 5. Block Propagation

Understanding block propagation is key to resolving many core debates in cryptocurrencies, from setting parameters like the block size to deciding which parts of the codebase need to be optimized for maximum impact. Using our techniques, we can provide insights on how blocks propagate through the network and help cryptocurrencies make data-driven decisions.

In this section, we first estimate the latency of block propagation in each network using the succession matrix. Then, we introduce and discuss the safe envelope of a cryptocurrency in order to show how the previous block advantage changes with respect to block size and how to evaluate changes to the network. The decision, such as a network upgrade, actually achieved the desired effect.

### 5.1. Latency of Block Propagation

Block propagation speed affects many aspects of decentralization, such as the prevalence of the previous block advantage and how often forks occur. We can use the succession matrix to estimate how long blocks take to travel from one miner to all other miners in each cryptocurrency.

We can use a simplified model of block propagation to compute the average latency, where miner $M_{i}$ mines a block that takes $L$ minutes to propagate to all other miners. If the average block interval is $t$, then the miner $M_{i}$ effectively increases its proportion of the mining power by a factor of $\frac{L}{t}$ every time it mines. This increase can also be computed from the succession matrix, $S$, as well. If $M_{i}$ mines after itself $S_{ii}$ of the time and has $O_{i}$ fraction of the hashpower, its advantage is $\frac{S_{ii}-O_{i}}{O_{i}}$. To find the average latency, $\bar{L}$, we take the average advantage over all $n$ miners:

 $\frac{\bar{L}}{t}=\sum_{i}^{n}\frac{S_{ii}-O_{i}}{O_{i}}O_{i}=\sum_{i}^{n}S_{% ii}-1$

Thus, the average latency is $\bar{L}=((\sum_{i}^{n}S_{ii})-1)t$.

Results. Table 7 presents the latency of each cryptocurrency in minutes and the $95\%$ confidence interval. The average latency of BTC is particularly small, with the midpoint of the confidence interval being slightly negative. Most other coins have a block latency that is about $10$ to $30$ percent of their average block interval time. Thus, it would be useful for many coins to focus on improving their block propagation characteristics. BCH’s latency estimate is also unnaturally high, and we discuss the reasons why below.

Sanity Check. To check our latency results, we use the orphan rate, which can be approximated as $1-e^{\frac{-\tau}{T}}$, where $\tau$ is the average latency between miners and $T$ is the block interval time (Rizun2016, ). From prior measurement work in the Bitcoin (BTC) and Ethereum (ETH) networks, we see that the orphan rate for Bitcoin is extremely small and Ethereum’s fluctuates between 6-10% (adem18, ). From our table above, we see that the orphan rate in BTC is very close to zero which is in line with prior work. In ETH, the 95% confidence interval for the orphan rate is between 4% and 15%, also in line with prior measurements.

Caveats. The largest caveat for our measurement comes from the accuracy of the timestamp data that was collected from each block. Since those are set by the miner, they are subject to manipulation either due to benign reasons, such as clock skew, or malicious actions by miners. To validate the timestamp data, we compared them with the receive time in the block explorers and checked the timestamps of some blocks on our full nodes for Bitcoin and Ethereum.

Our latency measurement is sensitive to outliers, which is why BCH’s latency number is quite high at 2.68 minutes. The outlier in the BCH data is the inflated previous block advantage that BTC.TOP has due to its unusual mining behavior.

### 5.2. Safe Envelopes

Understanding how the block size affects decentralization has been a critical debate in cryptocurrencies, is the root cause behind some chain splits. To get at this question, we use the previous block advantage metric from Section 4.3 and compute it on blocks after grouping them by size.

We expect to see that the previous block advantage is small for small blocks since small blocks take less time to propagate. However, as the block size increases, the previous block advantage would get larger. We define the safe envelope of a cryptocurrency as the last block size range that does not have a significantly higher previous block advantage relative to a block with no payload. Beyond the safe envelope, increasing the block size directly encourages miners to centralize as there is an increased previous block advantage.

Results. Figure 2 presents the previous block advantage for different block sizes. Error bars represent the 95% confidence interval for each block size. From the figure, we see that Bitcoin (BTC), Bitcoin Cash (BCH) and Litecoin (LTC) have a safe envelope that is beyond the block sizes that we have observed. In Bitcoin (BTC), the range is likely due to a maximum block size that is too small for the network. However, Litecoin and Bitcoin Cash blocks are not yet at their maximum block size, so the lack of a defined safe envelope stems from underutilization not the block size limit. This means that BTC can safely increase its block size, while LTC and BCH users are free to utilize the chain more without risking mining centralization.

On the other hand, we see that ZCash, Ethereum, and Bitcoin SV all have a defined safe envelope. This means that they all operate at a throughput that encourages some mining centralization. ZCash has a safe envelope that ends somewhere between 10 KB and 50 KB while Ethereum’s safe envelope ends somewhere between 32 KB and 40 KB. Bitcoin SV’s safe envelope ends somewhere after 10 KB. These are extremely coarse-grained estimates, which are limited by the number of blocks in our sample.

Using the safe envelope, we can analyze the effects of significant events through a concrete example. In December 2018, Ethereum released a new client that improved the block propagation time on the Ethereum network.

Figure 3 compares two different Ethereum snapshots: one from just before the deployment and one from after the deployment when most users had upgraded. Interestingly, we see that improving the network layer does not always improve the safe envelope. Indeed, the safe envelope in 2018 was between 24 KB and 32 KB while the safe envelope in 2019 actually decreased to somewhere between 16 KB and 24 KB. However, if we look at the magnitude of our distance metric, it has decreased for almost all block sizes. We therefore conclude that the network upgrade improved decentralization in the system as a whole even though the safe envelope has decreased.

## 6. Unusual Miner Behaviors

Using our analysis techniques, we have observed some miner cartels in ZEC, BSV and LTC. Additionally, we have observed BTC.TOP switching between BCH and BTC. We discuss how we reached those conclusions and the methodology behind them in Appendix A.

## 7. Related Work

Cryptocurrency measurement has been an active research area for many years, starting with the work by Decker et al (DeckerW13, ), which connected to many peers in the Bitcoin network and measured block propagation characteristics. Since then, traditional cryptocurrency measurement studies have typically chosen a few questions to ask, built a corresponding probe and then inferred properties of the peer to peer network based on measurements made by their probe. AddressProbe (Miller2015DiscoveringB, ) discovered peer-to-peer links in Bitcoin to analyze the topology of the entire network, find connected components, and find high degree nodes. The Falcon Relay Network and BMS (adem18, ) was a blockchain measurement system to measure the network properties of the nodes in the Bitcoin and Ethereum peer to peer networks. TxProbe (SSCJAAB18, ) cleverly used orphaned transactions in order to take snapshots of Bitcoin’s testnet. NodeFinder (Kim2018MeasuringEN, ) measured the Ethereum network propagation characteristics in the peer to peer network. Still, other measurement studies primarily focused on Bitcoin (AGVS2014, ; Pappalardo2018BlockchainII, ; AJ2015, ) or Ethereum (Kiffer2018AnalyzingEC, ) through direct measurement.

However, while these techniques are very powerful in their own right, they do have a few limitations. Prior techniques rely on a snapshot of the network during a particular time period- it is impossible to go back in time and answer historical queries. It is also very difficult to effectively capture the effects of significant events, such as an Ethereum network upgrade, since the measurement probe must be running during the time of the event. Additionally, measurement probes require significant manpower to build, run, and maintain. Finally, measurement probes are fundamentally limited in what types of questions they can answer as they can only be run on nodes that are controlled by the researcher. Thus, gaining insights, such as the structure of the mining network, is very difficult to do with a traditional measurement probe.

Our large scale statistical analysis relies on infrastructure that has already been built and maintained: the codebase of the cryptocurrency itself. Leveraging this data is very powerful and sidesteps all of the problems present in building custom measurement probes. The codebase is always running and is running on every single node in the network, which allows us to collect information from the entire network at all times. However, our technique does require a significant amount of data to make meaningful conclusions and the estimates are more coarse-grained than they would be with a measurement probe.

Prior work also verifies our conclusions as well. For example, we are not the first paper to note the centralization of the mining distribution (adem18, ). Other works (AGVS2014, ; AJ2015, ) show recent events in Bitcoin are showing the limits of its decentralized nature and how Bitcoin is slowly becoming centralized. Our analysis on block propagation also suggests that the Bitcoin is underutilizing its bandwidth and that the network can use more bandwidth without compromising decentralization, a conclusion that has been made using prior measurement probes as well (adem18, ; onscalingdecentralized, ).

While many measurement studies have focused on the largest cryptocurrencies, Bitcoin and Ethereum, there has been some prior work looking at properties of other cryptocurrencies as well. For example, some work has done a comparative financial and statistical analysis between Bitcoin, Litecoin, Ripple and Ethereum (Sapuric2017LedgerDW, ). Additionally, others have compared Bitcoin, Ethereum and Ripple on their architecture, scripting language and other crucial properties (Mauri2018ACA, ). There have also been in-depth studies on comparing the different consensus mechanisms (Bonneau2015SoKRP, ; Bonneau2015perspectivesOB, ). However, to our knowledge, we are the first known measurement study incorporating six different cryptocurrencies.

In our work, we looked at the safe envelope in order to deduce whether or not it was safe to increase the block size in each cryptocurrency. However, other, more effective ways of scaling cryptocurrencies have been proposed as well which optimize all parts of the cryptocurrency stack. Starting at the bottom, the peer to peer network has many suggested optimizations that make the block and transaction propagation more efficient (NaumenkosErlay2019paper, ; graphene, ). In the consensus layer, many consensus protocol changes have been proposed for Bitcoin that keep the same security guarantees as Nakamoto consensus while increasing the throughput of the system (eyal2016bitcoin, ; byzcoin, ). Finally, other proposals look at removing transactions from the blockchain altogether and only use the blockchain as a conflict resolution mechanism (lightning, ; plasma, ). In all of these proposals, however, measuring the safe envelope is critically important so that there is empirical evidence whether or not the new technology actually made a meaningful change to the underlying system.

## 8. Conclusion

This paper rests on a single, crucial observation: mining a block gives the corresponding miner a slight, but measurable advantage when mining the next block. Leveraging this insight is extremely powerful and allows us to answer difficult to answer questions about the internals of the network that connects miners. This, in turn, allows us to provide actionable intelligence to miners and developers to see where they should spend their efforts in order to achieve their objectives. With these techniques, we were able to compute a more accurate mining power distribution, answer questions about block propagation including how block size affects the previous block advantage, and observe some interesting miner behaviors like miner cartelization and active chain switching.

## References

• (1) Azouvi, S., Maller, M., Meiklejohn, S.: Egalitarian society or benevolent dictatorship: The state of cryptocurrency governance. In: International Conference on Financial Cryptography and Data Security. pp. 127–143. Springer (2018)
• (2) Beikverdi, A., Song, J.: Trend of centralization in bitcoin’s distributed network. In: 2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD). pp. 1–6. IEEE (2015)
• (3) Bonneau, J., Miller, A., Clark, J., Narayanan, A., Kroll, J.A., Felten, E.W.: perspectives on bitcoin and second-generation cryptocurrencies (2015)
• (4) Bonneau, J., Miller, A., Clark, J., Narayanan, A., Kroll, J.A., Felten, E.W.: Sok: Research perspectives and challenges for bitcoin and cryptocurrencies. 2015 IEEE Symposium on Security and Privacy pp. 104–121 (2015)
• (5) Community, E.: A next-generation smart contract and decentralized application platform (2017), https://github.com/ethereum/wiki/wiki/White-Paper
• (6) Corallo, M.: Fibre: Fast internet bitcoin relay engine (2017)
• (7) Croman, K., Decker, C., Eyal, I., Gencer, A.E., Juels, A., Kosba, A., Miller, A., Saxena, P., Shi, E., Sirer, E.G., et al.: On scaling decentralized blockchains. In: International Conference on Financial Cryptography and Data Security. pp. 106–125. Springer (2016)
• (8) Decker, C., Wattenhofer, R.: Information propagation in the bitcoin network. In: 13th IEEE International Conference on Peer-to-Peer Computing, IEEE P2P 2013, Trento, Italy, September 9-11, 2013, Proceedings. pp. 1–10 (2013). https://doi.org/10.1109/P2P.2013.6688704, https://doi.org/10.1109/P2P.2013.6688704
• (9) Delgado-Segura, S., Bakshi, S., Pérez-Solà, C., Litton, J., Pachulski, A., Miller, A., Bhattacharjee, B.: Txprobe: Discovering bitcoin’s network topology using orphan transactions. In: International Conference on Financial Cryptography and Data Security. pp. 550–566. Springer (2019)
• (10) Eyal, I., Gencer, A.E., Sirer, E.G., Van Renesse, R.: Bitcoin-ng: A scalable blockchain protocol. In: 13th $\{$USENIX$\}$ Symposium on Networked Systems Design and Implementation ($\{$NSDI$\}$ 16). pp. 45–59 (2016)
• (11) Eyal, I., Sirer, E.G.: Majority is not enough: bitcoin mining is vulnerable. Commun. ACM 61(7), 95–102 (2018). https://doi.org/10.1145/3212998, http://doi.acm.org/10.1145/3212998
• (12) Garay, J., Kiayias, A., Leonardos, N.: The bitcoin backbone protocol: Analysis and applications. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques. pp. 281–310. Springer (2015)
• (13) Gencer, A.E., Basu, S., Eyal, I., Van Renesse, R., Sirer, E.G.: Decentralization in bitcoin and ethereum networks. In: International Conference on Financial Cryptography and Data Security. pp. 439–457. Springer, Berlin, Heidelberg. (2018)
• (14) Gervais, A., Karame, G.O., Capkun, V., Capkun, S.: Is bitcoin a decentralized currency? IEEE security & privacy 12(3), 54–60 (2014)
• (15) Gilad, Y., Hemo, R., Micali, S., Vlachos, G., Zeldovich, N.: Algorand: Scaling byzantine agreements for cryptocurrencies. In: Proceedings of the 26th Symposium on Operating Systems Principles. pp. 51–68. ACM (2017)
• (16) Kiffer, L., Levin, D., Mislove, A.: Analyzing ethereum’s contract topology. In: IMC (2018)
• (17) Kim, S.K., Ma, Z., Murali, S., Mason, J., Miller, A., Bailey, M.: Measuring ethereum network peers. In: Proceedings of the Internet Measurement Conference 2018. pp. 91–104 (2018)
• (18) Kogias, E.K., Jovanovic, P., Gailly, N., Khoffi, I., Gasser, L., Ford, B.: Enhancing bitcoin security and performance with strong consistency via collective signing. In: 25th $\{$usenix$\}$ security symposium ($\{$usenix$\}$ security 16). pp. 279–296 (2016)
• (19) Lin, J.: Divergence measures based on the shannon entropy. IEEE Transactions on Information theory 37(1), 145–151 (1991)
• (20) Matonis, J.: The bitcoin mining arms race: Ghash.io and the 51(retrieved June, 2020), https://www.coindesk.com/bitcoin-mining-detente-ghash-io-51-issue
• (21) Mauri, L., Cimato, S., Damiani, E.: A comparative analysis of current cryptocurrencies. In: ICISSP (2018)
• (22) Miers, I., Garman, C., Green, M., Rubin, A.D.: Zerocoin: Anonymous distributed e-cash from bitcoin. In: 2013 IEEE Symposium on Security and Privacy. pp. 397–411. IEEE (2013)
• (23) Miller, A., Litton, J., Pachulski, A., Gupta, N., Levin, D., Spring, N., Bhattacharjee, B.: Discovering bitcoin’s public topology and influential nodes (2015)
• (24) Nakamoto, S., et al.: Bitcoin: A peer-to-peer electronic cash system (2008)
• (25) Naumenko, G., Maxwell, G., Wuille, P., Fedorova, A., Beschastnikh, I.: Erlay: Efficient transaction relay for bitcoin. In: Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. pp. 817–831 (2019)
• (26) Nayak, K., Kumar, S., Miller, A., Shi, E.: Stubborn mining: Generalizing selfish mining and combining with an eclipse attack. 2016 IEEE European Symposium on Security and Privacy pp. 305–320 (2015)
• (27) Ozisik, A.P., Andresen, G., Bissias, G., Houmansadr, A., Levine, B.: Graphene: A new protocol for block propagation using set reconciliation. In: Data Privacy Management, Cryptocurrencies and Blockchain Technology, pp. 420–428. Springer (2017)
• (28) Pappalardo, G., di Matteo, T., Caldarelli, G., Aste, T.: Blockchain inefficiency in the bitcoin peers network. EPJ Data Science 7, 1–13 (2018)
• (29) Pass, R., Shi, E.: Thundercore (2018)
• (30) Poon, J., Buterin, V.: Plasma: Scalable autonomous smart contracts. White paper pp. 1–47 (2017)
• (31) Poon, J., Dryja, T.: The bitcoin lightning network: Scalable off-chain instant payments (2016)
• (32) Rizun, P.R.: Subchains: A technique to scale bitcoin and improve the user experience. Ledger 1, 38–52 (Dec 2016)
• (33) Rocket, T.: Scalable and probabilistic leaderless bft consensus through metastability (2019)
• (34) Rosenthal, D.: Gini coefficients of cryptocurrencies (2018), https://blog.dshr.org/2018/10/gini-coefficients-of-cryptocurrencies.html
• (35) Sapuric, S.: Ledger do we trust ? a comparative analysis of cryptocurrencies (2017)
• (36) Serjantov, A., Danezis, G.: Towards an information theoretic metric for anonymity. In: International Workshop on Privacy Enhancing Technologies. pp. 41–53. Springer (2002)
• (37) Sompolinsky, Y., Zohar, A.: Secure high-rate transaction processing in bitcoin. In: International Conference on Financial Cryptography and Data Security. pp. 507–527. Springer (2015)
• (38) Team, B.I.: Blockchain info
• (39) Team, B.: A search engine for blockchains (retrieved Dec, 2018), https://blockchair.com/
• (40) Team, Z.: A zcash block explorer (retrieved Dec, 2018), https://api.zcha.in/
• (41) Wang, L., Liu, Y.: Exploring miner evolution in bitcoin network. In: International Conference on Passive and Active Network Measurement. pp. 290–302. Springer (2015)
• (42) Weisstein, E.W.: Bonferroni correction (2004)

## Appendix A Unusual Miner Behaviors

We now present two interesting miner behaviors present in our dataset. First, we discuss miner cartelization when two miners are interconnected and essentially function as a single miner. Next, we discuss the behavior of BTC.TOP and active chain switching.

### A.1. Miner Cartelization

Miner cartels are groups of miners that are nominally separate, but act like a single entity separate from the rest of the mining network. Such miners would preferentially send their blocks to each other and may have dedicated links between each other. When such behavior happens, miners in the cartel receive blocks from each other before everyone else. Miner cartels are dangerous because they make the cryptocurrency appear more decentralized than it actually is. Detecting cartels allows the community to understand, and perhaps take action, if the cryptocurrency is becoming too centralized.

Miners in a cartel receive a previous block advantage anytime another miner in the same cartel mines a block. Thus, we would expect to see very high values for miners in a cartel in the normalized succession matrix. To highlight cartel miners more clearly, we modify the normalized succession matrix, $N$, as follows.

Suppose we have two miners, $M_{i}$ and $M_{j}$, that belong to the same cartel. That means that $M_{i}$ gets an advantage whenever $M_{j}$ mines a block, and vice versa. Consequently, $N_{ij}$ and $N_{ji}$ must both be much larger than one, so we can analyze pairs of miners in the same cartel by computing $\frac{N_{ij}+N_{ji}}{2}-1$. High positive values imply that $M_{i}$ and $M_{j}$ may be in the same cartel. We present this in a pairs matrix, $P$, where $P_{ij}=P_{ji}=\frac{N_{ij}+N_{ji}}{2}-1$. If there were no miner cartels, then we would expect $P$ to contain solely elements very close to $0$.

Results Figure 4 shows the pairs matrix only for cryptocurrencies that are noteworthy in some way. In ZEC, we see that miner0 and miner5 are likely part of the same cartel. Additionally, all BSV miners are getting some previous block advantage, but no explicit cartels have formed. In LTC, there seems to be another likely cartel between LTC.TOP and litecoinpool.org, but this is not as extreme as the one in ZEC.

Miner centralization, while a problem, is possible to detect and base decisions on, such as users selling off tokens if a coin is too centralized. However, miner cartels are more insidious as they give the illusion of decentralization while hiding a system that is much more centralized in nature. Our techniques make it easy to detect cartels so that it is possible to detect and base decisions on.

### A.2. Active Chain Switching

In Nakamoto consensus, miners are rewarded every time they mine a block in the cryptocurrency they are mining. To remain profitable, miners use specialized hardware that is optimized for the proof of work function for their target cryptocurrency. For example, in our study, a Bitcoin (BTC) miner can easily switch to Bitcoin Cash (BCH) or Bitcion SV (BSV) as they all share the same proof of work function. Thus, miners are incentivized to switch between cryptocurrencies depending on the prices at the time.

Our analysis assumes that the network hashrate is relatively consistent across our measurement period, so if miners frequently actively chain switch, that would violate our assumption. In particular, we assume that taking the sample of blocks on the chain immediately succeeding a miner’s blocks would give us a representative subsample of all blocks on the chain. When a miner is actively chain switching, they are more likely to mine after themselves since the fact that a block of theirs appears on the chain implies that they are currently mining on this cryptocurrency. This makes it more likely that they mine the next block, which will show up in our dataset as an exceedingly large previous block advantage.

A potential example of such a miner is BTC.TOP in BCH, who may be active chain switching between BTC and BCH. If we look at BTC.TOP’s blocks in Figure 5, we see that few blocks were mined on BTC when BTC.TOP was mining BCH and vice versa. However, this does not fully explain BTC.TOP’s large gaps when we look at its mining behavior on BCH as we do not see a large previous block advantage for BTC.TOP in BTC. In general, strategic mining behavior that affects when their blocks appear will be detrimental to our analysis and is a limitation of our techniques. However, we see that this kind of behavior is not very common in the cryptocurrencies we have studied.

## Appendix B Figures and Tables

In this section, we include more detailed figures and tables to support the claims made in the main body of the paper.

Generated by LateXML