STIP Retroactive Analysis – Sequencer Revenue
The below research report is also available in document format here .
TL;DR
Our findings show that 43% of Arbitrum’s revenue between November 2023 and the Dencun update was attributable to the STIP, with $15.2M recouped from sequencer revenue against the $85.2M spent. Although the STIP had a positive short-term impact on market presence, its long-term effectiveness remains uncertain. The program likely helped maintain Arbitrum’s prominence and market share amidst intensifying competition from other L2s, influencing protocol launch decisions, amongst others. However, the $60M net loss signals the need for better-structured future incentive programs.
Context and Goals
Starting November 2023, Arbitrum launched the Short-Term Incentive Program (STIP), distributing millions of ARB tokens to various protocols to boost user engagement. This initiative allocated different amounts to a wide range of protocols across multiple sectors. Previously, Blockworks Research examined several protocols within specific verticals — perp DEXs, spot DEXs, and yield aggregators — to measure the STIP’s impact on key metrics.
In this analysis, we aim to assess the STIP’s overall effect on the Arbitrum network by examining its impact on sequencer revenue. The primary goal of the STIP was to attract more users to the recipient protocols and the broader ecosystem, fostering growth and activity. An increase in sequencer revenue would indicate a successful incentive program, where the costs of ARB incentives are at least partially offset by sequencer revenue in ETH.
Due to the Dencun upgrade, which significantly reduced fees across all L2s, the expected costs and revenue underwent considerable changes. Consequently, we set March 13, 2024, as the cutoff date for our analysis. Most protocols completed their distribution by March 29, 2024, ensuring that our cutoff includes the bulk of the incentive distribution period. The analysis period starts from November 2, 2023, when the first protocol began distributing its allocation.
Results
Below is the monthly sequencer revenue for major L2 networks considered in our analysis. One can see the increase in Arbitrum’s sequencer revenue dominance during STIP.
By zooming out to daily revenue using a 30-day moving average, we can gain a more detailed view of its evolution over time, as shown in the following chart.
Given that most L2 protocols have not been active as long as Arbitrum, we aimed to include more protocols in our modeling of Arbitrum’s revenue by using data starting from August 1, 2023, excluding Blast. The result of the synthetic control compared to Arbitrum’s own revenue is shown in the chart below.
By comparing the synthetic control, which represents the expected sequencer revenue for Arbitrum without the STIP, to the actual sequencer revenue observed during the same period, we can determine the STIP’s impact.
To evaluate the total cumulative impact, we calculate the area under the curve, which totals $15.2M. Comparing this to the total revenue of $35.1M during the period, we conclude that 43% of the revenue is attributable to the STIP. However, the total spent on the STIP was 71M ARB, equating to $85.2M at an average price of $1.2 per ARB.
Main Takeaways
The analysis concluded that 43% of Arbitrum’s revenue between November 2023 and the dencun update was attributable to the STIP. However, while $85.2M was spent on the STIP, only $15.2M was directly recouped through sequencer revenue. It’s important to note that the primary goal of the Arbitrum STIP was to foster the ecosystem. Even though the immediate revenue did not cover the cost of the STIP, there may be other long-lasting positive effects on the network. However, the loss of $60M is considerable and alerts for better preparation of such programs by the DAO in the future.
Given the significant traction other L2s have gained since earlier this year, Arbitrum’s competitive landscape has become more intense. The STIP likely helped maintain Arbitrum’s prominence and market share, particularly influencing new protocols’ decisions on which network to launch and generally where to focus their efforts. Although the STIP boosted Arbitrum’s market presence in the short term, its ability to drive long-term sustainable growth and expand market share is still unclear. Several common themes emerged in both the previous STIP retroactive analysis and our operational analysis, attributing the program’s individual success amongst protocols to various factors, especially the incentive mechanism used. These insights should inform future programs to minimize losses and maximize the effectiveness of ARB spending.
Additionally, evaluating persistent metrics like the number of new users who remained active in the ecosystem after onboarding through the STIP would offer valuable insights for future research.
Methodology
TLDR: We employed an analytical approach known as the Synthetic Control (SC) method. The SC method is a statistical technique utilized to estimate causal effects resulting from binary treatments within observational panel (longitudinal) data. Regarded as a groundbreaking innovation in policy evaluation, this method has garnered significant attention in multiple fields. At its core, the SC method creates an artificial control group by aggregating untreated units in a manner that replicates the characteristics of the treated units before the intervention (treatment). This synthetic control serves as the counterfactual for a treatment unit, with the treatment effect estimate being the disparity between the observed outcome in the post-treatment period and that of the synthetic control. In the context of our analysis, this model incorporates market dynamics by leveraging data from other protocols (untreated units). Thus, changes in market conditions are expected to manifest in the metrics of other protocols, thereby inherently accounting for these external trends and allowing us to explore whether the reactions of the protocols in the analysis differ post-STIP implementation.
To achieve the described goals, we turned to causal inference. Knowing that “association is not causation”, the study of causal inference lies in techniques that try to figure out how to make association be causation. The classic notation of causality analysis revolves around a certain treatment , which doesn’t need to be related to the medical field, but rather is a generalized term used to denote an intervention for which we want to study the effect. We typically consider the treatment intake for unit i, which is 1 if unit i received the treatment and 0 otherwise. Typically there is an , the observed outcome variable for unit i. This is our variable of interest, i.e., we want to understand what the influence of the treatment on this outcome was. The fundamental problem of causal inference is that one can never observe the same unit with and without treatment, so we express this in terms of potential outcomes. We are interested in what would have happened in the case some treatment was taken. It is common to call the potential outcome that happened the factual, and the one that didn’t happen, the counterfactual. We will use the following notation:
- the potential outcome for unit i without treatment
- the potential outcome for the same unit i with the treatment.
With these potential outcomes, we define the individual treatment effect to be . Because of the fundamental problem of causal inference, we will actually never know the individual treatment effect because only one of the potential outcomes is observed.
One technique used to tackle this is Difference-in-Difference (or diff-in-diff). It is commonly used to analyze the effect of macro interventions such as the effect of immigration on unemployment, the effect of law changes in crime rates, but also the impact of marketing campaigns on user engagement. There is always a period before and after the intervention and the goal is to extract the impact of the intervention from a general trend. Let be the potential outcome for treatment D on period T (0 for pre-intervention and 1 for post-intervention). Ideally, we would have the ability to observe the counterfactual and estimate the effect of an intervention as: #0), the causal effect being the outcome in the period post-intervention in the case of a treatment minus the outcome in the same period in the case of no treatment. Naturally, is counterfactual so it can’t be measured. If we take a before and after comparison, -E%5BY(0)%7CD%3D1)#0) we can’t really say anything about the effect of the intervention because there could be other external trends affecting that outcome.
The idea of diff-in-diff is to compare the treated group with an untreated group that didn’t get the intervention by replacing the missing counterfactual as such: . We take the treated unit before the intervention and add a trend component to it, which is estimated using the control . We are basically saying that the treated unit after the intervention, had it not been treated, would look like the treated unit before the treatment plus a growth factor that is the same as the growth of the control.
An important thing to note here is that this method assumes that the trends in the treatment and control are the same. If the growth trend from the treated unit is different from the trend of the control unit, diff-in-diff will be biased. So, instead of trying to find a single untreated unit that is very similar to the treated, we can forge our own as a combination of multiple untreated units, creating a synthetic control.
That is the intuitive idea behind using synthetic control for causal inference. Assuming we have units and unit 1 is affected by an intervention. Units are a collection of untreated units, that we will refer to as the “donor pool”. Our data spans T time periods, with periods before the intervention. For each unit j and each time t, we observe the outcome . We define as the potential outcome without intervention and the potential outcome with intervention. Then, the effect for the treated unit at time t, for is defined as . Here is factual but is not. The challenge lies in estimating .
Source: 15 - Synthetic Control — Causal Inference for the Brave and True
Since the treatment effect is defined for each period, it doesn’t need to be instantaneous, it can accumulate or dissipate. The problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of the treated unit, had it not been treated.
The most straightforward approach is to consider that a combination of units in the donor pool may approximate the characteristics of the treated unit better than any untreated unit alone. So we define the synthetic control as a weighted average of the units in the control pool. Given the weights the synthetic control estimate of is .
We can estimate the optimal weights with OLS like in any typical linear regression. We can minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period. Hence, creating a “fake” unit that resembles the treated unit before the intervention, so we can see how it would behave in the post-intervention period.
In the context of our analysis, this means that we can include all other L2 protocols that did not take part in the STIP in our donor pool and estimate a “fake”, synthetic, control L2 protocol that follows the trend of Arbitrum in the period before receiving the STIP. As mentioned before, the metric of interest chosen for this analysis was sequencer revenue and, in particular, we calculated the 30-day moving average to smooth the data. Then we can compare the behavior of our synthetic control with the factual and estimate the impact of the STIP by taking the difference. We are essentially comparing what would have happened, had the protocol not received the STIP with what actually happened.
However, sometimes regression leads to extrapolation, i.e., values that are outside of the range of our initial data and can possibly not make sense in our context. This happened when estimating our synthetic control, so we constrained the model to do only interpolation. This means we restrict the weights to be positive and sum up to one so that the synthetic control is a convex combination of the units in the donor pool. Hence, the treated unit is projected in the convex hull defined by the untreated unit. This means that there probably won’t be a perfect match of the treated unit in the pre-intervention period and that it can be sparse, as the wall of the convex hull will sometimes be defined only by a few units. This works well because we don’t want to overfit the data. It is understood that we will never be able to know with certainty what would have happened without the intervention, just that under the assumptions we can make statistical conclusions.
Formalizing interpolation, the synthetic control is still defined in the same way by . But now we use the weights that minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period , subject to the restriction that are positive and sum to one.
We get the optimal weights using quadratic programming optimization with the described constraints on the pre-STIP period and then use these weights to calculate the synthetic control for the total duration of time we are interested in. We initialized the optimization for each analysis with different starting weight vectors to avoid introducing bias in the model and getting stuck in local minima. We selected the one that minimized the square difference in the pre-intervention period.
Below is the resulting chart for Arbitrum, showing the factual sequencer revenue observed in Arbitrum and the synthetic control.
With the synthetic control, we can then estimate the effect of the STIP as the gap between the factual revenue and the synthetic control, .
To understand whether the result is statistically significant and not just a possible result we got due to randomness, we use the idea of Fisher’s Exact Test. We permute the treated and control units exhaustively by, for each unit, pretending it is the treated one while the others are the control. We create one synthetic control and effect estimates for each protocol, pretending that the STIP was given to another protocol to calculate the estimated impact for this treatment that didn’t happen. If the impact in the protocol of interest is sufficiently larger when compared to the other fake treatments (“placebos”), we can say our result is statistically significant and there is indeed an observable impact of the STIP on the Arbitrum’s sequencer revenue. The idea is that if there was no STIP in the other protocols and we used the same model to pretend that there was, we wouldn’t see any impact.
References
“Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.”
Aayush Agrawal - Causal inference with Synthetic Control using Python and SparseSC
01 - Introduction To Causality — Causal Inference for the Brave and True