ARDC Research Deliverables

Short-form Case Study – GMX

As requested by the DAO advocate for the ARDC, L2BEAT, Blockworks Research has begun conducting case studies on STIP recipients. Since perpetual futures-related projects, as categorized in the dashboard created by OpenBlock Labs, accounted for 27.45M ARB (~39% of the total STIP incentives), the first case study is based on GMX, the project receiving the largest STIP allocation at 12M ARB. This case study focuses on the grantee’s application and reporting structure, incentive mechanisms utilized, and the sustainability of activity induced by incentives.

This is part of a wider, in-depth analysis of the overall STIP process.

6 Likes

Short-form Case Study – JOJO

The second case study presented here concerns JOJO, another perp DEX that was given a grant allocation of up to 200K ARB, making it the smallest recipient within the perpetuals category. Similar to the first case study, the research conducted focuses on the grantee’s application and reporting structure, incentive mechanisms utilized, and the sustainability of activity induced by incentives.

2 Likes

STIP-Bridge – Support Material for the Community

To help community members and delegates form a holistic view of STIP-Bridge applicants that aren’t automatically required to go through a challenge vote on Snapshot (i.e., projects that have applied for a STIP-Bridge allocation before the initial deadline of May 3rd, 2024), Blockworks Research, acting as one of the Research Members for the ARDC, is sharing a workbook that includes qualitative data on the aforementioned applicants’ utilized incentive structures, operational approaches, reporting standards, notable protocol changes, etc. This data has primarily been collected from protocols’ original STIP applications, bi-weekly updates, final reports, and STIP-Bridge addendums. We’ve also included some relevant commentary, as well as any possible red flags and minimal/open-to-interpretation rule deviations we’ve encountered. This qualitative data is meant to be used as a supplementary tool to performance data (see OpenBlock Labs’ STIP dashboard), ideally assisting the community in forming opinions on STIP-Bridge applicants’ performance during and after the STIP.

Moreover, as instructed by the DAO Advocate, L2BEAT, the workbook also includes a sheet with summaries of our findings, accompanied by a color code representing whether or not the community might want to inspect further a protocol’s performance, incentive mechanisms, or program-related operations.

9 Likes

STIP-Bridge (Extended Deadline Applicants) – Support Material for the Community

Building on our earlier STIP-Bridge work, Blockworks Research is releasing another workbook that includes Bridge applicants who applied after the initial deadline of May 3, 2024 and have thus been automatically put up for a challenge vote on Snapshot. The data sources and methodology used are similar to those in our previous analysis posted above.

2 Likes

STIP Retroactive Analysis – Perp DEX Volume

The below research report is also available in document format here.

TL;DR

In H2 2023, Arbitrum launched the Short-Term Incentive Program (STIP) by distributing millions of ARB tokens to various protocols to drive user engagement. This report focuses on how the STIP impacted trading volumes in the perp DEX vertical, specifically examining the performance of Vertex, Gains Network, GMX, MUX Protocol, Vela Exchange, Perennial, and Jojo Exchange. By employing the Synthetic Control (SC) causal inference method to create a “synthetic” control group, we aimed to isolate the STIP’s effect from broader market trends. For an in-depth explanation of the utilized inference method and data, see the Methodology and Annex sections at the end of this report.

Our analysis yielded varied results: Vertex saw a significant 70% of its total trading volume in the analyzed period attributed to STIP, while GMX saw 43%. MUX Protocol also benefited, with 15% of its volume linked to STIP incentives. In contrast, our model predicts that Gains Network experienced 5% less volume than if there had been no STIP, and Vela Exchange showed no statistically significant impact. These outcomes seem to highlight that mainly utilizing traditional fee rebates, as done by Vertex, GMX, and MUX, was more effective in driving volume growth than the gamified incentives used by Gains Network and Vela Exchange.

Our methodology and detailed results underscore the complexities of measuring such interventions in a volatile market, stressing the importance of comparative analysis to understand the true impact of incentive programs like STIP.

Context and Goals

In H2 2023, Arbitrum initiated a significant undertaking by distributing millions of ARB tokens to protocols as part of the Short-Term Incentive Program (STIP), aiming to spur user engagement. This program allocated varying amounts to diverse protocols across different verticals. Our objective is to gauge the efficacy of these recipient protocols in leveraging their STIP allocations to boost the usage of their products. The challenge lies in accurately gauging the impact of the STIP amidst a backdrop of various factors, including broader market conditions.

This report pertains to the perp DEX vertical in particular. In this vertical, the STIP recipients were GMX, Gains Network, Vertex, MUX Protocol, Vela Exchange, Perennial, and Jojo Exchange. The following table summarizes the amount of ARB tokens received and when they were used by each protocol.

To assess the impact of the STIP on perp DEX protocols, daily trading volume serves as a crucial metric. Despite varying approaches to liquidity—such as synthetic AMMs versus orderbook liquidity—the true measure of a protocol’s success lies in the volume traded on its platform.

We used a Causal Inference method called Synthetic Control (SC) to analyze our data. This technique helps us understand the effects of a specific event by comparing our variable of interest to a “synthetic” control group. Here’s a short breakdown:

  • Purpose: SC estimates the impact of a particular event or intervention using data over time.
  • How It Works: It creates a fake control group by combining data from similar but unaffected groups. This synthetic group mirrors the affected group before the event.
  • Why It Matters: By comparing the real outcomes with this synthetic control, we can see the isolated effect of the event.

In our analysis, we use data from other protocols to account for market trends. This way, we can better understand how protocols react to changes, like the implementation of the STIP, by comparing their performance against these market-influenced synthetic controls. The results pertain to the period from the start of each protocol’s use of the STIP until March 1st. Therefore, this analysis centers on the effectiveness of various incentive structures, rather than the sustainability of activity. The final date was chosen to keep the statistical significance level at 90%, as further described in the Methodology section.

Jojo Exchange was excluded from the analysis due to insufficient reliable data and an apparent pivot to Base. Blockworks Research has conducted a separate case study on Jojo, available on the governance forum. Perennial has also been excluded due to its v2 launch coinciding with the start of the STIP program, and the methodology we employed requires a comparable dataset from before the STIP’s implementation to accurately gauge its impact.

Results

Vertex

The total estimated impact of the STIP from November 8th, 2023 until March 1st, 2024 on Vertex’s daily volume is $32.8B. Vertex’s total volume in this period was $46.9B, so according to our analysis, 70% of the total volume can be attributed to the STIP. Since Vertex received a total of 3M ARB, valued at around $3.6M (at 1.2$ per ARB), this means that the STIP created $9111.67 in volume per dollar spent for the 115 days analyzed.

Gains Network

The total estimated impact of the STIP from December 29th, 2023 until March 1st, 2024 on Gains Network’s daily volume is minus $0.3B. Gains Network’s total volume in this period is $7.1B, so according to our analysis, a loss of 5% of total volume can be attributed to the STIP. Since Gains Network received a total of 4.5M ARB valued at around $5.4M (at 1.2$ per ARB), this means that the STIP caused a loss of $63.52 in volume per dollar spent for the 64 days analyzed.

GMX

The total estimated impact of the STIP from November 8th, 2023 until March 1st, 2024 on GMX’s daily volume is $10.5B. GMX’s total volume in this period is $24.4B, so according to our analysis, 43% of the total volume can be attributed to the STIP. Since GMX received a total of 12M ARB valued at around $14.4M (at 1.2$ per ARB), this means that the STIP created $731.67 in volume per dollar spent for the 115 days analyzed.

MUX Protocol

The total estimated impact of the STIP from November 16th, 2023 until March 1st, 2024 on MUX Protocol’s daily volume is $2.2B. MUX Protocol’s total volume in this period is $15.2B, so according to our analysis, 15% of the total volume can be attributed to the STIP. Since MUX Protocol received a total of 6M ARB valued at around $7.2M (at 1.2$ per ARB), this means that the STIP created $308.33 in volume per dollar spent for the 107 days analyzed.

Vela Exchange

The total estimated impact of the STIP from December 27th, 2023 until March 1st, 2024 on Vela Exchange’s daily volume was minus $456M, but this impact was not deemed statistically significant. So effectively, the analysis did not find the STIP to have any significant impact, positive or negative, on Vela Exchange’s daily volume. In this context, it means that the observed impact of the STIP on the protocol’s daily volume could simply be due to random fluctuations rather than a real effect.

Main Takeaways

Our analysis, conducted with a 90% significance level, produced interesting results for the impact of the STIP on Vertex, Gains Network, GMX, and MUX Protocol. The analysis deemed the impact on Vela Exchange to not be statistically significant, which means that we couldn’t confidently say that the STIP caused a noticeable change in the protocol’s daily volume, but rather the variations we see could just be due to market behavior. A further explanation of how this and all results were derived can be found in the Methodology and Annex sections.

To interpret and understand the results, it is important to have an overview of the incentive mechanisms utilized by the different protocols.

GMX utilized their STIP ARB incentives by focusing primarily on maximizing TVL to ensure adequate liquidity, which is crucial for providing a good trading experience given their AMM design. Additionally, they also reduced trading fees on the decentralized perpetual exchange to levels comparable to the VIP tiers of leading centralized exchanges. Traders on GMX v2 benefited from a rebate of up to 75% on open and close fees, thanks to the STIP incentives, attracting users with a minimal entry fee of 0.015%. In total, 4,984,768.84 ARB was distributed as trading incentives, with the remaining incentives including liquidity and grants incentives. To further boost engagement, GMX ran a two-week trading competition to attract new traders to the V2 platform, though they do not plan to use bridge incentives for future competitions.

Vertex focused their STIP ARB incentives on KPIs such as monthly trading volume, monthly active users, on-chain activity, and TVL. Their first round of incentives targeted two main areas: trading rewards and Elixir LP Pools. In total, 3 million ARB was allocated across 16 weekly epochs, with 2.55 million ARB dedicated to Vertex trading incentives and 450,000 ARB to Elixir liquidity incentives. Additionally, Vertex matched the STIP with a rewards program, offering dual incentives for trading with approximately 10 million VRTX tokens allocated to each epoch. The data indicates that providing trader rebates significantly boosts on-chain usage of perpetuals.

During the STIP period, Gains Network implemented quite different incentive streams through a points system, mainly rewarding traders for behaviors such as fees paid, absolute PnL, loyalty, and relative PnL. These rewards were distributed weekly, with different allocations for each category. While this gamification attracted engagement, it also led to sybil attempts, especially in the relative PnL category, where actors tried to game the system with delta-neutral positions to extract ARB from the reward pools. Consequently, the relative PnL category was dropped during the program. For the STIP campaign, Gains Network allocated 85% of incentives to trading and 15% to LP incentives, distributing a total of 3.825 million tokens in trading incentives. Additionally, Gains Network provided a partial match of 65,000 GNS tokens to LP incentives, to further boost the incentive program.

MUX Protocol offered rebates of up to 100% on open and close fees for all integrated protocols on the MUX aggregator. This strategy aimed to aggressively onboard more traders to Arbitrum. The total STIP amount was used in the Rebate Program, where traders who opened and closed positions through the MUX Aggregator received weekly ARB token rebates for fees incurred on MUX, GMX V1, GMX V2, and Gains positions on Arbitrum.

Vela Exchange ran a gamified trading competition, the Grand Prix. Throughout the Grand Prix, users competed in five themed rounds, each offering new challenges and opportunities to earn credits, the event’s currency. Liquidity providers and yield farmers benefited from limited-time events where their contributions to VLP minting earned them greater credit multipliers. The grant breakdown for the STIP campaign included 150,000 ARB for multi-chain and fiat onboarding, 500,000 ARB for developing social features and trading leagues, and 350,000 ARB for VLP vault rewards. To prevent wash trading, incentives in the trading leagues were capped based on fees earned and focused on PnL, with volume playing a secondary role.

Vertex saw its daily volume impacted the most compared to the other perp DEXs analyzed. Our analysis attributes 70% of the project’s total volume to the STIP while having received a relatively low amount of ARB tokens. Vertex is followed by GMX, where 43% of the total volume was attributed to the STIP. GMX, however, received the largest amount of ARB tokens of any other protocol so the added volume per dollar spent is naturally smaller. The negative impact of the STIP on Gains Network could be attributed to the incentive mechanism having involved social trading competitions instead of rebating fees and direct rewards. Both Gains Network and Vela Exchange implemented gamified points systems instead of traditional trading rewards and fee rebates, however, according to our analysis, these strategies were less effective in boosting volume beyond the general market trend.

Having said that, it’s essential to acknowledge the limitations inherent in our models, which are only as reliable as the data available. Numerous factors can drastically influence outcomes, making it challenging to isolate the effects of a single intervention. This is particularly true and disproportionate in the crypto industry. Other relevant secondary factors possibly contributing to the differing results among perp DEXs can include traders’ mercenary activity and the cannibalization of trading volume.

Given these complexities, our results should be interpreted comparatively rather than absolutely. The SC methodology was uniformly applied across all protocols, allowing us to gauge the relative efficacy of the STIP allocation.

Appendix

Methodology

TL;DR: We employed an analytical approach known as the Synthetic Control (SC) method. The SC method is a statistical technique utilized to estimate causal effects resulting from binary treatments within observational panel (longitudinal) data. Regarded as a groundbreaking innovation in policy evaluation, this method has garnered significant attention in multiple fields. At its core, the SC method creates an artificial control group by aggregating untreated units in a manner that replicates the characteristics of the treated units before the intervention (treatment). This synthetic control serves as the counterfactual for a treatment unit, with the treatment effect estimate being the disparity between the observed outcome in the post-treatment period and that of the synthetic control. In the context of our analysis, this model incorporates market dynamics by leveraging data from other protocols (untreated units). Thus, changes in market conditions are expected to manifest in the metrics of other protocols, thereby inherently accounting for these external trends and allowing us to explore whether the reactions of the protocols in the analysis differ post-STIP implementation.

To achieve the described goals, we turned to causal inference. Knowing that “association is not causation”, the study of causal inference lies in techniques that try to figure out how to make association be causation. The classic notation of causality analysis revolves around a certain treatment , which doesn’t need to be related to the medical field, but rather is a generalized term used to denote an intervention for which we want to study the effect. We typically consider the treatment intake for unit i, which is 1 if unit i received the treatment and 0 otherwise. Typically there is an , the observed outcome variable for unit i. This is our variable of interest, i.e., we want to understand what the influence of the treatment on this outcome was. The fundamental problem of causal inference is that one can never observe the same unit with and without treatment, so we express this in terms of potential outcomes. We are interested in what would have happened in the case some treatment was taken. It is common to call the potential outcome that happened the factual, and the one that didn’t happen, the counterfactual. We will use the following notation:

- the potential outcome for unit i without treatment

- the potential outcome for the same unit i with the treatment.

With these potential outcomes, we define the individual treatment effect to be . Because of the fundamental problem of causal inference, we will actually never know the individual treatment effect because only one of the potential outcomes is observed.

One technique used to tackle this is Difference-in-Difference (or diff-in-diff). It is commonly used to analyze the effect of macro interventions such as the effect of immigration on unemployment, the effect of law changes in crime rates, but also the impact of marketing campaigns on user engagement. There is always a period before and after the intervention and the goal is to extract the impact of the intervention from a general trend. Let be the potential outcome for treatment D on period T (0 for pre-intervention and 1 for post-intervention). Ideally, we would have the ability to observe the counterfactual and estimate the effect of an intervention as: , the causal effect being the outcome in the period post-intervention in the case of a treatment minus the outcome in the same period in the case of no treatment. Naturally, is counterfactual so it can’t be measured. If we take a before and after comparison, - we can’t really say anything about the effect of the intervention because there could be other external trends affecting that outcome.

The idea of diff-in-diff is to compare the treated group with an untreated group that didn’t get the intervention by replacing the missing counterfactual as such: . We take the treated unit before the intervention and add a trend component to it, which is estimated using the control . We are basically saying that the treated unit after the intervention, had it not been treated, would look like the treated unit before the treatment plus a growth factor that is the same as the growth of the control.

An important thing to note here is that this method assumes that the trends in the treatment and control are the same. If the growth trend from the treated unit is different from the trend of the control unit, diff-in-diff will be biased. So, instead of trying to find a single untreated unit that is very similar to the treated, we can forge our own as a combination of multiple untreated units, creating a synthetic control.

That is the intuitive idea behind using synthetic control for causal inference. Assuming we have units and unit 1 is affected by an intervention. Units are a collection of untreated units, that we will refer to as the “donor pool”. Our data spans T time periods, with periods before the intervention. For each unit j and each time t, we observe the outcome . We define as the potential outcome without intervention and the potential outcome with intervention. Then, the effect for the treated unit at time t, for is defined as . Here is factual but is not. The challenge lies in estimating .

image
Source: 15 - Synthetic Control — Causal Inference for the Brave and True

Since the treatment effect is defined for each period, it doesn’t need to be instantaneous, it can accumulate or dissipate. The problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of the treated unit, had it not been treated.

The most straightforward approach is to consider that a combination of units in the donor pool may approximate the characteristics of the treated unit better than any untreated unit alone. So we define the synthetic control as a weighted average of the units in the control pool. Given the weights the synthetic control estimate of is .

We can estimate the optimal weights with OLS like in any typical linear regression. We can minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period. Hence, creating a “fake” unit that resembles the treated unit before the intervention, so we can see how it would behave in the post-intervention period.

In the context of our analysis, this means that we can include all other perp DEX protocols that did not receive the STIP in our donor pool and estimate a “fake”, synthetic, control perp DEX protocol that follows the trend of any particular one we want to study in the period before receiving the STIP. As mentioned before, the metric of interest chosen for this analysis was daily volume and, in particular, we calculated the 7-day moving average to smooth the data. Then we can compare the behavior of our synthetic control with the factual and estimate the impact of the STIP by taking the difference. We are essentially comparing what would have happened, had the protocol not received the STIP with what actually happened.

However, sometimes regression leads to extrapolation, i.e., values that are outside of the range of our initial data and can possibly not make sense in our context. This happened when estimating our synthetic control, so we constrained the model to do only interpolation. This means we restrict the weights to be positive and sum up to one so that the synthetic control is a convex combination of the units in the donor pool. Hence, the treated unit is projected in the convex hull defined by the untreated unit. This means that there probably won’t be a perfect match of the treated unit in the pre-intervention period and that it can be sparse, as the wall of the convex hull will sometimes be defined only by a few units. This works well because we don’t want to overfit the data. It is understood that we will never be able to know with certainty what would have happened without the intervention, just that under the assumptions we can make statistical conclusions.

Formalizing interpolation, the synthetic control is still defined in the same way by . But now we use the weights that minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period , subject to the restriction that are positive and sum to one.

We get the optimal weights using quadratic programming optimization with the described constraints on the pre-STIP period and then use these weights to calculate the synthetic control for the total duration of time we are interested in. We initialized the optimization for each analysis with different starting weight vectors to avoid introducing bias in the model and getting stuck in local minima. We selected the one that minimized the square difference in the pre-intervention period.

As an example, below is the resulting chart for Vertex, showing the factual daily volume observed in Vertex and the synthetic control.

With the synthetic control, we can then estimate the effect of the STIP as the gap between the factual protocol daily volume and the synthetic control, .

To understand whether the result is statistically significant and not just a possible result we got due to randomness, we use the idea of Fisher’s Exact Test. We permute the treated and control units exhaustively by, for each unit, pretending it is the treated one while the others are the control. We create one synthetic control and effect estimates for each protocol, pretending that the STIP was given to another protocol to calculate the estimated impact for this treatment that didn’t happen. If the impact in the protocol of interest is sufficiently larger when compared to the other fake treatments (“placebos”), we can say our result is statistically significant and there is indeed an observable impact of the STIP on the protocol’s daily volume. The idea is that if there was no STIP in the other protocols and we used the same model to pretend that there was, we wouldn’t see any impact.

It is expected that the variance after the intervention will be higher than the variance before the intervention since the synthetic control is designed to minimize the difference in the pre-intervention period. This can be seen in the chart below. Some protocols don’t fit well at all even in the pre-intervention period when no convex combination matches them, so they were removed from the analysis by setting a threshold for pre-intervention error.

With this test, we see that if we pretend the STIP was given to another protocol, we would almost never get an effect so extreme as the one we got with Vertex. For the other perp DEXs in the STIP, this was not always the case, especially after March 2024. For that reason, to maintain statistical significance at 90%, we restricted the analysis to the impact observed until March 1st.

References

“Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.”

Aayush Agrawal - Causal inference with Synthetic Control using Python and SparseSC

01 - Introduction To Causality — Causal Inference for the Brave and True

Annex

The Annex describes the data used and shows intermediary charts used in the analysis.

The data gathered to evaluate the effect of the STIP on each protocol’s daily volume includes data on daily volume from multiple protocols across a few months. There is a balance between having a long enough timeline of historical data and enough protocols to compare with. For instance, GMX was launched before 2022 but we chose to use only data from 2023 to allow for a larger donor pool. Protocols that also received the STIP were dropped from the analysis. The 7-day moving average was used to smooth out the time series.

Vertex

Protocols used in the donor pool: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), dYdX, GMX (avax).

Gains Network

Protocols used: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), Gains Network (polygon), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), KTX.Finance (bsc), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), HoldStation DeFutures (era), dYdX, GMX (avax).

GMX

Protocols used in the donor pool: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), dYdX, GMX (avax).

MUX Protocol

Protocols used in the donor pool: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), dYdX, GMX (avax).

Vela Exchange

Protocols used in the donor pool: Aevo (ethereum), ApeX Protocol (ethereum), APX Finance (bsc), Based Markets (base), Beamex (moonbeam), BLEX (arbitrum), Drift (solana), Fulcrom (cronos), Gains Network (polygon), HMX (arbitrum), GooseFX (solana), Hyperliquid (hyperliquid), ImmortalX (celo), KiloEx (bsc), KTX.Finance (bsc), IPOR (ethereum), Level Finance (bsc), Level Finance (arbitrum), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Pinnako (era), Polynomial Trade (optimism), Synthetix (optimism), UniDex (optimism), SpaceDex (bsc), UniDex (era), UniDex (arbitrum), UniDex (fantom), UrDEX Finance (arbitrum), Vela Exchange (arbitrum), HoldStation DeFutures (era), dYdX, GMX (avax).

2 Likes

STIP Retroactive Analysis – Spot DEX TVL

The below research report is also available in document format here .

TL;DR

In H2 2023, Arbitrum launched the Short-Term Incentive Program (STIP) by distributing millions of ARB tokens to various protocols to drive user engagement. This report focuses on how the STIP impacted TVL in the spot DEX vertical, specifically examining the performance of Balancer, Camelot, Ramses, Trader Joe and WOOFi. By employing the Synthetic Control (SC) causal inference method to create a “synthetic” control group, we aimed to isolate the STIP’s effect from broader market trends. For each protocol the analysis focuses on the median TVL in the period from the first day of the STIP to two weeks after the STIP ended, in an effort to include at least two weeks of persistence in the analysis.

Our analysis yielded varied results: WOOFi saw a significant 62.5% of its median TVL in the analyzed period attributed to STIP, while Camelot saw 37.1%. Balancer and Trader Joe also benefited, both with around 12% of their TVL linked to STIP incentives. During the STIP period and the two weeks following its conclusion, the added TVL per dollar spent on incentives was approximately $12 for both Balancer and Camelot, $7 for WOOFi ($25 when considering only the incentives given directly to LPs), and $2 for Trader Joe. Our model showed no statistically significant impact for Ramses.

Spot DEXs on Arbitrum focus on enhancing liquidity for native and multi-chain projects, helping them bootstrap and build liquidity sustainably. Protocols achieve this through liquidity incentives, using either activity-based formulas or more traditional methods for allocation. Our analysis showed that different incentive distribution methods had a similar impact on TVL across protocols like Balancer and Camelot. However, Trader Joe’s strategy was less effective due to shorter incentivization periods, as also identified by the team.

WOOFi’s results varied depending on whether we considered total incentives or those just for liquidity providers. While it underperformed in added TVL per dollar spent compared to Camelot and Balancer, it excelled in liquidity-specific incentives. Additionally, incentives for other activities like swaps may indirectly boost TVL. Note that a price manipulation attack on March 5th could have impacted WOOFi’s confidence.

Ramses showed a substantial increase in TVL following the STIP, suggesting a possible delayed positive impact despite a lack of immediate statistical significance.

Our methodology and detailed results underscore the complexities of measuring such interventions in a volatile market, stressing the importance of comparative analysis to understand the true impact of incentive programs like STIP.

Context and Goals

In H2 2023, Arbitrum initiated a significant undertaking by distributing millions of ARB tokens to protocols as part of the Short-Term Incentive Program (STIP), aiming to spur user engagement. This program allocated varying amounts to diverse protocols across different verticals. Our objective is to gauge the efficacy of these recipient protocols in leveraging their STIP allocations to boost the usage of their products. The challenge lies in accurately gauging the impact of the STIP amidst a backdrop of various factors, including broader market conditions.

This report pertains to the spot DEX vertical in particular. In this vertical, the STIP recipients were Balancer, Camelot, Ramses, Trader Joe and WOOFi. The following table summarizes the amount of ARB tokens received and when they were used by each protocol.

To assess the impact of the STIP on spot DEX protocols, TVL is a crucial metric. While trading volume is also important for evaluating a DEX’s performance, the primary focus of these AMMs within the STIP was on enhancing and sustaining liquidity for both new and established projects in the ecosystem. The goal was to ensure that liquidity was readily available to improve efficiency and reduce slippage. Most of these projects used all the incentives allocated to them to attract liquidity providers, making TVL the most directly impacted metric. However, a separate analysis would be valuable to understand how this increase in TVL translated into further activities, such as trading volume on the DEX, for a more comprehensive understanding. Throughout the report, the 7-day moving average (MA) TVL was used, so any mention of TVL should be understood as the 7-day MA TVL.

We used a Causal Inference method called Synthetic Control (SC) to analyze our data. This technique helps us understand the effects of a specific event by comparing our variable of interest to a “synthetic” control group. Here’s a short breakdown:

  • Purpose: SC estimates the impact of a particular event or intervention using data over time.
  • How It Works: It creates a fake control group by combining data from similar but unaffected groups. This synthetic group mirrors the affected group before the event.
  • Why It Matters: By comparing the real outcomes with this synthetic control, we can see the isolated effect of the event.

In our analysis, we use data from other protocols to account for market trends. This way, we can better understand how protocols react to changes, like the implementation of the STIP, by comparing their performance against these market-influenced synthetic controls. The results pertain to the period from the start of each protocol’s use of the STIP until two weeks after the STIP had ended.

Results

Balancer

Balancer launched its v1 in early 2020, being in Arbitrum since Q3 2021. Balancer’s KPIs included TVL, daily protocol fees and volume, all of which have increased during the STIP. The totality of the received funds were allocated to liquidity providers through an incentive system developed for the STIP. The grant aimed to boost economic activity on Arbitrum by creating an autonomous mechanism for distributing ARB incentives to enhance Balancer liquidity across the network. This incentive program distributed 41,142.65 ARB per week based on veBAL voting for pools on Arbitrum. The vote weight per pool was multiplied by a boost factor, and ARB was then distributed to all pools based on their relative boosted weight. Pools were capped at 10% of the total weekly ARB, except for ETH-based LSD stableswap pools, which were capped at 20%.

Balancer’s TVL increased from approximately $111M on November 2 2023, when the STIP started, to a peak of $193M in March 2024. By the end of the STIP on March 22, the TVL was at $156M. One week later, on March 29, the TVL had decreased to $145M. Two weeks later, on April 5, it was $135M, and by May 5, it had further dropped to $90M.

Overall, there was a 41% increase in TVL when comparing the periods before and after the STIP. Comparing the start of the STIP to one week after its end, there was a 31% increase, and a 21% increase two weeks after the STIP ended.

The first chart below compares Balancer’s TVL with the modeled synthetic control. The second chart highlights the impact of the STIP by showing the difference between Balancer’s TVL and the synthetic control. For more details, see the Methodology section.

The median impact of the STIP on Balancer’s TVL, from its start on November 2, 2023, to its end on March 22, 2024, was $17.3M. Including the TVL two weeks after the STIP concluded, the median impact was $17.6M. Balancer received a total of 1.2M ARB, valued at approximately $1.44M (at $1.20 per ARB). This indicates that the STIP generated an average of $12.27 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 90% statistical significance.

Camelot

Camelot’s KPIs for the STIP included TVL, volume, and fees. Their incentive allocation strategy prioritized LP returns by incentivizing more than 75 different pools from various Arbitrum protocols. The team focused on promoting liquidity in a diverse mix of pools, including Arbitrum OGs, smaller protocols, newcomers, and established projects from other ecosystems. The ARB distribution followed the same logic used for their own GRAIL emissions, aimed at ensuring a consistent and strategic approach to incentivization.

Camelot’s TVL increased from approximately $82M on November 14 2024, when the STIP began, to a peak of $150M in March 2024. By the end of the STIP on March 29, the TVL was $136M. One week later, on April 5, it was $131.6M. Two weeks after the STIP ended, on April 12, the TVL stood at $126M, and one month later, it was $105.5M.

Overall, there was a 66% increase in TVL comparing the periods before and after the STIP. Comparing the start of the STIP to one week after its end, there was a 60% increase, and a 53% increase two weeks after the STIP concluded.

The median impact of the STIP on Camelot’s TVL from its start on November 14, 2023, to its end on March 29, 2024, was $42.7M. When including the TVL two weeks after the STIP ended, the impact increased to $44M. Camelot received a total of 3.09M ARB, of which 65,450 ARB were returned, resulting in a final total of 3,024,550 ARB, valued at approximately $3.63M (at $1.20 per ARB). This indicates that the STIP generated an average of $12.12 in TVL per dollar spent during the STIP and the two weeks following it. These results were obtained with 95% statistical significance.

Ramses

Ramses distributed its total incentives to liquidity providers across various pools using a two-part strategy. Fifty percent of the incentives were allocated based on fees generated in the previous epoch, while the remaining fifty percent were distributed at the team’s discretion, targeting protocols that needed bootstrapping and lacked sufficient liquidity support from other sources. This approach aimed to strengthen the overall Arbitrum ecosystem by balancing support for established and emerging projects.

Ramses’ TVL increased from approximately $8.2M on December 27, 2023, when the STIP began, to over $25M at its peak in June 2024. When the STIP ended on March 20, 2024, the TVL was $10.7M. One week later, on March 27, the TVL had dropped slightly to $8.1M. However, two weeks after the STIP concluded, on April 3, the TVL had risen to $10.9M, and one month later, on April 20, it reached $12.5M. This represents a 30% increase in TVL from the period before the STIP to the period after it. Comparing the start of the STIP with the TVL one week after its end shows virtually no change, while there was a 33% increase two weeks after the STIP ended.

The STIP’s impact on Ramses was not statistically significant, which means we cannot conclude that the observed results were caused by the STIP rather than occurring by chance. As a result, no clear conclusions about the magnitude of the STIP’s effect on Ramses can be drawn with this analysis.

Trader Joe

In the STIP Addendum, Trader Joe explained that while the main goal of the grant was to incentivize long-tail assets (builders) within the Arbitrum ecosystem, it quickly became a cat-and-mouse game due to intense yield competition and high demand for liquidity. The protocol found that spreading efforts across a wide range of protocols and using a rotating incentive program with concentrated rewards over short periods was not the most effective approach for allocating incentives.

Trader Joe’s TVL increased from approximately $27.5M on November 4, 2023, when the STIP started, to nearly $50M at its peak in March 2024. By the end of the STIP on March 29, 2024, TVL had reached $42.4M. One week later, on April 5, TVL was $41.4M, and two weeks after the STIP ended, on April 12, it had decreased to $37.5M. One month after the STIP concluded, on April 29, the TVL was $28.3M. This represents a 54% increase in TVL from the period before the STIP to the period after it. Compared to the start of the STIP, TVL showed a 51% increase one week after the STIP ended and a 37% increase two weeks after the STIP concluded.

The median impact of the STIP on Trader Joe’s TVL from the start date on November 4, 2023, to the end date on March 29, 2024, was $3.7M. When also accounting for the TVL two weeks after the STIP ended, the total impact increased to $3.9M. Trader Joe received a total of 1.51M ARB, which was valued at approximately $1.81M (based on a $1.20 per ARB rate). This implies that the STIP generated an average of $2.31 in TVL for every dollar spent on incentives during the STIP period and the subsequent two weeks. These results are significant at an 85% confidence level.

WOOFi

WOOFi is the DeFi arm of the WOO ecosystem, functioning as a DEX that bridges the liquidity of the WOO X centralized exchange on-chain. Unlike the other protocols in this analysis, WOOFi did not allocate the total received ARB directly to liquidity incentives. WOOFi’s KPIs focused on several metrics: WOOFi Earn TVL, WOOFi Stake TVL, monthly swap volume, monthly perps volume, and the number of Arbitrum inbound cross-chain swaps. According to the Grant Information here and the STIP Addendum here, the ARB allocation was divided as follows: 30% to WOOFi Earn, 20% to WOOFi Pro, 15% to Arbitrum-inbound cross-chain swaps, 15% to WOOFi Stake, 10% to WOOFi Swap, and 10% to Quests & Cross-Protocol Integration. Additionally, approximately 65k ARB remained unused and was returned to the DAO.

To ensure consistency with the other analyzed protocols, this report focuses exclusively on WOOFi Earn’s TVL. Therefore, whenever WOOFi’s TVL is mentioned, it specifically refers to WOOFi Earn’s TVL. It is relevant to note that WOOFi Earn functions primarily as a yield aggregator product. While analyzing WOOFi Earn in isolation might also warrant a comparison to other yield aggregator protocols rather than spot DEXs, the core of WOOFi’s business is its spot DEX. The Earn feature was designed to support liquidity for on-chain swaps, which is why it has been included in this analysis.

WOOFi’s TVL grew from approximately $4.5M on December 26, 2023, when the STIP began, to around $15.4M at its peak in February 2024. By the end of the STIP on March 29, 2024, the TVL was $12.6M. One week later, on April 5, the TVL was $12.3M, and two weeks after the STIP ended, on April 12, it dropped to $9.5M. By one month after the STIP concluded, on April 29, the TVL had decreased to $5.9M.

This represents a 181% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 175% increase, while two weeks after the STIP ended, the increase was 111%.

The median impact of the STIP on WOOFi’s TVL from its start on December 26, 2023, to its end on March 29, 2024, was $8.8M. When accounting for the TVL two weeks after the STIP concluded, the total median impact was $8.3M. WOOFi received a total of 1M ARB, of which approximately 65,000 ARB remained unused, resulting in 935,000 ARB valued at about $1.12M (at $1.20 per ARB). This means the STIP generated an average of $7.4 in TVL for every dollar spent on ARB incentives during the program and the following month. These results were gathered with 90% statistical significance.

Main Takeaways

Our analysis produced interesting results for the impact of the STIP on Balancer, Camelot, Trader Joe and WOOFi. The analysis deemed the impact on Ramses to not be statistically significant, which means that we couldn’t confidently say that STIP caused a noticeable change in the protocol’s TVL, but rather the variations we see could potentially just be due to market behavior. A further explanation of how this and all results were derived can be found in the Methodology section.

Spot DEXs serve a fundamentally different purpose than other protocols such as perp DEXs, where successful incentive campaigns primarily focus on boosting trading volume and fees. For a spot DEX on Arbitrum, the goal is to enhance and support the liquidity of both native Arbitrum projects and multi-chain projects aligned with Arbitrum by increasing liquidity for their tokens. This approach helps new, growing, and established projects on Arbitrum bootstrap and build liquidity in a sustainable and capital-efficient manner.

The protocols we analyzed aimed to achieve this goal mainly by offering liquidity incentives to their providers. Some protocols developed activity-based formulas to determine a relative allocation between pools, while others employed more traditional methods, such as evaluating allocations on a weekly or biweekly basis and fixing them for the next period.

Our analysis showed that different methods of distributing incentives across various pools did not seem to substantially impact the effectiveness of the STIP. The ability to generate TVL appeared similar across protocols. For instance, both Balancer and Camelot demonstrated comparable added TVL per dollar of ARB spent. However, Trader Joe’s strategy was deemed less effective due to the short duration of incentivization in specific pools, as also identified by the team. Protocols can learn from this experience when designing future incentive programs.

WOOFi presents a different case with results that vary substantially depending on whether we consider the total incentives or just those allocated directly to WOOFi’s liquidity providers. Compared to Camelot and Balancer, WOOFi underperforms in terms of added TVL per dollar spent on incentives. However, it excels when focusing solely on liquidity incentives. This disparity makes direct comparisons challenging, but it suggests that incentives for other activities on the platform may indirectly boost TVL and could be beneficial. For example, offering incentives for swaps can increase trading volume, which in turn raises yields in those pools and attracts more TVL. It’s also worth noting that on March 5th, WOOFi experienced a price manipulation attack resulting in an $8.75M loss from its synthetic proactive market making (sPMM). Although WOOFi Earn was not directly affected, this incident likely shook user confidence in the protocol for a period following the attack.

The exact impact on Ramses couldn’t be assessed due to a lack of statistical significance. However, it’s noteworthy that TVL increased substantially in the months following the STIP. This may suggest that, while the immediate impact of the STIP couldn’t be determined during the analysis period, it may have had a delayed positive effect.

Lastly, it’s essential to acknowledge the limitations inherent in our models, which are only as reliable as the data available. Numerous factors can drastically influence outcomes, making it challenging to isolate the effects of a single intervention. This is particularly true and disproportionate in the crypto industry.

Given these complexities, our results should be interpreted comparatively rather than absolutely. The SC methodology was uniformly applied across all protocols, allowing us to gauge the relative efficacy of the STIP allocation.

Appendix

Methodology

TLDR: We employed an analytical approach known as the Synthetic Control (SC) method. The SC method is a statistical technique utilized to estimate causal effects resulting from binary treatments within observational panel (longitudinal) data. Regarded as a groundbreaking innovation in policy evaluation, this method has garnered significant attention in multiple fields. At its core, the SC method creates an artificial control group by aggregating untreated units in a manner that replicates the characteristics of the treated units before the intervention (treatment). This synthetic control serves as the counterfactual for a treatment unit, with the treatment effect estimate being the disparity between the observed outcome in the post-treatment period and that of the synthetic control. In the context of our analysis, this model incorporates market dynamics by leveraging data from other protocols (untreated units). Thus, changes in market conditions are expected to manifest in the metrics of other protocols, thereby inherently accounting for these external trends and allowing us to explore whether the reactions of the protocols in the analysis differ post-STIP implementation.

To achieve the described goals, we turned to causal inference. Knowing that “association is not causation”, the study of causal inference lies in techniques that try to figure out how to make association be causation. The classic notation of causality analysis revolves around a certain treatment , which doesn’t need to be related to the medical field, but rather is a generalized term used to denote an intervention for which we want to study the effect. We typically consider the treatment intake for unit i, which is 1 if unit i received the treatment and 0 otherwise. Typically there is an , the observed outcome variable for unit i. This is our variable of interest, i.e., we want to understand what the influence of the treatment on this outcome was. The fundamental problem of causal inference is that one can never observe the same unit with and without treatment, so we express this in terms of potential outcomes. We are interested in what would have happened in the case some treatment was taken. It is common to call the potential outcome that happened the factual, and the one that didn’t happen, the counterfactual. We will use the following notation:

- the potential outcome for unit i without treatment

- the potential outcome for the same unit i with the treatment.

With these potential outcomes, we define the individual treatment effect to be . Because of the fundamental problem of causal inference, we will actually never know the individual treatment effect because only one of the potential outcomes is observed.

One technique used to tackle this is Difference-in-Difference (or diff-in-diff). It is commonly used to analyze the effect of macro interventions such as the effect of immigration on unemployment, the effect of law changes in crime rates, but also the impact of marketing campaigns on user engagement. There is always a period before and after the intervention and the goal is to extract the impact of the intervention from a general trend. Let be the potential outcome for treatment D on period T (0 for pre-intervention and 1 for post-intervention). Ideally, we would have the ability to observe the counterfactual and estimate the effect of an intervention as: #0), the causal effect being the outcome in the period post-intervention in the case of a treatment minus the outcome in the same period in the case of no treatment. Naturally, is counterfactual so it can’t be measured. If we take a before and after comparison, -E%5BY(0)%7CD%3D1)#0) we can’t really say anything about the effect of the intervention because there could be other external trends affecting that outcome.

The idea of diff-in-diff is to compare the treated group with an untreated group that didn’t get the intervention by replacing the missing counterfactual as such: . We take the treated unit before the intervention and add a trend component to it, which is estimated using the control . We are basically saying that the treated unit after the intervention, had it not been treated, would look like the treated unit before the treatment plus a growth factor that is the same as the growth of the control.

An important thing to note here is that this method assumes that the trends in the treatment and control are the same. If the growth trend from the treated unit is different from the trend of the control unit, diff-in-diff will be biased. So, instead of trying to find a single untreated unit that is very similar to the treated, we can forge our own as a combination of multiple untreated units, creating a synthetic control.

That is the intuitive idea behind using synthetic control for causal inference. Assuming we have units and unit 1 is affected by an intervention. Units are a collection of untreated units, that we will refer to as the “donor pool”. Our data spans T time periods, with periods before the intervention. For each unit j and each time t, we observe the outcome . We define as the potential outcome without intervention and the potential outcome with intervention. Then, the effect for the treated unit at time t, for is defined as . Here is factual but is not. The challenge lies in estimating .

Source: 15 - Synthetic Control — Causal Inference for the Brave and True

Since the treatment effect is defined for each period, it doesn’t need to be instantaneous, it can accumulate or dissipate. The problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of the treated unit, had it not been treated.

The most straightforward approach is to consider that a combination of units in the donor pool may approximate the characteristics of the treated unit better than any untreated unit alone. So we define the synthetic control as a weighted average of the units in the control pool. Given the weights the synthetic control estimate of is .

We can estimate the optimal weights with OLS like in any typical linear regression. We can minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period. Hence, creating a “fake” unit that resembles the treated unit before the intervention, so we can see how it would behave in the post-intervention period.

In the context of our analysis, this means that we can include all other spot DEX protocols that did not receive the STIP in our donor pool and estimate a “fake”, synthetic, control spot DEX protocol that follows the trend of any particular one we want to study in the period before receiving the STIP. As mentioned before, the metric of interest chosen for this analysis was TVL and, in particular, we calculated the 7-day moving average to smooth the data. Then we can compare the behavior of our synthetic control with the factual and estimate the impact of the STIP by taking the difference. We are essentially comparing what would have happened, had the protocol not received the STIP with what actually happened.

However, sometimes regression leads to extrapolation, i.e., values that are outside of the range of our initial data and can possibly not make sense in our context. This happened when estimating our synthetic control, so we constrained the model to do only interpolation. This means we restrict the weights to be positive and sum up to one so that the synthetic control is a convex combination of the units in the donor pool. Hence, the treated unit is projected in the convex hull defined by the untreated unit. This means that there probably won’t be a perfect match of the treated unit in the pre-intervention period and that it can be sparse, as the wall of the convex hull will sometimes be defined only by a few units. This works well because we don’t want to overfit the data. It is understood that we will never be able to know with certainty what would have happened without the intervention, just that under the assumptions we can make statistical conclusions.

Formalizing interpolation, the synthetic control is still defined in the same way by . But now we use the weights that minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period , subject to the restriction that are positive and sum to one.

We get the optimal weights using quadratic programming optimization with the described constraints on the pre-STIP period and then use these weights to calculate the synthetic control for the total duration of time we are interested in. We initialized the optimization for each analysis with different starting weight vectors to avoid introducing bias in the model and getting stuck in local minima. We selected the one that minimized the square difference in the pre-intervention period.

As an example, below is the resulting chart for Camelot, showing the factual TVL observed in Camelot and the synthetic control.

With the synthetic control, we can then estimate the effect of the STIP as the gap between the factual protocol TVL and the synthetic control, .

To understand whether the result is statistically significant and not just a possible result we got due to randomness, we use the idea of Fisher’s Exact Test. We permute the treated and control units exhaustively by, for each unit, pretending it is the treated one while the others are the control. We create one synthetic control and effect estimates for each protocol, pretending that the STIP was given to another protocol to calculate the estimated impact for this treatment that didn’t happen. If the impact in the protocol of interest is sufficiently larger when compared to the other fake treatments (“placebos”), we can say our result is statistically significant and there is indeed an observable impact of the STIP on the protocol’s TVL. The idea is that if there was no STIP in the other protocols and we used the same model to pretend that there was, we wouldn’t see any impact.

References

“Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.”

Aayush Agrawal - Causal inference with Synthetic Control using Python and SparseSC

01 - Introduction To Causality — Causal Inference for the Brave and True

3 Likes

STIP Analysis of Operations and Incentive Mechanisms

The below research report is also available in document format here .

Introduction

The following analysis presents an overview of the final STIP fund allocation across different incentive mechanisms, focusing on any differences in growth trends and their sustainability for perp DEX and spot DEX protocols, while comparing activity on Arbitrum against other relevant ecosystems. Moreover, this analysis covers broader themes and recurring developments that have emerged through STIP applications, updates, recipients’ performance, and discussions with recipient teams.

TL;DR

  • Following changes made to incentive allocations throughout the STIP, the most popular high-level mechanisms utilized were standard liquidity incentives (~30% of total allocation), fee rebates (~25% of total allocation), trading/points/usage-based programs (~12% of total allocation), liquidity incentives for native token(s) in pools on partner protocols (~8% of total allocation), and liquidity incentives with optional/required long-term/perpetual capital locking (~8% of total allocation).
  • Most spot and perp DEX STIP recipients’ top-line metrics fell notably after the STIP ended and are currently around levels seen in September 2023. There are a few outperformers, generally younger, differentiated protocols that outgrew the market during the STIP and have successfully managed to maintain activity and capital in the long term.
  • Excluding the outperformers, the absolute change in TVL/Volume achieved per ARB utilized to increase these metrics directly has widely converged between protocols in the same verticals.
  • Overall, during the STIP, Arbitrum’s market share growth across major blockchains peaked at ~0% for TVL, ~5% for spot volume, ~12% for perp volume, and ~0% for loans outstanding. The market shares are currently at around September 2023 values, except for TVL, which is down from ~6% to ~4%.

Key Insights – The STIP’s Impact

Until now, we’ve examined how protocols within the perp DEX and spot DEX verticals have performed relative to each other within the same verticals. This offers insights into, e.g., any high-level differences in the effectiveness of different incentive mechanisms and growth themes for protocols across different maturities and sizes. However, this doesn’t allow us to say anything about the STIP’s effectiveness in maintaining or growing usage in aggregate. For example, let’s say market conditions would have drastically deteriorated during the STIP due to a systematic shock, causing all projects’ metrics to decrease throughout the program. Naively only looking at the performance of Arbitrum protocols would lead us to conclude that the program has been a failure, although what might have happened was that the total market size shrunk but the protocols’ share of the market increased, which most community members would likely objectively consider a success.

Everything else equal, it seems that the STIP successfully catalyzed a notable increase in Arbitrum’s DeFi market share across all major blockchains at the beginning of the incentive program. More specifically, it seems that ARB incentives for perp and spot DEXs, the two largest allocations, have been vast enough to meaningfully capture more of the total activity during the incentive period. However, activity began reverting in the latter half of the program, with Arbitrum’s market shares for spot volume, perp volume, and loans outstanding currently hovering around September 2023 levels. Based on the below graph, incentives were enough to sustain Arbitrum’s TVL market share at a steady level of ~6% for most of the program, but the ecosystem began losing capital relative to the rest of the market in the middle of February and currently has a ~4% market share.

Source: Artemis. Note: Major Blockchains include, where relevant, Aevo, Aptos, Avalanche C-Chain, Base, BNB Chain, Blast, dYdX, Ethereum, Fantom, Gnosis Chain, Hyperliquid, Near, Optimism, Osmosis, Polygon PoS, Scroll, Solana, StarkNet, Sui, zkSync Era

When comparing Arbitrum’s figures against Ethereum, Optimism, and Base, the trends are almost identical to those shown above, except for perp volume, which has sustained notably better. One major driver for this has likely been the successful bootstrapping of one of the newer perp DEXs on Arbitrum while protocols on the other blockchains have stagnated. This showcases Arbitrum’s strength within the perps vertical compared to the other major blockchains in the Ethereum ecosystem.

Source: Artemis

To summarize, all of the analyzed protocols saw their top-line metrics increase during the STIP, but in the months following the program’s end, figures trended back toward September 2023 values. There was some variability in how much capital/volume each protocol had managed to capture per ARB spent at the end of the STIP, but in the long term, these multiples tended to converge to a tight range. There are a few exceptions to this—protocols that are on the younger side and generally offer differentiated products. These protocols have successfully reached notably higher “steady states” compared to the beginning of the program, with incentives likely amplifying market penetration deriving from intrinsic drivers and on a few occasions, leading to more robust collaboration between the outperformers and other Arbitrum protocols, creating additional synergies. Although the data isn’t shown in this report, the money market vertical generally showcased similar trends as the perp and spot DEX verticals.

While a handful of protocols outperformed by sustainably growing activity, the overall increases in Arbitrum’s market shares across TVL and the major DeFi categories have largely reverted. In other words, the STIP doesn’t seem to have led to sustainable market capture in aggregate. However, there are other indirect returns that could be considered as well. For example:

  • Incentives are an implicit avenue for the DAO to convert ARB in the treasury to ETH through the sequencer margin as activity increases during incentive programs. Although, some might argue that this isn’t the most effective way to diversify the treasury.
  • If designed correctly, incentives could theoretically increase existing users’ loyalty and goodwill to Arbitrum. This is quite intangible and difficult to measure.
  • Incentives might attract new builders and protocols to the ecosystem. Although this is difficult to measure in such a short time period, there are tangible examples of protocols migrating to the ecosystem with Kwenta and Curve Lending both launching on Arbitrum with onboarding incentives. However, through our conversations with several protocols, it has also become clear that the incentive application process might be too complex for many smaller teams, and some projects have decided to not even consider launching on Arbitrum as they feel like receiving funds requires too much politics.
  • Intuitively, as incentives clearly increase activity, protocols benefit from earning more revenue, and it makes sense for Arbitrum protocols to also benefit as they are an integral part of the ecosystem. In contrast, the feedback we’ve received from some teams is that because meeting KPIs is such an important factor for being considered a successful STIP recipient, some projects have had to decrease their native fees to a minimum. It seems fair to say that in such cases, growth objectively isn’t sustainable, and might also hurt other protocols within the same vertical since they have to match a fee structure that isn’t profitable.

Lastly, it could be argued that had the STIP not happened, Arbitrum’s market shares across different metrics would be worse than what they are now. Such a counterfactual analysis carries many complexities and requires subjective interpretation, meaning that it isn’t possible to come to a result that can be considered the objective truth. Nevertheless, Blockworks Research has released two analyses employing the Synthetic Control causal inference method, which strives to compare what the performance of perp DEX and spot DEX STIP recipients has been during the program against what the performance would have been had the STIP not taken place.

Operational Observations

Throughout our analysis, certain wider-reaching themes and recurring developments have presented themselves, which are covered below. To begin with, incentive programs that ended notably earlier than others (i.e., February and early March) generally experienced vast capital/user outflows relative to other, similar STIP recipients when their programs ended in the latter half of March 2024. This makes sense rationally. The cost of capital within the Arbitrum ecosystem heightens during the STIP because protocols provide boosted yields and lowered fees through incentives. When one protocol stops allocating incentives, users and capital move to other protocols that are still providing heightened returns and lower expenses.

As such, allocating incentives to a concentrated group of protocols or tiering incentives across protocols within the same vertical might lead to unwanted outcomes. It’s likely that the protocols distributing incentives end up largely capturing capital and users from other projects within the ecosystem that aren’t distributing incentives, meaning that activity mostly rotates around from protocol to protocol within Arbitrum instead of bringing in new usage from foreign ecosystems.

Related to this, resulting yields across liquidity provision opportunities have had some notable variations across protocols, even within the same verticals and similar pools. Naturally, some opportunities are more risky than others because of, for example, inherently different mechanisms or less tested smart contracts, and should thus theoretically offer higher yields to reward users for taking on additional risk. Having said that, if protocols aim to minimize capturing capital and users from other similar projects within the ecosystem and instead bring in new users from foreign ecosystems, it might make sense to benchmark yields based on what similar protocols and products outside of Arbitrum are offering and apply slightly higher target yield intervals for STIP participants. As previously discussed, projects must naturally have the freedom to finetune their incentive distributions depending on, e.g., market conditions, the need to bootstrap new products, and reacting to protocol-specific needs, but especially for medium- and large-sized projects that aren’t bootstrapping new pools or products, it might be sensible to set targets.

The point here is that if similar opportunities within the ecosystem offer similar returns somewhat constantly, users within the ecosystem are disincentivized to change protocols purely based on returns. Theoretically, if growth within a vertical were to stagnate at certain yield thresholds, this implies that existing usage is exhausted at those levels and the prevalent market conditions, while the marginal new user requires higher returns to migrate to the ecosystem. In this case, it would be sensible to increase the specific vertical’s/product type’s threshold yield. Rationally, incentives shouldn’t sustainably expand existing users’ usage behavior or willingness to put capital at risk, meaning that it might make sense to add some sustainability-related KPIs to the program and more heavily prioritize structures that are likely to get new users to migrate to the ecosystem and create long-term activity.

It is worth pointing out that some standardization is already taking place, with notable perp DEX incentive recipients agreeing to rebate a maximum of 75% of trading fees. On the topic of standardization and program structure clarity, it’s exceptionally difficult to follow what the total, final number of ARB received by protocols with partnership allocations has been. Some projects have received a notably larger number of tokens compared to the initial allocation in isolation. To be specific, if partnership allocations aren’t adjusted for, comparing STIP recipients’ historical performance could lead to skewed results.

Many protocols missed several bi-weekly reports or didn’t post them at all. Around 35% of all STIP recipients didn’t post a final report. For the bi-weekly reports, only a handful of projects discussed protocol-related events and developments that might have explained growth that had materialized. Instead, it was more usual that projects would only present some high-level KPIs and how they planned to use incentives in the coming two weeks across different allocation buckets, meaning that it was sometimes difficult to understand drastic changes in figures purely based on information provided on the forum. It might be worth considering decreasing the reporting frequency to a monthly cadence, or even lower, such that protocols have more time to prepare their reports, can cover more relevant information, and hopefully can direct most of their focus on growing their products.

It was infrequent that protocols rigorously justified why they should be allocated a certain amount of incentives when applying for the STIP. Rather, the final allocations were generally a result of back-and-forth between protocols and the community, often resulting in an allocation based on something akin to “we feel like this ask is too big/small”. To make allocation requests more quantifiable and comparable across medium- and large-sized applicants, metrics such as TVL/volume/users/etc. could be normalized relative to the requested ARB allocation, creating a high-level metric that could be used to compare projects within the same verticals more easily. There are naturally additional factors that should be considered when deciding allocations as well, but multiples could be an efficient way to sanity-check protocols’ requests relative to each other.

The initially planned incentive period for the original STIP was ~3 months. This ended up being somewhat longer for Round 1 recipients, and somewhat shorter for Round 2 recipients because of the reasons mentioned at the beginning of this report. The average distribution period for relevant STIP Round 1 protocols that weren’t hacked and applied to the STIP-Bridge was 129 days, while the average distribution period for relevant STIP Round 2 protocols that weren’t hacked and applied to the STIP-Bridge was 83 days.

Some projects, even those having received incentives as part of Round 1, had a notable amount of their ARB allocations left when the STIP deadline began approaching. From a protocol’s perspective, it is naturally more beneficial to utilize as much of their incentive allocation as possible, while protocols don’t really gain anything from returning funds at the end of the period. As such, some protocols that had been more conservative in allocating ARB throughout the program arbitrarily cranked up their incentive distribution at the end of the period, which again theoretically affected the cost of capital structure within the ecosystem.

Finally, some projects that were eligible for incentives had planned to allocate ARB to products that hadn’t yet launched or to leverage incentive distribution mechanisms that hadn’t yet been put in place when the applications were submitted. Many of these projects then didn’t manage to launch the product or distribution mechanism, generally leading to the ARB allocation being redirected to some other incentive bucket instead of being sent back to the DAO. It might be sensible to require that the to-be-incentivized product has been live for X days before a protocol can request incentives for that product. Rysk has already set a great example for this, forfeiting from the STIP-Bridge since the project is currently working on its v2 upgrade. Somewhat relatedly, certain protocols built on top of another protocol’s product incentivized usage while the underlying protocol’s product was being wound down to be replaced by a newer version. Directing incentives to a product that will be discontinued in the near term might not be the highest ROI opportunity for the DAO.

Final Allocation of STIP Incentives

50M ARB was initially earmarked for the STIP. However, following higher-than-expected demand for incentives by protocols and to distribute tokens across a larger number of projects, the initial allocation (Round 1) was accompanied by a Round 2 (a.k.a. the STIP Backfund). Round 2 distributed capital to all approved but not funded projects connected to the initial allocation, amounting to ~21M ARB. In other words, a total of ~71M ARB was allocated to the overall STIP program.

Most protocols funded through Round 1 began distributing incentives in early/the middle of November 2023, while Round 2 protocols generally initiated their incentive programs at the end of December 2023 and throughout January 2024. Initially, both programs were to end by January 31, 2024, but due to backfunded protocols receiving their streams with a delay, the timelines for both programs were extended to March 29, 2024.

Source: Arbitrum Forum, Arbiscan, Blockworks Research Analysis. Note: The data excludes protocols that interrupted their distribution during the STIP, have allowed users to earn ARB rewards after March 29, 2024 (note: protocols that have allowed users to collect rewards earned during the STIP after the deadline are included), are labeled as infrastructure, and have migrated from Arbitrum. The number of ARB also excludes incentives originating from protocols’ balance sheets.

During the program, nearly all projects modified their initially proposed incentive allocations, with some even implementing completely new, unmentioned mechanisms. Modifying allocations between disclosed incentive buckets should be expected as protocols need flexibility in the way they allocate incentives as their programs progress, depending on factors such as market conditions, bootstrapping needs, and perceived effectiveness. However, it was somewhat surprising to see several protocols introduce completely new incentive buckets that were not disclosed in any way in the original incentive applications or initial bi-weekly updates.

To gauge how much ARB has been utilized across different end goals, we’ve divided the distribution into four high-level groups depending on the type of user activity they primarily target. Proprietary TVL refers to incentive mechanisms that directly encouraged users to deposit liquidity into the protocol that distributed incentives. Partner TVL means that the protocol distributing incentives allocated ARB to another project’s liquidity pools. The volume category includes incentive mechanisms that directly encouraged users to move more volume through the distributing protocol’s platform.

Utilized incentive mechanisms have also been classified into different categories based on their high-level characteristics. Standard liquidity incentives refers to structures where ARB distributions and amounts across pools were decided by the protocol teams and receiving rewards didn’t require anything in addition to providing liquidity. The “Liquidity incentives, allocation across pools activity based” category is similar but the allocation structure and amounts were decided by a predetermined formula instead of being controlled by a team on a week-by-week basis. The “Liquidity incentives with optional/required long-term/perpetual capital locking” category refers to structures where users could earn more rewards by locking capital with the distributing protocol or were required to do so to be eligible for rewards. “Liquidity incentives through integrated partner protocols” means that the distributing protocol allocated liquidity incentives to another protocol, and an increase in the latter’s TVL also directly increased the allocating protocol’s TVL. “Liquidity incentives requiring native token staking/LPing” refers to a structure where a user had to acquire and either stake or LP the distributing protocols native token to be eligible for rewards. The “Liquidity incentives for native token(s) outside platform” category comprises mechanisms where distributing protocols allocated incentives to another protocol’s liquidity pools, which didn’t directly increase the distributor’s TVL. An example of this is the distributing protocol incentivizing a spot DEX’s LPs that provided liquidity for the distributor’s governance token. “Incentives to proprietary infra/partnership protocols for discretionary use” refers to mechanisms that, e.g., allocated ARB to projects to cover the costs of integrating with the distributing protocol, or allocated ARB to the distributing protocol’s partners that could freely decide how to distribute the rewards to their users.

Following the changes made throughout the STIP, the most popular mechanisms were standard liquidity incentives (~30% of total allocation), fee rebates (~25% of total allocation), trading/points/usage-based programs (~12% of total allocation), liquidity incentives for native token(s) outside platform (~8% of total allocation), and liquidity incentives with optional/required long-term/perpetual capital locking (~8% of total allocation). In aggregate, ~45% of ARB distributed was directly connected to increasing proprietary TVL, ~38% to increasing volume, ~9% to miscellaneous end goals, and ~8% to increasing partner TVL.

Source: Arbitrum Forum, Arbiscan, Blockworks Research Analysis. Note: The data excludes protocols that interrupted their distribution during the STIP, have allowed users to earn ARB rewards after March 29, 2024 (note: protocols that have allowed users to collect rewards earned during the STIP after the deadline are included), are labeled as infrastructure, and have migrated from Arbitrum. The number of ARB also excludes incentives originating from protocols’ balance sheets.

STIP Recipient Performance

We’ve chosen to focus this analysis on perp and spot DEXs as these groups were the two largest verticals to receive incentives at ~38% and ~15% of the total allocation, respectively. Moreover, it’s no secret that DeFi activity is one of Arbitrum’s main competitive strengths, with DeFi-related protocols historically accounting for over 25% of the blockchain’s sequencer revenue.

To gauge the sustainability of activity and stickiness of capital on a relative basis across different incentive mechanisms and verticals, the following sections present several normalized charts, where figures have been standardized to September 2023 beginning-of-month values. It’s important to note that it’s naturally easier for a smaller protocol to grow by, e.g., 2x, compared to a well-established protocol. However, the idea behind normalizing performance is to be able to compare how sustainable activity has been across protocols, while we have strived to analyze the relative effectiveness, impact, and perhaps fairness of the incentive distributions by normalizing absolute changes in performance metrics by ARB utilized to directly increase the relevant metrics.

The analyzed protocols are displayed in the following format: the relevant incentive mechanism(s) utilized; the size of the ARB allocation received (where Small: < 1M ARB; Medium: >= 1M ARB & <= 2M ARB; Large > 2M ARB); the round through which incentives were received. To not distort this analysis, we’ve only considered protocols that have distributed incentives continuously for over two months, haven’t been hacked since the beginning of September 2023, and have been operational before the beginning of September 2023. A few protocols were also excluded due to reliable performance data not being readily available.

Perp DEXs

TVL

Source: DefiLlama & Dune

Within the perp DEX vertical, one protocol outperformed others when looking at normalized TVL figures. This protocol is on the younger side compared to the peer group, and as mentioned earlier, it’s naturally easier to reach large relative growth numbers when initial figures are smaller. That is not to say that the result isn’t impressive, especially given that the TVL growth continued after the program ended and has now essentially stabilized. However, this growth has most likely mainly been driven by factors intrinsically connected to the protocol, such as finding product-market fit, business development efforts, native liquidity incentives, bootstrapping market makers, etc. Nevertheless, we consider this as a great example of where incentives can be beneficial, with ARB tokens likely having amplified growth, which the protocol has managed to maintain post-STIP incentives. One possible downside to consider is that some capital might have migrated from other Arbitrum-aligned perp DEXs but increased competition on the supply side is in general beneficial for end users.

Source: DefiLlama & Dune

TVL development for the rest of the protocols within the perp DEX vertical follows a somewhat unified pattern. In general, liquidity began increasing around the time each protocol’s incentive program commenced, remained elevated during the program, but started decreasing as incentives drew to an end. It also seems that Round 2 protocols have been at a disadvantage, with liquidity trending downward until their incentive programs were initiated.

For half of the group, liquidity has dropped notably below levels where it was when the incentive program began, while for the other half, liquidity is slightly higher than what it was when the programs were initiated. Theoretically, each protocol has a baseline liquidity level that it can attract, depending on yields offered, perceived riskiness of returns, as well as LPs’ opportunity cost. On a high level, yields go down when incentives end since some trading volume is bound to migrate and immediate ARB rewards to LPs taper, inducing some LPs to move to other sources of yield that they perceive to be better.

Interestingly, the two protocols that used ARB to directly incentivize LPing on their platforms didn’t see as drastic drawdowns in their TVLs as experienced by the two protocols that used no incentives for proprietary TVL. Another factor to consider is that market conditions drastically improved during the STIP and asset prices shot up. The impact of this on perp DEXs depends on the protocol design, as some perp DEXs mainly rely on stablecoins for liquidity, while others’ TVLs consist mostly of volatile crypto assets. In ETH terms, every perp DEXs’ TVL is down on a normalized basis, excluding the outperformer mentioned earlier.

Source: DefiLlama, CoinGecko, Dune

Source: DefiLlama, CoinGecko, Dune

Three of the five perp DEXs analyzed used incentives to directly increase proprietary liquidity. Looking at TVL figures at the beginning of September 2023 normalized by ARB allocated for increasing proprietary liquidity throughout the program shows that there was some notable variation in how much ARB was used as liquidity incentives relative to TVL levels. If liquidity was purely driven by direct ARB incentives, projects with smaller TVL / ARB incentives allocated starting multiples should see their multiples expand more throughout the program than projects with large starting multiples.

Source: DefiLlama, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

The best performer reached ~$135 additional 30D-MA liquidity compared to the beginning of its incentive program per ARB spent. TVL continued climbing after the program ended until the middle of April, and the metric stabilized at ~175$ at the end of June 2024. Excluding the best performer leaves a small sample size, but the long-term TVL growth per ARB spent across the other two protocols is quite similar despite the fact that one utilized notably larger liquidity incentives relative to its TVL than the other, and the utilized incentive mechanisms were quite different. This might indicate that as long as teams allocate incentives sensibly, the underlying mechanism doesn’t significantly matter since the return LPs require is similar across protocols. Simply put, opportunities with similar risk profiles should also have similar costs of capital.

Source: DefiLlama, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

Volume

The volume trends for perp DEX STIP recipients follow the TVL trends quite closely, where the same outperformer managed to grow during its incentive program as well as maintain the increased activity after the program ended. Meanwhile, the rest of the protocols experienced notable uptrends in volume coinciding with the beginning of their incentive programs, but this trend began reverting in early April, converging towards September 2023 values in June 2024. Similarly to TVL, round 2 recipients’ figures lagged behind round 1 recipients’ but began catching up in the latter half of the STIP.

Source: DefiLlama & Dune

Source: DefiLlama, CoinGecko, Dune

The outperformer utilized the least ARB to directly incentivize trading volume relative to volume at the beginning of September 2023, at 1 ARB per ~$21 of volume. This indicates that the outperformance in growth didn’t derive from an outsized allocation of incentives relative to volume when compared against other perp DEXs. For the rest of the perp DEXs, there is some spread in the Volume / ARB utilized multiple, ranging between $6-$12.

Source: DefiLlama, Dune, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

Source: DefiLlama, Dune, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

As with TVL, there is some variability across absolute volume growth achieved at the end of the STIP per ARB directly used to incentivize activity, with increases for the incentive periods ranging between ~$10 and ~$39, excluding the outperformer which achieved volume growth of ~$93 per ARB spent. However, these figures began converging in the months following the STIP’s conclusion. At the end of June 2024, the absolute changes in volume compared to the beginning of each perp DEX’s incentive program per ARB spent were ~$6, ~$1, ~-$2, and ~-$21, excluding the outperformer, for which the figure was at ~$55. In other words, projects have generally achieved similar returns when looking at a longer time period.

Source: DefiLlama, Dune, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

Spot DEXs

TVL

Within the spot DEX vertical, there was one outperformer in terms of relative TVL growth, and this growth has sustained post-incentives. Similarly to the outperformer within the perp DEX vertical, this protocol is on the smaller/younger side, meaning that only looking at relative growth can be misleading. Nevertheless, incentives have facilitated the protocol to achieve a sustainable market share increase, although this is unlikely to be the main driver for outperformance.

For the rest of the protocols, all of which are Round 1 recipients, relative TVL growth moved in tandem until the end of March 2024, when one perp DEX lost notably more liquidity compared to the rest of the group. This protocol is the only spot DEX that had a lower TVL at the end of June 2024 compared to the beginning of September 2023. Compared to perp DEXs, this vertical’s USD-denominated TVL is even more heavily driven by volatile crypto asset prices. As crypto prices generally increased in Q4 ‘23 and Q1 ‘24, USD-denominated TVL would have increased even with no asset inflows to spot DEXs. Looking at TVL denominated in ETH, only the outperformer’s figures are up from September 2023 values.

Source: DefiLlama & Token Terminal

Source: DefiLlama, CoinGecko, Token Terminal

All of the analyzed spot DEXs used all of their incentives to directly increase proprietary liquidity. The below two graphs are great examples of why it’s helpful to normalize high-level metrics by the ARB allocation size. As mentioned earlier, one spot DEX outperformed when it comes to relative TVL growth. However, the same protocol also received the largest ARB allocation relative to its TVL at the beginning of September 2023, with one ARB allocated per ~$4 of liquidity. In comparison, the other spot DEXs received one ARB per liquidity between ~$19-~$71.

Source: DefiLlama, Token Terminal, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

Performance relative to the allocation size varied widely between protocols at the end of the STIP, with Round 1 recipients having reached notably stronger results. However, three months later, all except for one protocol’s figures had converged to similar levels at ~$13, ~$17, and ~$20, while the underperformer was at ~-$25. In other words, the long-term benefit of one ARB spent has been quite similar across most spot DEXs. It’s worth noting that the underperformer utilized a predetermined formula that was based on other factors than just fees generated across pools, which might have been gameable and attracted mercenary capital, possibly explaining the drastic drawdown once the STIP ended. However, this isn’t something that we can say to be objectively true just based on the data presented. Nevertheless, in contrast, the other spot DEX that allocated LP incentives based on a predetermined formula did so purely based on fees generated. The below graph also exemplifies why purely examining growth figures achieved during the incentive program could be misleading. While one spot DEX outperformed when looking at values from the end of March, its TVL performance per ARB spent has actually been the weakest in the long term.

Source: DefiLlama, Token Terminal, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

Volume

As mentioned earlier, no spot DEXs directly incentivized traders with ARB. Despite this, all protocols’ volume figures grew notably during the program, with two projects having outperformed and largely maintained volume after the program ended. The two other protocols also saw a clear uplift in volume during the program, generally coinciding with wider market trends, but the drawdown post-STIP has been more drastic on a relative basis than for the two outperformers, both of which are newer protocols. It should be noted that the active liquidity management protocol Gamma was exploited on January 4, 2024, which led to abnormally high volumes on that day for a few protocols analyzed here.

Source: DefiLlama, Token Terminal

Source: DefiLlama, Token Terminal

Again, market conditions in Q4 ‘23 and Q1 ‘24 were favorable for perp DEX volume, and relative growth should be expected for all projects when looking at USD-denominated values. Having said that, instead of simply benefitting from increased asset prices and more volatility, the two outperformers have likely grown by also increasing their market penetration, exceeding the market’s expansion. Looking at ETH-denominated volume, the two outperformers have largely maintained their relative growth, while volume for the two other projects has returned to September 2023 levels.

Source: DefiLlama, CoinGecko, Token Terminal

Source: DefiLlama, CoinGecko, Token Terminal

2 Likes

STIP Retroactive Analysis – Yield Aggregators TVL

The below research report is also available in document format here .

TL;DR

In H2 2023, Arbitrum launched the Short-Term Incentive Program (STIP) by distributing millions of ARB tokens to various protocols to drive user engagement. This report focuses on how the STIP impacted TVL in the yield aggregator vertical, specifically examining the performance of Gamma, Jones DAO, Solv Protocol, Stella and Umami. By employing the Synthetic Control (SC) causal inference method to create a “synthetic” control group, we aimed to isolate the STIP’s effect from broader market trends. For each protocol the analysis focuses on the median TVL in the period from the first day of the STIP to two weeks after the STIP ended, in an effort to include at least two weeks of persistence in the analysis.

Yield aggregator protocols utilized their STIP allocations to incentivize depositors, leading to positive impacts on their TVL. Our analysis yielded varied results: Solv Protocol, Gamma, and Jones DAO experienced increases in their median TVL, directly attributed to the STIP, of $18.6M, $16.5M, and $12.6M, respectively, from the start of the STIP to two weeks after its conclusion. Stella and Umami also benefited, with $2.4M and $1M in additional TVL, respectively, linked to STIP incentives. During the STIP period and the two weeks following its conclusion, the added TVL per dollar spent on incentives was approximately $18 for Gamma, $5 for Jones DAO, $103 for Solv Protocol, $11 for Stella and $1 for Umami Finance. While the STIP positively impacted all protocols when considering the median TVL of the corresponding period, a different picture emerges when looking at growth from the program’s start to two weeks after its end. In this “before and after” comparison, the STIP slightly negatively impacted TVL growth for Solv Protocol (-8%) and had a minimal positive impact for Stella and Jones DAO (1%), even though Stella had a large TVL increase. In contrast, Umami experienced the most significant growth due to STIP, with a 120% increase (largely indirectly due to GMX’s grant), followed by Gamma with a 45% increase. It becomes clear that sustained growth, or “stickiness” is not correlated with each protocol’s success during the program. This is further underscored by comparing the growth with the TVL one month after the STIP, where all protocols experienced a much smaller increase. In every instance, the boost from incentives proved to be somewhat temporary, with only a small portion of the growth remaining a month after the STIP concluded.

The analysis indicates that direct incentives to LPs can yield substantial TVL growth and efficient use of funds. Flexibility in strategy also proved beneficial, as Umami Finance’s switch to direct ARB emissions significantly boosted their TVL. Stella’s balanced approach of splitting incentives between strategies and lending pools also led to a notable increase. Overall, the findings suggest that smaller protocols have more room for rapid growth when given substantial incentives, while larger protocols, like Solv Protocol, might benefit from a more proportional allocation to maximize efficiency. Protocols should reward users directly, uniformly over time, and transparently for providing liquidity while maintaining flexibility to adapt to feedback.

Our methodology and detailed results underscore the complexities of measuring such interventions in a volatile market, stressing the importance of comparative analysis to understand the true impact of incentive programs like STIP.

Context and Goals

In H2 2023, Arbitrum initiated a significant undertaking by distributing millions of ARB tokens to protocols as part of the Short-Term Incentive Program (STIP), aiming to spur user engagement. This program allocated varying amounts to diverse protocols across different verticals. Our objective is to gauge the efficacy of these recipient protocols in leveraging their STIP allocations to boost the usage of their products. The challenge lies in accurately gauging the impact of the STIP amidst a backdrop of various factors, including broader market conditions.

This report pertains to the yield aggregator vertical in particular. In this vertical, the STIP recipients were Gamma, Jones DAO, Solv Protocol, Stella and Umami. Stake DAO faced some KYC issues, which significantly delayed the start of the program. As a result, Stake DAO was only able to distribute incentives for three weeks concurrently with other protocols, and its distribution is still ongoing. The following table summarizes the amount of ARB tokens received and when they were used by each protocol.

For yield aggregators, TVL is a highly relevant metric. These protocols aim to facilitate “deploy and forget” strategies that minimize user interaction, making metrics like transactions or fees less pertinent. Throughout the report, the 7-day moving average (MA) TVL was used, so any mention of TVL should be understood as the 7-day MA TVL.

We used a Causal Inference method called Synthetic Control (SC) to analyze our data. This technique helps us understand the effects of a specific event by comparing our variable of interest to a “synthetic” control group. Here’s a short breakdown:

  • Purpose: SC estimates the impact of a particular event or intervention using data over time.
  • How It Works: It creates a fake control group by combining data from similar but unaffected groups. This synthetic group mirrors the affected group before the event.
  • Why It Matters: By comparing the real outcomes with this synthetic control, we can see the isolated effect of the event.

In our analysis, we use data from other protocols to account for market trends. This way, we can better understand how protocols react to changes, like the implementation of the STIP, by comparing their performance against these market-influenced synthetic controls. The results pertain to the period from the start of each protocol’s use of the STIP until two weeks after the STIP had ended.

Results

Gamma

Gamma, a protocol specializing in active liquidity management and market-making strategies, offers non-custodial, automated, and active concentrated liquidity management services. Gamma is supported in fourteen different networks, including Arbitrum, where it launched November 1, 2022.

The STIP aimed to distribute ARB tokens to liquidity providers (LPs) who participated in qualified Gamma vaults. These vaults were built on the liquidity pools of six supported AMMs: Uniswap V3, Sushiswap V3, Ramses, Camelot, Zyberswap, and Pancakeswap. 100% of the incentives were allocated to LPs.

The primary objective of the program was to enhance liquidity on the Arbitrum network by deploying incentives on three native AMMs (Ramses, Camelot, and Zyberswap) and three non-native AMMs (Uniswap, Sushiswap, and Pancakeswap). Gamma engaged in discussions with partner AMMs to identify suitable pools to incentivize, focusing on under-capitalized pools based on their analysis of trading activity on the Arbitrum network.

Gamma’s methodology for selecting pairs considered various factors. It prioritized native pairs to Arbitrum, which typically required more liquidity and were under-capitalized. Critical infrastructure pairs, such as WETH, ARB, WBTC, and stablecoins, were also a focus, given their regular use by most users. Additionally, Gamma considered pairs that aligned with the strengths of the AMM they were on, while avoiding pairs already incentivized by other parties or overcapitalized ones. The incentive structure was designed to ensure that AMMs did not work against each other inefficiently. No matching funds for grant matching were available.

Gamma’s TVL grew from approximately $17.7M on November 15, 2023, when the STIP began, to around $43.4M at its peak in end January 2024. By the end of the STIP on March 20, 2024, the TVL was $38.0M. One week later, on March 27, the TVL was $35.3M, and two weeks after the STIP ended, on April 3, it dropped to $33.4M. By one month after the STIP concluded, on April 17, the TVL had decreased to $20.5M.

This represents a 114.2% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 99.1% increase, while two weeks after the STIP ended, there was a 88.4% increase.

It’s worth noting that on January 4, 2024, Gamma temporarily halted deposits into their vaults due to an issue affecting four of the stable and LST vaults. Following OpenZeppelin’s investigation and their confirmation that Gamma’s mitigation was effective, deposits resumed on January 23rd.

The median impact of the STIP on Gamma’s TVL, from its start on November 15, 2023, to its end on March 20, 2024, was $16.2M. Including the TVL two weeks after the STIP concluded, the impact was $16.5M. Gamma received a total of 750k ARB, valued at approximately $900k (at $1.20 per ARB). This indicates that the STIP generated an average of $18.3 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 95% statistical significance.

Jones DAO

Jones DAO, a yield, strategy, and liquidity protocol, offers vaults that provide easy access to different strategies, aiming to enhance liquidity and capital efficiency for DeFi through yield-bearing tokens. In the STIP, Jones DAO requested 2 million ARB tokens to be allocated as follows: 82.5% for user incentives in current vaults and 17.5% for user incentives in future vaults.

A significant portion of these incentives was allocated to GLP-related products, while GMX focused on V2 growth in its program. Since future products were not released in time, all the incentives were ultimately distributed to existing products. Jones DAO’s execution strategy aimed to distribute 100% of the ARB allocation directly to Jones Vault users and the lending strategies built upon these vaults.

The distribution of ARB tokens was designed to align with the current yield distribution methods of Jones strategies. For instance, if a vault distributed yield weekly, ARB tokens would also be distributed weekly. Conversely, if yield distribution was constant, ARB tokens would be streamed continuously.

Jones DAO concluded that reducing the relative percentage of rewards per category and focusing more on integrations within the Arbitrum ecosystem, rather than solely on native farms, could enhance capital efficiency.

Jones DAO’s TVL grew from approximately $15.6M on November 28, 2023, when the STIP began, to around $30.7M at its peak in end January 2024. By the end of the STIP on March 29, 2024, the TVL was $25.2M. One week later, on April 5, the TVL was $18.2M, and two weeks after the STIP ended, on April 12, it dropped to $14.4M. By one month after the STIP concluded, on April 29, the TVL had decreased to $12.8M.

This represents a 61.8% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 16.5% increase, while two weeks after the STIP ended, there was a 7.6% decrease.

The median impact of the STIP on Jones DAO’s TVL, from its start on November 28, 2023, to its end on March 29, 2024, was $12.8M. Including the TVL two weeks after the STIP concluded, the impact was $12M. Jones DAO received a total of 2M ARB, valued at approximately $2.4M (at $1.20 per ARB). This indicates that the STIP generated an average of $5.25 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 95% statistical significance.

Solv Protocol

Solv Protocol has launched innovative products like Vesting Vouchers, Bond Vouchers, and Fund SFT for on-chain funds. Solv V3 offers a transparent platform where global institutions and retail investors can access a variety of trusted crypto investments. It also supports fund managers in raising capital and establishing on-chain credibility.

In the STIP, Solv Protocol planned to issue multiple DeFi market-making funds designed to provide users with consistent and appealing returns in a controlled environment with relatively low risk, exemplified by the open-end GMX fund with a $20,000,000 capacity. To enhance yield returns and bootstrap token governance, Solv incorporated a plan for token emissions, with the addition of ARB tokens aiming to attract high-value Arbitrum users.

100% of the allocated ARB (150,000 ARB) was designated as extra incentives for fund products on Solv Arbitrum, split equally between Offchain/RWA funds and Onchain Delta Neutral Strategy funds. The ARB incentives were proportionally allocated among users based on their cumulative daily holdings across vaults on Arbitrum. These incentives were airdropped directly to Solv vault investors after the completion of each of the three distribution epochs.

Additionally, Solv Protocol confirmed its commitment to grant matching with future token issuance.

Solv Protocol’s TVL grew from approximately $71.3M on January 1, 2024, when the STIP began, to around $122.6M at its peak in March 2024. By the end of the STIP on March 29, 2024, the TVL was $106.1M. One week later, on April 5, the TVL was $105.2M, and two weeks after the STIP ended, on April 12, $105.9M. By one month after the STIP concluded, on April 26, the TVL had decreased to $89.7M.

This represents a 48.8% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 47.5% increase, while two weeks after the STIP ended, the increase remained steady around 48.5%.

The median impact of the STIP on Solv Protocol’s TVL, from its start on January 1, 2024, to its end on March 29, 2024, was $25.1M. Including the TVL two weeks after the STIP concluded, the impact was $18.6M. Solv Protocol received a total of 150k ARB, valued at approximately $180k (at $1.20 per ARB). This indicates that the STIP generated an average of $103.5 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 90% statistical significance.

Stella

Stella, a leveraged yield farming protocol on Arbitrum, offers 0% cost to borrow and enables leveraged strategies on yield sources like Uniswap DEXs, TraderJoe liquidity book, and the Pendle PT pools.

The protocol is divided into two parts: Stella Strategies (leveraged strategies) and Stella Lend (lending pools). A total of 186,000 ARB tokens were allocated as incentives, distributed between these two parts as follows:

  • Approximately 66,000 ARB tokens were designated for Stella Strategies, providing an additional 20% yield for profitable leveraged positions. This incentive was exclusive to profitable positions to prevent sybil attacks and encourage good behavior.
  • Around 120,000 ARB tokens were allocated to Stella Lending pools. The exact incentive amount for each pool was determined dynamically based on what seemed appropriate.

Stella aimed to stimulate both the strategy and lending sides, initiating a positive feedback loop for growth.

According to Stella, this experience highlighted the need to adjust protocol mechanics to make lending more attractive, as the borrow capacity was consistently maxed out while lending liquidity lagged. To address this, Stella implemented an “airdrop points sharing” system where lenders earned 50% of points from EigenLayer and LRT, enhancing the appeal of lending.

The ARB incentives were distributed innovatively. For Stella Strategies, the incentives were auto-deposited into the ARB lending pool on Stella with a linear vesting period of 30 days, preventing immediate dumping and allowing leverage users to earn additional lending yields over this period. This approach also helped bootstrap liquidity in the ARB lending pool, benefiting both the lending and leveraged farming sides.

Stella’s TVL grew from approximately $2.3M on November 3, 2023, when the STIP began, to around $9.3M at its peak in March 2024. By the end of the STIP on March 29, 2024, the TVL was $7.2M. One week later, on April 5, the TVL was $6.0M, and two weeks after the STIP ended, on April 12, it dropped to $5.1M. By one month after the STIP concluded, on April 29, the TVL had decreased to $3.8M.

This represents a 211.3% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 158.9% increase, while two weeks after the STIP ended, the increase was 120.5%.

The median impact of the STIP on Stella’s TVL, from its start on November 3, 2023, to its end on March 29, 2024, was $2.5M. Including the TVL two weeks after the STIP concluded, the impact was $2.4M. Stella received a total of 186k ARB, valued at approximately $223k (at $1.20 per ARB). This indicates that the STIP generated an average of $10.8 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 95% statistical significance.

Umami Finance

Umami Finance implemented an oARB emissions tool, inspired by Dolomite’s system, to distribute their STIP allocation. This tool allowed users to stake their Umami vault receipt tokens for oARB emissions, which were continuously emitted and could be vested on a first-come, first-served basis. The emitted oARB could be staked for a duration of up to four months, in weekly increments, paired with an equal amount of ARB. After the vesting period, users could obtain the underlying ARB at a discounted price, with the discount increasing by 2.5% per additional week staked, subject to change based on feedback. However, if the ARB rewards pool was depleted, all remaining oARB tokens would expire worthless. The staked ARB paired with oARB was deposited back into Umami to improve capital efficiency for farmers.

The implementation of oARB emissions faced challenges, particularly with the availability of ARB from vesting contracts. While early depositors initially enjoyed high returns, issues arose towards the end of the period as more users opted for the non-ETH investment 40-week option, necessitating a tapering of emissions. Looking ahead, Umami Finance decided to adopt a direct incentive approach with dynamic incentives, instead of the oARB incentive mechanism.

Umami Finance also used GMX’s grant of 100,000 ARB, and so was able to delay ARB yield from the STIP and utilize direct ARB emissions with the GMX grant for 45 days. 702,775 ARB was distributed through the scaling GLP vaults with the oARB emissions program, which concluded on January 26th. After this, the remaining 47,225 tokens from the STIP together with GMX’s 100k ARB were allocated to GM Vault direct emissions.

Umami’s TVL grew from approximately $3.3M on November 13, 2023, when the STIP began, to around $11.2M at its peak in February 2024. By the end of the STIP on March 29, 2024, the TVL was $10.5M. One week later, on April 5, the TVL was $11.1M, and two weeks after the STIP ended, on April 12, it dropped to $10.6M. By one month after the STIP concluded, on April 29, the TVL was at $10.7M.

This represents a 220.8% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 238.6% increase, while two weeks after the STIP ended, the increase was 223.7%.

The median impact of the STIP on Umami’s TVL, from its start on November 13, 2023, to its end on March 29, 2024, was $726.7k. Including the TVL two weeks after the STIP concluded, the impact was $1.2M. Umami Finance received a total of 750k ARB, valued at approximately $900k (at $1.20 per ARB). This indicates that the STIP generated an average of $1.2 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 95% statistical significance.

Main Takeaways

Our analysis produced interesting results for the impact of the STIP on Gamma, Jones DAO, Solv Protocol, Stella and Umami. The table below summarizes these results. A further explanation of how this and all results were derived can be found in the Methodology section.

A summary of the key differentiators in incentive allocation is shown in the table below.

All yield aggregator protocols used the STIP allocation to provide direct incentives to their depositors, resulting in a positive impact on their TVL. Notably, these protocols vary significantly in the products they offer and their operational methods. While diversity exists in other DeFi verticals as well, it is particularly pronounced among yield aggregators. For instance, although classified as yield aggregators within the context of STIP and the Arbitrum DAO, entities like DefiLlama categorize them differently, including them in various verticals such as yield protocols, liquidity managers, RWA, and leveraged farming.

Smaller protocols like Stella (starting at $2.3M) and Umami ($3.3M) saw the largest TVL increases from the start of the STIP to their respective peak TVLs. However, Umami maintained its TVL increase in a much steadier fashion even two weeks after the STIP. While the STIP generated a 120% TVL growth for Umami, Stella only saw a 1% increase in this period. Despite this, with Umami’s grant size being four times that of Stella, the efficiency of the STIP in terms of median TVL added per dollar was much higher for Stella, at $10.83 compared to Umami’s $1.20. This median TVL considers the entire duration of the STIP plus two weeks post-STIP.

Gamma’s TVL growth from start to peak was the third largest, further supporting the idea that smaller protocols benefit proportionally more from the program. Nearly all of Gamma’s growth, 116%, can be attributed to the STIP, compared to its total growth of 145%. Even considering the growth from start to two weeks after the STIP’s end, the 45% increase attributed to the STIP is significant. Gamma’s median added TVL attributed to the STIP was $18.33 per dollar spent, higher than both Stella and Umami due to its efficient allocation.

Jones DAO started with a TVL similar to Gamma but experienced less growth from start to peak, with a 55% increase attributed to the STIP during this period. Notably, there was a decrease in TVL when comparing two weeks post-STIP to the pre-STIP period. However, the STIP still had a positive impact, albeit small, of 1%. The median TVL added by the STIP was $12.6M, resulting in $5.25 TVL per dollar, which is on the lower end for this group. The team concluded that focusing more on integrations within the Arbitrum ecosystem, rather than solely on native farms, could enhance capital efficiency.

Solv Protocol began the STIP with nearly five times the TVL of Jones DAO or Gamma, yet still saw a comparable increase of 48.6% during the STIP period and two weeks after. Although the absolute median TVL added was the largest, it wasn’t substantially higher than Gamma or Jones DAO when considering the higher starting point. Our analysis reveals an 8% TVL decrease attributed to the STIP in the period after two weeks and only a 2% increase at its peak TVL. This suggests that while the STIP initially boosted growth, it stagnated towards the end. This stagnation might be due to the uneven distribution method over three epochs, with the first batch of incentives released only on March 5, 2024, after three months of user participation. The two remaining batches were all distributed within March. Despite this, Solv Protocol achieved a median of $103.47 added TVL per dollar spent. Notably, Solv’s allocation was the smallest at 150k ARB, similar to Stella’s 186K ARB and much smaller than Jones DAO’s 2M ARB. A more proportional allocation might have enabled the protocol to perform better.

Direct incentives to LPs seem effective: Gamma, which allocated 100% of incentives directly to LPs, saw a significant TVL impact and a good TVL per dollar spent ratio. Diversified incentive allocation can also be beneficial: Stella, which split incentives between strategies and lending pools, achieved the highest TVL increase and a high STIP impact. This indicates that a balanced approach targeting different aspects of the protocol can yield positive results.

Flexibility and adaptability are crucial: Umami Finance’s experience demonstrates that adjusting strategies—such as switching from oARB emissions to direct ARB emissions—can lead to significantly better results. They saw the highest impact on their TVL after this change, indicating that direct emissions are much better received than strategies involving lock and vesting, like oARB. The substantial increase in TVL attributed to the STIP two weeks after its end, which includes the GMX grant, highlights this point. However, because Umami’s initial allocation was quite large, especially compared to similarly sized protocols like Stella, the efficiency measured by added TVL per dollar was relatively low.

The analysis suggests that smaller protocols might have more room for rapid growth when given significant incentives. However, while smaller protocols showed higher percentage overall growth, larger protocols often saw more significant absolute increases in TVL per dollar spent. This is an important distinction when evaluating the impact of incentives. A more proportional allocation of incentives might have enhanced the efficiency of distribution, as evidenced by Solv Protocol’s high added TVL per dollar compared to Umami’s lower efficiency.

Overall, the most successful approaches appear to be those that directly, uniformly over time and transparently reward users for providing liquidity, while maintaining the flexibility to adapt to changing market conditions or user behaviors. Protocols that can balance simplicity with strategic allocation across their ecosystem seem to achieve the best results in terms of TVL growth and efficient use of incentives.

It’s also interesting to note that most protocols’ TVL, normalized by ARB allocation, converges to values between $5 and $18 per ARB spent. Solv Protocol is an outlier due to its disproportionate allocation relative to its size. This indicates that, while the liquidity benefit of one ARB spent varies significantly throughout the program, the long-term benefit remains fairly consistent across most yield aggregators.

Methodology

TLDR: We employed an analytical approach known as the Synthetic Control (SC) method. The SC method is a statistical technique utilized to estimate causal effects resulting from binary treatments within observational panel (longitudinal) data. Regarded as a groundbreaking innovation in policy evaluation, this method has garnered significant attention in multiple fields. At its core, the SC method creates an artificial control group by aggregating untreated units in a manner that replicates the characteristics of the treated units before the intervention (treatment). This synthetic control serves as the counterfactual for a treatment unit, with the treatment effect estimate being the disparity between the observed outcome in the post-treatment period and that of the synthetic control. In the context of our analysis, this model incorporates market dynamics by leveraging data from other protocols (untreated units). Thus, changes in market conditions are expected to manifest in the metrics of other protocols, thereby inherently accounting for these external trends and allowing us to explore whether the reactions of the protocols in the analysis differ post-STIP implementation.

To achieve the described goals, we turned to causal inference. Knowing that “association is not causation”, the study of causal inference lies in techniques that try to figure out how to make association be causation. The classic notation of causality analysis revolves around a certain treatment , which doesn’t need to be related to the medical field, but rather is a generalized term used to denote an intervention for which we want to study the effect. We typically consider the treatment intake for unit i, which is 1 if unit i received the treatment and 0 otherwise. Typically there is an , the observed outcome variable for unit i. This is our variable of interest, i.e., we want to understand what the influence of the treatment on this outcome was. The fundamental problem of causal inference is that one can never observe the same unit with and without treatment, so we express this in terms of potential outcomes. We are interested in what would have happened in the case some treatment was taken. It is common to call the potential outcome that happened the factual, and the one that didn’t happen, the counterfactual. We will use the following notation:

- the potential outcome for unit i without treatment

- the potential outcome for the same unit i with the treatment.

With these potential outcomes, we define the individual treatment effect to be . Because of the fundamental problem of causal inference, we will actually never know the individual treatment effect because only one of the potential outcomes is observed.

One technique used to tackle this is Difference-in-Difference (or diff-in-diff). It is commonly used to analyze the effect of macro interventions such as the effect of immigration on unemployment, the effect of law changes in crime rates, but also the impact of marketing campaigns on user engagement. There is always a period before and after the intervention and the goal is to extract the impact of the intervention from a general trend. Let be the potential outcome for treatment D on period T (0 for pre-intervention and 1 for post-intervention). Ideally, we would have the ability to observe the counterfactual and estimate the effect of an intervention as: #0), the causal effect being the outcome in the period post-intervention in the case of a treatment minus the outcome in the same period in the case of no treatment. Naturally, is counterfactual so it can’t be measured. If we take a before and after comparison, -E%5BY(0)%7CD%3D1)#0) we can’t really say anything about the effect of the intervention because there could be other external trends affecting that outcome.

The idea of diff-in-diff is to compare the treated group with an untreated group that didn’t get the intervention by replacing the missing counterfactual as such: . We take the treated unit before the intervention and add a trend component to it, which is estimated using the control . We are basically saying that the treated unit after the intervention, had it not been treated, would look like the treated unit before the treatment plus a growth factor that is the same as the growth of the control.

An important thing to note here is that this method assumes that the trends in the treatment and control are the same. If the growth trend from the treated unit is different from the trend of the control unit, diff-in-diff will be biased. So, instead of trying to find a single untreated unit that is very similar to the treated, we can forge our own as a combination of multiple untreated units, creating a synthetic control.

That is the intuitive idea behind using synthetic control for causal inference. Assuming we have units and unit 1 is affected by an intervention. Units are a collection of untreated units, that we will refer to as the “donor pool”. Our data spans T time periods, with periods before the intervention. For each unit j and each time t, we observe the outcome . We define as the potential outcome without intervention and the potential outcome with intervention. Then, the effect for the treated unit at time t, for is defined as . Here is factual but is not. The challenge lies in estimating .

Source: 15 - Synthetic Control — Causal Inference for the Brave and True

Since the treatment effect is defined for each period, it doesn’t need to be instantaneous, it can accumulate or dissipate. The problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of the treated unit, had it not been treated.

The most straightforward approach is to consider that a combination of units in the donor pool may approximate the characteristics of the treated unit better than any untreated unit alone. So we define the synthetic control as a weighted average of the units in the control pool. Given the weights the synthetic control estimate of is .

We can estimate the optimal weights with OLS like in any typical linear regression. We can minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period. Hence, creating a “fake” unit that resembles the treated unit before the intervention, so we can see how it would behave in the post-intervention period.

In the context of our analysis, this means that we can include all other yield aggregator protocols that did not receive the STIP in our donor pool and estimate a “fake”, synthetic, control yield aggregator protocol that follows the trend of any particular one we want to study in the period before receiving the STIP. As mentioned before, the metric of interest chosen for this analysis was TVL and, in particular, we calculated the 7-day moving average to smooth the data. Then we can compare the behavior of our synthetic control with the factual and estimate the impact of the STIP by taking the difference. We are essentially comparing what would have happened, had the protocol not received the STIP with what actually happened.

However, sometimes regression leads to extrapolation, i.e., values that are outside of the range of our initial data and can possibly not make sense in our context. This happened when estimating our synthetic control, so we constrained the model to do only interpolation. This means we restrict the weights to be positive and sum up to one so that the synthetic control is a convex combination of the units in the donor pool. Hence, the treated unit is projected in the convex hull defined by the untreated unit. This means that there probably won’t be a perfect match of the treated unit in the pre-intervention period and that it can be sparse, as the wall of the convex hull will sometimes be defined only by a few units. This works well because we don’t want to overfit the data. It is understood that we will never be able to know with certainty what would have happened without the intervention, just that under the assumptions we can make statistical conclusions.

Formalizing interpolation, the synthetic control is still defined in the same way by . But now we use the weights that minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period , subject to the restriction that are positive and sum to one.

We get the optimal weights using quadratic programming optimization with the described constraints on the pre-STIP period and then use these weights to calculate the synthetic control for the total duration of time we are interested in. We initialized the optimization for each analysis with different starting weight vectors to avoid introducing bias in the model and getting stuck in local minima. We selected the one that minimized the square difference in the pre-intervention period.

As an example, below is the resulting chart for Gamma, showing the factual TVL observed in Gamma and the synthetic control.

With the synthetic control, we can then estimate the effect of the STIP as the gap between the factual protocol TVL and the synthetic control, .

To understand whether the result is statistically significant and not just a possible result we got due to randomness, we use the idea of Fisher’s Exact Test. We permute the treated and control units exhaustively by, for each unit, pretending it is the treated one while the others are the control. We create one synthetic control and effect estimates for each protocol, pretending that the STIP was given to another protocol to calculate the estimated impact for this treatment that didn’t happen. If the impact in the protocol of interest is sufficiently larger when compared to the other fake treatments (“placebos”), we can say our result is statistically significant and there is indeed an observable impact of the STIP on the protocol’s TVL. The idea is that if there was no STIP in the other protocols and we used the same model to pretend that there was, we wouldn’t see any impact.

References

“Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.”

Aayush Agrawal - Causal inference with Synthetic Control using Python and SparseSC

01 - Introduction To Causality — Causal Inference for the Brave and True

2 Likes

STIP Retroactive Analysis – Sequencer Revenue

The below research report is also available in document format here .

TL;DR

Our findings show that 43% of Arbitrum’s revenue between November 2023 and the Dencun update was attributable to the STIP, with $15.2M recouped from sequencer revenue against the $85.2M spent. Although the STIP had a positive short-term impact on market presence, its long-term effectiveness remains uncertain. The program likely helped maintain Arbitrum’s prominence and market share amidst intensifying competition from other L2s, influencing protocol launch decisions, amongst others. However, the $60M net loss signals the need for better-structured future incentive programs.

Context and Goals

Starting November 2023, Arbitrum launched the Short-Term Incentive Program (STIP), distributing millions of ARB tokens to various protocols to boost user engagement. This initiative allocated different amounts to a wide range of protocols across multiple sectors. Previously, Blockworks Research examined several protocols within specific verticals — perp DEXs, spot DEXs, and yield aggregators — to measure the STIP’s impact on key metrics.

In this analysis, we aim to assess the STIP’s overall effect on the Arbitrum network by examining its impact on sequencer revenue. The primary goal of the STIP was to attract more users to the recipient protocols and the broader ecosystem, fostering growth and activity. An increase in sequencer revenue would indicate a successful incentive program, where the costs of ARB incentives are at least partially offset by sequencer revenue in ETH.

Due to the Dencun upgrade, which significantly reduced fees across all L2s, the expected costs and revenue underwent considerable changes. Consequently, we set March 13, 2024, as the cutoff date for our analysis. Most protocols completed their distribution by March 29, 2024, ensuring that our cutoff includes the bulk of the incentive distribution period. The analysis period starts from November 2, 2023, when the first protocol began distributing its allocation.

Results

Below is the monthly sequencer revenue for major L2 networks considered in our analysis. One can see the increase in Arbitrum’s sequencer revenue dominance during STIP.

By zooming out to daily revenue using a 30-day moving average, we can gain a more detailed view of its evolution over time, as shown in the following chart.

Given that most L2 protocols have not been active as long as Arbitrum, we aimed to include more protocols in our modeling of Arbitrum’s revenue by using data starting from August 1, 2023, excluding Blast. The result of the synthetic control compared to Arbitrum’s own revenue is shown in the chart below.

By comparing the synthetic control, which represents the expected sequencer revenue for Arbitrum without the STIP, to the actual sequencer revenue observed during the same period, we can determine the STIP’s impact.

To evaluate the total cumulative impact, we calculate the area under the curve, which totals $15.2M. Comparing this to the total revenue of $35.1M during the period, we conclude that 43% of the revenue is attributable to the STIP. However, the total spent on the STIP was 71M ARB, equating to $85.2M at an average price of $1.2 per ARB.

Main Takeaways

The analysis concluded that 43% of Arbitrum’s revenue between November 2023 and the dencun update was attributable to the STIP. However, while $85.2M was spent on the STIP, only $15.2M was directly recouped through sequencer revenue. It’s important to note that the primary goal of the Arbitrum STIP was to foster the ecosystem. Even though the immediate revenue did not cover the cost of the STIP, there may be other long-lasting positive effects on the network. However, the loss of $60M is considerable and alerts for better preparation of such programs by the DAO in the future.

Given the significant traction other L2s have gained since earlier this year, Arbitrum’s competitive landscape has become more intense. The STIP likely helped maintain Arbitrum’s prominence and market share, particularly influencing new protocols’ decisions on which network to launch and generally where to focus their efforts. Although the STIP boosted Arbitrum’s market presence in the short term, its ability to drive long-term sustainable growth and expand market share is still unclear. Several common themes emerged in both the previous STIP retroactive analysis and our operational analysis, attributing the program’s individual success amongst protocols to various factors, especially the incentive mechanism used. These insights should inform future programs to minimize losses and maximize the effectiveness of ARB spending.

Additionally, evaluating persistent metrics like the number of new users who remained active in the ecosystem after onboarding through the STIP would offer valuable insights for future research.

Methodology

TLDR: We employed an analytical approach known as the Synthetic Control (SC) method. The SC method is a statistical technique utilized to estimate causal effects resulting from binary treatments within observational panel (longitudinal) data. Regarded as a groundbreaking innovation in policy evaluation, this method has garnered significant attention in multiple fields. At its core, the SC method creates an artificial control group by aggregating untreated units in a manner that replicates the characteristics of the treated units before the intervention (treatment). This synthetic control serves as the counterfactual for a treatment unit, with the treatment effect estimate being the disparity between the observed outcome in the post-treatment period and that of the synthetic control. In the context of our analysis, this model incorporates market dynamics by leveraging data from other protocols (untreated units). Thus, changes in market conditions are expected to manifest in the metrics of other protocols, thereby inherently accounting for these external trends and allowing us to explore whether the reactions of the protocols in the analysis differ post-STIP implementation.

To achieve the described goals, we turned to causal inference. Knowing that “association is not causation”, the study of causal inference lies in techniques that try to figure out how to make association be causation. The classic notation of causality analysis revolves around a certain treatment , which doesn’t need to be related to the medical field, but rather is a generalized term used to denote an intervention for which we want to study the effect. We typically consider the treatment intake for unit i, which is 1 if unit i received the treatment and 0 otherwise. Typically there is an , the observed outcome variable for unit i. This is our variable of interest, i.e., we want to understand what the influence of the treatment on this outcome was. The fundamental problem of causal inference is that one can never observe the same unit with and without treatment, so we express this in terms of potential outcomes. We are interested in what would have happened in the case some treatment was taken. It is common to call the potential outcome that happened the factual, and the one that didn’t happen, the counterfactual. We will use the following notation:

- the potential outcome for unit i without treatment

- the potential outcome for the same unit i with the treatment.

With these potential outcomes, we define the individual treatment effect to be . Because of the fundamental problem of causal inference, we will actually never know the individual treatment effect because only one of the potential outcomes is observed.

One technique used to tackle this is Difference-in-Difference (or diff-in-diff). It is commonly used to analyze the effect of macro interventions such as the effect of immigration on unemployment, the effect of law changes in crime rates, but also the impact of marketing campaigns on user engagement. There is always a period before and after the intervention and the goal is to extract the impact of the intervention from a general trend. Let be the potential outcome for treatment D on period T (0 for pre-intervention and 1 for post-intervention). Ideally, we would have the ability to observe the counterfactual and estimate the effect of an intervention as: #0), the causal effect being the outcome in the period post-intervention in the case of a treatment minus the outcome in the same period in the case of no treatment. Naturally, is counterfactual so it can’t be measured. If we take a before and after comparison, -E%5BY(0)%7CD%3D1)#0) we can’t really say anything about the effect of the intervention because there could be other external trends affecting that outcome.

The idea of diff-in-diff is to compare the treated group with an untreated group that didn’t get the intervention by replacing the missing counterfactual as such: . We take the treated unit before the intervention and add a trend component to it, which is estimated using the control . We are basically saying that the treated unit after the intervention, had it not been treated, would look like the treated unit before the treatment plus a growth factor that is the same as the growth of the control.

An important thing to note here is that this method assumes that the trends in the treatment and control are the same. If the growth trend from the treated unit is different from the trend of the control unit, diff-in-diff will be biased. So, instead of trying to find a single untreated unit that is very similar to the treated, we can forge our own as a combination of multiple untreated units, creating a synthetic control.

That is the intuitive idea behind using synthetic control for causal inference. Assuming we have units and unit 1 is affected by an intervention. Units are a collection of untreated units, that we will refer to as the “donor pool”. Our data spans T time periods, with periods before the intervention. For each unit j and each time t, we observe the outcome . We define as the potential outcome without intervention and the potential outcome with intervention. Then, the effect for the treated unit at time t, for is defined as . Here is factual but is not. The challenge lies in estimating .

Source: 15 - Synthetic Control — Causal Inference for the Brave and True

Since the treatment effect is defined for each period, it doesn’t need to be instantaneous, it can accumulate or dissipate. The problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of the treated unit, had it not been treated.

The most straightforward approach is to consider that a combination of units in the donor pool may approximate the characteristics of the treated unit better than any untreated unit alone. So we define the synthetic control as a weighted average of the units in the control pool. Given the weights the synthetic control estimate of is .

We can estimate the optimal weights with OLS like in any typical linear regression. We can minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period. Hence, creating a “fake” unit that resembles the treated unit before the intervention, so we can see how it would behave in the post-intervention period.

In the context of our analysis, this means that we can include all other L2 protocols that did not take part in the STIP in our donor pool and estimate a “fake”, synthetic, control L2 protocol that follows the trend of Arbitrum in the period before receiving the STIP. As mentioned before, the metric of interest chosen for this analysis was sequencer revenue and, in particular, we calculated the 30-day moving average to smooth the data. Then we can compare the behavior of our synthetic control with the factual and estimate the impact of the STIP by taking the difference. We are essentially comparing what would have happened, had the protocol not received the STIP with what actually happened.

However, sometimes regression leads to extrapolation, i.e., values that are outside of the range of our initial data and can possibly not make sense in our context. This happened when estimating our synthetic control, so we constrained the model to do only interpolation. This means we restrict the weights to be positive and sum up to one so that the synthetic control is a convex combination of the units in the donor pool. Hence, the treated unit is projected in the convex hull defined by the untreated unit. This means that there probably won’t be a perfect match of the treated unit in the pre-intervention period and that it can be sparse, as the wall of the convex hull will sometimes be defined only by a few units. This works well because we don’t want to overfit the data. It is understood that we will never be able to know with certainty what would have happened without the intervention, just that under the assumptions we can make statistical conclusions.

Formalizing interpolation, the synthetic control is still defined in the same way by . But now we use the weights that minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period , subject to the restriction that are positive and sum to one.

We get the optimal weights using quadratic programming optimization with the described constraints on the pre-STIP period and then use these weights to calculate the synthetic control for the total duration of time we are interested in. We initialized the optimization for each analysis with different starting weight vectors to avoid introducing bias in the model and getting stuck in local minima. We selected the one that minimized the square difference in the pre-intervention period.

Below is the resulting chart for Arbitrum, showing the factual sequencer revenue observed in Arbitrum and the synthetic control.

With the synthetic control, we can then estimate the effect of the STIP as the gap between the factual revenue and the synthetic control, .

To understand whether the result is statistically significant and not just a possible result we got due to randomness, we use the idea of Fisher’s Exact Test. We permute the treated and control units exhaustively by, for each unit, pretending it is the treated one while the others are the control. We create one synthetic control and effect estimates for each protocol, pretending that the STIP was given to another protocol to calculate the estimated impact for this treatment that didn’t happen. If the impact in the protocol of interest is sufficiently larger when compared to the other fake treatments (“placebos”), we can say our result is statistically significant and there is indeed an observable impact of the STIP on the Arbitrum’s sequencer revenue. The idea is that if there was no STIP in the other protocols and we used the same model to pretend that there was, we wouldn’t see any impact.

References

“Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.”

Aayush Agrawal - Causal inference with Synthetic Control using Python and SparseSC

01 - Introduction To Causality — Causal Inference for the Brave and True

1 Like

TBV Research

The below report is available in document format here

Introduction

On Jun 3, 2024, cupojoseph proposed a pilot stage for funding R&D for the implementation and execution of operating a treasury-backed vault (TBV). He suggests that leveraging a TBV and issuing a stablecoin against ARB (and other treasury assets) in the vault could be a more effective funding mechanism for grants and other initiatives–noting leveraging ARB to issue stablecoins could relieve ARB selling pressure.

After back and forth discourse amongst the community, both in favor and against a DAO-owned TBV, members of the DAO (Entropy and L2Beat) expressed that funding R&D for this proposal is redundant given the ARDC. Hence, Blockworks Research and Chaos Labs were assigned the task of outlining the operational structure and performing a risk analysis of operating a TBV.

Summary

In collaboration with Chaos Labs, we highlight operational requirements and suggest R&D, an oversight committee, and a risk management committee are essential pieces for effectively operating a CDP, or TBV. We also consider the operational setup, and compare the high level trade-offs between a simple framework and a more complex one.

Alongside the operational setup, we provide a risk analysis on leveraging ARB in a DAO-owned CDP, offer recommendations for how to properly set the relevant parameters and manage the risk, and share our recommendation and interpretations.

We ultimately conclude operating and managing a CDP or TBV using ARB is highly capital inefficient (especially taking into consideration the DAO’s past spending) and is not worth the risk/reward. This is due to the highly volatile nature of ARB, lack of available liquidity, and the sheer amount of capital that is required to adequately support the DAO’s historical spending behaviors. However, it is worth revisiting leveraging ARB in a debt position once market conditions and ARB liquidity have considerably matured.

That said, this preliminary work is sufficient for the DAO to gauge interest in this direction, and with an updated perspective on the nature of operating a CDP, we encourage members of the community and delegates to carefully review and open further discussions. If necessary, further research can, for example, outline the risks associated with managing a TBV, specifically, at different scales.

Overview of CDPs and TBVs

For educational purposes, here is an overview of existing protocols that the DAO could utilize. While some of these protocols are not active in the Arbitrum ecosystem, or are active in select parts, we recommend reviewing these CDP mechanisms to gain a more thorough understanding.

Maker DAO and DAI

Maker’s main product is DAI, a stablecoin that’s pegged to the U.S. Dollar. DAI’s supply curve moves through a credit manager set of smart contracts on Ethereum. This creates a mechanism by which actors are incentivized to adjust the supply curve to keep the peg price at $1 as market conditions and DAI demand fluctuate. Additionally, Maker allows users to mint DAI through the Maker Vaults, where users borrow DAI and lock up their assets as collateral, repaying the accrued fees on a later date.

Maker uses several risk parameters for each collateral type:

  • Liquidation ratio: the minimum ratio of collateral value to debt per Vault (usually overcollateralized).
  • Debt ceiling: the max amount of DAI generated for a collateral type.
  • Stability fee: the fee that accrues on debt in a vault on an annual basis (ex: 5% per year)

Maker’s mechanisms to maintain solvency are:

  • DAI issuance is influenced by stability fees, DAI savings rate, debt ceiling adjustments, and the Peg Stability Module.
  • Peg Stability Module: The Peg Stability Module (PSM) maintains the stability of DAI by enabling users to swap collateral tokens directly for DAI. Users can swap other stables for DAI which helps keep DAI fixed to the value of $1. Importantly, the amount of trades the PSM can offer is limited by the debt ceiling of deposited collateral. Thus, upon crossing the debt threshold, the PSM cannot trade in stables for DAI.
  • Liquidation: The closer the collateral value of a vault is to the verge of debt, the more risk the vault takes on. Thus the system liquidates overly risky vaults, and liquidations are performed through a gradual dutch auction.
  • MKR Mint/Burn: If bad debt hits a certain threshold, a Flapper/Flopper smart contract burns, mints, and sells MKR in order to recap the MakerDAO protocol in times of insolvency.

All interactions with vaults are done through the DssCdpManager (CDP Manager) contract. Vaults may also be transferred between owners using this same contract.

Curve and crvUSD

While crvUSD is not live on Arbitrum as of now, Curve is a potential option should crvUSD expand to the Arbitrum network.

In the typical hard liquidation, when a borrower’s loan becomes undercollateralized, their position is liquidated and converted. Curve’s lending-liquidating AMM algorithm (LLAMMA) is able to soft liquidate, where it gradually converts the collateral into stablecoins as the value of the collateral decreases, preserving some of the borrower’s original capital.

Curve only offers overcollateralized loans, with the loan-to-value ratio dependent on the risk of the collateral. Thus, depending on the collateral, there exists a set of bands for the liquidation range for which soft liquidations can occur. Unlike other AMMs, the price offered by Curve’s LLAMMA is not reliant on the balance of assets in the pool, rather an external price oracle. As a result, if the oracle price increases/decreases by 1% then the LLAMMA price will increase/decrease by at least 1%.

Curve’s stablecoin, crvUSD, is pegged through a basket of fiat stables, with stability pools and peg keepers to further uphold the price at $1. The stability pool for crvUSD holds the largest fiat-backed stablecoins, USDC, USDT, USDP, and TUSD. The Peg Keepers are similar to the AMO concept used by Frax Finance. They’re smart contracts that perform algorithmic market operations to fix the price of crvUSD to its peg. More specifically, peg keepers are assigned to stability pools and can perform market operations under certain circumstances. For example, if and only if the price of crvUSD is above $1 in a stability pool, will the peg keeper be able to mint and deposit crvUSD into the pool. In this scenario, adding single sided crvUSD when the price is above $1 puts downward pressure on the price, moving it back toward the target. The reverse can be said for when the price of crvUSD is below the $1 threshold.

Finally, the other important detail to understand about Curve’s lending mechanisms is its variable interest rate. Borrowers pay a variable interest rate to the protocol based on their total outstanding debt. The interest rate is dependent on the current crvUSD price and the shape of the interest rate curve using risk parameters, and the activity of peg keepers (debt ratio and target debt).

Aave and GHO

Aave’s GHO stablecoin is a CDP style protocol similar to Curve and crvUSD. It is a system of lending pools where users’ deposits are funneled into a liquidity pool. Borrowers may access these pools when looking for a loan. Facilitators (Aave), can mint and burn GHO tokens, each is assigned a ‘bucket’ with a capacity representing the maximum amount they can generate. The total available supply of GHO is calculated based on the capacities of all Facilitators, ensuring the system remains overcollateralized. New facilitators are approved by Aave DAO to ensure that they operate within parameters.

To maintain its peg and system stability, GHO has stabilization mechanisms controlled by Aave DAO–managing borrow rates, collateral requirements, and other parameters for minting and burning GHO. There is also a discount model that gives discounted borrowing rates to stkAAVE holders.

Furthermore, Aave offers other features for risk management: isolation mode, siloed assets, DAO-elected permissioned roles, and then traditional features like supply and borrow caps. Isolation mode is a feature limiting risky assets. Assets in isolation mode may only receive certain types of stablecoins.

Open Dollar and OD

Open Dollar is a stablecoin CDP protocol designed for non-fungible vaults for users to stake collateral and borrow Open Dollar’s OD stablecoin. As an Arbitrum native protocol, Open Dollar strives to make OD Arbitrum’s native stablecoin.

Similar to Aave and Maker, Open Dollar is also controlled by a governance entity and the ODG token. The OD stablecoin is overcollateralized and uses a similar managed float regime mechanism to Maker. This means that the OD/USD exchange rate is determined by supply and demand, and the protocol tries to restabilize by either devaluation or revaluation. Notably it accepts fewer forms of collateral (only ARB and ETH variants) compared to Maker and AAVE.

Some of the listed CDP protocols have been battle-tested through new deployments, making them worth considering for exploration. Additionally, one of the listed protocols serves as an alternative mentioned in the original proposal.

These are very brief overviews of each protocol, to give the DAO a sense for the type of protocol it would use to leverage ARB in a TBV. Further research work would be required to investigate which protocol is most suitable for the DAO. This would likely include working closely with candidates for the relevant committees outlined below.

In order to effectively manage a CDP or TBV, the DAO would have to fund the relevant committees/roles responsible for tasks such as developing the necessary set of smart contracts for facilitating a debt position and maintaining the communication, health, and security of the DAO’s position. These responsibilities primarily include (but are not limited to) an R&D committee, an oversight committee, and a risk committee.

Relevant Committees

In this section, we’ll outline roles and responsibilities that are necessary for effectively managing a CDP/TBV. While we are not suggesting the DAO needs to create three separate committees, it is important to note the essential parts of executing this initiative, even if it means one entity is capable of serving two roles.

R&D: Oversee the strategic and technical development of the CDP/TBV, and work closely with the DAO, other committees, or protocols to ensure the efficacy of the CDP/TBV’s strategy, management, and implementation. This includes and is not limited to:

  • Customizing relevant smart contracts to fit the needs and standards of the DAO.
  • ​​Conducting security audits of smart contracts and the overall system
  • Maintaining technical implementation and pushing updates when necessary
  • Ensuring compatibility with DAO operations (e.g. outlining operational procedures for using a safety buffer to top up debt positions with treasury ARB)
  • Creating a dashboard or other visual representation for viewing important metrics, such as monitoring collateral and transaction history
  • Closely collaborating with other committees or protocols to lead further development of necessary contracts, tools, and strategies that enhance the position
  • Performing due diligence and reviewing applications that request funding from the treasury

Oversight Committee: The backbone of treasury operations for approving, sending, and receiving funds to and from the treasury and the CDP/TBV, overseeing the day-to-day operations of the treasury, implementing best security practices, and executing strategic decisions for capital in the treasury. This includes and is not limited to:

  • Implementing best security practices between the DAO’s treasury and CDP/TBV management such as procedures for responding to emergency incidents or unforeseen vulnerabilities
  • Delivering operational procedures for maintaining accountability (e.g. perform monthly/quarterly performance reviews)
  • Maintaining consistent communication with the risk committee, the DAO, and relevant protocols regarding adjustments to relevant parameters of the DAO’s position and other activities related to the debt position
  • Pre-approve transactions and privileges for CDP/TBV manager
  • Relaying accurate and timely reporting of the TBV to the rest of the community on a weekly/monthly basis.
  • Maintaining transparent dashboards that display the DAO’s position in real-time
  • Collaborating with R&D committee for issuing stablecoins from the TBV
  • Monitoring debt and repayment from funded projects and enforcing penalties if needed

Risk Management: Manage the CDP/TBV position and set risk parameters to ensure solvency. In addition to the below, in order to achieve incentive alignment between the DAO and the risk committee, the DAO and relevant committees would have to determine what responsibilities and/or privileges are delegated to the risk committee. It is possible the functions described here fall within the scope of the R&D and/or a treasury manager’s role. Regardless, since a treasury management role has not been formalized, we felt necessary describing the risk portion of the vault. This includes and is not limited to:

  • Clear communication on how the stablecoin works and its benefits
  • Conducting ongoing CDP/TVB research, identifying potential risks, developing risk assessment frameworks and models of the economic tradeoff space, recommending the optimal strategy/implementation, and providing regular analysis and updates on the state of the market and the position
  • Working closely with existing protocols and their risk teams to determine parameters for levering ARB in a CDP/TBV
  • Continuously monitoring market conditions, protocol health, and ARB volatility/liquidity to gauge adjustments to the CDP/TBV and maintain solvency
  • Updating over-collateralization ratios, liquidation triggers, interest rates, and other relevant parameters and operations according to the Oversight and R&D committees’ recommendations
  • Ensuring liquidation mechanisms are operational and emergency shutdown procedures are effective
  • Closely communicating with other committees, the DAO, and relevant protocols
  • Sending and receiving funds to and from the DAO’s treasury
  • Around the clock maintenance and management

Operational Structure

Taking inspiration from other DAOs, depending on the scale of the position, each committee is necessary for effectively managing a CDP or TBV; however, the operational setup can differ. For example, the CDP could be a multisig that is maintained by each committee. This is a simple setup in which multiple parties sign transactions, and it affords less interactions between the DAO and the position. However, requiring multiple signatures to approve transactions decreases the autonomy for the risk manager to act swiftly, if needed.

The framework we recommend is inspired by Karpatkey’s treasury management proposal and also Lido’s treasury management committee proposed by Steakhouse Financial. In Karpatkey’s proposal, the DAO retains full custody of the vault’s position and can withdraw and deposit through a governance process. In this setup, a trusted, experienced entity is appointed to manage the CDP, or TBV, giving that entity more autonomy. To restrict a malicious manager from carrying out an attack and to incentivize performance, the setup also involves an oversight committee that establishes pre-approved transactions and is generally responsible for ensuring the vault manager is achieving their mandate. Both proposals also utilize onchain tools, essentially creating an environment that is tailored for their respective strategies and use cases, albeit given more complexity.

While this operational structure can be considered more complex and costly than simply deploying a multisig, it clearly defines roles, allows for each party to specialize on their respective tasks, and grants more autonomy for each party to operate efficiently.

Lastly, the DAO would have to work with relevant committees to establish an amount that is allocated to a safety module, which the vault manager can draw from when needed.

We’ve outlined high level tradeoffs between a very basic structure and more professional structures. If further progress is made toward implementing a position, potential candidates–that have track records servicing these roles and operational structures in practice–can provide their frameworks. This would allow the DAO to compare specific details about the operational structures from experienced candidates.

Additional work

Primarily, if the DAO is borrowing stablecoins against ARB and distributing those funds to teams, those teams would presumably sell the stablecoins to fund their respective initiatives. With this in mind, additional work for the DAO entails (1) developing a strategy for paying down the stablecoin debt.

Let’s say, a viable strategy is using sequencer revenues. Therefore, the DAO or its relevant committees could (2) establish relevant KPIs for funded projects, such as generating a certain amount of sequencer revenues. Intuitively, this model creates a framework for how to approach funding projects, because it is in the DAO’s interest to only fund projects that can generate sequencer revenues and thus pay down the vault’s debt.

Other work may involve:

  • Performing more research and another risk analysis on the potential stablecoin protocols used
  • Reach out to existing protocols and collaborate on setting up a position specifically for the DAO
  • Ensuring that the DAO has robust governance mechanisms in place to make adjustments and respond to market changes.
  • Create and maintain governance policies for handling extreme market conditions and emergencies
  • Implement insurance funds or hedging strategies to cover potential losses
  • Explore introducing less volatile assets to TBV such as ETH or wstETH

Risk Analysis

View the full risk analysis here.

In conjunction with Chaos Labs, we analyzed ARB and its market capitalization, trading volume, historical volatility, and historical liquidity to derive key parameters and thresholds for using ARB in a CDP. We also analyzed the ARB/USD oracle because price updates are crucial for maintaining a healthy vault.

We recommend an initial liquidation threshold (i.e. where a CDP loan becomes liquidatable) at 66% of the dollar value of ARB supplied. We also recommend a 10% liquidation bonus awarded to incentivize liquidators to avoid undercollateralized positions and to ensure that the CDP protocol avoids bad debt.

We recommend an initial loan of 25% of the value of ARB supplied to limit the liability of the DAO to supply more ARB collateral should the value drop. We also provide a more aggressive approach of taking an initial loan of 44% of the value of ARB supplied. In either case, we would need to set processes in place to manage the increased risk of getting the loan liquidated, and at a much less favorable ARB price in the latter approach.

We recommend an initial debt ceiling of $12m. This means that the DAO could borrow up to $12m safely against a minimum of $48m in ARB collateral at the outset.

We also recommend the DAO establish a ring-fenced safety buffer in ARB to cover any price drops in excess of what is forecast. More work is needed here to arrive at how much the DAO would need to allocate to a safety buffer fund. However, it is worth noting that the DAO should weigh the opportunity costs of deploying a safety buffer (i.e. ARB sitting in a safety buffer, in addition to ARB as collateral in a vault, is not being used for other initiatives).

Recommendations and Interpretations

For now the ARDC covers parts of the R&D committee–it does not develop or manage smart contracts–but given the current term is over in September, the DAO will need to revisit funding another ARDC term, or discuss contracting an entity that serves a similar role or is solely responsible for servicing a CDP or TBV.

As far as the other roles, the oversight committee is the organization responsible for overseeing the manager of the vault, and the risk management committee is an entity responsible for managing the risks of the TBV including updating parameters and topping up the debt position when necessary. Note: it is possible the risk management committee is also the treasury manager.

In terms of committee assignments and therefore initializing a CDP or TBV, we think the DAO should first take into consideration other initiatives to avoid introducing potential operational redundancies. For example, the Arbitrum Ventures Initiative is a WIP that would establish a fund structure for the DAO within which an oversight committee is appointed and verticals for allocating funds to specific strategies, such as operating a treasury or a TBV, are established. There is also an open proposal by Karpatkey and discussions about creating a Arbitrum Strategic Treasury Management Group and a DAO Oversight Committee. In short, we believe structuring the broader framework and dedicating attention to these conversations takes precedence, because they would establish a formal foundation on top of which the DAO can install a TBV. They would provide closure on who is servicing which roles and what additional committees (if any) are necessary.

If the DAO wishes to proceed toward a TBV, we recommend opening applications for participants who want to service the committees. This would allow candidates to outline their previous experience and framework. The proposal should also outline the scale of the initiative and define relevant KPIs to measure performance and ROI. From here, the DAO can narrow its focus for which protocol we’d use for even further R&D.

In terms of the risks, given the aforementioned recommendations, it is clear that borrowing stablecoins against an ARB position is extremely risky and would not satisfy a majority of the DAO’s annual spend.

From 2023 through 2024, the DAO spent 433M ARB–this is about 25x greater than the recommended initial loan amount. Secondly, using the conservative settings from the analysis, if we apply the previous year’s spending, then this means the DAO would have to seed the position with roughly 1.6B ARB (~$1.08B) to fund past initiatives. This is approximately half of the ARB in the treasury today, and the DAO would only be utilizing 25% of that value. At this scale, we can observe the highly capital inefficient use of ARB in a CDP or TBV.

Even at a much lower annualized spend by the DAO, we still view this as an inefficient use of capital, especially considering the opportunity cost of funding other projects that don’t carry as much inherent risk. In other words, we do not feel confident the reward for leveraging ARB clearly outweighs the risks (and associated costs) of operationalizing a DAO-owned CDP or TBV.

In terms of next steps, as mentioned above, we believe establishing a broader framework (perhaps related to the AVI), or making progress on installing a legitimate treasury management framework, takes precedence. Once this structure is set, then the DAO and the relevant committees will be in a better position to reconsider the prospects of activating a treasury-owned CDP.

Final Thoughts

All in all, while operating a TBV is worth considering, especially as the DAO and underlying ecosystem matures, it introduces another plane of risks and costs that feel unnecessary given where the protocol is in its life cycle. Instead, we feel confident the DAO should focus on growth and prioritizing initiatives that directly expand the use cases for the Arbitrum stack. In the grand scheme of rollups, it is still really early, and it is not clear what fundamentals drive the performance of rollup ecosystems and their underlying governance tokens.

For Arbitrum, there are a myriad of developments and catalysts on the horizon (such as Timeboost, decentralized Timeboost, blob bidding strategies, economic policies for L3s, ecosystem funds, and more) that should provide more context to identify strong business models and therefore perform the economic analysis for sustaining revenue and growth. Moreover, while the ARB price is an important feature of the DAO and protocol, it is a byproduct of success–not the driver–and we do not think initiatives directly addressing ARB price action properly prioritizes fundamental value drivers of the protocol at this moment.

3 Likes

Blockworks Research ARDC Deliverable Summaries

Short-form Case Study – GMX

  • Conclusions: The study suggests that future incentive programs should require stronger justifications, growth driver analyses, and consistent standards for handling conflicts of interest.

Short-form Case Study – JOJO

  • Conclusions: The case highlights the need for continuous monitoring, stricter grantee accountability, and better-defined incentive program structures to avoid unrealistic promises and ensure adherence to rules.

STIP-Bridge – Support Material for the Community

  • Conclusions:
    • Some protocols had high sybil ratios, indicating potential misuse of incentives.
    • Multiple projects displayed questionable practices in how they utilized ARB incentives
    • Many protocols saw sharp declines in usage post-incentives. Reporting quality varied, with some projects failing to provide transparent updates.
    • Several projects are adjusting their incentive mechanisms for future rounds, aiming for more targeted and sustainable growth, while others have continued with similar structures despite mixed results.
    • Identified misuse of funds per STIP rules by Synapse, who eventually returned 750k ARB back to the DAO in addition to withdrawing their 950k ARB STIP Bridge request. Ultimately saved the DAO 1.7M ARB (~$1.4M at that time)
    • Identified misuse of STIP funds by Furocumbo, which led to Arbitrum Foundation seeking the banning of the team of receiving any future incentives

STIP-Bridge (Extended Deadline Applicants) – Support Material for the Community

  • Additional support material for the community to use when reviewing STIP Bridge applicants who applied after the initial deadline of May 3, 2024 and have thus been automatically put up for a challenge vote on Snapshot.

STIP Retroactive Analysis – Perp DEX Volume

  • Conclusions:
    • STIP impacted trading volumes differently across perpetual DEXs. Vertex and GMX saw significant volume boosts, with Vertex benefiting most per dollar spent.
    • The analysis suggests that traditional fee rebates, as used by Vertex, GMX, and MUX Protocol, were more effective than gamified incentives employed by Gains Network and Vela.

STIP Retroactive Analysis – Spot DEX TVL

  • Conclusions:
    • STIP in H2 2023 significantly impacted TVL in spot DEXs, with WOOFi and Camelot seeing the highest gains.
    • Balancer and Trader Joe also benefited but with a more modest impact.
    • WOOFi excelled in liquidity-specific incentives despite its underperformance in overall TVL impact.

STIP Analysis of Operations and Incentive Mechanisms

  • Conclusions:
    • STIP was likely a reason for why Arbitrum’s market share (with respect to certain metrics) didn’t diminish with new competition popping up.
    • Major mechanisms included liquidity incentives (~30%) and fee rebates (~25%).
    • Most protocols saw initial growth during STIP, but metrics often reverted to pre-STIP levels afterward.
    • STIP initially boosted Arbitrum’s market share in most major DeFi metrics, but gains largely reverted post-program.
    • Market ultimately gravitated towards a baseline cost of capital, meaning future incentives should target ecosystem-wide yield targets
    • Protocol incentives should be a function of vertical-specific metrics and not be so open-ended

STIP Retroactive Analysis – Yield Aggregators TVL

  • Conclusions:

  • Targeting LPs directly and uniformly can be effective in boosting TVL.

  • Flexibility in incentive strategies, as shown by Umami’s pivot to direct ARB emissions, can enhance outcomes.

  • A more proportional distribution of incentives may lead to higher efficiency, as evidenced by Solv Protocol’s success.

STIP Retroactive Analysis – Sequencer Revenue

  • Conclusions:
    • Statistically attributed STIP to 43% of Arbitrum’s revenue between November 2023 and the Dencun upgrade, amounting to $15.2M in sequencer revenue against $85.2M spent on incentives.
    • Despite its positive short-term impact, the program resulted in a $60M net loss, highlighting the need for more effective future incentive programs and more attention to long-term sustainability and growth.
    • Future programs should focus on better-structured incentives and persistent user metrics to maximize ARB spending.
3 Likes
  1. Am I right in understanding that the overall conclusion is that STIP was a failed program?

  2. Should the sequencer get as much profit as it spent on incentives?