ARDC Research Deliverables

Short-form Case Study – GMX

As requested by the DAO advocate for the ARDC, L2BEAT, Blockworks Research has begun conducting case studies on STIP recipients. Since perpetual futures-related projects, as categorized in the dashboard created by OpenBlock Labs, accounted for 27.45M ARB (~39% of the total STIP incentives), the first case study is based on GMX, the project receiving the largest STIP allocation at 12M ARB. This case study focuses on the grantee’s application and reporting structure, incentive mechanisms utilized, and the sustainability of activity induced by incentives.

This is part of a wider, in-depth analysis of the overall STIP process.

6 Likes

Short-form Case Study – JOJO

The second case study presented here concerns JOJO, another perp DEX that was given a grant allocation of up to 200K ARB, making it the smallest recipient within the perpetuals category. Similar to the first case study, the research conducted focuses on the grantee’s application and reporting structure, incentive mechanisms utilized, and the sustainability of activity induced by incentives.

2 Likes

STIP-Bridge – Support Material for the Community

To help community members and delegates form a holistic view of STIP-Bridge applicants that aren’t automatically required to go through a challenge vote on Snapshot (i.e., projects that have applied for a STIP-Bridge allocation before the initial deadline of May 3rd, 2024), Blockworks Research, acting as one of the Research Members for the ARDC, is sharing a workbook that includes qualitative data on the aforementioned applicants’ utilized incentive structures, operational approaches, reporting standards, notable protocol changes, etc. This data has primarily been collected from protocols’ original STIP applications, bi-weekly updates, final reports, and STIP-Bridge addendums. We’ve also included some relevant commentary, as well as any possible red flags and minimal/open-to-interpretation rule deviations we’ve encountered. This qualitative data is meant to be used as a supplementary tool to performance data (see OpenBlock Labs’ STIP dashboard), ideally assisting the community in forming opinions on STIP-Bridge applicants’ performance during and after the STIP.

Moreover, as instructed by the DAO Advocate, L2BEAT, the workbook also includes a sheet with summaries of our findings, accompanied by a color code representing whether or not the community might want to inspect further a protocol’s performance, incentive mechanisms, or program-related operations.

9 Likes

STIP-Bridge (Extended Deadline Applicants) – Support Material for the Community

Building on our earlier STIP-Bridge work, Blockworks Research is releasing another workbook that includes Bridge applicants who applied after the initial deadline of May 3, 2024 and have thus been automatically put up for a challenge vote on Snapshot. The data sources and methodology used are similar to those in our previous analysis posted above.

2 Likes

STIP Retroactive Analysis – Perp DEX Volume

The below research report is also available in document format here.

TL;DR

In H2 2023, Arbitrum launched the Short-Term Incentive Program (STIP) by distributing millions of ARB tokens to various protocols to drive user engagement. This report focuses on how the STIP impacted trading volumes in the perp DEX vertical, specifically examining the performance of Vertex, Gains Network, GMX, MUX Protocol, Vela Exchange, Perennial, and Jojo Exchange. By employing the Synthetic Control (SC) causal inference method to create a “synthetic” control group, we aimed to isolate the STIP’s effect from broader market trends. For an in-depth explanation of the utilized inference method and data, see the Methodology and Annex sections at the end of this report.

Our analysis yielded varied results: Vertex saw a significant 70% of its total trading volume in the analyzed period attributed to STIP, while GMX saw 43%. MUX Protocol also benefited, with 15% of its volume linked to STIP incentives. In contrast, our model predicts that Gains Network experienced 5% less volume than if there had been no STIP, and Vela Exchange showed no statistically significant impact. These outcomes seem to highlight that mainly utilizing traditional fee rebates, as done by Vertex, GMX, and MUX, was more effective in driving volume growth than the gamified incentives used by Gains Network and Vela Exchange.

Our methodology and detailed results underscore the complexities of measuring such interventions in a volatile market, stressing the importance of comparative analysis to understand the true impact of incentive programs like STIP.

Context and Goals

In H2 2023, Arbitrum initiated a significant undertaking by distributing millions of ARB tokens to protocols as part of the Short-Term Incentive Program (STIP), aiming to spur user engagement. This program allocated varying amounts to diverse protocols across different verticals. Our objective is to gauge the efficacy of these recipient protocols in leveraging their STIP allocations to boost the usage of their products. The challenge lies in accurately gauging the impact of the STIP amidst a backdrop of various factors, including broader market conditions.

This report pertains to the perp DEX vertical in particular. In this vertical, the STIP recipients were GMX, Gains Network, Vertex, MUX Protocol, Vela Exchange, Perennial, and Jojo Exchange. The following table summarizes the amount of ARB tokens received and when they were used by each protocol.

To assess the impact of the STIP on perp DEX protocols, daily trading volume serves as a crucial metric. Despite varying approaches to liquidity—such as synthetic AMMs versus orderbook liquidity—the true measure of a protocol’s success lies in the volume traded on its platform.

We used a Causal Inference method called Synthetic Control (SC) to analyze our data. This technique helps us understand the effects of a specific event by comparing our variable of interest to a “synthetic” control group. Here’s a short breakdown:

  • Purpose: SC estimates the impact of a particular event or intervention using data over time.
  • How It Works: It creates a fake control group by combining data from similar but unaffected groups. This synthetic group mirrors the affected group before the event.
  • Why It Matters: By comparing the real outcomes with this synthetic control, we can see the isolated effect of the event.

In our analysis, we use data from other protocols to account for market trends. This way, we can better understand how protocols react to changes, like the implementation of the STIP, by comparing their performance against these market-influenced synthetic controls. The results pertain to the period from the start of each protocol’s use of the STIP until March 1st. Therefore, this analysis centers on the effectiveness of various incentive structures, rather than the sustainability of activity. The final date was chosen to keep the statistical significance level at 90%, as further described in the Methodology section.

Jojo Exchange was excluded from the analysis due to insufficient reliable data and an apparent pivot to Base. Blockworks Research has conducted a separate case study on Jojo, available on the governance forum. Perennial has also been excluded due to its v2 launch coinciding with the start of the STIP program, and the methodology we employed requires a comparable dataset from before the STIP’s implementation to accurately gauge its impact.

Results

Vertex

The total estimated impact of the STIP from November 8th, 2023 until March 1st, 2024 on Vertex’s daily volume is $32.8B. Vertex’s total volume in this period was $46.9B, so according to our analysis, 70% of the total volume can be attributed to the STIP. Since Vertex received a total of 3M ARB, valued at around $3.6M (at 1.2$ per ARB), this means that the STIP created $9111.67 in volume per dollar spent for the 115 days analyzed.

Gains Network

The total estimated impact of the STIP from December 29th, 2023 until March 1st, 2024 on Gains Network’s daily volume is minus $0.3B. Gains Network’s total volume in this period is $7.1B, so according to our analysis, a loss of 5% of total volume can be attributed to the STIP. Since Gains Network received a total of 4.5M ARB valued at around $5.4M (at 1.2$ per ARB), this means that the STIP caused a loss of $63.52 in volume per dollar spent for the 64 days analyzed.

GMX

The total estimated impact of the STIP from November 8th, 2023 until March 1st, 2024 on GMX’s daily volume is $10.5B. GMX’s total volume in this period is $24.4B, so according to our analysis, 43% of the total volume can be attributed to the STIP. Since GMX received a total of 12M ARB valued at around $14.4M (at 1.2$ per ARB), this means that the STIP created $731.67 in volume per dollar spent for the 115 days analyzed.

MUX Protocol

The total estimated impact of the STIP from November 16th, 2023 until March 1st, 2024 on MUX Protocol’s daily volume is $2.2B. MUX Protocol’s total volume in this period is $15.2B, so according to our analysis, 15% of the total volume can be attributed to the STIP. Since MUX Protocol received a total of 6M ARB valued at around $7.2M (at 1.2$ per ARB), this means that the STIP created $308.33 in volume per dollar spent for the 107 days analyzed.

Vela Exchange

The total estimated impact of the STIP from December 27th, 2023 until March 1st, 2024 on Vela Exchange’s daily volume was minus $456M, but this impact was not deemed statistically significant. So effectively, the analysis did not find the STIP to have any significant impact, positive or negative, on Vela Exchange’s daily volume. In this context, it means that the observed impact of the STIP on the protocol’s daily volume could simply be due to random fluctuations rather than a real effect.

Main Takeaways

Our analysis, conducted with a 90% significance level, produced interesting results for the impact of the STIP on Vertex, Gains Network, GMX, and MUX Protocol. The analysis deemed the impact on Vela Exchange to not be statistically significant, which means that we couldn’t confidently say that the STIP caused a noticeable change in the protocol’s daily volume, but rather the variations we see could just be due to market behavior. A further explanation of how this and all results were derived can be found in the Methodology and Annex sections.

To interpret and understand the results, it is important to have an overview of the incentive mechanisms utilized by the different protocols.

GMX utilized their STIP ARB incentives by focusing primarily on maximizing TVL to ensure adequate liquidity, which is crucial for providing a good trading experience given their AMM design. Additionally, they also reduced trading fees on the decentralized perpetual exchange to levels comparable to the VIP tiers of leading centralized exchanges. Traders on GMX v2 benefited from a rebate of up to 75% on open and close fees, thanks to the STIP incentives, attracting users with a minimal entry fee of 0.015%. In total, 4,984,768.84 ARB was distributed as trading incentives, with the remaining incentives including liquidity and grants incentives. To further boost engagement, GMX ran a two-week trading competition to attract new traders to the V2 platform, though they do not plan to use bridge incentives for future competitions.

Vertex focused their STIP ARB incentives on KPIs such as monthly trading volume, monthly active users, on-chain activity, and TVL. Their first round of incentives targeted two main areas: trading rewards and Elixir LP Pools. In total, 3 million ARB was allocated across 16 weekly epochs, with 2.55 million ARB dedicated to Vertex trading incentives and 450,000 ARB to Elixir liquidity incentives. Additionally, Vertex matched the STIP with a rewards program, offering dual incentives for trading with approximately 10 million VRTX tokens allocated to each epoch. The data indicates that providing trader rebates significantly boosts on-chain usage of perpetuals.

During the STIP period, Gains Network implemented quite different incentive streams through a points system, mainly rewarding traders for behaviors such as fees paid, absolute PnL, loyalty, and relative PnL. These rewards were distributed weekly, with different allocations for each category. While this gamification attracted engagement, it also led to sybil attempts, especially in the relative PnL category, where actors tried to game the system with delta-neutral positions to extract ARB from the reward pools. Consequently, the relative PnL category was dropped during the program. For the STIP campaign, Gains Network allocated 85% of incentives to trading and 15% to LP incentives, distributing a total of 3.825 million tokens in trading incentives. Additionally, Gains Network provided a partial match of 65,000 GNS tokens to LP incentives, to further boost the incentive program.

MUX Protocol offered rebates of up to 100% on open and close fees for all integrated protocols on the MUX aggregator. This strategy aimed to aggressively onboard more traders to Arbitrum. The total STIP amount was used in the Rebate Program, where traders who opened and closed positions through the MUX Aggregator received weekly ARB token rebates for fees incurred on MUX, GMX V1, GMX V2, and Gains positions on Arbitrum.

Vela Exchange ran a gamified trading competition, the Grand Prix. Throughout the Grand Prix, users competed in five themed rounds, each offering new challenges and opportunities to earn credits, the event’s currency. Liquidity providers and yield farmers benefited from limited-time events where their contributions to VLP minting earned them greater credit multipliers. The grant breakdown for the STIP campaign included 150,000 ARB for multi-chain and fiat onboarding, 500,000 ARB for developing social features and trading leagues, and 350,000 ARB for VLP vault rewards. To prevent wash trading, incentives in the trading leagues were capped based on fees earned and focused on PnL, with volume playing a secondary role.

Vertex saw its daily volume impacted the most compared to the other perp DEXs analyzed. Our analysis attributes 70% of the project’s total volume to the STIP while having received a relatively low amount of ARB tokens. Vertex is followed by GMX, where 43% of the total volume was attributed to the STIP. GMX, however, received the largest amount of ARB tokens of any other protocol so the added volume per dollar spent is naturally smaller. The negative impact of the STIP on Gains Network could be attributed to the incentive mechanism having involved social trading competitions instead of rebating fees and direct rewards. Both Gains Network and Vela Exchange implemented gamified points systems instead of traditional trading rewards and fee rebates, however, according to our analysis, these strategies were less effective in boosting volume beyond the general market trend.

Having said that, it’s essential to acknowledge the limitations inherent in our models, which are only as reliable as the data available. Numerous factors can drastically influence outcomes, making it challenging to isolate the effects of a single intervention. This is particularly true and disproportionate in the crypto industry. Other relevant secondary factors possibly contributing to the differing results among perp DEXs can include traders’ mercenary activity and the cannibalization of trading volume.

Given these complexities, our results should be interpreted comparatively rather than absolutely. The SC methodology was uniformly applied across all protocols, allowing us to gauge the relative efficacy of the STIP allocation.

Appendix

Methodology

TL;DR: We employed an analytical approach known as the Synthetic Control (SC) method. The SC method is a statistical technique utilized to estimate causal effects resulting from binary treatments within observational panel (longitudinal) data. Regarded as a groundbreaking innovation in policy evaluation, this method has garnered significant attention in multiple fields. At its core, the SC method creates an artificial control group by aggregating untreated units in a manner that replicates the characteristics of the treated units before the intervention (treatment). This synthetic control serves as the counterfactual for a treatment unit, with the treatment effect estimate being the disparity between the observed outcome in the post-treatment period and that of the synthetic control. In the context of our analysis, this model incorporates market dynamics by leveraging data from other protocols (untreated units). Thus, changes in market conditions are expected to manifest in the metrics of other protocols, thereby inherently accounting for these external trends and allowing us to explore whether the reactions of the protocols in the analysis differ post-STIP implementation.

To achieve the described goals, we turned to causal inference. Knowing that “association is not causation”, the study of causal inference lies in techniques that try to figure out how to make association be causation. The classic notation of causality analysis revolves around a certain treatment , which doesn’t need to be related to the medical field, but rather is a generalized term used to denote an intervention for which we want to study the effect. We typically consider the treatment intake for unit i, which is 1 if unit i received the treatment and 0 otherwise. Typically there is an , the observed outcome variable for unit i. This is our variable of interest, i.e., we want to understand what the influence of the treatment on this outcome was. The fundamental problem of causal inference is that one can never observe the same unit with and without treatment, so we express this in terms of potential outcomes. We are interested in what would have happened in the case some treatment was taken. It is common to call the potential outcome that happened the factual, and the one that didn’t happen, the counterfactual. We will use the following notation:

- the potential outcome for unit i without treatment

- the potential outcome for the same unit i with the treatment.

With these potential outcomes, we define the individual treatment effect to be . Because of the fundamental problem of causal inference, we will actually never know the individual treatment effect because only one of the potential outcomes is observed.

One technique used to tackle this is Difference-in-Difference (or diff-in-diff). It is commonly used to analyze the effect of macro interventions such as the effect of immigration on unemployment, the effect of law changes in crime rates, but also the impact of marketing campaigns on user engagement. There is always a period before and after the intervention and the goal is to extract the impact of the intervention from a general trend. Let be the potential outcome for treatment D on period T (0 for pre-intervention and 1 for post-intervention). Ideally, we would have the ability to observe the counterfactual and estimate the effect of an intervention as: , the causal effect being the outcome in the period post-intervention in the case of a treatment minus the outcome in the same period in the case of no treatment. Naturally, is counterfactual so it can’t be measured. If we take a before and after comparison, - we can’t really say anything about the effect of the intervention because there could be other external trends affecting that outcome.

The idea of diff-in-diff is to compare the treated group with an untreated group that didn’t get the intervention by replacing the missing counterfactual as such: . We take the treated unit before the intervention and add a trend component to it, which is estimated using the control . We are basically saying that the treated unit after the intervention, had it not been treated, would look like the treated unit before the treatment plus a growth factor that is the same as the growth of the control.

An important thing to note here is that this method assumes that the trends in the treatment and control are the same. If the growth trend from the treated unit is different from the trend of the control unit, diff-in-diff will be biased. So, instead of trying to find a single untreated unit that is very similar to the treated, we can forge our own as a combination of multiple untreated units, creating a synthetic control.

That is the intuitive idea behind using synthetic control for causal inference. Assuming we have units and unit 1 is affected by an intervention. Units are a collection of untreated units, that we will refer to as the “donor pool”. Our data spans T time periods, with periods before the intervention. For each unit j and each time t, we observe the outcome . We define as the potential outcome without intervention and the potential outcome with intervention. Then, the effect for the treated unit at time t, for is defined as . Here is factual but is not. The challenge lies in estimating .

image
Source: 15 - Synthetic Control — Causal Inference for the Brave and True

Since the treatment effect is defined for each period, it doesn’t need to be instantaneous, it can accumulate or dissipate. The problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of the treated unit, had it not been treated.

The most straightforward approach is to consider that a combination of units in the donor pool may approximate the characteristics of the treated unit better than any untreated unit alone. So we define the synthetic control as a weighted average of the units in the control pool. Given the weights the synthetic control estimate of is .

We can estimate the optimal weights with OLS like in any typical linear regression. We can minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period. Hence, creating a “fake” unit that resembles the treated unit before the intervention, so we can see how it would behave in the post-intervention period.

In the context of our analysis, this means that we can include all other perp DEX protocols that did not receive the STIP in our donor pool and estimate a “fake”, synthetic, control perp DEX protocol that follows the trend of any particular one we want to study in the period before receiving the STIP. As mentioned before, the metric of interest chosen for this analysis was daily volume and, in particular, we calculated the 7-day moving average to smooth the data. Then we can compare the behavior of our synthetic control with the factual and estimate the impact of the STIP by taking the difference. We are essentially comparing what would have happened, had the protocol not received the STIP with what actually happened.

However, sometimes regression leads to extrapolation, i.e., values that are outside of the range of our initial data and can possibly not make sense in our context. This happened when estimating our synthetic control, so we constrained the model to do only interpolation. This means we restrict the weights to be positive and sum up to one so that the synthetic control is a convex combination of the units in the donor pool. Hence, the treated unit is projected in the convex hull defined by the untreated unit. This means that there probably won’t be a perfect match of the treated unit in the pre-intervention period and that it can be sparse, as the wall of the convex hull will sometimes be defined only by a few units. This works well because we don’t want to overfit the data. It is understood that we will never be able to know with certainty what would have happened without the intervention, just that under the assumptions we can make statistical conclusions.

Formalizing interpolation, the synthetic control is still defined in the same way by . But now we use the weights that minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period , subject to the restriction that are positive and sum to one.

We get the optimal weights using quadratic programming optimization with the described constraints on the pre-STIP period and then use these weights to calculate the synthetic control for the total duration of time we are interested in. We initialized the optimization for each analysis with different starting weight vectors to avoid introducing bias in the model and getting stuck in local minima. We selected the one that minimized the square difference in the pre-intervention period.

As an example, below is the resulting chart for Vertex, showing the factual daily volume observed in Vertex and the synthetic control.

With the synthetic control, we can then estimate the effect of the STIP as the gap between the factual protocol daily volume and the synthetic control, .

To understand whether the result is statistically significant and not just a possible result we got due to randomness, we use the idea of Fisher’s Exact Test. We permute the treated and control units exhaustively by, for each unit, pretending it is the treated one while the others are the control. We create one synthetic control and effect estimates for each protocol, pretending that the STIP was given to another protocol to calculate the estimated impact for this treatment that didn’t happen. If the impact in the protocol of interest is sufficiently larger when compared to the other fake treatments (“placebos”), we can say our result is statistically significant and there is indeed an observable impact of the STIP on the protocol’s daily volume. The idea is that if there was no STIP in the other protocols and we used the same model to pretend that there was, we wouldn’t see any impact.

It is expected that the variance after the intervention will be higher than the variance before the intervention since the synthetic control is designed to minimize the difference in the pre-intervention period. This can be seen in the chart below. Some protocols don’t fit well at all even in the pre-intervention period when no convex combination matches them, so they were removed from the analysis by setting a threshold for pre-intervention error.

With this test, we see that if we pretend the STIP was given to another protocol, we would almost never get an effect so extreme as the one we got with Vertex. For the other perp DEXs in the STIP, this was not always the case, especially after March 2024. For that reason, to maintain statistical significance at 90%, we restricted the analysis to the impact observed until March 1st.

References

“Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.”

Aayush Agrawal - Causal inference with Synthetic Control using Python and SparseSC

01 - Introduction To Causality — Causal Inference for the Brave and True

Annex

The Annex describes the data used and shows intermediary charts used in the analysis.

The data gathered to evaluate the effect of the STIP on each protocol’s daily volume includes data on daily volume from multiple protocols across a few months. There is a balance between having a long enough timeline of historical data and enough protocols to compare with. For instance, GMX was launched before 2022 but we chose to use only data from 2023 to allow for a larger donor pool. Protocols that also received the STIP were dropped from the analysis. The 7-day moving average was used to smooth out the time series.

Vertex

Protocols used in the donor pool: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), dYdX, GMX (avax).

Gains Network

Protocols used: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), Gains Network (polygon), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), KTX.Finance (bsc), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), HoldStation DeFutures (era), dYdX, GMX (avax).

GMX

Protocols used in the donor pool: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), dYdX, GMX (avax).

MUX Protocol

Protocols used in the donor pool: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), dYdX, GMX (avax).

Vela Exchange

Protocols used in the donor pool: Aevo (ethereum), ApeX Protocol (ethereum), APX Finance (bsc), Based Markets (base), Beamex (moonbeam), BLEX (arbitrum), Drift (solana), Fulcrom (cronos), Gains Network (polygon), HMX (arbitrum), GooseFX (solana), Hyperliquid (hyperliquid), ImmortalX (celo), KiloEx (bsc), KTX.Finance (bsc), IPOR (ethereum), Level Finance (bsc), Level Finance (arbitrum), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Pinnako (era), Polynomial Trade (optimism), Synthetix (optimism), UniDex (optimism), SpaceDex (bsc), UniDex (era), UniDex (arbitrum), UniDex (fantom), UrDEX Finance (arbitrum), Vela Exchange (arbitrum), HoldStation DeFutures (era), dYdX, GMX (avax).

2 Likes

STIP Retroactive Analysis – Spot DEX TVL

The below research report is also available in document format here .

TL;DR

In H2 2023, Arbitrum launched the Short-Term Incentive Program (STIP) by distributing millions of ARB tokens to various protocols to drive user engagement. This report focuses on how the STIP impacted TVL in the spot DEX vertical, specifically examining the performance of Balancer, Camelot, Ramses, Trader Joe and WOOFi. By employing the Synthetic Control (SC) causal inference method to create a “synthetic” control group, we aimed to isolate the STIP’s effect from broader market trends. For each protocol the analysis focuses on the median TVL in the period from the first day of the STIP to two weeks after the STIP ended, in an effort to include at least two weeks of persistence in the analysis.

Our analysis yielded varied results: WOOFi saw a significant 62.5% of its median TVL in the analyzed period attributed to STIP, while Camelot saw 37.1%. Balancer and Trader Joe also benefited, both with around 12% of their TVL linked to STIP incentives. During the STIP period and the two weeks following its conclusion, the added TVL per dollar spent on incentives was approximately $12 for both Balancer and Camelot, $7 for WOOFi ($25 when considering only the incentives given directly to LPs), and $2 for Trader Joe. Our model showed no statistically significant impact for Ramses.

Spot DEXs on Arbitrum focus on enhancing liquidity for native and multi-chain projects, helping them bootstrap and build liquidity sustainably. Protocols achieve this through liquidity incentives, using either activity-based formulas or more traditional methods for allocation. Our analysis showed that different incentive distribution methods had a similar impact on TVL across protocols like Balancer and Camelot. However, Trader Joe’s strategy was less effective due to shorter incentivization periods, as also identified by the team.

WOOFi’s results varied depending on whether we considered total incentives or those just for liquidity providers. While it underperformed in added TVL per dollar spent compared to Camelot and Balancer, it excelled in liquidity-specific incentives. Additionally, incentives for other activities like swaps may indirectly boost TVL. Note that a price manipulation attack on March 5th could have impacted WOOFi’s confidence.

Ramses showed a substantial increase in TVL following the STIP, suggesting a possible delayed positive impact despite a lack of immediate statistical significance.

Our methodology and detailed results underscore the complexities of measuring such interventions in a volatile market, stressing the importance of comparative analysis to understand the true impact of incentive programs like STIP.

Context and Goals

In H2 2023, Arbitrum initiated a significant undertaking by distributing millions of ARB tokens to protocols as part of the Short-Term Incentive Program (STIP), aiming to spur user engagement. This program allocated varying amounts to diverse protocols across different verticals. Our objective is to gauge the efficacy of these recipient protocols in leveraging their STIP allocations to boost the usage of their products. The challenge lies in accurately gauging the impact of the STIP amidst a backdrop of various factors, including broader market conditions.

This report pertains to the spot DEX vertical in particular. In this vertical, the STIP recipients were Balancer, Camelot, Ramses, Trader Joe and WOOFi. The following table summarizes the amount of ARB tokens received and when they were used by each protocol.

To assess the impact of the STIP on spot DEX protocols, TVL is a crucial metric. While trading volume is also important for evaluating a DEX’s performance, the primary focus of these AMMs within the STIP was on enhancing and sustaining liquidity for both new and established projects in the ecosystem. The goal was to ensure that liquidity was readily available to improve efficiency and reduce slippage. Most of these projects used all the incentives allocated to them to attract liquidity providers, making TVL the most directly impacted metric. However, a separate analysis would be valuable to understand how this increase in TVL translated into further activities, such as trading volume on the DEX, for a more comprehensive understanding. Throughout the report, the 7-day moving average (MA) TVL was used, so any mention of TVL should be understood as the 7-day MA TVL.

We used a Causal Inference method called Synthetic Control (SC) to analyze our data. This technique helps us understand the effects of a specific event by comparing our variable of interest to a “synthetic” control group. Here’s a short breakdown:

  • Purpose: SC estimates the impact of a particular event or intervention using data over time.
  • How It Works: It creates a fake control group by combining data from similar but unaffected groups. This synthetic group mirrors the affected group before the event.
  • Why It Matters: By comparing the real outcomes with this synthetic control, we can see the isolated effect of the event.

In our analysis, we use data from other protocols to account for market trends. This way, we can better understand how protocols react to changes, like the implementation of the STIP, by comparing their performance against these market-influenced synthetic controls. The results pertain to the period from the start of each protocol’s use of the STIP until two weeks after the STIP had ended.

Results

Balancer

Balancer launched its v1 in early 2020, being in Arbitrum since Q3 2021. Balancer’s KPIs included TVL, daily protocol fees and volume, all of which have increased during the STIP. The totality of the received funds were allocated to liquidity providers through an incentive system developed for the STIP. The grant aimed to boost economic activity on Arbitrum by creating an autonomous mechanism for distributing ARB incentives to enhance Balancer liquidity across the network. This incentive program distributed 41,142.65 ARB per week based on veBAL voting for pools on Arbitrum. The vote weight per pool was multiplied by a boost factor, and ARB was then distributed to all pools based on their relative boosted weight. Pools were capped at 10% of the total weekly ARB, except for ETH-based LSD stableswap pools, which were capped at 20%.

Balancer’s TVL increased from approximately $111M on November 2 2023, when the STIP started, to a peak of $193M in March 2024. By the end of the STIP on March 22, the TVL was at $156M. One week later, on March 29, the TVL had decreased to $145M. Two weeks later, on April 5, it was $135M, and by May 5, it had further dropped to $90M.

Overall, there was a 41% increase in TVL when comparing the periods before and after the STIP. Comparing the start of the STIP to one week after its end, there was a 31% increase, and a 21% increase two weeks after the STIP ended.

The first chart below compares Balancer’s TVL with the modeled synthetic control. The second chart highlights the impact of the STIP by showing the difference between Balancer’s TVL and the synthetic control. For more details, see the Methodology section.

The median impact of the STIP on Balancer’s TVL, from its start on November 2, 2023, to its end on March 22, 2024, was $17.3M. Including the TVL two weeks after the STIP concluded, the median impact was $17.6M. Balancer received a total of 1.2M ARB, valued at approximately $1.44M (at $1.20 per ARB). This indicates that the STIP generated an average of $12.27 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 90% statistical significance.

Camelot

Camelot’s KPIs for the STIP included TVL, volume, and fees. Their incentive allocation strategy prioritized LP returns by incentivizing more than 75 different pools from various Arbitrum protocols. The team focused on promoting liquidity in a diverse mix of pools, including Arbitrum OGs, smaller protocols, newcomers, and established projects from other ecosystems. The ARB distribution followed the same logic used for their own GRAIL emissions, aimed at ensuring a consistent and strategic approach to incentivization.

Camelot’s TVL increased from approximately $82M on November 14 2024, when the STIP began, to a peak of $150M in March 2024. By the end of the STIP on March 29, the TVL was $136M. One week later, on April 5, it was $131.6M. Two weeks after the STIP ended, on April 12, the TVL stood at $126M, and one month later, it was $105.5M.

Overall, there was a 66% increase in TVL comparing the periods before and after the STIP. Comparing the start of the STIP to one week after its end, there was a 60% increase, and a 53% increase two weeks after the STIP concluded.

The median impact of the STIP on Camelot’s TVL from its start on November 14, 2023, to its end on March 29, 2024, was $42.7M. When including the TVL two weeks after the STIP ended, the impact increased to $44M. Camelot received a total of 3.09M ARB, of which 65,450 ARB were returned, resulting in a final total of 3,024,550 ARB, valued at approximately $3.63M (at $1.20 per ARB). This indicates that the STIP generated an average of $12.12 in TVL per dollar spent during the STIP and the two weeks following it. These results were obtained with 95% statistical significance.

Ramses

Ramses distributed its total incentives to liquidity providers across various pools using a two-part strategy. Fifty percent of the incentives were allocated based on fees generated in the previous epoch, while the remaining fifty percent were distributed at the team’s discretion, targeting protocols that needed bootstrapping and lacked sufficient liquidity support from other sources. This approach aimed to strengthen the overall Arbitrum ecosystem by balancing support for established and emerging projects.

Ramses’ TVL increased from approximately $8.2M on December 27, 2023, when the STIP began, to over $25M at its peak in June 2024. When the STIP ended on March 20, 2024, the TVL was $10.7M. One week later, on March 27, the TVL had dropped slightly to $8.1M. However, two weeks after the STIP concluded, on April 3, the TVL had risen to $10.9M, and one month later, on April 20, it reached $12.5M. This represents a 30% increase in TVL from the period before the STIP to the period after it. Comparing the start of the STIP with the TVL one week after its end shows virtually no change, while there was a 33% increase two weeks after the STIP ended.

The STIP’s impact on Ramses was not statistically significant, which means we cannot conclude that the observed results were caused by the STIP rather than occurring by chance. As a result, no clear conclusions about the magnitude of the STIP’s effect on Ramses can be drawn with this analysis.

Trader Joe

In the STIP Addendum, Trader Joe explained that while the main goal of the grant was to incentivize long-tail assets (builders) within the Arbitrum ecosystem, it quickly became a cat-and-mouse game due to intense yield competition and high demand for liquidity. The protocol found that spreading efforts across a wide range of protocols and using a rotating incentive program with concentrated rewards over short periods was not the most effective approach for allocating incentives.

Trader Joe’s TVL increased from approximately $27.5M on November 4, 2023, when the STIP started, to nearly $50M at its peak in March 2024. By the end of the STIP on March 29, 2024, TVL had reached $42.4M. One week later, on April 5, TVL was $41.4M, and two weeks after the STIP ended, on April 12, it had decreased to $37.5M. One month after the STIP concluded, on April 29, the TVL was $28.3M. This represents a 54% increase in TVL from the period before the STIP to the period after it. Compared to the start of the STIP, TVL showed a 51% increase one week after the STIP ended and a 37% increase two weeks after the STIP concluded.

The median impact of the STIP on Trader Joe’s TVL from the start date on November 4, 2023, to the end date on March 29, 2024, was $3.7M. When also accounting for the TVL two weeks after the STIP ended, the total impact increased to $3.9M. Trader Joe received a total of 1.51M ARB, which was valued at approximately $1.81M (based on a $1.20 per ARB rate). This implies that the STIP generated an average of $2.31 in TVL for every dollar spent on incentives during the STIP period and the subsequent two weeks. These results are significant at an 85% confidence level.

WOOFi

WOOFi is the DeFi arm of the WOO ecosystem, functioning as a DEX that bridges the liquidity of the WOO X centralized exchange on-chain. Unlike the other protocols in this analysis, WOOFi did not allocate the total received ARB directly to liquidity incentives. WOOFi’s KPIs focused on several metrics: WOOFi Earn TVL, WOOFi Stake TVL, monthly swap volume, monthly perps volume, and the number of Arbitrum inbound cross-chain swaps. According to the Grant Information here and the STIP Addendum here, the ARB allocation was divided as follows: 30% to WOOFi Earn, 20% to WOOFi Pro, 15% to Arbitrum-inbound cross-chain swaps, 15% to WOOFi Stake, 10% to WOOFi Swap, and 10% to Quests & Cross-Protocol Integration. Additionally, approximately 65k ARB remained unused and was returned to the DAO.

To ensure consistency with the other analyzed protocols, this report focuses exclusively on WOOFi Earn’s TVL. Therefore, whenever WOOFi’s TVL is mentioned, it specifically refers to WOOFi Earn’s TVL. It is relevant to note that WOOFi Earn functions primarily as a yield aggregator product. While analyzing WOOFi Earn in isolation might also warrant a comparison to other yield aggregator protocols rather than spot DEXs, the core of WOOFi’s business is its spot DEX. The Earn feature was designed to support liquidity for on-chain swaps, which is why it has been included in this analysis.

WOOFi’s TVL grew from approximately $4.5M on December 26, 2023, when the STIP began, to around $15.4M at its peak in February 2024. By the end of the STIP on March 29, 2024, the TVL was $12.6M. One week later, on April 5, the TVL was $12.3M, and two weeks after the STIP ended, on April 12, it dropped to $9.5M. By one month after the STIP concluded, on April 29, the TVL had decreased to $5.9M.

This represents a 181% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 175% increase, while two weeks after the STIP ended, the increase was 111%.

The median impact of the STIP on WOOFi’s TVL from its start on December 26, 2023, to its end on March 29, 2024, was $8.8M. When accounting for the TVL two weeks after the STIP concluded, the total median impact was $8.3M. WOOFi received a total of 1M ARB, of which approximately 65,000 ARB remained unused, resulting in 935,000 ARB valued at about $1.12M (at $1.20 per ARB). This means the STIP generated an average of $7.4 in TVL for every dollar spent on ARB incentives during the program and the following month. These results were gathered with 90% statistical significance.

Main Takeaways

Our analysis produced interesting results for the impact of the STIP on Balancer, Camelot, Trader Joe and WOOFi. The analysis deemed the impact on Ramses to not be statistically significant, which means that we couldn’t confidently say that STIP caused a noticeable change in the protocol’s TVL, but rather the variations we see could potentially just be due to market behavior. A further explanation of how this and all results were derived can be found in the Methodology section.

Spot DEXs serve a fundamentally different purpose than other protocols such as perp DEXs, where successful incentive campaigns primarily focus on boosting trading volume and fees. For a spot DEX on Arbitrum, the goal is to enhance and support the liquidity of both native Arbitrum projects and multi-chain projects aligned with Arbitrum by increasing liquidity for their tokens. This approach helps new, growing, and established projects on Arbitrum bootstrap and build liquidity in a sustainable and capital-efficient manner.

The protocols we analyzed aimed to achieve this goal mainly by offering liquidity incentives to their providers. Some protocols developed activity-based formulas to determine a relative allocation between pools, while others employed more traditional methods, such as evaluating allocations on a weekly or biweekly basis and fixing them for the next period.

Our analysis showed that different methods of distributing incentives across various pools did not seem to substantially impact the effectiveness of the STIP. The ability to generate TVL appeared similar across protocols. For instance, both Balancer and Camelot demonstrated comparable added TVL per dollar of ARB spent. However, Trader Joe’s strategy was deemed less effective due to the short duration of incentivization in specific pools, as also identified by the team. Protocols can learn from this experience when designing future incentive programs.

WOOFi presents a different case with results that vary substantially depending on whether we consider the total incentives or just those allocated directly to WOOFi’s liquidity providers. Compared to Camelot and Balancer, WOOFi underperforms in terms of added TVL per dollar spent on incentives. However, it excels when focusing solely on liquidity incentives. This disparity makes direct comparisons challenging, but it suggests that incentives for other activities on the platform may indirectly boost TVL and could be beneficial. For example, offering incentives for swaps can increase trading volume, which in turn raises yields in those pools and attracts more TVL. It’s also worth noting that on March 5th, WOOFi experienced a price manipulation attack resulting in an $8.75M loss from its synthetic proactive market making (sPMM). Although WOOFi Earn was not directly affected, this incident likely shook user confidence in the protocol for a period following the attack.

The exact impact on Ramses couldn’t be assessed due to a lack of statistical significance. However, it’s noteworthy that TVL increased substantially in the months following the STIP. This may suggest that, while the immediate impact of the STIP couldn’t be determined during the analysis period, it may have had a delayed positive effect.

Lastly, it’s essential to acknowledge the limitations inherent in our models, which are only as reliable as the data available. Numerous factors can drastically influence outcomes, making it challenging to isolate the effects of a single intervention. This is particularly true and disproportionate in the crypto industry.

Given these complexities, our results should be interpreted comparatively rather than absolutely. The SC methodology was uniformly applied across all protocols, allowing us to gauge the relative efficacy of the STIP allocation.

Appendix

Methodology

TLDR: We employed an analytical approach known as the Synthetic Control (SC) method. The SC method is a statistical technique utilized to estimate causal effects resulting from binary treatments within observational panel (longitudinal) data. Regarded as a groundbreaking innovation in policy evaluation, this method has garnered significant attention in multiple fields. At its core, the SC method creates an artificial control group by aggregating untreated units in a manner that replicates the characteristics of the treated units before the intervention (treatment). This synthetic control serves as the counterfactual for a treatment unit, with the treatment effect estimate being the disparity between the observed outcome in the post-treatment period and that of the synthetic control. In the context of our analysis, this model incorporates market dynamics by leveraging data from other protocols (untreated units). Thus, changes in market conditions are expected to manifest in the metrics of other protocols, thereby inherently accounting for these external trends and allowing us to explore whether the reactions of the protocols in the analysis differ post-STIP implementation.

To achieve the described goals, we turned to causal inference. Knowing that “association is not causation”, the study of causal inference lies in techniques that try to figure out how to make association be causation. The classic notation of causality analysis revolves around a certain treatment , which doesn’t need to be related to the medical field, but rather is a generalized term used to denote an intervention for which we want to study the effect. We typically consider the treatment intake for unit i, which is 1 if unit i received the treatment and 0 otherwise. Typically there is an , the observed outcome variable for unit i. This is our variable of interest, i.e., we want to understand what the influence of the treatment on this outcome was. The fundamental problem of causal inference is that one can never observe the same unit with and without treatment, so we express this in terms of potential outcomes. We are interested in what would have happened in the case some treatment was taken. It is common to call the potential outcome that happened the factual, and the one that didn’t happen, the counterfactual. We will use the following notation:

- the potential outcome for unit i without treatment

- the potential outcome for the same unit i with the treatment.

With these potential outcomes, we define the individual treatment effect to be . Because of the fundamental problem of causal inference, we will actually never know the individual treatment effect because only one of the potential outcomes is observed.

One technique used to tackle this is Difference-in-Difference (or diff-in-diff). It is commonly used to analyze the effect of macro interventions such as the effect of immigration on unemployment, the effect of law changes in crime rates, but also the impact of marketing campaigns on user engagement. There is always a period before and after the intervention and the goal is to extract the impact of the intervention from a general trend. Let be the potential outcome for treatment D on period T (0 for pre-intervention and 1 for post-intervention). Ideally, we would have the ability to observe the counterfactual and estimate the effect of an intervention as: #0), the causal effect being the outcome in the period post-intervention in the case of a treatment minus the outcome in the same period in the case of no treatment. Naturally, is counterfactual so it can’t be measured. If we take a before and after comparison, -E%5BY(0)%7CD%3D1)#0) we can’t really say anything about the effect of the intervention because there could be other external trends affecting that outcome.

The idea of diff-in-diff is to compare the treated group with an untreated group that didn’t get the intervention by replacing the missing counterfactual as such: . We take the treated unit before the intervention and add a trend component to it, which is estimated using the control . We are basically saying that the treated unit after the intervention, had it not been treated, would look like the treated unit before the treatment plus a growth factor that is the same as the growth of the control.

An important thing to note here is that this method assumes that the trends in the treatment and control are the same. If the growth trend from the treated unit is different from the trend of the control unit, diff-in-diff will be biased. So, instead of trying to find a single untreated unit that is very similar to the treated, we can forge our own as a combination of multiple untreated units, creating a synthetic control.

That is the intuitive idea behind using synthetic control for causal inference. Assuming we have units and unit 1 is affected by an intervention. Units are a collection of untreated units, that we will refer to as the “donor pool”. Our data spans T time periods, with periods before the intervention. For each unit j and each time t, we observe the outcome . We define as the potential outcome without intervention and the potential outcome with intervention. Then, the effect for the treated unit at time t, for is defined as . Here is factual but is not. The challenge lies in estimating .

Source: 15 - Synthetic Control — Causal Inference for the Brave and True

Since the treatment effect is defined for each period, it doesn’t need to be instantaneous, it can accumulate or dissipate. The problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of the treated unit, had it not been treated.

The most straightforward approach is to consider that a combination of units in the donor pool may approximate the characteristics of the treated unit better than any untreated unit alone. So we define the synthetic control as a weighted average of the units in the control pool. Given the weights the synthetic control estimate of is .

We can estimate the optimal weights with OLS like in any typical linear regression. We can minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period. Hence, creating a “fake” unit that resembles the treated unit before the intervention, so we can see how it would behave in the post-intervention period.

In the context of our analysis, this means that we can include all other spot DEX protocols that did not receive the STIP in our donor pool and estimate a “fake”, synthetic, control spot DEX protocol that follows the trend of any particular one we want to study in the period before receiving the STIP. As mentioned before, the metric of interest chosen for this analysis was TVL and, in particular, we calculated the 7-day moving average to smooth the data. Then we can compare the behavior of our synthetic control with the factual and estimate the impact of the STIP by taking the difference. We are essentially comparing what would have happened, had the protocol not received the STIP with what actually happened.

However, sometimes regression leads to extrapolation, i.e., values that are outside of the range of our initial data and can possibly not make sense in our context. This happened when estimating our synthetic control, so we constrained the model to do only interpolation. This means we restrict the weights to be positive and sum up to one so that the synthetic control is a convex combination of the units in the donor pool. Hence, the treated unit is projected in the convex hull defined by the untreated unit. This means that there probably won’t be a perfect match of the treated unit in the pre-intervention period and that it can be sparse, as the wall of the convex hull will sometimes be defined only by a few units. This works well because we don’t want to overfit the data. It is understood that we will never be able to know with certainty what would have happened without the intervention, just that under the assumptions we can make statistical conclusions.

Formalizing interpolation, the synthetic control is still defined in the same way by . But now we use the weights that minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period , subject to the restriction that are positive and sum to one.

We get the optimal weights using quadratic programming optimization with the described constraints on the pre-STIP period and then use these weights to calculate the synthetic control for the total duration of time we are interested in. We initialized the optimization for each analysis with different starting weight vectors to avoid introducing bias in the model and getting stuck in local minima. We selected the one that minimized the square difference in the pre-intervention period.

As an example, below is the resulting chart for Camelot, showing the factual TVL observed in Camelot and the synthetic control.

With the synthetic control, we can then estimate the effect of the STIP as the gap between the factual protocol TVL and the synthetic control, .

To understand whether the result is statistically significant and not just a possible result we got due to randomness, we use the idea of Fisher’s Exact Test. We permute the treated and control units exhaustively by, for each unit, pretending it is the treated one while the others are the control. We create one synthetic control and effect estimates for each protocol, pretending that the STIP was given to another protocol to calculate the estimated impact for this treatment that didn’t happen. If the impact in the protocol of interest is sufficiently larger when compared to the other fake treatments (“placebos”), we can say our result is statistically significant and there is indeed an observable impact of the STIP on the protocol’s TVL. The idea is that if there was no STIP in the other protocols and we used the same model to pretend that there was, we wouldn’t see any impact.

References

“Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.”

Aayush Agrawal - Causal inference with Synthetic Control using Python and SparseSC

01 - Introduction To Causality — Causal Inference for the Brave and True

3 Likes

STIP Analysis of Operations and Incentive Mechanisms

The below research report is also available in document format here .

Introduction

The following analysis presents an overview of the final STIP fund allocation across different incentive mechanisms, focusing on any differences in growth trends and their sustainability for perp DEX and spot DEX protocols, while comparing activity on Arbitrum against other relevant ecosystems. Moreover, this analysis covers broader themes and recurring developments that have emerged through STIP applications, updates, recipients’ performance, and discussions with recipient teams.

TL;DR

  • Following changes made to incentive allocations throughout the STIP, the most popular high-level mechanisms utilized were standard liquidity incentives (~30% of total allocation), fee rebates (~25% of total allocation), trading/points/usage-based programs (~12% of total allocation), liquidity incentives for native token(s) in pools on partner protocols (~8% of total allocation), and liquidity incentives with optional/required long-term/perpetual capital locking (~8% of total allocation).
  • Most spot and perp DEX STIP recipients’ top-line metrics fell notably after the STIP ended and are currently around levels seen in September 2023. There are a few outperformers, generally younger, differentiated protocols that outgrew the market during the STIP and have successfully managed to maintain activity and capital in the long term.
  • Excluding the outperformers, the absolute change in TVL/Volume achieved per ARB utilized to increase these metrics directly has widely converged between protocols in the same verticals.
  • Overall, during the STIP, Arbitrum’s market share growth across major blockchains peaked at ~0% for TVL, ~5% for spot volume, ~12% for perp volume, and ~0% for loans outstanding. The market shares are currently at around September 2023 values, except for TVL, which is down from ~6% to ~4%.

Key Insights – The STIP’s Impact

Until now, we’ve examined how protocols within the perp DEX and spot DEX verticals have performed relative to each other within the same verticals. This offers insights into, e.g., any high-level differences in the effectiveness of different incentive mechanisms and growth themes for protocols across different maturities and sizes. However, this doesn’t allow us to say anything about the STIP’s effectiveness in maintaining or growing usage in aggregate. For example, let’s say market conditions would have drastically deteriorated during the STIP due to a systematic shock, causing all projects’ metrics to decrease throughout the program. Naively only looking at the performance of Arbitrum protocols would lead us to conclude that the program has been a failure, although what might have happened was that the total market size shrunk but the protocols’ share of the market increased, which most community members would likely objectively consider a success.

Everything else equal, it seems that the STIP successfully catalyzed a notable increase in Arbitrum’s DeFi market share across all major blockchains at the beginning of the incentive program. More specifically, it seems that ARB incentives for perp and spot DEXs, the two largest allocations, have been vast enough to meaningfully capture more of the total activity during the incentive period. However, activity began reverting in the latter half of the program, with Arbitrum’s market shares for spot volume, perp volume, and loans outstanding currently hovering around September 2023 levels. Based on the below graph, incentives were enough to sustain Arbitrum’s TVL market share at a steady level of ~6% for most of the program, but the ecosystem began losing capital relative to the rest of the market in the middle of February and currently has a ~4% market share.

Source: Artemis. Note: Major Blockchains include, where relevant, Aevo, Aptos, Avalanche C-Chain, Base, BNB Chain, Blast, dYdX, Ethereum, Fantom, Gnosis Chain, Hyperliquid, Near, Optimism, Osmosis, Polygon PoS, Scroll, Solana, StarkNet, Sui, zkSync Era

When comparing Arbitrum’s figures against Ethereum, Optimism, and Base, the trends are almost identical to those shown above, except for perp volume, which has sustained notably better. One major driver for this has likely been the successful bootstrapping of one of the newer perp DEXs on Arbitrum while protocols on the other blockchains have stagnated. This showcases Arbitrum’s strength within the perps vertical compared to the other major blockchains in the Ethereum ecosystem.

Source: Artemis

To summarize, all of the analyzed protocols saw their top-line metrics increase during the STIP, but in the months following the program’s end, figures trended back toward September 2023 values. There was some variability in how much capital/volume each protocol had managed to capture per ARB spent at the end of the STIP, but in the long term, these multiples tended to converge to a tight range. There are a few exceptions to this—protocols that are on the younger side and generally offer differentiated products. These protocols have successfully reached notably higher “steady states” compared to the beginning of the program, with incentives likely amplifying market penetration deriving from intrinsic drivers and on a few occasions, leading to more robust collaboration between the outperformers and other Arbitrum protocols, creating additional synergies. Although the data isn’t shown in this report, the money market vertical generally showcased similar trends as the perp and spot DEX verticals.

While a handful of protocols outperformed by sustainably growing activity, the overall increases in Arbitrum’s market shares across TVL and the major DeFi categories have largely reverted. In other words, the STIP doesn’t seem to have led to sustainable market capture in aggregate. However, there are other indirect returns that could be considered as well. For example:

  • Incentives are an implicit avenue for the DAO to convert ARB in the treasury to ETH through the sequencer margin as activity increases during incentive programs. Although, some might argue that this isn’t the most effective way to diversify the treasury.
  • If designed correctly, incentives could theoretically increase existing users’ loyalty and goodwill to Arbitrum. This is quite intangible and difficult to measure.
  • Incentives might attract new builders and protocols to the ecosystem. Although this is difficult to measure in such a short time period, there are tangible examples of protocols migrating to the ecosystem with Kwenta and Curve Lending both launching on Arbitrum with onboarding incentives. However, through our conversations with several protocols, it has also become clear that the incentive application process might be too complex for many smaller teams, and some projects have decided to not even consider launching on Arbitrum as they feel like receiving funds requires too much politics.
  • Intuitively, as incentives clearly increase activity, protocols benefit from earning more revenue, and it makes sense for Arbitrum protocols to also benefit as they are an integral part of the ecosystem. In contrast, the feedback we’ve received from some teams is that because meeting KPIs is such an important factor for being considered a successful STIP recipient, some projects have had to decrease their native fees to a minimum. It seems fair to say that in such cases, growth objectively isn’t sustainable, and might also hurt other protocols within the same vertical since they have to match a fee structure that isn’t profitable.

Lastly, it could be argued that had the STIP not happened, Arbitrum’s market shares across different metrics would be worse than what they are now. Such a counterfactual analysis carries many complexities and requires subjective interpretation, meaning that it isn’t possible to come to a result that can be considered the objective truth. Nevertheless, Blockworks Research has released two analyses employing the Synthetic Control causal inference method, which strives to compare what the performance of perp DEX and spot DEX STIP recipients has been during the program against what the performance would have been had the STIP not taken place.

Operational Observations

Throughout our analysis, certain wider-reaching themes and recurring developments have presented themselves, which are covered below. To begin with, incentive programs that ended notably earlier than others (i.e., February and early March) generally experienced vast capital/user outflows relative to other, similar STIP recipients when their programs ended in the latter half of March 2024. This makes sense rationally. The cost of capital within the Arbitrum ecosystem heightens during the STIP because protocols provide boosted yields and lowered fees through incentives. When one protocol stops allocating incentives, users and capital move to other protocols that are still providing heightened returns and lower expenses.

As such, allocating incentives to a concentrated group of protocols or tiering incentives across protocols within the same vertical might lead to unwanted outcomes. It’s likely that the protocols distributing incentives end up largely capturing capital and users from other projects within the ecosystem that aren’t distributing incentives, meaning that activity mostly rotates around from protocol to protocol within Arbitrum instead of bringing in new usage from foreign ecosystems.

Related to this, resulting yields across liquidity provision opportunities have had some notable variations across protocols, even within the same verticals and similar pools. Naturally, some opportunities are more risky than others because of, for example, inherently different mechanisms or less tested smart contracts, and should thus theoretically offer higher yields to reward users for taking on additional risk. Having said that, if protocols aim to minimize capturing capital and users from other similar projects within the ecosystem and instead bring in new users from foreign ecosystems, it might make sense to benchmark yields based on what similar protocols and products outside of Arbitrum are offering and apply slightly higher target yield intervals for STIP participants. As previously discussed, projects must naturally have the freedom to finetune their incentive distributions depending on, e.g., market conditions, the need to bootstrap new products, and reacting to protocol-specific needs, but especially for medium- and large-sized projects that aren’t bootstrapping new pools or products, it might be sensible to set targets.

The point here is that if similar opportunities within the ecosystem offer similar returns somewhat constantly, users within the ecosystem are disincentivized to change protocols purely based on returns. Theoretically, if growth within a vertical were to stagnate at certain yield thresholds, this implies that existing usage is exhausted at those levels and the prevalent market conditions, while the marginal new user requires higher returns to migrate to the ecosystem. In this case, it would be sensible to increase the specific vertical’s/product type’s threshold yield. Rationally, incentives shouldn’t sustainably expand existing users’ usage behavior or willingness to put capital at risk, meaning that it might make sense to add some sustainability-related KPIs to the program and more heavily prioritize structures that are likely to get new users to migrate to the ecosystem and create long-term activity.

It is worth pointing out that some standardization is already taking place, with notable perp DEX incentive recipients agreeing to rebate a maximum of 75% of trading fees. On the topic of standardization and program structure clarity, it’s exceptionally difficult to follow what the total, final number of ARB received by protocols with partnership allocations has been. Some projects have received a notably larger number of tokens compared to the initial allocation in isolation. To be specific, if partnership allocations aren’t adjusted for, comparing STIP recipients’ historical performance could lead to skewed results.

Many protocols missed several bi-weekly reports or didn’t post them at all. Around 35% of all STIP recipients didn’t post a final report. For the bi-weekly reports, only a handful of projects discussed protocol-related events and developments that might have explained growth that had materialized. Instead, it was more usual that projects would only present some high-level KPIs and how they planned to use incentives in the coming two weeks across different allocation buckets, meaning that it was sometimes difficult to understand drastic changes in figures purely based on information provided on the forum. It might be worth considering decreasing the reporting frequency to a monthly cadence, or even lower, such that protocols have more time to prepare their reports, can cover more relevant information, and hopefully can direct most of their focus on growing their products.

It was infrequent that protocols rigorously justified why they should be allocated a certain amount of incentives when applying for the STIP. Rather, the final allocations were generally a result of back-and-forth between protocols and the community, often resulting in an allocation based on something akin to “we feel like this ask is too big/small”. To make allocation requests more quantifiable and comparable across medium- and large-sized applicants, metrics such as TVL/volume/users/etc. could be normalized relative to the requested ARB allocation, creating a high-level metric that could be used to compare projects within the same verticals more easily. There are naturally additional factors that should be considered when deciding allocations as well, but multiples could be an efficient way to sanity-check protocols’ requests relative to each other.

The initially planned incentive period for the original STIP was ~3 months. This ended up being somewhat longer for Round 1 recipients, and somewhat shorter for Round 2 recipients because of the reasons mentioned at the beginning of this report. The average distribution period for relevant STIP Round 1 protocols that weren’t hacked and applied to the STIP-Bridge was 129 days, while the average distribution period for relevant STIP Round 2 protocols that weren’t hacked and applied to the STIP-Bridge was 83 days.

Some projects, even those having received incentives as part of Round 1, had a notable amount of their ARB allocations left when the STIP deadline began approaching. From a protocol’s perspective, it is naturally more beneficial to utilize as much of their incentive allocation as possible, while protocols don’t really gain anything from returning funds at the end of the period. As such, some protocols that had been more conservative in allocating ARB throughout the program arbitrarily cranked up their incentive distribution at the end of the period, which again theoretically affected the cost of capital structure within the ecosystem.

Finally, some projects that were eligible for incentives had planned to allocate ARB to products that hadn’t yet launched or to leverage incentive distribution mechanisms that hadn’t yet been put in place when the applications were submitted. Many of these projects then didn’t manage to launch the product or distribution mechanism, generally leading to the ARB allocation being redirected to some other incentive bucket instead of being sent back to the DAO. It might be sensible to require that the to-be-incentivized product has been live for X days before a protocol can request incentives for that product. Rysk has already set a great example for this, forfeiting from the STIP-Bridge since the project is currently working on its v2 upgrade. Somewhat relatedly, certain protocols built on top of another protocol’s product incentivized usage while the underlying protocol’s product was being wound down to be replaced by a newer version. Directing incentives to a product that will be discontinued in the near term might not be the highest ROI opportunity for the DAO.

Final Allocation of STIP Incentives

50M ARB was initially earmarked for the STIP. However, following higher-than-expected demand for incentives by protocols and to distribute tokens across a larger number of projects, the initial allocation (Round 1) was accompanied by a Round 2 (a.k.a. the STIP Backfund). Round 2 distributed capital to all approved but not funded projects connected to the initial allocation, amounting to ~21M ARB. In other words, a total of ~71M ARB was allocated to the overall STIP program.

Most protocols funded through Round 1 began distributing incentives in early/the middle of November 2023, while Round 2 protocols generally initiated their incentive programs at the end of December 2023 and throughout January 2024. Initially, both programs were to end by January 31, 2024, but due to backfunded protocols receiving their streams with a delay, the timelines for both programs were extended to March 29, 2024.

Source: Arbitrum Forum, Arbiscan, Blockworks Research Analysis. Note: The data excludes protocols that interrupted their distribution during the STIP, have allowed users to earn ARB rewards after March 29, 2024 (note: protocols that have allowed users to collect rewards earned during the STIP after the deadline are included), are labeled as infrastructure, and have migrated from Arbitrum. The number of ARB also excludes incentives originating from protocols’ balance sheets.

During the program, nearly all projects modified their initially proposed incentive allocations, with some even implementing completely new, unmentioned mechanisms. Modifying allocations between disclosed incentive buckets should be expected as protocols need flexibility in the way they allocate incentives as their programs progress, depending on factors such as market conditions, bootstrapping needs, and perceived effectiveness. However, it was somewhat surprising to see several protocols introduce completely new incentive buckets that were not disclosed in any way in the original incentive applications or initial bi-weekly updates.

To gauge how much ARB has been utilized across different end goals, we’ve divided the distribution into four high-level groups depending on the type of user activity they primarily target. Proprietary TVL refers to incentive mechanisms that directly encouraged users to deposit liquidity into the protocol that distributed incentives. Partner TVL means that the protocol distributing incentives allocated ARB to another project’s liquidity pools. The volume category includes incentive mechanisms that directly encouraged users to move more volume through the distributing protocol’s platform.

Utilized incentive mechanisms have also been classified into different categories based on their high-level characteristics. Standard liquidity incentives refers to structures where ARB distributions and amounts across pools were decided by the protocol teams and receiving rewards didn’t require anything in addition to providing liquidity. The “Liquidity incentives, allocation across pools activity based” category is similar but the allocation structure and amounts were decided by a predetermined formula instead of being controlled by a team on a week-by-week basis. The “Liquidity incentives with optional/required long-term/perpetual capital locking” category refers to structures where users could earn more rewards by locking capital with the distributing protocol or were required to do so to be eligible for rewards. “Liquidity incentives through integrated partner protocols” means that the distributing protocol allocated liquidity incentives to another protocol, and an increase in the latter’s TVL also directly increased the allocating protocol’s TVL. “Liquidity incentives requiring native token staking/LPing” refers to a structure where a user had to acquire and either stake or LP the distributing protocols native token to be eligible for rewards. The “Liquidity incentives for native token(s) outside platform” category comprises mechanisms where distributing protocols allocated incentives to another protocol’s liquidity pools, which didn’t directly increase the distributor’s TVL. An example of this is the distributing protocol incentivizing a spot DEX’s LPs that provided liquidity for the distributor’s governance token. “Incentives to proprietary infra/partnership protocols for discretionary use” refers to mechanisms that, e.g., allocated ARB to projects to cover the costs of integrating with the distributing protocol, or allocated ARB to the distributing protocol’s partners that could freely decide how to distribute the rewards to their users.

Following the changes made throughout the STIP, the most popular mechanisms were standard liquidity incentives (~30% of total allocation), fee rebates (~25% of total allocation), trading/points/usage-based programs (~12% of total allocation), liquidity incentives for native token(s) outside platform (~8% of total allocation), and liquidity incentives with optional/required long-term/perpetual capital locking (~8% of total allocation). In aggregate, ~45% of ARB distributed was directly connected to increasing proprietary TVL, ~38% to increasing volume, ~9% to miscellaneous end goals, and ~8% to increasing partner TVL.

Source: Arbitrum Forum, Arbiscan, Blockworks Research Analysis. Note: The data excludes protocols that interrupted their distribution during the STIP, have allowed users to earn ARB rewards after March 29, 2024 (note: protocols that have allowed users to collect rewards earned during the STIP after the deadline are included), are labeled as infrastructure, and have migrated from Arbitrum. The number of ARB also excludes incentives originating from protocols’ balance sheets.

STIP Recipient Performance

We’ve chosen to focus this analysis on perp and spot DEXs as these groups were the two largest verticals to receive incentives at ~38% and ~15% of the total allocation, respectively. Moreover, it’s no secret that DeFi activity is one of Arbitrum’s main competitive strengths, with DeFi-related protocols historically accounting for over 25% of the blockchain’s sequencer revenue.

To gauge the sustainability of activity and stickiness of capital on a relative basis across different incentive mechanisms and verticals, the following sections present several normalized charts, where figures have been standardized to September 2023 beginning-of-month values. It’s important to note that it’s naturally easier for a smaller protocol to grow by, e.g., 2x, compared to a well-established protocol. However, the idea behind normalizing performance is to be able to compare how sustainable activity has been across protocols, while we have strived to analyze the relative effectiveness, impact, and perhaps fairness of the incentive distributions by normalizing absolute changes in performance metrics by ARB utilized to directly increase the relevant metrics.

The analyzed protocols are displayed in the following format: the relevant incentive mechanism(s) utilized; the size of the ARB allocation received (where Small: < 1M ARB; Medium: >= 1M ARB & <= 2M ARB; Large > 2M ARB); the round through which incentives were received. To not distort this analysis, we’ve only considered protocols that have distributed incentives continuously for over two months, haven’t been hacked since the beginning of September 2023, and have been operational before the beginning of September 2023. A few protocols were also excluded due to reliable performance data not being readily available.

Perp DEXs

TVL

Source: DefiLlama & Dune

Within the perp DEX vertical, one protocol outperformed others when looking at normalized TVL figures. This protocol is on the younger side compared to the peer group, and as mentioned earlier, it’s naturally easier to reach large relative growth numbers when initial figures are smaller. That is not to say that the result isn’t impressive, especially given that the TVL growth continued after the program ended and has now essentially stabilized. However, this growth has most likely mainly been driven by factors intrinsically connected to the protocol, such as finding product-market fit, business development efforts, native liquidity incentives, bootstrapping market makers, etc. Nevertheless, we consider this as a great example of where incentives can be beneficial, with ARB tokens likely having amplified growth, which the protocol has managed to maintain post-STIP incentives. One possible downside to consider is that some capital might have migrated from other Arbitrum-aligned perp DEXs but increased competition on the supply side is in general beneficial for end users.

Source: DefiLlama & Dune

TVL development for the rest of the protocols within the perp DEX vertical follows a somewhat unified pattern. In general, liquidity began increasing around the time each protocol’s incentive program commenced, remained elevated during the program, but started decreasing as incentives drew to an end. It also seems that Round 2 protocols have been at a disadvantage, with liquidity trending downward until their incentive programs were initiated.

For half of the group, liquidity has dropped notably below levels where it was when the incentive program began, while for the other half, liquidity is slightly higher than what it was when the programs were initiated. Theoretically, each protocol has a baseline liquidity level that it can attract, depending on yields offered, perceived riskiness of returns, as well as LPs’ opportunity cost. On a high level, yields go down when incentives end since some trading volume is bound to migrate and immediate ARB rewards to LPs taper, inducing some LPs to move to other sources of yield that they perceive to be better.

Interestingly, the two protocols that used ARB to directly incentivize LPing on their platforms didn’t see as drastic drawdowns in their TVLs as experienced by the two protocols that used no incentives for proprietary TVL. Another factor to consider is that market conditions drastically improved during the STIP and asset prices shot up. The impact of this on perp DEXs depends on the protocol design, as some perp DEXs mainly rely on stablecoins for liquidity, while others’ TVLs consist mostly of volatile crypto assets. In ETH terms, every perp DEXs’ TVL is down on a normalized basis, excluding the outperformer mentioned earlier.

Source: DefiLlama, CoinGecko, Dune

Source: DefiLlama, CoinGecko, Dune

Three of the five perp DEXs analyzed used incentives to directly increase proprietary liquidity. Looking at TVL figures at the beginning of September 2023 normalized by ARB allocated for increasing proprietary liquidity throughout the program shows that there was some notable variation in how much ARB was used as liquidity incentives relative to TVL levels. If liquidity was purely driven by direct ARB incentives, projects with smaller TVL / ARB incentives allocated starting multiples should see their multiples expand more throughout the program than projects with large starting multiples.

Source: DefiLlama, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

The best performer reached ~$135 additional 30D-MA liquidity compared to the beginning of its incentive program per ARB spent. TVL continued climbing after the program ended until the middle of April, and the metric stabilized at ~175$ at the end of June 2024. Excluding the best performer leaves a small sample size, but the long-term TVL growth per ARB spent across the other two protocols is quite similar despite the fact that one utilized notably larger liquidity incentives relative to its TVL than the other, and the utilized incentive mechanisms were quite different. This might indicate that as long as teams allocate incentives sensibly, the underlying mechanism doesn’t significantly matter since the return LPs require is similar across protocols. Simply put, opportunities with similar risk profiles should also have similar costs of capital.

Source: DefiLlama, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

Volume

The volume trends for perp DEX STIP recipients follow the TVL trends quite closely, where the same outperformer managed to grow during its incentive program as well as maintain the increased activity after the program ended. Meanwhile, the rest of the protocols experienced notable uptrends in volume coinciding with the beginning of their incentive programs, but this trend began reverting in early April, converging towards September 2023 values in June 2024. Similarly to TVL, round 2 recipients’ figures lagged behind round 1 recipients’ but began catching up in the latter half of the STIP.

Source: DefiLlama & Dune

Source: DefiLlama, CoinGecko, Dune

The outperformer utilized the least ARB to directly incentivize trading volume relative to volume at the beginning of September 2023, at 1 ARB per ~$21 of volume. This indicates that the outperformance in growth didn’t derive from an outsized allocation of incentives relative to volume when compared against other perp DEXs. For the rest of the perp DEXs, there is some spread in the Volume / ARB utilized multiple, ranging between $6-$12.

Source: DefiLlama, Dune, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

Source: DefiLlama, Dune, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

As with TVL, there is some variability across absolute volume growth achieved at the end of the STIP per ARB directly used to incentivize activity, with increases for the incentive periods ranging between ~$10 and ~$39, excluding the outperformer which achieved volume growth of ~$93 per ARB spent. However, these figures began converging in the months following the STIP’s conclusion. At the end of June 2024, the absolute changes in volume compared to the beginning of each perp DEX’s incentive program per ARB spent were ~$6, ~$1, ~-$2, and ~-$21, excluding the outperformer, for which the figure was at ~$55. In other words, projects have generally achieved similar returns when looking at a longer time period.

Source: DefiLlama, Dune, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

Spot DEXs

TVL

Within the spot DEX vertical, there was one outperformer in terms of relative TVL growth, and this growth has sustained post-incentives. Similarly to the outperformer within the perp DEX vertical, this protocol is on the smaller/younger side, meaning that only looking at relative growth can be misleading. Nevertheless, incentives have facilitated the protocol to achieve a sustainable market share increase, although this is unlikely to be the main driver for outperformance.

For the rest of the protocols, all of which are Round 1 recipients, relative TVL growth moved in tandem until the end of March 2024, when one perp DEX lost notably more liquidity compared to the rest of the group. This protocol is the only spot DEX that had a lower TVL at the end of June 2024 compared to the beginning of September 2023. Compared to perp DEXs, this vertical’s USD-denominated TVL is even more heavily driven by volatile crypto asset prices. As crypto prices generally increased in Q4 ‘23 and Q1 ‘24, USD-denominated TVL would have increased even with no asset inflows to spot DEXs. Looking at TVL denominated in ETH, only the outperformer’s figures are up from September 2023 values.

Source: DefiLlama & Token Terminal

Source: DefiLlama, CoinGecko, Token Terminal

All of the analyzed spot DEXs used all of their incentives to directly increase proprietary liquidity. The below two graphs are great examples of why it’s helpful to normalize high-level metrics by the ARB allocation size. As mentioned earlier, one spot DEX outperformed when it comes to relative TVL growth. However, the same protocol also received the largest ARB allocation relative to its TVL at the beginning of September 2023, with one ARB allocated per ~$4 of liquidity. In comparison, the other spot DEXs received one ARB per liquidity between ~$19-~$71.

Source: DefiLlama, Token Terminal, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

Performance relative to the allocation size varied widely between protocols at the end of the STIP, with Round 1 recipients having reached notably stronger results. However, three months later, all except for one protocol’s figures had converged to similar levels at ~$13, ~$17, and ~$20, while the underperformer was at ~-$25. In other words, the long-term benefit of one ARB spent has been quite similar across most spot DEXs. It’s worth noting that the underperformer utilized a predetermined formula that was based on other factors than just fees generated across pools, which might have been gameable and attracted mercenary capital, possibly explaining the drastic drawdown once the STIP ended. However, this isn’t something that we can say to be objectively true just based on the data presented. Nevertheless, in contrast, the other spot DEX that allocated LP incentives based on a predetermined formula did so purely based on fees generated. The below graph also exemplifies why purely examining growth figures achieved during the incentive program could be misleading. While one spot DEX outperformed when looking at values from the end of March, its TVL performance per ARB spent has actually been the weakest in the long term.

Source: DefiLlama, Token Terminal, Arbitrum Forum, Arbiscan, Blockworks Research Analysis

Volume

As mentioned earlier, no spot DEXs directly incentivized traders with ARB. Despite this, all protocols’ volume figures grew notably during the program, with two projects having outperformed and largely maintained volume after the program ended. The two other protocols also saw a clear uplift in volume during the program, generally coinciding with wider market trends, but the drawdown post-STIP has been more drastic on a relative basis than for the two outperformers, both of which are newer protocols. It should be noted that the active liquidity management protocol Gamma was exploited on January 4, 2024, which led to abnormally high volumes on that day for a few protocols analyzed here.

Source: DefiLlama, Token Terminal

Source: DefiLlama, Token Terminal

Again, market conditions in Q4 ‘23 and Q1 ‘24 were favorable for perp DEX volume, and relative growth should be expected for all projects when looking at USD-denominated values. Having said that, instead of simply benefitting from increased asset prices and more volatility, the two outperformers have likely grown by also increasing their market penetration, exceeding the market’s expansion. Looking at ETH-denominated volume, the two outperformers have largely maintained their relative growth, while volume for the two other projects has returned to September 2023 levels.

Source: DefiLlama, CoinGecko, Token Terminal

Source: DefiLlama, CoinGecko, Token Terminal

2 Likes

STIP Retroactive Analysis – Yield Aggregators TVL

The below research report is also available in document format here .

TL;DR

In H2 2023, Arbitrum launched the Short-Term Incentive Program (STIP) by distributing millions of ARB tokens to various protocols to drive user engagement. This report focuses on how the STIP impacted TVL in the yield aggregator vertical, specifically examining the performance of Gamma, Jones DAO, Solv Protocol, Stella and Umami. By employing the Synthetic Control (SC) causal inference method to create a “synthetic” control group, we aimed to isolate the STIP’s effect from broader market trends. For each protocol the analysis focuses on the median TVL in the period from the first day of the STIP to two weeks after the STIP ended, in an effort to include at least two weeks of persistence in the analysis.

Yield aggregator protocols utilized their STIP allocations to incentivize depositors, leading to positive impacts on their TVL. Our analysis yielded varied results: Solv Protocol, Gamma, and Jones DAO experienced increases in their median TVL, directly attributed to the STIP, of $18.6M, $16.5M, and $12.6M, respectively, from the start of the STIP to two weeks after its conclusion. Stella and Umami also benefited, with $2.4M and $1M in additional TVL, respectively, linked to STIP incentives. During the STIP period and the two weeks following its conclusion, the added TVL per dollar spent on incentives was approximately $18 for Gamma, $5 for Jones DAO, $103 for Solv Protocol, $11 for Stella and $1 for Umami Finance. While the STIP positively impacted all protocols when considering the median TVL of the corresponding period, a different picture emerges when looking at growth from the program’s start to two weeks after its end. In this “before and after” comparison, the STIP slightly negatively impacted TVL growth for Solv Protocol (-8%) and had a minimal positive impact for Stella and Jones DAO (1%), even though Stella had a large TVL increase. In contrast, Umami experienced the most significant growth due to STIP, with a 120% increase (largely indirectly due to GMX’s grant), followed by Gamma with a 45% increase. It becomes clear that sustained growth, or “stickiness” is not correlated with each protocol’s success during the program. This is further underscored by comparing the growth with the TVL one month after the STIP, where all protocols experienced a much smaller increase. In every instance, the boost from incentives proved to be somewhat temporary, with only a small portion of the growth remaining a month after the STIP concluded.

The analysis indicates that direct incentives to LPs can yield substantial TVL growth and efficient use of funds. Flexibility in strategy also proved beneficial, as Umami Finance’s switch to direct ARB emissions significantly boosted their TVL. Stella’s balanced approach of splitting incentives between strategies and lending pools also led to a notable increase. Overall, the findings suggest that smaller protocols have more room for rapid growth when given substantial incentives, while larger protocols, like Solv Protocol, might benefit from a more proportional allocation to maximize efficiency. Protocols should reward users directly, uniformly over time, and transparently for providing liquidity while maintaining flexibility to adapt to feedback.

Our methodology and detailed results underscore the complexities of measuring such interventions in a volatile market, stressing the importance of comparative analysis to understand the true impact of incentive programs like STIP.

Context and Goals

In H2 2023, Arbitrum initiated a significant undertaking by distributing millions of ARB tokens to protocols as part of the Short-Term Incentive Program (STIP), aiming to spur user engagement. This program allocated varying amounts to diverse protocols across different verticals. Our objective is to gauge the efficacy of these recipient protocols in leveraging their STIP allocations to boost the usage of their products. The challenge lies in accurately gauging the impact of the STIP amidst a backdrop of various factors, including broader market conditions.

This report pertains to the yield aggregator vertical in particular. In this vertical, the STIP recipients were Gamma, Jones DAO, Solv Protocol, Stella and Umami. Stake DAO faced some KYC issues, which significantly delayed the start of the program. As a result, Stake DAO was only able to distribute incentives for three weeks concurrently with other protocols, and its distribution is still ongoing. The following table summarizes the amount of ARB tokens received and when they were used by each protocol.

For yield aggregators, TVL is a highly relevant metric. These protocols aim to facilitate “deploy and forget” strategies that minimize user interaction, making metrics like transactions or fees less pertinent. Throughout the report, the 7-day moving average (MA) TVL was used, so any mention of TVL should be understood as the 7-day MA TVL.

We used a Causal Inference method called Synthetic Control (SC) to analyze our data. This technique helps us understand the effects of a specific event by comparing our variable of interest to a “synthetic” control group. Here’s a short breakdown:

  • Purpose: SC estimates the impact of a particular event or intervention using data over time.
  • How It Works: It creates a fake control group by combining data from similar but unaffected groups. This synthetic group mirrors the affected group before the event.
  • Why It Matters: By comparing the real outcomes with this synthetic control, we can see the isolated effect of the event.

In our analysis, we use data from other protocols to account for market trends. This way, we can better understand how protocols react to changes, like the implementation of the STIP, by comparing their performance against these market-influenced synthetic controls. The results pertain to the period from the start of each protocol’s use of the STIP until two weeks after the STIP had ended.

Results

Gamma

Gamma, a protocol specializing in active liquidity management and market-making strategies, offers non-custodial, automated, and active concentrated liquidity management services. Gamma is supported in fourteen different networks, including Arbitrum, where it launched November 1, 2022.

The STIP aimed to distribute ARB tokens to liquidity providers (LPs) who participated in qualified Gamma vaults. These vaults were built on the liquidity pools of six supported AMMs: Uniswap V3, Sushiswap V3, Ramses, Camelot, Zyberswap, and Pancakeswap. 100% of the incentives were allocated to LPs.

The primary objective of the program was to enhance liquidity on the Arbitrum network by deploying incentives on three native AMMs (Ramses, Camelot, and Zyberswap) and three non-native AMMs (Uniswap, Sushiswap, and Pancakeswap). Gamma engaged in discussions with partner AMMs to identify suitable pools to incentivize, focusing on under-capitalized pools based on their analysis of trading activity on the Arbitrum network.

Gamma’s methodology for selecting pairs considered various factors. It prioritized native pairs to Arbitrum, which typically required more liquidity and were under-capitalized. Critical infrastructure pairs, such as WETH, ARB, WBTC, and stablecoins, were also a focus, given their regular use by most users. Additionally, Gamma considered pairs that aligned with the strengths of the AMM they were on, while avoiding pairs already incentivized by other parties or overcapitalized ones. The incentive structure was designed to ensure that AMMs did not work against each other inefficiently. No matching funds for grant matching were available.

Gamma’s TVL grew from approximately $17.7M on November 15, 2023, when the STIP began, to around $43.4M at its peak in end January 2024. By the end of the STIP on March 20, 2024, the TVL was $38.0M. One week later, on March 27, the TVL was $35.3M, and two weeks after the STIP ended, on April 3, it dropped to $33.4M. By one month after the STIP concluded, on April 17, the TVL had decreased to $20.5M.

This represents a 114.2% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 99.1% increase, while two weeks after the STIP ended, there was a 88.4% increase.

It’s worth noting that on January 4, 2024, Gamma temporarily halted deposits into their vaults due to an issue affecting four of the stable and LST vaults. Following OpenZeppelin’s investigation and their confirmation that Gamma’s mitigation was effective, deposits resumed on January 23rd.

The median impact of the STIP on Gamma’s TVL, from its start on November 15, 2023, to its end on March 20, 2024, was $16.2M. Including the TVL two weeks after the STIP concluded, the impact was $16.5M. Gamma received a total of 750k ARB, valued at approximately $900k (at $1.20 per ARB). This indicates that the STIP generated an average of $18.3 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 95% statistical significance.

Jones DAO

Jones DAO, a yield, strategy, and liquidity protocol, offers vaults that provide easy access to different strategies, aiming to enhance liquidity and capital efficiency for DeFi through yield-bearing tokens. In the STIP, Jones DAO requested 2 million ARB tokens to be allocated as follows: 82.5% for user incentives in current vaults and 17.5% for user incentives in future vaults.

A significant portion of these incentives was allocated to GLP-related products, while GMX focused on V2 growth in its program. Since future products were not released in time, all the incentives were ultimately distributed to existing products. Jones DAO’s execution strategy aimed to distribute 100% of the ARB allocation directly to Jones Vault users and the lending strategies built upon these vaults.

The distribution of ARB tokens was designed to align with the current yield distribution methods of Jones strategies. For instance, if a vault distributed yield weekly, ARB tokens would also be distributed weekly. Conversely, if yield distribution was constant, ARB tokens would be streamed continuously.

Jones DAO concluded that reducing the relative percentage of rewards per category and focusing more on integrations within the Arbitrum ecosystem, rather than solely on native farms, could enhance capital efficiency.

Jones DAO’s TVL grew from approximately $15.6M on November 28, 2023, when the STIP began, to around $30.7M at its peak in end January 2024. By the end of the STIP on March 29, 2024, the TVL was $25.2M. One week later, on April 5, the TVL was $18.2M, and two weeks after the STIP ended, on April 12, it dropped to $14.4M. By one month after the STIP concluded, on April 29, the TVL had decreased to $12.8M.

This represents a 61.8% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 16.5% increase, while two weeks after the STIP ended, there was a 7.6% decrease.

The median impact of the STIP on Jones DAO’s TVL, from its start on November 28, 2023, to its end on March 29, 2024, was $12.8M. Including the TVL two weeks after the STIP concluded, the impact was $12M. Jones DAO received a total of 2M ARB, valued at approximately $2.4M (at $1.20 per ARB). This indicates that the STIP generated an average of $5.25 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 95% statistical significance.

Solv Protocol

Solv Protocol has launched innovative products like Vesting Vouchers, Bond Vouchers, and Fund SFT for on-chain funds. Solv V3 offers a transparent platform where global institutions and retail investors can access a variety of trusted crypto investments. It also supports fund managers in raising capital and establishing on-chain credibility.

In the STIP, Solv Protocol planned to issue multiple DeFi market-making funds designed to provide users with consistent and appealing returns in a controlled environment with relatively low risk, exemplified by the open-end GMX fund with a $20,000,000 capacity. To enhance yield returns and bootstrap token governance, Solv incorporated a plan for token emissions, with the addition of ARB tokens aiming to attract high-value Arbitrum users.

100% of the allocated ARB (150,000 ARB) was designated as extra incentives for fund products on Solv Arbitrum, split equally between Offchain/RWA funds and Onchain Delta Neutral Strategy funds. The ARB incentives were proportionally allocated among users based on their cumulative daily holdings across vaults on Arbitrum. These incentives were airdropped directly to Solv vault investors after the completion of each of the three distribution epochs.

Additionally, Solv Protocol confirmed its commitment to grant matching with future token issuance.

Solv Protocol’s TVL grew from approximately $71.3M on January 1, 2024, when the STIP began, to around $122.6M at its peak in March 2024. By the end of the STIP on March 29, 2024, the TVL was $106.1M. One week later, on April 5, the TVL was $105.2M, and two weeks after the STIP ended, on April 12, $105.9M. By one month after the STIP concluded, on April 26, the TVL had decreased to $89.7M.

This represents a 48.8% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 47.5% increase, while two weeks after the STIP ended, the increase remained steady around 48.5%.

The median impact of the STIP on Solv Protocol’s TVL, from its start on January 1, 2024, to its end on March 29, 2024, was $25.1M. Including the TVL two weeks after the STIP concluded, the impact was $18.6M. Solv Protocol received a total of 150k ARB, valued at approximately $180k (at $1.20 per ARB). This indicates that the STIP generated an average of $103.5 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 90% statistical significance.

Stella

Stella, a leveraged yield farming protocol on Arbitrum, offers 0% cost to borrow and enables leveraged strategies on yield sources like Uniswap DEXs, TraderJoe liquidity book, and the Pendle PT pools.

The protocol is divided into two parts: Stella Strategies (leveraged strategies) and Stella Lend (lending pools). A total of 186,000 ARB tokens were allocated as incentives, distributed between these two parts as follows:

  • Approximately 66,000 ARB tokens were designated for Stella Strategies, providing an additional 20% yield for profitable leveraged positions. This incentive was exclusive to profitable positions to prevent sybil attacks and encourage good behavior.
  • Around 120,000 ARB tokens were allocated to Stella Lending pools. The exact incentive amount for each pool was determined dynamically based on what seemed appropriate.

Stella aimed to stimulate both the strategy and lending sides, initiating a positive feedback loop for growth.

According to Stella, this experience highlighted the need to adjust protocol mechanics to make lending more attractive, as the borrow capacity was consistently maxed out while lending liquidity lagged. To address this, Stella implemented an “airdrop points sharing” system where lenders earned 50% of points from EigenLayer and LRT, enhancing the appeal of lending.

The ARB incentives were distributed innovatively. For Stella Strategies, the incentives were auto-deposited into the ARB lending pool on Stella with a linear vesting period of 30 days, preventing immediate dumping and allowing leverage users to earn additional lending yields over this period. This approach also helped bootstrap liquidity in the ARB lending pool, benefiting both the lending and leveraged farming sides.

Stella’s TVL grew from approximately $2.3M on November 3, 2023, when the STIP began, to around $9.3M at its peak in March 2024. By the end of the STIP on March 29, 2024, the TVL was $7.2M. One week later, on April 5, the TVL was $6.0M, and two weeks after the STIP ended, on April 12, it dropped to $5.1M. By one month after the STIP concluded, on April 29, the TVL had decreased to $3.8M.

This represents a 211.3% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 158.9% increase, while two weeks after the STIP ended, the increase was 120.5%.

The median impact of the STIP on Stella’s TVL, from its start on November 3, 2023, to its end on March 29, 2024, was $2.5M. Including the TVL two weeks after the STIP concluded, the impact was $2.4M. Stella received a total of 186k ARB, valued at approximately $223k (at $1.20 per ARB). This indicates that the STIP generated an average of $10.8 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 95% statistical significance.

Umami Finance

Umami Finance implemented an oARB emissions tool, inspired by Dolomite’s system, to distribute their STIP allocation. This tool allowed users to stake their Umami vault receipt tokens for oARB emissions, which were continuously emitted and could be vested on a first-come, first-served basis. The emitted oARB could be staked for a duration of up to four months, in weekly increments, paired with an equal amount of ARB. After the vesting period, users could obtain the underlying ARB at a discounted price, with the discount increasing by 2.5% per additional week staked, subject to change based on feedback. However, if the ARB rewards pool was depleted, all remaining oARB tokens would expire worthless. The staked ARB paired with oARB was deposited back into Umami to improve capital efficiency for farmers.

The implementation of oARB emissions faced challenges, particularly with the availability of ARB from vesting contracts. While early depositors initially enjoyed high returns, issues arose towards the end of the period as more users opted for the non-ETH investment 40-week option, necessitating a tapering of emissions. Looking ahead, Umami Finance decided to adopt a direct incentive approach with dynamic incentives, instead of the oARB incentive mechanism.

Umami Finance also used GMX’s grant of 100,000 ARB, and so was able to delay ARB yield from the STIP and utilize direct ARB emissions with the GMX grant for 45 days. 702,775 ARB was distributed through the scaling GLP vaults with the oARB emissions program, which concluded on January 26th. After this, the remaining 47,225 tokens from the STIP together with GMX’s 100k ARB were allocated to GM Vault direct emissions.

Umami’s TVL grew from approximately $3.3M on November 13, 2023, when the STIP began, to around $11.2M at its peak in February 2024. By the end of the STIP on March 29, 2024, the TVL was $10.5M. One week later, on April 5, the TVL was $11.1M, and two weeks after the STIP ended, on April 12, it dropped to $10.6M. By one month after the STIP concluded, on April 29, the TVL was at $10.7M.

This represents a 220.8% increase in TVL from before the STIP to after it. Comparing the start of the STIP to the TVL one week after its end shows a 238.6% increase, while two weeks after the STIP ended, the increase was 223.7%.

The median impact of the STIP on Umami’s TVL, from its start on November 13, 2023, to its end on March 29, 2024, was $726.7k. Including the TVL two weeks after the STIP concluded, the impact was $1.2M. Umami Finance received a total of 750k ARB, valued at approximately $900k (at $1.20 per ARB). This indicates that the STIP generated an average of $1.2 in TVL per dollar spent during its duration and the following two weeks. These results were gathered with 95% statistical significance.

Main Takeaways

Our analysis produced interesting results for the impact of the STIP on Gamma, Jones DAO, Solv Protocol, Stella and Umami. The table below summarizes these results. A further explanation of how this and all results were derived can be found in the Methodology section.

A summary of the key differentiators in incentive allocation is shown in the table below.

All yield aggregator protocols used the STIP allocation to provide direct incentives to their depositors, resulting in a positive impact on their TVL. Notably, these protocols vary significantly in the products they offer and their operational methods. While diversity exists in other DeFi verticals as well, it is particularly pronounced among yield aggregators. For instance, although classified as yield aggregators within the context of STIP and the Arbitrum DAO, entities like DefiLlama categorize them differently, including them in various verticals such as yield protocols, liquidity managers, RWA, and leveraged farming.

Smaller protocols like Stella (starting at $2.3M) and Umami ($3.3M) saw the largest TVL increases from the start of the STIP to their respective peak TVLs. However, Umami maintained its TVL increase in a much steadier fashion even two weeks after the STIP. While the STIP generated a 120% TVL growth for Umami, Stella only saw a 1% increase in this period. Despite this, with Umami’s grant size being four times that of Stella, the efficiency of the STIP in terms of median TVL added per dollar was much higher for Stella, at $10.83 compared to Umami’s $1.20. This median TVL considers the entire duration of the STIP plus two weeks post-STIP.

Gamma’s TVL growth from start to peak was the third largest, further supporting the idea that smaller protocols benefit proportionally more from the program. Nearly all of Gamma’s growth, 116%, can be attributed to the STIP, compared to its total growth of 145%. Even considering the growth from start to two weeks after the STIP’s end, the 45% increase attributed to the STIP is significant. Gamma’s median added TVL attributed to the STIP was $18.33 per dollar spent, higher than both Stella and Umami due to its efficient allocation.

Jones DAO started with a TVL similar to Gamma but experienced less growth from start to peak, with a 55% increase attributed to the STIP during this period. Notably, there was a decrease in TVL when comparing two weeks post-STIP to the pre-STIP period. However, the STIP still had a positive impact, albeit small, of 1%. The median TVL added by the STIP was $12.6M, resulting in $5.25 TVL per dollar, which is on the lower end for this group. The team concluded that focusing more on integrations within the Arbitrum ecosystem, rather than solely on native farms, could enhance capital efficiency.

Solv Protocol began the STIP with nearly five times the TVL of Jones DAO or Gamma, yet still saw a comparable increase of 48.6% during the STIP period and two weeks after. Although the absolute median TVL added was the largest, it wasn’t substantially higher than Gamma or Jones DAO when considering the higher starting point. Our analysis reveals an 8% TVL decrease attributed to the STIP in the period after two weeks and only a 2% increase at its peak TVL. This suggests that while the STIP initially boosted growth, it stagnated towards the end. This stagnation might be due to the uneven distribution method over three epochs, with the first batch of incentives released only on March 5, 2024, after three months of user participation. The two remaining batches were all distributed within March. Despite this, Solv Protocol achieved a median of $103.47 added TVL per dollar spent. Notably, Solv’s allocation was the smallest at 150k ARB, similar to Stella’s 186K ARB and much smaller than Jones DAO’s 2M ARB. A more proportional allocation might have enabled the protocol to perform better.

Direct incentives to LPs seem effective: Gamma, which allocated 100% of incentives directly to LPs, saw a significant TVL impact and a good TVL per dollar spent ratio. Diversified incentive allocation can also be beneficial: Stella, which split incentives between strategies and lending pools, achieved the highest TVL increase and a high STIP impact. This indicates that a balanced approach targeting different aspects of the protocol can yield positive results.

Flexibility and adaptability are crucial: Umami Finance’s experience demonstrates that adjusting strategies—such as switching from oARB emissions to direct ARB emissions—can lead to significantly better results. They saw the highest impact on their TVL after this change, indicating that direct emissions are much better received than strategies involving lock and vesting, like oARB. The substantial increase in TVL attributed to the STIP two weeks after its end, which includes the GMX grant, highlights this point. However, because Umami’s initial allocation was quite large, especially compared to similarly sized protocols like Stella, the efficiency measured by added TVL per dollar was relatively low.

The analysis suggests that smaller protocols might have more room for rapid growth when given significant incentives. However, while smaller protocols showed higher percentage overall growth, larger protocols often saw more significant absolute increases in TVL per dollar spent. This is an important distinction when evaluating the impact of incentives. A more proportional allocation of incentives might have enhanced the efficiency of distribution, as evidenced by Solv Protocol’s high added TVL per dollar compared to Umami’s lower efficiency.

Overall, the most successful approaches appear to be those that directly, uniformly over time and transparently reward users for providing liquidity, while maintaining the flexibility to adapt to changing market conditions or user behaviors. Protocols that can balance simplicity with strategic allocation across their ecosystem seem to achieve the best results in terms of TVL growth and efficient use of incentives.

It’s also interesting to note that most protocols’ TVL, normalized by ARB allocation, converges to values between $5 and $18 per ARB spent. Solv Protocol is an outlier due to its disproportionate allocation relative to its size. This indicates that, while the liquidity benefit of one ARB spent varies significantly throughout the program, the long-term benefit remains fairly consistent across most yield aggregators.

Methodology

TLDR: We employed an analytical approach known as the Synthetic Control (SC) method. The SC method is a statistical technique utilized to estimate causal effects resulting from binary treatments within observational panel (longitudinal) data. Regarded as a groundbreaking innovation in policy evaluation, this method has garnered significant attention in multiple fields. At its core, the SC method creates an artificial control group by aggregating untreated units in a manner that replicates the characteristics of the treated units before the intervention (treatment). This synthetic control serves as the counterfactual for a treatment unit, with the treatment effect estimate being the disparity between the observed outcome in the post-treatment period and that of the synthetic control. In the context of our analysis, this model incorporates market dynamics by leveraging data from other protocols (untreated units). Thus, changes in market conditions are expected to manifest in the metrics of other protocols, thereby inherently accounting for these external trends and allowing us to explore whether the reactions of the protocols in the analysis differ post-STIP implementation.

To achieve the described goals, we turned to causal inference. Knowing that “association is not causation”, the study of causal inference lies in techniques that try to figure out how to make association be causation. The classic notation of causality analysis revolves around a certain treatment , which doesn’t need to be related to the medical field, but rather is a generalized term used to denote an intervention for which we want to study the effect. We typically consider the treatment intake for unit i, which is 1 if unit i received the treatment and 0 otherwise. Typically there is an , the observed outcome variable for unit i. This is our variable of interest, i.e., we want to understand what the influence of the treatment on this outcome was. The fundamental problem of causal inference is that one can never observe the same unit with and without treatment, so we express this in terms of potential outcomes. We are interested in what would have happened in the case some treatment was taken. It is common to call the potential outcome that happened the factual, and the one that didn’t happen, the counterfactual. We will use the following notation:

- the potential outcome for unit i without treatment

- the potential outcome for the same unit i with the treatment.

With these potential outcomes, we define the individual treatment effect to be . Because of the fundamental problem of causal inference, we will actually never know the individual treatment effect because only one of the potential outcomes is observed.

One technique used to tackle this is Difference-in-Difference (or diff-in-diff). It is commonly used to analyze the effect of macro interventions such as the effect of immigration on unemployment, the effect of law changes in crime rates, but also the impact of marketing campaigns on user engagement. There is always a period before and after the intervention and the goal is to extract the impact of the intervention from a general trend. Let be the potential outcome for treatment D on period T (0 for pre-intervention and 1 for post-intervention). Ideally, we would have the ability to observe the counterfactual and estimate the effect of an intervention as: #0), the causal effect being the outcome in the period post-intervention in the case of a treatment minus the outcome in the same period in the case of no treatment. Naturally, is counterfactual so it can’t be measured. If we take a before and after comparison, -E%5BY(0)%7CD%3D1)#0) we can’t really say anything about the effect of the intervention because there could be other external trends affecting that outcome.

The idea of diff-in-diff is to compare the treated group with an untreated group that didn’t get the intervention by replacing the missing counterfactual as such: . We take the treated unit before the intervention and add a trend component to it, which is estimated using the control . We are basically saying that the treated unit after the intervention, had it not been treated, would look like the treated unit before the treatment plus a growth factor that is the same as the growth of the control.

An important thing to note here is that this method assumes that the trends in the treatment and control are the same. If the growth trend from the treated unit is different from the trend of the control unit, diff-in-diff will be biased. So, instead of trying to find a single untreated unit that is very similar to the treated, we can forge our own as a combination of multiple untreated units, creating a synthetic control.

That is the intuitive idea behind using synthetic control for causal inference. Assuming we have units and unit 1 is affected by an intervention. Units are a collection of untreated units, that we will refer to as the “donor pool”. Our data spans T time periods, with periods before the intervention. For each unit j and each time t, we observe the outcome . We define as the potential outcome without intervention and the potential outcome with intervention. Then, the effect for the treated unit at time t, for is defined as . Here is factual but is not. The challenge lies in estimating .

Source: 15 - Synthetic Control — Causal Inference for the Brave and True

Since the treatment effect is defined for each period, it doesn’t need to be instantaneous, it can accumulate or dissipate. The problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of the treated unit, had it not been treated.

The most straightforward approach is to consider that a combination of units in the donor pool may approximate the characteristics of the treated unit better than any untreated unit alone. So we define the synthetic control as a weighted average of the units in the control pool. Given the weights the synthetic control estimate of is .

We can estimate the optimal weights with OLS like in any typical linear regression. We can minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period. Hence, creating a “fake” unit that resembles the treated unit before the intervention, so we can see how it would behave in the post-intervention period.

In the context of our analysis, this means that we can include all other yield aggregator protocols that did not receive the STIP in our donor pool and estimate a “fake”, synthetic, control yield aggregator protocol that follows the trend of any particular one we want to study in the period before receiving the STIP. As mentioned before, the metric of interest chosen for this analysis was TVL and, in particular, we calculated the 7-day moving average to smooth the data. Then we can compare the behavior of our synthetic control with the factual and estimate the impact of the STIP by taking the difference. We are essentially comparing what would have happened, had the protocol not received the STIP with what actually happened.

However, sometimes regression leads to extrapolation, i.e., values that are outside of the range of our initial data and can possibly not make sense in our context. This happened when estimating our synthetic control, so we constrained the model to do only interpolation. This means we restrict the weights to be positive and sum up to one so that the synthetic control is a convex combination of the units in the donor pool. Hence, the treated unit is projected in the convex hull defined by the untreated unit. This means that there probably won’t be a perfect match of the treated unit in the pre-intervention period and that it can be sparse, as the wall of the convex hull will sometimes be defined only by a few units. This works well because we don’t want to overfit the data. It is understood that we will never be able to know with certainty what would have happened without the intervention, just that under the assumptions we can make statistical conclusions.

Formalizing interpolation, the synthetic control is still defined in the same way by . But now we use the weights that minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period , subject to the restriction that are positive and sum to one.

We get the optimal weights using quadratic programming optimization with the described constraints on the pre-STIP period and then use these weights to calculate the synthetic control for the total duration of time we are interested in. We initialized the optimization for each analysis with different starting weight vectors to avoid introducing bias in the model and getting stuck in local minima. We selected the one that minimized the square difference in the pre-intervention period.

As an example, below is the resulting chart for Gamma, showing the factual TVL observed in Gamma and the synthetic control.

With the synthetic control, we can then estimate the effect of the STIP as the gap between the factual protocol TVL and the synthetic control, .

To understand whether the result is statistically significant and not just a possible result we got due to randomness, we use the idea of Fisher’s Exact Test. We permute the treated and control units exhaustively by, for each unit, pretending it is the treated one while the others are the control. We create one synthetic control and effect estimates for each protocol, pretending that the STIP was given to another protocol to calculate the estimated impact for this treatment that didn’t happen. If the impact in the protocol of interest is sufficiently larger when compared to the other fake treatments (“placebos”), we can say our result is statistically significant and there is indeed an observable impact of the STIP on the protocol’s TVL. The idea is that if there was no STIP in the other protocols and we used the same model to pretend that there was, we wouldn’t see any impact.

References

“Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.”

Aayush Agrawal - Causal inference with Synthetic Control using Python and SparseSC

01 - Introduction To Causality — Causal Inference for the Brave and True

2 Likes

STIP Retroactive Analysis – Sequencer Revenue

The below research report is also available in document format here .

TL;DR

Our findings show that 43% of Arbitrum’s revenue between November 2023 and the Dencun update was attributable to the STIP, with $15.2M recouped from sequencer revenue against the $85.2M spent. Although the STIP had a positive short-term impact on market presence, its long-term effectiveness remains uncertain. The program likely helped maintain Arbitrum’s prominence and market share amidst intensifying competition from other L2s, influencing protocol launch decisions, amongst others. However, the $60M net loss signals the need for better-structured future incentive programs.

Context and Goals

Starting November 2023, Arbitrum launched the Short-Term Incentive Program (STIP), distributing millions of ARB tokens to various protocols to boost user engagement. This initiative allocated different amounts to a wide range of protocols across multiple sectors. Previously, Blockworks Research examined several protocols within specific verticals — perp DEXs, spot DEXs, and yield aggregators — to measure the STIP’s impact on key metrics.

In this analysis, we aim to assess the STIP’s overall effect on the Arbitrum network by examining its impact on sequencer revenue. The primary goal of the STIP was to attract more users to the recipient protocols and the broader ecosystem, fostering growth and activity. An increase in sequencer revenue would indicate a successful incentive program, where the costs of ARB incentives are at least partially offset by sequencer revenue in ETH.

Due to the Dencun upgrade, which significantly reduced fees across all L2s, the expected costs and revenue underwent considerable changes. Consequently, we set March 13, 2024, as the cutoff date for our analysis. Most protocols completed their distribution by March 29, 2024, ensuring that our cutoff includes the bulk of the incentive distribution period. The analysis period starts from November 2, 2023, when the first protocol began distributing its allocation.

Results

Below is the monthly sequencer revenue for major L2 networks considered in our analysis. One can see the increase in Arbitrum’s sequencer revenue dominance during STIP.

By zooming out to daily revenue using a 30-day moving average, we can gain a more detailed view of its evolution over time, as shown in the following chart.

Given that most L2 protocols have not been active as long as Arbitrum, we aimed to include more protocols in our modeling of Arbitrum’s revenue by using data starting from August 1, 2023, excluding Blast. The result of the synthetic control compared to Arbitrum’s own revenue is shown in the chart below.

By comparing the synthetic control, which represents the expected sequencer revenue for Arbitrum without the STIP, to the actual sequencer revenue observed during the same period, we can determine the STIP’s impact.

To evaluate the total cumulative impact, we calculate the area under the curve, which totals $15.2M. Comparing this to the total revenue of $35.1M during the period, we conclude that 43% of the revenue is attributable to the STIP. However, the total spent on the STIP was 71M ARB, equating to $85.2M at an average price of $1.2 per ARB.

Main Takeaways

The analysis concluded that 43% of Arbitrum’s revenue between November 2023 and the dencun update was attributable to the STIP. However, while $85.2M was spent on the STIP, only $15.2M was directly recouped through sequencer revenue. It’s important to note that the primary goal of the Arbitrum STIP was to foster the ecosystem. Even though the immediate revenue did not cover the cost of the STIP, there may be other long-lasting positive effects on the network. However, the loss of $60M is considerable and alerts for better preparation of such programs by the DAO in the future.

Given the significant traction other L2s have gained since earlier this year, Arbitrum’s competitive landscape has become more intense. The STIP likely helped maintain Arbitrum’s prominence and market share, particularly influencing new protocols’ decisions on which network to launch and generally where to focus their efforts. Although the STIP boosted Arbitrum’s market presence in the short term, its ability to drive long-term sustainable growth and expand market share is still unclear. Several common themes emerged in both the previous STIP retroactive analysis and our operational analysis, attributing the program’s individual success amongst protocols to various factors, especially the incentive mechanism used. These insights should inform future programs to minimize losses and maximize the effectiveness of ARB spending.

Additionally, evaluating persistent metrics like the number of new users who remained active in the ecosystem after onboarding through the STIP would offer valuable insights for future research.

Methodology

TLDR: We employed an analytical approach known as the Synthetic Control (SC) method. The SC method is a statistical technique utilized to estimate causal effects resulting from binary treatments within observational panel (longitudinal) data. Regarded as a groundbreaking innovation in policy evaluation, this method has garnered significant attention in multiple fields. At its core, the SC method creates an artificial control group by aggregating untreated units in a manner that replicates the characteristics of the treated units before the intervention (treatment). This synthetic control serves as the counterfactual for a treatment unit, with the treatment effect estimate being the disparity between the observed outcome in the post-treatment period and that of the synthetic control. In the context of our analysis, this model incorporates market dynamics by leveraging data from other protocols (untreated units). Thus, changes in market conditions are expected to manifest in the metrics of other protocols, thereby inherently accounting for these external trends and allowing us to explore whether the reactions of the protocols in the analysis differ post-STIP implementation.

To achieve the described goals, we turned to causal inference. Knowing that “association is not causation”, the study of causal inference lies in techniques that try to figure out how to make association be causation. The classic notation of causality analysis revolves around a certain treatment , which doesn’t need to be related to the medical field, but rather is a generalized term used to denote an intervention for which we want to study the effect. We typically consider the treatment intake for unit i, which is 1 if unit i received the treatment and 0 otherwise. Typically there is an , the observed outcome variable for unit i. This is our variable of interest, i.e., we want to understand what the influence of the treatment on this outcome was. The fundamental problem of causal inference is that one can never observe the same unit with and without treatment, so we express this in terms of potential outcomes. We are interested in what would have happened in the case some treatment was taken. It is common to call the potential outcome that happened the factual, and the one that didn’t happen, the counterfactual. We will use the following notation:

- the potential outcome for unit i without treatment

- the potential outcome for the same unit i with the treatment.

With these potential outcomes, we define the individual treatment effect to be . Because of the fundamental problem of causal inference, we will actually never know the individual treatment effect because only one of the potential outcomes is observed.

One technique used to tackle this is Difference-in-Difference (or diff-in-diff). It is commonly used to analyze the effect of macro interventions such as the effect of immigration on unemployment, the effect of law changes in crime rates, but also the impact of marketing campaigns on user engagement. There is always a period before and after the intervention and the goal is to extract the impact of the intervention from a general trend. Let be the potential outcome for treatment D on period T (0 for pre-intervention and 1 for post-intervention). Ideally, we would have the ability to observe the counterfactual and estimate the effect of an intervention as: #0), the causal effect being the outcome in the period post-intervention in the case of a treatment minus the outcome in the same period in the case of no treatment. Naturally, is counterfactual so it can’t be measured. If we take a before and after comparison, -E%5BY(0)%7CD%3D1)#0) we can’t really say anything about the effect of the intervention because there could be other external trends affecting that outcome.

The idea of diff-in-diff is to compare the treated group with an untreated group that didn’t get the intervention by replacing the missing counterfactual as such: . We take the treated unit before the intervention and add a trend component to it, which is estimated using the control . We are basically saying that the treated unit after the intervention, had it not been treated, would look like the treated unit before the treatment plus a growth factor that is the same as the growth of the control.

An important thing to note here is that this method assumes that the trends in the treatment and control are the same. If the growth trend from the treated unit is different from the trend of the control unit, diff-in-diff will be biased. So, instead of trying to find a single untreated unit that is very similar to the treated, we can forge our own as a combination of multiple untreated units, creating a synthetic control.

That is the intuitive idea behind using synthetic control for causal inference. Assuming we have units and unit 1 is affected by an intervention. Units are a collection of untreated units, that we will refer to as the “donor pool”. Our data spans T time periods, with periods before the intervention. For each unit j and each time t, we observe the outcome . We define as the potential outcome without intervention and the potential outcome with intervention. Then, the effect for the treated unit at time t, for is defined as . Here is factual but is not. The challenge lies in estimating .

Source: 15 - Synthetic Control — Causal Inference for the Brave and True

Since the treatment effect is defined for each period, it doesn’t need to be instantaneous, it can accumulate or dissipate. The problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of the treated unit, had it not been treated.

The most straightforward approach is to consider that a combination of units in the donor pool may approximate the characteristics of the treated unit better than any untreated unit alone. So we define the synthetic control as a weighted average of the units in the control pool. Given the weights the synthetic control estimate of is .

We can estimate the optimal weights with OLS like in any typical linear regression. We can minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period. Hence, creating a “fake” unit that resembles the treated unit before the intervention, so we can see how it would behave in the post-intervention period.

In the context of our analysis, this means that we can include all other L2 protocols that did not take part in the STIP in our donor pool and estimate a “fake”, synthetic, control L2 protocol that follows the trend of Arbitrum in the period before receiving the STIP. As mentioned before, the metric of interest chosen for this analysis was sequencer revenue and, in particular, we calculated the 30-day moving average to smooth the data. Then we can compare the behavior of our synthetic control with the factual and estimate the impact of the STIP by taking the difference. We are essentially comparing what would have happened, had the protocol not received the STIP with what actually happened.

However, sometimes regression leads to extrapolation, i.e., values that are outside of the range of our initial data and can possibly not make sense in our context. This happened when estimating our synthetic control, so we constrained the model to do only interpolation. This means we restrict the weights to be positive and sum up to one so that the synthetic control is a convex combination of the units in the donor pool. Hence, the treated unit is projected in the convex hull defined by the untreated unit. This means that there probably won’t be a perfect match of the treated unit in the pre-intervention period and that it can be sparse, as the wall of the convex hull will sometimes be defined only by a few units. This works well because we don’t want to overfit the data. It is understood that we will never be able to know with certainty what would have happened without the intervention, just that under the assumptions we can make statistical conclusions.

Formalizing interpolation, the synthetic control is still defined in the same way by . But now we use the weights that minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period , subject to the restriction that are positive and sum to one.

We get the optimal weights using quadratic programming optimization with the described constraints on the pre-STIP period and then use these weights to calculate the synthetic control for the total duration of time we are interested in. We initialized the optimization for each analysis with different starting weight vectors to avoid introducing bias in the model and getting stuck in local minima. We selected the one that minimized the square difference in the pre-intervention period.

Below is the resulting chart for Arbitrum, showing the factual sequencer revenue observed in Arbitrum and the synthetic control.

With the synthetic control, we can then estimate the effect of the STIP as the gap between the factual revenue and the synthetic control, .

To understand whether the result is statistically significant and not just a possible result we got due to randomness, we use the idea of Fisher’s Exact Test. We permute the treated and control units exhaustively by, for each unit, pretending it is the treated one while the others are the control. We create one synthetic control and effect estimates for each protocol, pretending that the STIP was given to another protocol to calculate the estimated impact for this treatment that didn’t happen. If the impact in the protocol of interest is sufficiently larger when compared to the other fake treatments (“placebos”), we can say our result is statistically significant and there is indeed an observable impact of the STIP on the Arbitrum’s sequencer revenue. The idea is that if there was no STIP in the other protocols and we used the same model to pretend that there was, we wouldn’t see any impact.

References

“Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.”

Aayush Agrawal - Causal inference with Synthetic Control using Python and SparseSC

01 - Introduction To Causality — Causal Inference for the Brave and True

1 Like

TBV Research

The below report is available in document format here

Introduction

On Jun 3, 2024, cupojoseph proposed a pilot stage for funding R&D for the implementation and execution of operating a treasury-backed vault (TBV). He suggests that leveraging a TBV and issuing a stablecoin against ARB (and other treasury assets) in the vault could be a more effective funding mechanism for grants and other initiatives–noting leveraging ARB to issue stablecoins could relieve ARB selling pressure.

After back and forth discourse amongst the community, both in favor and against a DAO-owned TBV, members of the DAO (Entropy and L2Beat) expressed that funding R&D for this proposal is redundant given the ARDC. Hence, Blockworks Research and Chaos Labs were assigned the task of outlining the operational structure and performing a risk analysis of operating a TBV.

Summary

In collaboration with Chaos Labs, we highlight operational requirements and suggest R&D, an oversight committee, and a risk management committee are essential pieces for effectively operating a CDP, or TBV. We also consider the operational setup, and compare the high level trade-offs between a simple framework and a more complex one.

Alongside the operational setup, we provide a risk analysis on leveraging ARB in a DAO-owned CDP, offer recommendations for how to properly set the relevant parameters and manage the risk, and share our recommendation and interpretations.

We ultimately conclude operating and managing a CDP or TBV using ARB is highly capital inefficient (especially taking into consideration the DAO’s past spending) and is not worth the risk/reward. This is due to the highly volatile nature of ARB, lack of available liquidity, and the sheer amount of capital that is required to adequately support the DAO’s historical spending behaviors. However, it is worth revisiting leveraging ARB in a debt position once market conditions and ARB liquidity have considerably matured.

That said, this preliminary work is sufficient for the DAO to gauge interest in this direction, and with an updated perspective on the nature of operating a CDP, we encourage members of the community and delegates to carefully review and open further discussions. If necessary, further research can, for example, outline the risks associated with managing a TBV, specifically, at different scales.

Overview of CDPs and TBVs

For educational purposes, here is an overview of existing protocols that the DAO could utilize. While some of these protocols are not active in the Arbitrum ecosystem, or are active in select parts, we recommend reviewing these CDP mechanisms to gain a more thorough understanding.

Maker DAO and DAI

Maker’s main product is DAI, a stablecoin that’s pegged to the U.S. Dollar. DAI’s supply curve moves through a credit manager set of smart contracts on Ethereum. This creates a mechanism by which actors are incentivized to adjust the supply curve to keep the peg price at $1 as market conditions and DAI demand fluctuate. Additionally, Maker allows users to mint DAI through the Maker Vaults, where users borrow DAI and lock up their assets as collateral, repaying the accrued fees on a later date.

Maker uses several risk parameters for each collateral type:

  • Liquidation ratio: the minimum ratio of collateral value to debt per Vault (usually overcollateralized).
  • Debt ceiling: the max amount of DAI generated for a collateral type.
  • Stability fee: the fee that accrues on debt in a vault on an annual basis (ex: 5% per year)

Maker’s mechanisms to maintain solvency are:

  • DAI issuance is influenced by stability fees, DAI savings rate, debt ceiling adjustments, and the Peg Stability Module.
  • Peg Stability Module: The Peg Stability Module (PSM) maintains the stability of DAI by enabling users to swap collateral tokens directly for DAI. Users can swap other stables for DAI which helps keep DAI fixed to the value of $1. Importantly, the amount of trades the PSM can offer is limited by the debt ceiling of deposited collateral. Thus, upon crossing the debt threshold, the PSM cannot trade in stables for DAI.
  • Liquidation: The closer the collateral value of a vault is to the verge of debt, the more risk the vault takes on. Thus the system liquidates overly risky vaults, and liquidations are performed through a gradual dutch auction.
  • MKR Mint/Burn: If bad debt hits a certain threshold, a Flapper/Flopper smart contract burns, mints, and sells MKR in order to recap the MakerDAO protocol in times of insolvency.

All interactions with vaults are done through the DssCdpManager (CDP Manager) contract. Vaults may also be transferred between owners using this same contract.

Curve and crvUSD

While crvUSD is not live on Arbitrum as of now, Curve is a potential option should crvUSD expand to the Arbitrum network.

In the typical hard liquidation, when a borrower’s loan becomes undercollateralized, their position is liquidated and converted. Curve’s lending-liquidating AMM algorithm (LLAMMA) is able to soft liquidate, where it gradually converts the collateral into stablecoins as the value of the collateral decreases, preserving some of the borrower’s original capital.

Curve only offers overcollateralized loans, with the loan-to-value ratio dependent on the risk of the collateral. Thus, depending on the collateral, there exists a set of bands for the liquidation range for which soft liquidations can occur. Unlike other AMMs, the price offered by Curve’s LLAMMA is not reliant on the balance of assets in the pool, rather an external price oracle. As a result, if the oracle price increases/decreases by 1% then the LLAMMA price will increase/decrease by at least 1%.

Curve’s stablecoin, crvUSD, is pegged through a basket of fiat stables, with stability pools and peg keepers to further uphold the price at $1. The stability pool for crvUSD holds the largest fiat-backed stablecoins, USDC, USDT, USDP, and TUSD. The Peg Keepers are similar to the AMO concept used by Frax Finance. They’re smart contracts that perform algorithmic market operations to fix the price of crvUSD to its peg. More specifically, peg keepers are assigned to stability pools and can perform market operations under certain circumstances. For example, if and only if the price of crvUSD is above $1 in a stability pool, will the peg keeper be able to mint and deposit crvUSD into the pool. In this scenario, adding single sided crvUSD when the price is above $1 puts downward pressure on the price, moving it back toward the target. The reverse can be said for when the price of crvUSD is below the $1 threshold.

Finally, the other important detail to understand about Curve’s lending mechanisms is its variable interest rate. Borrowers pay a variable interest rate to the protocol based on their total outstanding debt. The interest rate is dependent on the current crvUSD price and the shape of the interest rate curve using risk parameters, and the activity of peg keepers (debt ratio and target debt).

Aave and GHO

Aave’s GHO stablecoin is a CDP style protocol similar to Curve and crvUSD. It is a system of lending pools where users’ deposits are funneled into a liquidity pool. Borrowers may access these pools when looking for a loan. Facilitators (Aave), can mint and burn GHO tokens, each is assigned a ‘bucket’ with a capacity representing the maximum amount they can generate. The total available supply of GHO is calculated based on the capacities of all Facilitators, ensuring the system remains overcollateralized. New facilitators are approved by Aave DAO to ensure that they operate within parameters.

To maintain its peg and system stability, GHO has stabilization mechanisms controlled by Aave DAO–managing borrow rates, collateral requirements, and other parameters for minting and burning GHO. There is also a discount model that gives discounted borrowing rates to stkAAVE holders.

Furthermore, Aave offers other features for risk management: isolation mode, siloed assets, DAO-elected permissioned roles, and then traditional features like supply and borrow caps. Isolation mode is a feature limiting risky assets. Assets in isolation mode may only receive certain types of stablecoins.

Open Dollar and OD

Open Dollar is a stablecoin CDP protocol designed for non-fungible vaults for users to stake collateral and borrow Open Dollar’s OD stablecoin. As an Arbitrum native protocol, Open Dollar strives to make OD Arbitrum’s native stablecoin.

Similar to Aave and Maker, Open Dollar is also controlled by a governance entity and the ODG token. The OD stablecoin is overcollateralized and uses a similar managed float regime mechanism to Maker. This means that the OD/USD exchange rate is determined by supply and demand, and the protocol tries to restabilize by either devaluation or revaluation. Notably it accepts fewer forms of collateral (only ARB and ETH variants) compared to Maker and AAVE.

Some of the listed CDP protocols have been battle-tested through new deployments, making them worth considering for exploration. Additionally, one of the listed protocols serves as an alternative mentioned in the original proposal.

These are very brief overviews of each protocol, to give the DAO a sense for the type of protocol it would use to leverage ARB in a TBV. Further research work would be required to investigate which protocol is most suitable for the DAO. This would likely include working closely with candidates for the relevant committees outlined below.

In order to effectively manage a CDP or TBV, the DAO would have to fund the relevant committees/roles responsible for tasks such as developing the necessary set of smart contracts for facilitating a debt position and maintaining the communication, health, and security of the DAO’s position. These responsibilities primarily include (but are not limited to) an R&D committee, an oversight committee, and a risk committee.

Relevant Committees

In this section, we’ll outline roles and responsibilities that are necessary for effectively managing a CDP/TBV. While we are not suggesting the DAO needs to create three separate committees, it is important to note the essential parts of executing this initiative, even if it means one entity is capable of serving two roles.

R&D: Oversee the strategic and technical development of the CDP/TBV, and work closely with the DAO, other committees, or protocols to ensure the efficacy of the CDP/TBV’s strategy, management, and implementation. This includes and is not limited to:

  • Customizing relevant smart contracts to fit the needs and standards of the DAO.
  • ​​Conducting security audits of smart contracts and the overall system
  • Maintaining technical implementation and pushing updates when necessary
  • Ensuring compatibility with DAO operations (e.g. outlining operational procedures for using a safety buffer to top up debt positions with treasury ARB)
  • Creating a dashboard or other visual representation for viewing important metrics, such as monitoring collateral and transaction history
  • Closely collaborating with other committees or protocols to lead further development of necessary contracts, tools, and strategies that enhance the position
  • Performing due diligence and reviewing applications that request funding from the treasury

Oversight Committee: The backbone of treasury operations for approving, sending, and receiving funds to and from the treasury and the CDP/TBV, overseeing the day-to-day operations of the treasury, implementing best security practices, and executing strategic decisions for capital in the treasury. This includes and is not limited to:

  • Implementing best security practices between the DAO’s treasury and CDP/TBV management such as procedures for responding to emergency incidents or unforeseen vulnerabilities
  • Delivering operational procedures for maintaining accountability (e.g. perform monthly/quarterly performance reviews)
  • Maintaining consistent communication with the risk committee, the DAO, and relevant protocols regarding adjustments to relevant parameters of the DAO’s position and other activities related to the debt position
  • Pre-approve transactions and privileges for CDP/TBV manager
  • Relaying accurate and timely reporting of the TBV to the rest of the community on a weekly/monthly basis.
  • Maintaining transparent dashboards that display the DAO’s position in real-time
  • Collaborating with R&D committee for issuing stablecoins from the TBV
  • Monitoring debt and repayment from funded projects and enforcing penalties if needed

Risk Management: Manage the CDP/TBV position and set risk parameters to ensure solvency. In addition to the below, in order to achieve incentive alignment between the DAO and the risk committee, the DAO and relevant committees would have to determine what responsibilities and/or privileges are delegated to the risk committee. It is possible the functions described here fall within the scope of the R&D and/or a treasury manager’s role. Regardless, since a treasury management role has not been formalized, we felt necessary describing the risk portion of the vault. This includes and is not limited to:

  • Clear communication on how the stablecoin works and its benefits
  • Conducting ongoing CDP/TVB research, identifying potential risks, developing risk assessment frameworks and models of the economic tradeoff space, recommending the optimal strategy/implementation, and providing regular analysis and updates on the state of the market and the position
  • Working closely with existing protocols and their risk teams to determine parameters for levering ARB in a CDP/TBV
  • Continuously monitoring market conditions, protocol health, and ARB volatility/liquidity to gauge adjustments to the CDP/TBV and maintain solvency
  • Updating over-collateralization ratios, liquidation triggers, interest rates, and other relevant parameters and operations according to the Oversight and R&D committees’ recommendations
  • Ensuring liquidation mechanisms are operational and emergency shutdown procedures are effective
  • Closely communicating with other committees, the DAO, and relevant protocols
  • Sending and receiving funds to and from the DAO’s treasury
  • Around the clock maintenance and management

Operational Structure

Taking inspiration from other DAOs, depending on the scale of the position, each committee is necessary for effectively managing a CDP or TBV; however, the operational setup can differ. For example, the CDP could be a multisig that is maintained by each committee. This is a simple setup in which multiple parties sign transactions, and it affords less interactions between the DAO and the position. However, requiring multiple signatures to approve transactions decreases the autonomy for the risk manager to act swiftly, if needed.

The framework we recommend is inspired by Karpatkey’s treasury management proposal and also Lido’s treasury management committee proposed by Steakhouse Financial. In Karpatkey’s proposal, the DAO retains full custody of the vault’s position and can withdraw and deposit through a governance process. In this setup, a trusted, experienced entity is appointed to manage the CDP, or TBV, giving that entity more autonomy. To restrict a malicious manager from carrying out an attack and to incentivize performance, the setup also involves an oversight committee that establishes pre-approved transactions and is generally responsible for ensuring the vault manager is achieving their mandate. Both proposals also utilize onchain tools, essentially creating an environment that is tailored for their respective strategies and use cases, albeit given more complexity.

While this operational structure can be considered more complex and costly than simply deploying a multisig, it clearly defines roles, allows for each party to specialize on their respective tasks, and grants more autonomy for each party to operate efficiently.

Lastly, the DAO would have to work with relevant committees to establish an amount that is allocated to a safety module, which the vault manager can draw from when needed.

We’ve outlined high level tradeoffs between a very basic structure and more professional structures. If further progress is made toward implementing a position, potential candidates–that have track records servicing these roles and operational structures in practice–can provide their frameworks. This would allow the DAO to compare specific details about the operational structures from experienced candidates.

Additional work

Primarily, if the DAO is borrowing stablecoins against ARB and distributing those funds to teams, those teams would presumably sell the stablecoins to fund their respective initiatives. With this in mind, additional work for the DAO entails (1) developing a strategy for paying down the stablecoin debt.

Let’s say, a viable strategy is using sequencer revenues. Therefore, the DAO or its relevant committees could (2) establish relevant KPIs for funded projects, such as generating a certain amount of sequencer revenues. Intuitively, this model creates a framework for how to approach funding projects, because it is in the DAO’s interest to only fund projects that can generate sequencer revenues and thus pay down the vault’s debt.

Other work may involve:

  • Performing more research and another risk analysis on the potential stablecoin protocols used
  • Reach out to existing protocols and collaborate on setting up a position specifically for the DAO
  • Ensuring that the DAO has robust governance mechanisms in place to make adjustments and respond to market changes.
  • Create and maintain governance policies for handling extreme market conditions and emergencies
  • Implement insurance funds or hedging strategies to cover potential losses
  • Explore introducing less volatile assets to TBV such as ETH or wstETH

Risk Analysis

View the full risk analysis here.

In conjunction with Chaos Labs, we analyzed ARB and its market capitalization, trading volume, historical volatility, and historical liquidity to derive key parameters and thresholds for using ARB in a CDP. We also analyzed the ARB/USD oracle because price updates are crucial for maintaining a healthy vault.

We recommend an initial liquidation threshold (i.e. where a CDP loan becomes liquidatable) at 66% of the dollar value of ARB supplied. We also recommend a 10% liquidation bonus awarded to incentivize liquidators to avoid undercollateralized positions and to ensure that the CDP protocol avoids bad debt.

We recommend an initial loan of 25% of the value of ARB supplied to limit the liability of the DAO to supply more ARB collateral should the value drop. We also provide a more aggressive approach of taking an initial loan of 44% of the value of ARB supplied. In either case, we would need to set processes in place to manage the increased risk of getting the loan liquidated, and at a much less favorable ARB price in the latter approach.

We recommend an initial debt ceiling of $12m. This means that the DAO could borrow up to $12m safely against a minimum of $48m in ARB collateral at the outset.

We also recommend the DAO establish a ring-fenced safety buffer in ARB to cover any price drops in excess of what is forecast. More work is needed here to arrive at how much the DAO would need to allocate to a safety buffer fund. However, it is worth noting that the DAO should weigh the opportunity costs of deploying a safety buffer (i.e. ARB sitting in a safety buffer, in addition to ARB as collateral in a vault, is not being used for other initiatives).

Recommendations and Interpretations

For now the ARDC covers parts of the R&D committee–it does not develop or manage smart contracts–but given the current term is over in September, the DAO will need to revisit funding another ARDC term, or discuss contracting an entity that serves a similar role or is solely responsible for servicing a CDP or TBV.

As far as the other roles, the oversight committee is the organization responsible for overseeing the manager of the vault, and the risk management committee is an entity responsible for managing the risks of the TBV including updating parameters and topping up the debt position when necessary. Note: it is possible the risk management committee is also the treasury manager.

In terms of committee assignments and therefore initializing a CDP or TBV, we think the DAO should first take into consideration other initiatives to avoid introducing potential operational redundancies. For example, the Arbitrum Ventures Initiative is a WIP that would establish a fund structure for the DAO within which an oversight committee is appointed and verticals for allocating funds to specific strategies, such as operating a treasury or a TBV, are established. There is also an open proposal by Karpatkey and discussions about creating a Arbitrum Strategic Treasury Management Group and a DAO Oversight Committee. In short, we believe structuring the broader framework and dedicating attention to these conversations takes precedence, because they would establish a formal foundation on top of which the DAO can install a TBV. They would provide closure on who is servicing which roles and what additional committees (if any) are necessary.

If the DAO wishes to proceed toward a TBV, we recommend opening applications for participants who want to service the committees. This would allow candidates to outline their previous experience and framework. The proposal should also outline the scale of the initiative and define relevant KPIs to measure performance and ROI. From here, the DAO can narrow its focus for which protocol we’d use for even further R&D.

In terms of the risks, given the aforementioned recommendations, it is clear that borrowing stablecoins against an ARB position is extremely risky and would not satisfy a majority of the DAO’s annual spend.

From 2023 through 2024, the DAO spent 433M ARB–this is about 25x greater than the recommended initial loan amount. Secondly, using the conservative settings from the analysis, if we apply the previous year’s spending, then this means the DAO would have to seed the position with roughly 1.6B ARB (~$1.08B) to fund past initiatives. This is approximately half of the ARB in the treasury today, and the DAO would only be utilizing 25% of that value. At this scale, we can observe the highly capital inefficient use of ARB in a CDP or TBV.

Even at a much lower annualized spend by the DAO, we still view this as an inefficient use of capital, especially considering the opportunity cost of funding other projects that don’t carry as much inherent risk. In other words, we do not feel confident the reward for leveraging ARB clearly outweighs the risks (and associated costs) of operationalizing a DAO-owned CDP or TBV.

In terms of next steps, as mentioned above, we believe establishing a broader framework (perhaps related to the AVI), or making progress on installing a legitimate treasury management framework, takes precedence. Once this structure is set, then the DAO and the relevant committees will be in a better position to reconsider the prospects of activating a treasury-owned CDP.

Final Thoughts

All in all, while operating a TBV is worth considering, especially as the DAO and underlying ecosystem matures, it introduces another plane of risks and costs that feel unnecessary given where the protocol is in its life cycle. Instead, we feel confident the DAO should focus on growth and prioritizing initiatives that directly expand the use cases for the Arbitrum stack. In the grand scheme of rollups, it is still really early, and it is not clear what fundamentals drive the performance of rollup ecosystems and their underlying governance tokens.

For Arbitrum, there are a myriad of developments and catalysts on the horizon (such as Timeboost, decentralized Timeboost, blob bidding strategies, economic policies for L3s, ecosystem funds, and more) that should provide more context to identify strong business models and therefore perform the economic analysis for sustaining revenue and growth. Moreover, while the ARB price is an important feature of the DAO and protocol, it is a byproduct of success–not the driver–and we do not think initiatives directly addressing ARB price action properly prioritizes fundamental value drivers of the protocol at this moment.

3 Likes

Blockworks Research ARDC Deliverable Summaries

Short-form Case Study – GMX

  • Conclusions: The study suggests that future incentive programs should require stronger justifications, growth driver analyses, and consistent standards for handling conflicts of interest.

Short-form Case Study – JOJO

  • Conclusions: The case highlights the need for continuous monitoring, stricter grantee accountability, and better-defined incentive program structures to avoid unrealistic promises and ensure adherence to rules.

STIP-Bridge – Support Material for the Community

  • Conclusions:
    • Some protocols had high sybil ratios, indicating potential misuse of incentives.
    • Multiple projects displayed questionable practices in how they utilized ARB incentives
    • Many protocols saw sharp declines in usage post-incentives. Reporting quality varied, with some projects failing to provide transparent updates.
    • Several projects are adjusting their incentive mechanisms for future rounds, aiming for more targeted and sustainable growth, while others have continued with similar structures despite mixed results.
    • Identified misuse of funds per STIP rules by Synapse, who eventually returned 750k ARB back to the DAO in addition to withdrawing their 950k ARB STIP Bridge request. Ultimately saved the DAO 1.7M ARB (~$1.4M at that time)
    • Identified misuse of STIP funds by Furocumbo, which led to Arbitrum Foundation seeking the banning of the team of receiving any future incentives

STIP-Bridge (Extended Deadline Applicants) – Support Material for the Community

  • Additional support material for the community to use when reviewing STIP Bridge applicants who applied after the initial deadline of May 3, 2024 and have thus been automatically put up for a challenge vote on Snapshot.

STIP Retroactive Analysis – Perp DEX Volume

  • Conclusions:
    • STIP impacted trading volumes differently across perpetual DEXs. Vertex and GMX saw significant volume boosts, with Vertex benefiting most per dollar spent.
    • The analysis suggests that traditional fee rebates, as used by Vertex, GMX, and MUX Protocol, were more effective than gamified incentives employed by Gains Network and Vela.

STIP Retroactive Analysis – Spot DEX TVL

  • Conclusions:
    • STIP in H2 2023 significantly impacted TVL in spot DEXs, with WOOFi and Camelot seeing the highest gains.
    • Balancer and Trader Joe also benefited but with a more modest impact.
    • WOOFi excelled in liquidity-specific incentives despite its underperformance in overall TVL impact.

STIP Analysis of Operations and Incentive Mechanisms

  • Conclusions:
    • STIP was likely a reason for why Arbitrum’s market share (with respect to certain metrics) didn’t diminish with new competition popping up.
    • Major mechanisms included liquidity incentives (~30%) and fee rebates (~25%).
    • Most protocols saw initial growth during STIP, but metrics often reverted to pre-STIP levels afterward.
    • STIP initially boosted Arbitrum’s market share in most major DeFi metrics, but gains largely reverted post-program.
    • Market ultimately gravitated towards a baseline cost of capital, meaning future incentives should target ecosystem-wide yield targets
    • Protocol incentives should be a function of vertical-specific metrics and not be so open-ended

STIP Retroactive Analysis – Yield Aggregators TVL

  • Conclusions:

  • Targeting LPs directly and uniformly can be effective in boosting TVL.

  • Flexibility in incentive strategies, as shown by Umami’s pivot to direct ARB emissions, can enhance outcomes.

  • A more proportional distribution of incentives may lead to higher efficiency, as evidenced by Solv Protocol’s success.

STIP Retroactive Analysis – Sequencer Revenue

  • Conclusions:
    • Statistically attributed STIP to 43% of Arbitrum’s revenue between November 2023 and the Dencun upgrade, amounting to $15.2M in sequencer revenue against $85.2M spent on incentives.
    • Despite its positive short-term impact, the program resulted in a $60M net loss, highlighting the need for more effective future incentive programs and more attention to long-term sustainability and growth.
    • Future programs should focus on better-structured incentives and persistent user metrics to maximize ARB spending.
3 Likes
  1. Am I right in understanding that the overall conclusion is that STIP was a failed program?

  2. Should the sequencer get as much profit as it spent on incentives?

Timeboost Revenue and LP Impact Analysis

The below report is available in document format here

Although this was not requested by the DAO advocate for the ARDC, in light of the Timeboost proposal vote, Entropy’s request, and the general concern around the impact Timeboost may have on LPs, Blockworks Research has performed an analysis on the revenue projections Timeboost could drive to the DAO, as well as an attempt to explain the impact on DEX LPs. We estimate that Timeboost could have led to an additional $19m to $95m increase in annual DAO revenue, implying 33-162% growth over the previous twelve months. We believe Timeboost is a net-positive for Arbitrum DAO value capture and that Arbitrum One LPs should still outperform LPs on competing chains. Additionally, we believe the DAO may want to consider a Timeboost Committee to closely monitor Timeboost activity and onchain market conditions, so that it can make informed decisions on adjustments to Timeboost’s parameters in the future.

Acknowledgements: We appreciate all the feedback we received from Derek and other members of the Offchain Labs team. Additional thanks to Andrea Canidio, Krzysztof Gogol, and Alex Nezlobin for discussions that helped shape our thinking.

Arbitrum’s Market Dominance in the Ethereum ecosystem

Historically, maintaining market share has been extraordinarily challenging in crypto due to airdrop farming, foundation grant dynamics, and other incentive programs. Users and founders are constantly being paid to switch to the next newest app or ecosystem, adding a challenging dimension to the user retention game. Despite the difficulties of Ethereum’s rollup centric roadmap and growing competition, Arbitrum remains a leading ecosystem by the most important KPIs.

Arbitrum’s market share of total crypto TVL, despite how gameable this metric is, and especially in the context of new ecosystems where native tokens with low float/high FDV dynamics are particularly inflationary to TVL, has remained sizable. In August, Arbitrum ranked the second and third most used chain for perps and spot market trading, generating $20.7B and $34.10B in volumes, respectively.

Arbitrum stands out as the ETH-aligned L2 with the most liquidity. Despite Base’s distribution, Arbitrum still leads in trading activity, attracting more sophisticated participants, which we believe indicates Arbitrum is more attractive to asset issuers relative to other L2s.

Timeboost Revenue Projection

To estimate the revenue potential of Timeboost, which introduces an auction mechanism via the Express Lane (which effectively gives the winner of the auction a time advantage for transaction inclusion for multiple blocks), we take historical priority fees for select EVM L1s/L2s and calculate a time series of priority fees as a percentage of spot DEX volume.

We then take the 30-day rolling average of these time series, apply a blended average to Arbitrum One DEX volumes over the trailing 12 months, and discount the result by 50% given the inherent uncertainty in this new mechanism and the fact that this is a multivariate problem considering differences in block times, DEX volumes, market conditions, transaction costs, etc. We believe the following result serves as a conservative estimate.

According to this backtest, we estimate that Timeboost could have led to an additional $19M to $95M increase in DAO revenue, implying 33-162% growth over the previous twelve months.

Potential Impact on LPs

While Timeboost may be a boon for Arbitrum DAO revenue, this revenue will, at least partially, come at the expense of one of the largest stakeholders in the Arbitrum ecosystem, liquidity providers (LPs), in the form of Loss-Versus-Rebalancing (LVR). LVR quantifies the adverse-selection costs that LPs incur for quoting stale prices against market-makers on a centralized exchange (CEX), namely Binance. By introducing an express lane controller, a trader that owns the express lane can effectively win top-of-block slots, which should theoretically subject LPs to more frequent LVR for assets that are also listed on CEXs.

In a recent study, researchers attempted to quantify the arbitrage opportunities on Ethereum and different L2s (including Arbitrum One) by taking onchain data over a four-month period. This study derives Max Arbitrage Value (MAV), which is differentiated from LVR, whereby LVR sums the aggregate per-block arbitrage opportunities over a period of time, and MAV sums the unique per-block arbitrage opportunities.

Example: Imagine there is a $10 arbitrage that does not get taken for three blocks, MAV would be equal to $10 over that three block period, while LVR would be equivalent to $30. According to this study, LVR overstates arbitrage opportunities (LP losses).

Interestingly, if one compares the MAV in the chart above to the chain priority fee capture chart under the Timeboost Revenue Projection section, there are striking similarities (e.g.: Ethereum MAV at 0.15% of DEX volume and aggregate priority fees, including MEV Boost bids, hover around 0.15% from January 2024 to end of April 2024). Additionally, our base case for Timeboost value capture at 0.0432% of DEX volumes lines up with Arbitrum’s MAV of 0.04%. This tells us that while Timeboost revenue will largely come from LPs through LVR, instead of that value being extracted by external parties (searchers), the DAO treasury will accumulate these proceeds and in-turn can decide how to best allocate them (including redistributing a portion of this revenue back to LPs if it so chooses).

As mentioned by Camelot and other stakeholders, the concern around Timeboost is that the 200ms delay could exacerbate LP losses since block time is one of the main variables that impact LP profitability. A naive formula to calculate LVR imposed by longer (or shorter) block times is shown below:

Where: sigma = asset volatility, f= pool swap fee, t = block times

Keeping all else equal (trade volume, liquidity, LP fees, market conditions, etc), an LP going from Ethereum with 12s block times to a faster chain like Optimism with 2s block times should experience less LVR, by at least ~59% (sqrt(2/12) = 0.408 (59.2% decrease)). With Arbitrum One’s current FCFS ordering policy, block times are effectively 0, meaning LP LVR on Arbitrum One should theoretically be nonexistent using the formula above. In practice, with empirical evidence as demonstrated by the research referenced above, LVR currently exists on Arbitrum One.

When looking at MAV as a percentage of DEX volumes for both Ethereum and Optimism (table above), we see a decrease of ~66% (0.15% → 0.05%), roughly matching the formula’s expectation. Again, this can be attributed to Ethereum having blocktimes that are 6x longer than Optimism (12s vs 2s). Interestingly, the researchers did not find materially more MAV on Optimism (0.05%) in comparison to Arbitrum One (0.04%), despite the drastically longer block times (2s vs FCFS). With the Timeboost delay, effectively causing a 200ms block time from the LP’s perspective, LPs on Arbitrum post-Timeboost would theoretically experience ~68% less LVR compared to LPs on Optimism (sqrt(0.2/2) = .316 (68.3% decrease)). While this suggests Arbitrum One would still be the best place to LP out of competing L2s, we don’t have sufficient evidence to conclude that LVR with FCFS is materially better than Timeboost non-express lane transactions with a 200ms delay. Thus, estimating Timeboost’s impact on Arbitrum LVR is more nuanced than what prevailing literature suggests.

The researchers also found that the average MAV on Arbitrum One was higher than the average MAV on other chains and these arbitrage opportunities lasted for ~27 blocks (6.9s of decay), on average. This decay across rollups might sound perplexing–in particular, how could a $19.68 arbitrage opportunity exist for 6.9s, on average? On Base, the average MAV opportunity lasted for 420s! While it appears the authors did not explain the rationale behind these non-intuitive results, we believe these longer than expected price discrepancies are due to the costs and execution risks of capturing these relatively small opportunities in the absence of MEV-Boost.

Additionally, this analysis was performed specifically on Uniswap v3 pools from December 31st to April 30th, 2024, and we believe the longer decay times could also be partially due to liquidity and trading data using Uniswap v3, especially for Base and zkSync which lacked Uniswap v3 liquidity compared to Arbitrum One and Optimism during this time period. Furthermore, it’s possible performing this analysis on Uniswap v2 pools might reflect a more accurate picture for LVR, MAV, and decay across various block times, because these passive v2 LPs are more likely to quote uniform liquidity at stale prices more often.

Applying this logic to Arbitrum One, due to fast 250ms block times and the FCFS ordering policy–where the primary advantage between arbitrageurs is speed–we hypothesize that arbitrageurs have lower assurances to take on the inventory risks and execution costs associated with capturing CEX-DEX arbitrage opportunities compared to L1 and therefore take on less arbitrage opportunities. Instead, they wait until the CEX-DEX price discrepancy is wide enough to justify the cost and risks before execution of their arbitrage trade. Due to these lower assurances, the current ordering policy could be lowering the ceiling on total Arbitrum One DEX volumes and thus potentially swap fees available to LPs.

Overall, our expectation that LPs should still find Arbitrum more attractive than other L2s is in line with LVR and AMM research. Canidio and Fritsch show that LVR decreases with block times, Adams finds LPs on chains with shorter blocks times earn 20% more fees from arbitrage than mainnet, and Heimbach et al state a possible mitigation for non-atomic (CEX-DEX) arbitrage “would be to move DEX volume to Layer 2s such as Arbitrum and Optimism which already have shorter blocks times.” Note: while LVR on Arbitrum post-Timeboost will be much lower than LVR on Ethereum, Ethereum benefits from higher amounts of uninformed (retail) flow, a positive factor for LPs.

We believe it is possible that, in aggregate, (1) Timeboost has a neutral impact on LPs, and (2) given the expectation of additional trading activity and really small 200 ms slot times to extract LVR, Arbitrum LPs should still outperform LPs on competing chains. The questions are: to what degree do 200ms slots increase LVR per block, how does it compare to the LVR in today’s FCFS regime, and will the additional arbitrage volume beget more LP fees than other L2s (and potentially offset LVR, in aggregate)?

Final Thoughts

Ultimately, despite valid concerns around MEV, we feel the overwhelming conclusion that Timeboost is net negative for the ecosystem and its participants is not a straightforward one. Another important consideration is that the research is primarily centered around non-atomic arbitrage and does not ponder Timeboost’s impact on atomic arbitrage. Moreover, the way in which the DAO manages Timeboost, grows the ecosystem to more retail users, and eventually decentralizes Timeboost will play a more significant role in its success or failure in the long run.

Hence, there is an opportunity for the DAO to create a Timeboost Committee to closely monitor Timeboost activity and onchain market conditions, so that it can make informed decisions on adjustments to Timeboost’s parameters and/or where Timeboost revenues can be properly allocated or redistributed that ensures both DAO sustainability and a healthy ecosystem for its LPs.

3 Likes

LTIPP Analysis

The below research report is also available in document format here.

The Arbitrum DAO established a Long Term Incentives Pilot Program (LTIPP) to test and ultimately develop an incentive framework. The program allocated roughly 30.65M ARB to protocols across the Arbitrum ecosystem, excluding previous STIP recipients. To date, a total of approximately 22.87M ARB has been distributed. Over 12 weeks, protocols distributed incentives to drive activity and growth. Part of the motivation was to assist protocols that did not receive STIP Round 1 incentives or completely missed the program. The outline for LTIPP can be found here.

Incentives are an important feature of growing ecosystems, fostering economic activity, and bootstrapping mindshare. Done correctly, incentives can be a powerful mechanism for driving adoption and alignment. Done incorrectly, incentives can be a waste of time and resources.

This retroactive analysis will provide a brief assessment of LTIPP. We collect data available here and here and review bi-weekly updates to analyze recipients, their strategies, and the impact of the incentives on high level growth metrics. In particular, we want to highlight outperformers and underperformers, and glean any best practices or lessons learned for protocols distributing ARB incentives in the future. This work is intended to be supplemental to already funded work (here and here). The overarching goal is to synthesize lessons learned that the DAO can reference as it begins thinking about future incentives programs–namely, the working group for incentives that is being actively discussed–especially as Timeboost introduces new conditions for trading and economic activity.

Summary

We attempted to evaluate incentives strategy for protocols in the Lending, DEX, and Perps categories by analyzing key protocol level metrics. We derived a metric for each category and measured its growth between before and during LTIPP to evaluate the effectiveness of each protocol’s incentive strategy. We then overlaid qualitative observations from applications and biweeklies. Below is a list of takeaways that the DAO can keep in mind for future incentives programs.

  • LTIPP had a significant impact on DEX and Lending activity.
  • LTIPP curbed a downward spiral in sequencer revenues.
  • Protocols that formed partnerships during LTIPP generally saw better results.
  • Effective marketing, frequent communication, and responsiveness to user feedback are key.
  • For the DEX category, protocols that enhance LP experience (i.e. increase LP returns) outperformed.
  • LTIPP did not have an impact on Arbitrum’s perps ecosystem, more broadly, and based on protocol specific results, incentives were largely unsustainable.
  • Biweeklies suggest allocating more incentives to LPs rather than relying on referrals is a more effective strategy.
  • Complex incentives that involve purchasing vesting ARB can result in additional friction and hurt results.

Results

Below is a table that shows the impact of incentives on protocol level metrics and global metrics. We also measured the increase in each metric per dollar spent on the category. We find that, for lending, DEXs, and perps, TVL market share grew by 25%, 17%, and -5% respectively. We find that volume market share grew by 7%, 21%, and -11.38%, respectively. And finally, sequencer revenue market share grew by 6.54%. As one can see, while incentives have a positive impact on DEX, lending, and sequencer revenues, they do not have a positive impact on perps.

Market Share Growth

Lending

Arbitrum’s market share in lending deposits amongst all chains has been fairly consistent since July 2023, and LTIPP has been able to further boost deposits. In fact, there was a significant median change in market share to 3.02% during LTIPP from 2.63% in the month before the LTIPP, effectively a 15% increase. LTIPP is estimated to have provided an additional TVL normalized by market share of $130M. This makes the dollar cost of LTIPP at $30.92 added TVL (normalized by market share) per dollar spent. It is relevant to add that part of this increase started a few days before the LTIPP, so other factors can be introducing bias in the analysis; however, this growth was notably sustained throughout the duration of LTIPP.

Arbitrum’s market share in borrow TVL was consistently larger than its market share in lending TVL which may have indicated that users in Arbitrum were more prone to participate more actively in lending and borrowing as opposed to only lending idle capital. Utilization had been slightly decreasing until the LTIPP, after which it seems to have stabilized. Borrow market share grew 7% during the LTIPP, from 2.65% to 2.84%. The LTIPP is estimated to have provided an added borrow TVL normalized by market share of $51M. This makes the dollar cost of LTIPP at $12.08 added borrows (normalized by market share) per dollar spent.

Amongst rollups, it is relevant to note that Arbitrum’s lending market share has been drastically decreasing since the end of 2023. While this could be a result of a number of factors, such as poor ARB market conditions, LTIPP has not been able to stop this trend, although it has perhaps slowed or stagnated it. There was a significant decrease of 2.62% in median TVL market share when comparing the period before to that during the LTIPP, 40.33% to 37.71%.

Borrow market share, however, had been trending similarly to deposit market share until March 2024 when it experienced a decrease to lower levels. Activity in this section has been picking up before and during the LTIPP, with a significant increase. In fact, rollups as a group have been increasing market share steadily since the start of 2023, and this has accelerated throughout 2024, which may explain why Arbitrum’s market share in global lending markets has slightly increased even if it has decreased amongst rollups.

DEXs

There was a significant increase in DEX TVL market share during LTIPP, both when comparing Arbitrum amongst all chains and amongst rollups. However, it is noticeable that Arbitrum’s market share amongst rollups has been drastically declining on a year-over-year basis. This is somewhat expected due to the growing competition in this sector, but given the degree, it is nonetheless worth raising this flag. LTIPP seems to have been instrumental in curbing this decrease and shifting the trend upwards, perhaps suggesting incentivizing DEX activity is a more positive return on investment. Arbitrum’s global DEX TVL market share grew from 3.28% to 3.84% during the program, a significant increase of 17%.

This reflects $102.6M of added TVL normalized by market share during this period, totalling $38.64 of TVL added per dollar spent on incentives. This calculation considers the 2,300,000 ARB spent amongst participants in this vertical, which at $1.13 per ARB (price at June 2nd) totals $2,599,000. The analysis uses data until September 24th (the last date of distribution for most protocols) even though some protocols have leftover ARB and have applied for LTIPP extensions.

One interesting aspect to note when looking at Arbitrum’s TVL and volume market share is that volume market share tends to be significantly larger than TVL market share. Considering first Arbitrum’s global DEX market share, there’s a sharp contrast between its median 3.84% market share in global TVL with its median 10.86% market share in global volume, during the period of the program. Amongst rollups, Arbitrum’s DEX TVL represents 26.2% of all rollups while its volume is 44.29% of all rollups. This can indicate that Arbitrum DEXs are operating in a relatively more efficient capacity than most other chains, likely a positive byproduct of 250ms blocktimes. A balance between these two metrics could indicate healthy activity, when capital deposited in DEXs is also actively traded.

Fluctuations in volume market share have increased during LTIPP, but there is still a significant increase noticeable when compared to the 30 days before the program. Global DEX volume market share belonging to Arbitrum has increased significantly from 8.99% to 10.86% (a 21% increase).

Perps

Arbitrum’s market share in the global perps market also does not seem to be affected by LTIPP, as can be seen by the continuing of its slow decrease, both in TVL and daily volume. In fact, there was a significant decrease in TVL market share when comparing the periods before to during the LTIPP, from 21.53% to 20.44%, perhaps suggesting incentivizing perps activity is more challenging or less effective (especially in comparison to DEX activity). This difference of 1.09% less market share is a 5% decrease from the period pre-LTIPP.

The median daily volume market share before the LTIPP was around 19.91%, and it decreased to 17.64% during the program. This 2.27% difference is a significant decrease, representing an 11.38% decrease from previous values.

Amongst other rollups, however, it seems like the LTIPP might have been able to slightly increase or at least stop the decrease of Arbitrum’s market share, which might indicate that rollup users are more influenced by incentives and more likely to switch over when enough benefits are present. In fact, Arbitrum’s perps volume market share amongst rollups decreased only by 1.48% and the corresponding TVL increased by 4% in the same period, compared to an 11.38% and 5% decrease, respectively, in market share amongst all chains.

Sequencer Revenue

Arbitrum’s sequencer revenue relative to other main L2s has been sharply decreasing all throughout 2023 and 2024. However, it appears Arbitrum’s incentive programs have had a noticeable impact on this metric - the decrease was curbed in November when STIP started, and there was even a sustained increase throughout its duration. After STIP, in March 2024, the decline continued and was again curbed by LTIPP in June 2024. Arbitrum’s market share of total L2 revenue in the month before the LTIPP was 6.04% and has grown to 12.58% in the months during the program. The total revenue for Arbitrum during this period, from June 2nd to September 24th was $4,678,700, an increase of $2,431,912 in added revenue. A total of 30.65M ARB ARB was allocated to the program, and with an ARB price of $1.13 at the start, the LTIPP’s dollar cost equates to $0.07 in added sequencer revenue per dollar spent on incentives.

Notably, Arbitrum deploys a first-come-first-serve (FCFS) sequencing policy, while other L2s such as Optimism and Base, use a priority fee mechanism. This means Optimism chains capture more value from their activity because users and traders can pay the sequencer a higher fee to include their transactions, whereas on Arbitrum traders opportunistically spam the sequencer to get their transactions included, without having to pay higher fees. These dynamics are going to change once Timeboost goes live, and we expect Timeboost to generate additional annual revenue, ranging from $19M to $95M.

Lending Category

Lending had the highest number of LTIPP recipients (10), receiving a total of 3.717M ARB in aggregate (roughly ~12.13% of total LTIPP incentives). Most lending protocols distributed incentives via airdrops and/or LP incentives, however, incentive strategies for which types of activities or pools varied.

We looked at Aave, Alchemix, Compound Protocol, Gravita Protocol, Lumin Finance, Primex Finance, and Synonym Finance. We did not analyze Myso or Copra in our study because we were unable to pull the relevant metrics from their Dune dashboard per OBL’s guidelines or DefiLlama. While we analyzed Covenant, we did not use them in our analysis because they used LTIPP to bootstrap their protocol, and we cannot compare performance for protocols that did not have any metrics at the beginning of the program, as ARB incentives would have an outsized (infinite) impact on these protocols. therefore results would outperform the entire cohort. It would be an unfair comparison.

Methodology

For each protocol, we collected borrow TVL and supply TVL data from before and throughout the duration of LTIPP, calculated the growth in both metrics from the 30 day median before LTIPP to the median throughout LTIPP, derived a borrow-adjusted TVL metric, and normalized this metric by total ARB claimed. The equations applied in this analysis are detailed below.

Where:

  • is the supply growth,
  • ​​ is the utilization ratio during the incentive period, ,
  • is the ratio before the incentive period, ​,
  • is the signum function, which returns +1 if , -1 if , and 0 if .

Where is the amount of ARB claimed by the protocol.

While top-line TVL is an attractive vanity metric to optimize, the utilization of that capital is what ultimately determines the efficacy and success of lending protocols. We then ran a significance test on the normalized metric to deduce which protocols outperformed and underperformed, then reviewed their bi-weeklies to glean qualitative insights regarding what types of incentives strategies work and which ones do not.

Synonym Finance and Gravita fall in the lower 25% (Q1 - first quartile) of our data, growing normalized borrow-adjusted supply by 3.69% and -5.14%, respectively. This shows Gravita experienced negative growth. It’s important to note that, although Synonym Finance’s growth was modest, it was statistically significant according to the applied test. And in this context, the low and negative borrow-adjusted supply growth implies that these protocols saw a lower or decreased amount of usage over the period analyzed, potentially signaling that users are migrating away, incentives could have been allocated more effectively to boost utilization, or there are challenges impacting their usage such as market conditions or size.

Lumin Finance and Primex Finance fall in the higher 25% (Q3 - third quartile) of our data, indicating that they experienced significant positive utilization growth growth (745.66% and 443.92%, respectively). The high values suggest that they have seen a large increase in usage or adoption.

The box plot also shows a large spread between underperforming protocols and overperforming ones, indicating substantial variation in utilization growth across protocols in the dataset.

One could argue driving TVL (lending supply and borrowing demand) doesn’t necessarily indicate effective incentives spending because short term yield farmers can simply supply and borrow liquidity and earn risk-free ARB incentives (assuming the smart contracts are secure), and leave the ecosystem once yields dry up or become more attractive elsewhere. Another reason could be the impact of incentives are less likely to be significant for protocols that are more established and already have high utilization rates, absent meaningful adjustments to the protocol’s parameters.

For these cases, further research can analyze user activity and retention statistics, providing another layer of insight for the type of activity and breadth of growth. For example, Lampros was able to find the percentage of users who claimed and sold their ARB incentives. This type of analysis can glean insights on specific wallets behaviors and address questions such as what are borrowers doing with their capital.

Note: For lending, we are only analyzing the LTIPP data up until September 10th, as some protocols stopped updating their queries after this date. This may introduce a slight bias toward more positive results, since the analysis excludes the end of September.

Alchemix

Alchemix received 150k ARB and increased its supply by 230.53% during LTIPP when compared to the prior 30 days. Throughout this time the borrow to supply ratio decreased only from 40.05% to 38.56%. Borrow-adjusted supply growth for Alchemix was hence 221.98%. This sustained growth could indicate that supply level post-incentives stayed at a higher level than pre-incentives, although this has to be assessed on a later analysis.

Alchemix planned primarily to incentivize depositors and LPs on Ramses Exchange. The protocol intended to allocate between 33% to 50% of the grant to enhance yields for depositors, thereby attracting more users to Alchemix on Arbitrum. The remainder of the grant was to be directed towards providing liquidity incentives specifically for alUSD and alETH LPers. Distribution of ARB was to be facilitated through direct incentivization on Ramses DEX, allowing LPers to claim their ARB directly from staking contracts. On the depositor side, Alchemix converted ARB into underlying assets to repay debt, although selling ARB is generally avoided unless it enhances the product’s usability or the grant’s effectiveness.

Taking a look at bi-weeklies, it appears that their success may be attributable to a focus on education, marketing, and collaborations. Due to their novel product (self repaying loans), they highlighted an effort to educate users and hosted Twitter spaces throughout the program, which seemed to have brought more liquidity to alAsset pools. They also collaborated with other DeFi protocols, including Gearbox, JonesDAO, and Layer3–two of whom also received LTIPP rewards. Collaborating with other protocols likely increased yields for users and thus attracted more capital. As you will see throughout this report, collaborations are a common thread among successful programs, which intuitively makes sense within the context of a rollup ecosystem.

Primex

Primex Finance received the 42k ARB–the lowest amount of incentives from LTIPP. The median borrow-adjusted supply growth for the analyzed period was 166.22%. Isolating supply growth, this represented an increase of 27.77% which then heavily expanded borrow TVL growth by 646.35%. The borrow to supply ratio increased from 1.9% before LTIPP to 11.11% during.

However, there was a sharp decline in borrow volume throughout September, so it is important to highlight that our analysis is only considering data until the 10th of September (after which date a few protocols, namely Alchemix and Lumin Finance did not provide more updated data). This means that it is possible that Primex would fall from its place as one of the top performers if the rest of the month had been included, indicating that borrow TVL was not sticky.

Primex focused on distributing ARB directly to users. The protocol earmarked 21,000 ARB (50%) for Primex Lenders with the aim of driving TVL growth, while the other 21,000 ARB (50%) was allocated to Primex Traders to stimulate volume growth. Additionally, Primex planned to launch an Achievement System designed to motivate both lenders and traders to engage in competitive activities. This system should enable participants to earn points based on their activity levels, allowing them to track their standings on a publicly accessible leaderboard, thereby fostering a more active user community.

Looking at biweeklies, Primex completed only three (out of seven) bi-weekly reports. While we were unable to gather much information from their bi-weeklies compared to other protocols, they wrote a blog post about their incentives program, in which the top 200 traders and top 200 lenders during six week periods were rewarded 14k ARB.

It is possible this type of incentives strategy creates conditions that attract short term opportunistic behavior, in which users opportunistically create ways of inflating activity to receive rewards. In particular, incentivizing participation through a highly gamified system with a public leaderboard has proven to be less effective for long-term user retention, as demonstrated by our prior research published on the forum as well as results for the perps category later in this report.

It seems as though the borrow growth was substantially larger than the supply growth, which could be attributed to the incentives strategy and the types of activities they induce, such as looping strategies to enhance yield farming returns. Perhaps due to this dynamic, ultimately borrow TVL was not sustained.

Synonym Finance

Synonym Finance received a grant size of 500k ARB, and exhibited a characteristic pattern of a short-term boost driven by incentives. During the analyzed period, there was a median supply growth of 16.88% and a borrow growth of 43.27%, making for a 20.69% borrow-adjusted supply growth. While the increase is significant, it ranks among the lower performers due to its relatively modest growth compared to other lending protocols, despite receiving a substantial allocation of 400k ARB. This large allocation reduces its borrow-adjusted supply growth when normalized by incentives.

In its application, Synonym planned to combine its ARB incentives with the existing SYNO emissions according to its pre-established incentives schedule. The protocol focused its bonus ARB incentives on borrowers to encourage increased activity and efficient capital utilization. A total of 400K ARB was planned for deployment alongside SYNO emissions, with 60% allocated to borrowers and 40% to depositors. These incentives would be implemented across various markets, including USDC, ARB, and forthcoming LRT markets, with weekly epochs to monitor their effectiveness.

Looking at biweeklies, nothing stands out. While it is clear that ARB incentives drive activity and growth, it does not necessarily mean the activity and growth is sticky. Protocols have to maintain activity and growth through other means such as having a great product, being highly engaged with the community, or integrating with other protocols. We were unable to observe qualitative factors due to the lack of information in biweeklies. This lack of participation or following guidelines laid out by LTIPP speaks to a broader implication for the DAO on how it can hold protocols receiving ARB incentives accountable to reporting requirements.

Gravita

Gravita is the only lending protocol where we saw a decline in borrow-adjusted supply. Supply TVL decreased 25.97% during LTIPP and borrow-to-supply ratio also decreased slightly, making a total decrease of 28.29% in borrow adjusted supply. Gravita’s borrow and supply TVL gradually decreasing throughout the program is very unusual.

Gravita has outlined several incentivization mechanisms aimed at enhancing liquidity provisioning and increasing demand for its GRAI token. The protocol planned to provide competitive returns to liquidity providers and create more utility for GRAI by onboarding lending platforms and partnering with strategic platforms.

Looking at biweeklies and reviewing the documentation, the incentives program appears to be more complex or cumbersome than most, and also involves purchasing vesting ARB (goARB), from a minimum of 20% discount to a maximum of 100% after a 40 week vesting period. In other words, users can acquire ARB at zero cost if it’s fully vested over 40 weeks, or users can acquire ARB instantly by paying a discount for it, from which the discount vests linearly. Given the additional friction from the incentive program, users likely flock to other lending protocols that offer similar yields and less complexity, and could impact users’ propensity to participate since it is possible users prefer receiving ARB for using the protocol.

It is worth noting that Gravita’s strategy using the goARB model has faced challenges, as evidenced by Umami’s previous unsuccessful attempt to implement a similar strategy during the STIP, which ultimately led to a shift in their approach. This had already been identified in our STIP retroactive analysis.

In phase 6, they terminated the program and noted “incentives are not sufficient to fix a “broken” product,” while in the following phase they wrote, “overall a success in terms of expanding user base.” Based on data in our results, evaluating these mixed signals from the bi-weeklies is not straightforward.

DEX Category

For the DEX category, we looked at Clipper, DODO, Gyroscope, Integral, Uniswap, and PancakeSwap. We did not analyze Poolside or Symbiosis in our study because we were unable to pull the relevant metrics from their Dune dashboard per OBL’s guidelines or DefiLlama. Some protocols exclusively showed incentivized pools in their Dune dashboard. However, for others, we needed to source data from DefiLlama. To maintain consistency and fairness across all protocols, we ultimately decided to rely on DefiLlama data for our analysis of this vertical, as comparing growth metrics from only incentivized pools against overall metrics would not provide a fair assessment.

Methodology

For each protocol, we collected trading volume and TVL data from before and throughout the duration of LTIPP, calculated the growth in both metrics from the 30 day median before LTIPP to the median throughout LTIPP, derived a volume-adjusted TVL metric, and normalized this metric by total ARB claimed. The equations applied in this analysis are detailed below.

Where:

  • is the supply growth,
  • ​​ is the volume-to-tvl ratio during the incentive period, ,
  • is the volume-to-tvl ratio before the incentive period, ​,
  • is the signum function, which returns +1 if , -1 if , and 0 if .

Where is the amount of ARB claimed by the protocol.

We then ran a significance test on the normalized metric to deduce which protocols outperformed and underperformed and then reviewed their bi-weeklies to glean qualitative insights regarding what types of incentives strategies work and which ones do not.

Sustained TVL growth that is not backed by volume can often lack significance, as it may quickly diminish once incentives are withdrawn. In DEXs, the primary goal of incentivizing liquidity is to create a flywheel effect that attracts more users - both traders and liquidity providers - due to enhanced protocol performance. Ideally, liquidity incentives initially draw in more LPs, boosting TVL, which in turn attracts traders seeking improved price execution. Increased trading activity generates higher fees, benefiting LPs through greater yields. Ultimately, while some liquidity may leave when incentives are removed, a portion should remain due to the enhanced yields, resulting in both elevated TVL and volume levels compared to pre-incentive metrics. This is why we developed the volume-adjusted TVL growth metric, which penalizes TVL growth when the volume-to-TVL ratio decreases and amplifies growth when the ratio increases.

When measuring the impact of claimed ARB amounts on volume-adjusted TVL for the DEX cohort, Gyroscope, Pancakeswap, Integral, and Clipper had significant results. Gyroscope was the best performer, followed by Clipper, Integral and Pancakeswap were at the median, and Dodo had the worst performance, followed by Uniswap. Gyroscope stands out as an outlier amongst the participant DEXs.

Uniswap and DODO fall in the lower 25% (Q1 - first quartile) of our data, growing normalized volume-adjusted TVL by 0.65% and -0.14%, respectively. This shows DODO experienced negative growth. Additionally, Uniswap’s growth was not deemed statistically significant based on the applied test. The absent and negative volume-adjusted TVL growth implies that these protocols saw a stable or decreased amount of usage over the period analyzed, potentially signaling that users are migrating away, incentives could have been allocated more effectively to boost utilization, or there are challenges impacting their usage such as market conditions or size.

Gyroscope and Clipper fall in the higher 25% (Q3 - third quartile) of our data, indicating that they experienced significant positive growth (173.07% and 20.81%, respectively). The high values suggest that they have seen a large increase in usage or adoption.

The box plot reveals that Gyroscope stands out as a clear outlier, exhibiting substantial growth compared to the rest of the group.

Gyroscope

Gyroscope received 100k ARB–the least amount of ARB incentives in the DEX category. Initial TVL and volume levels were very low compared to others in the vertical, enabling fruitful conditions for a positive flywheel could take place (e.g. liquidity mining attracts LPs, which improves trade execution and thus increases trade volume, which in turn leads to more fees for LPs, increasing their return on capital, which should naturally attract more LPs). The goal would be to reach an equilibrium where even when liquidity mining incentives are removed, the new equilibrium between liquidity and trade volume is higher, thus maintaining LPs.

Median TVL on Gyroscope grew by $13.5M, $1.4M more than Uniswap’s TVL increase, which amounts to a 10,715.10% growth. Volume increased by 773.42%, resulting in a 90% decrease in the volume-to-TVL ratio. Our volume-adjusted TVL growth metric of 865.35% reflects a penalization of TVL growth due to the disproportionate increase in TVL that was not sustained by trading volume.

Gyroscope has laid plans to use its 100,000 ARB grant to incentivize liquidity pools (E-CLPs) in two main categories: 33.33% of the ARB would be allocated to LST E-CLPs (~33,000 ARB), and 66.67% would go to stablecoin E-CLPs (~67,000 ARB), distributed weekly. The goal is to enhance liquidity provision for both stablecoin and LST pools, including upcoming pools like GYD/USDC and GYD/DAI.

The funds would be used to incentivize LPs, boost capital efficiency with new, concentrated liquidity pools, and promote long-term liquidity stickiness by minimizing liquidity withdrawal friction. Gyroscope also introduced a points system to reward LPs and create a liquidity network using GYD as a settlement asset to generate organic yield for LPs. Additionally, they plan to build trust with the GYD stablecoin by promoting its automated risk-control features.

Looking at biweeklies, Gyroscope also received additional rewards and subsequent TVL from other programs such as Aave. This likely is why it was the best performer and outlier. While this result could be considered an unfair evaluation–normalizing for multiple incentive programs might provide more balanced results–due to the composable nature of DeFi, it is worth highlighting how significant collaboration or composability is for incentive programs. Instead of discounting partnerships, highlighting partnerships captures a protocol’s business development efforts and product. In addition, protocols that worked together or had partnerships during LTIPP, tended to produce more successful results (e.g. Alchemix).

It is also worth noting that ARB rewards were also used to obtain vlAURA votes (i.e. a mechanism by which users can bribe the protocol to direct incentives to specific pools). This translated to higher incentives for LPs due to Balancer’s vote matching, making specific Gyroscope pools even more attractive places to LP. Again, this clever mechanism and strategy speaks to the power of composability and the ability for protocols to enter mutually beneficial partnerships.

Clipper

Clipper received 174.308k ARB–the second smallest allocation amongst DEXs. Clipper’s TVL increased from around $1M to $1.4M during the program, a significant increase of 42.66%. Interestingly, Clipper’s daily volume increased more than TVL, namely by 264.8%. This represents an increase in volume to TVL ratio of almost 100%, from 7.26% to 18.56%.

This sustained increase in volume alongside TVL growth could generate stickier liquidity and healthier metrics for the protocol in the long run. The LTIPP provided a volume-adjusted TVL growth of 109.1%.

According to their application, the goal of the program was to grow the number of LPs and TVL by specifically targeting new LPs and new TVL. They incentivized migrating from other chains to Arbitrum and pro-rated the allocation the earlier LPs moved positions.

Looking at biweeklies, Clipper highlighted interviewing LPs about their experience to create a more compelling incentives structure. This level of attention to detail and their users was not common among observed LTIPP recipients, at least according to bi-weekly reporting. One of the pieces of feedback they received was allocating more incentives to LPs from referrals, as the efficacy of referrals is unclear, because defending against sybil with referrals is not straightforward, and referrals do not necessarily garner users who are genuinely interested in the protocol. This observation was highlighted here and in other programs’ bi-weeklies.

Aside from interviewing LPs, we could not identify any outstanding or unique strategies. Rather the success of Clipper’s program can be attributable to basic factors, including having a clear case for incentives (i.e. a new novel product), having a clear direction for incentives (LPs), having a nimble approach to adjusting the incentives strategy (gaining feedback and making adjustments), and having a product that users (LPs) actually value. This perhaps speaks to the idea that simple strategies with good products are good strategies.

In fact, Clipper is a DEX that uses novel rebalancing strategies to generate higher returns for LPs. As such, arguably, Clipper is better positioned to take advantage of the aforementioned flywheel effect. Moreover, this potentially points to a broader discussion regarding allocating ARB incentives to DEXs, that the DAO should allocate incentives to DEXs that increase LP returns.

Dodo

Dodo received 350k ARB. Both TVL and volume declined sharply by the end of the program, reverting back to initial levels and nullifying the growth. Initial TVL was quite low, so a larger growth conducive of a positive flywheel effect could have taken place. When considering median values before and for the full duration of the program, no growth was observed; in fact, we noticed a slight decrease of -2.41% in TVL and a slight increase of 2.09% in volume. For the initial level of volume and TVL, the ARB allocation received was disproportionately large when compared to other protocols in the vertical. When incentive allocation is disproportionately large relative to the size of the protocol, we’d expect incentives to drive disproportionately more volume and TVL, like shown in protocols that grew a lot from zero or low TVL. This suggests either there is additional activity that we have not accounted for (such as spending incentives on another protocol), or incentive allocation was particularly inefficient.

The stated objectives from their application was (1) to enhance liquidity, minimize, slippage, and boost market-making activities for stablecoins and (2) to accelerate bridging assets to Arbitrum and elevate Arbitrum’s onchain trading experience. To do so, there were four strategies: (1) boost TVL and trading volume through higher APR, (2) establish partnerships and reward addresses trading specific tokens, (3) incentive innovative projects and drive liquidity for new assets, and (4) create a dPoints System to facilitate ARB distributions.

Looking at biweeklies, the bulk of Dodo’s incentives were allocated to solvBTC and liquid restaking tokens (weETH and ezETH). There was limited information regarding the choices for these decisions and adjustments throughout the program. Each of these pools is associated with relatively new assets (as planned). However, it is possible new assets are less palatable for LPs, if the rewards do not justify the additional exposure to newer assets. For what it’s worth, Pancakeswap added a solvBTC pool as part of their incentives strategy but quickly noted that growth in that pool was much slower than other pools and decided to shut the incentives off.

In addition, from an LP’s perspective, it is more profitable to LP pools that are also generating swap fees in addition to ARB incentives. Thus, if the pool doesn’t have sufficient liquidity or a mechanism that improves LP performance, unless incentives can pay for enough additional liquidity that tightens spread to a competitive level, then it is unlikely incentives will generate trade volume and thus additional fees to offset the risk of passively making that market. In other words, if a pool still cannot compete with existing pools, it is likely that another pool is a more effective allocation of incentives. See Gauntlet’s incentives strategy for allocating incentives to drive better price execution and thus more trading activity on specific Uniswap pools.

Moreover, when evaluating incentives for DEXs, it is important to consider the AMMs design and the amount of incentives that would drive more competition. It is possible that having a more opinionative stance on DEXs and liquidity within the Arbitrum ecosystem can generate more positive results.

Uniswap

Uniswap received 1 million ARB, more than double the allocation of the next highest recipient. Given that Uniswap had the largest TVL in the group, it is expected that its growth would be relatively modest. This is due to the proven phenomenon of diminishing returns on additional liquidity (e.g. here) and its impact on trading volume for several reasons.

First, when liquidity is already abundant, price execution is likely highly efficient compared to other onchain liquidity sources, leaving little room for improvement. Additionally, a portion of trades are often routed sub-optimally due to factors such as traders’ personal preferences, differences in user interfaces, or protocol fees. Lastly, adding liquidity to long tail assets does not necessarily imply more activity, unless, for example, price discovery for the token is outside of Arbitrum, or the underlying fundamentals for the token create conditions in which users have a reason to transact. As a result, even if enhanced price execution theoretically leads to more trades being routed to a specific exchange, the trading activity and volume depends on other factors, which can potentially limit total routable volume.

As a whole, median TVL grew by $12.1M or 4.35%, a non-significant amount when comparing the month before LTIPP to during the program. This does not mean that individual pools did not perform well. Gauntlet shared some initial results that look positive for the incentivization of a select group on Uniswap pools on Arbitrum.

1M ARB allocation is, again, more than double the next-highest allocation, so this penalizes our efficiency-focused metric. Volume to TVL ratio decreased slightly during the program, from 96.00% in the month before to 85.94% For a complete breakdown of Uniswap’s (previous) incentive program, see here. This should provide context for how Gauntlet distributes ARB incentives. They deploy a sophisticated strategy that optimizes for improving pools where more liquidity would yield better price execution.

However, while these results are promising at a protocol specific level (i.e. ARB incentives were able to effectively raise TVL in specific Uniswap pools), in comparison to other DEX protocols, the impact of ARB incentives on Uniswap were not significant. Intuitively, this makes sense since the larger a protocol is, the more diminishing returns are present i.e. 1 ARB has a larger impact on smaller protocols.

Furthermore, it is also intuitive that protocols like Uniswap that already have strong PMF and significant cash flows arguably do not fall into the category of protocols that need ARB incentives. Simply put, spending ARB on protocols that have higher ROI are a much more effective way to generate successful results for an incentive program.

Another takeaway is, our methodology does not offer a fair comparison for larger protocols like Uniswap, and future incentive programs and analysis should differentiate between newer protocols and those with PMF. This differentiation could boil down to amount received, the type of support from the DAO, goals, or impact on the ecosystem.

Perps Category

For the perps category, we looked at Aark, LOGX, APX Finance, CVI Finance, and Okto. We did not analyze Synthetix because it appears they only used ARB rewards to bootstrap collateral in vaults and plan to go live with trading after LTIPP. Audits for the multi collateral contracts were delayed for months, so trading never went live while LPs were issued incentives to stay. LTIPP ended and they launched trading right after, and filed for an extension to use the remaining ARB but got denied. Hence, we excluded them from our study, and we urge the DAO to be more thorough about incentives, as it does not make that much sense to allocate incentives to protocols that aren’t ready for them. Although Pear Protocol ran a program according to their bi-weeklies, we did not include them in our analysis because we could not find their Dune dashboard per OBL’s guidelines or pull their data from DefiLlama. We are happy to add the data to the study if it is presented.

Methodology

For each protocol, we collected trading volume data from before and throughout the duration of LTIPP, calculated the growth in volume from the 30 day median before LTIPP to the median throughout LTIPP, and normalized this metric by total ARB claimed.

Where is the amount of ARB claimed by the protocol.

We then ran a significance test to deduce which protocols outperformed and underperformed and then reviewed their bi-weeklies to glean qualitative insights regarding what types of incentives strategies work and which ones do not.

When measuring the impact of claimed ARB amounts on volume for the perps cohort, APX Finance, CVI Finance, Okto and Contango all had a significant increase in trading volume. APX Finance was the best performer, followed by CVI Finance. Aark and LOGX were the worst performers according to volume growth normalized by ARB allocation.

LOGX and Aark fall in the lower 25% (Q1 - first quartile) of our data, with normalized volume having decreased by 9.7% and 10.39%, respectively. This shows that both LOGX and Aark experienced negative growth, implying that these protocols saw a decreased amount of usage over the period analyzed, potentially signaling that users are migrating away, incentives could have been allocated more effectively to boost utilization, or there are challenges impacting their usage such as market conditions or size.

APX and CVI Finance fall in the higher 25% (Q3 - third quartile) of our data, indicating that they experienced significant positive growth (47.47% and 31.31%, respectively). The high values suggest that they have seen a large increase in usage or adoption.

The box plot also shows a large spread between underperforming protocols and overperforming ones, indicating substantial variation in volume growth across protocols in the dataset.

APX

APX Finance received 525k ARB. The median daily volume observed was $22.5M, compared to $6M in the month before the program, an increase of 271.55% during LTIPP.

Although APX was a top performer, the sharp decline in volume during the final month of the program signals potentially unsustainable user behavior. This aligns with the earlier market share analysis, which suggests that LTIPP had minimal impact on the perps vertical.

According to their application, the focus of the program was to expand multichain liquidity between BSC and Arbitrum. They planned to distribute 225k ARB to cross-chain ALPs and LPs, targeting a 5x to $5.1M from $1.02M for Arbitrum ALPs, by the end of LTIPP. They also did grant matching to encourage new traders.

Looking at biweeklies, they focused rewards on LP incentives and trading activity, and gave additional rewards for completing Galxe quests and trading activity/LPing on Pancakeswap–another instance in which a top performer leveraged collaborations. They also hosted a lottery in which one lucky trader wins 5k ARB each week. It is also worth noting that when users staked their ALP tokens, they received ARB rewards from APX and Pancakeswap. Thus, yields were higher relative to other protocols, which might contribute to their outperformance.

They also focused on marketing throughout the program, and highlighting frequent communication was necessary to keep interested users. This is something other top performers (like Alchemix) in LTIPP also noted.

CVI

CVI Finance received 125k ARB, and median trading volume grew by 46.77%, when comparing the month before LTIPP to the months during the program. In line with APX, the decline of trading volume towards the end of the program is noticeable.

The stated objectives from their application are: (1) encourage trading activity, (2) amplify liquidity and market stability, and (3) foster widespread trading engagement. To do so,the plan is to dedicate a portion of incentives to LPs, a portion to trading competitions, a portion to trading rebates, and the remainder to ARB raffles based on trading volume.

Looking at biweeklies, they focused rewards on LP incentives. They even reduced the level of rebates to increase rewards for LPs to reduce their risk in volatility and incorporated trading competitions to boost activity and raffles. There was limited information from bi-weeklies.

Aark

Aark received 900k ARB, and when comparing the median daily trading volume of the month before LTIPP and the months during LTIPP, there was a 61.88% decrease. This does not necessarily mean that incentives caused a decrease in the protocol’s volume, but rather that it wasn’t effective in preventing it.

For the protocols in the perps category that we analyzed, Aark received a very large allocation of 900k ARB, which contributed to its place in the bottom performers in terms of volume growth normalized by allocation.

The stated objectives from their application are (1) onboard non-Arbitrum users onto Aark, (2) focus rewards and retention campaigns on non-Arbitrum users, and (3) grow TVL via LSTs, LRTs, and RWAs. To do so, it involved using an airdrop quest to acquire users and offering reduced fees to traders on Aark. They incentivized LPing assets unique to Aark, such as GM tokens, LSTs, singled-side ARB, BTC, and ETH. They also planned to incentivize affiliates to build a network of non-Arbitrum users through referrals.

Looking at biweeklies, they quickly recognized some sybil behavior from their referral program, and decided to reallocate incentives to trading. They also adjusted their new user acquisition strategy and partnered with Stakestone to strengthen LP incentives. It appears Aark was very active in tweaking their strategy to optimize outcomes–adjustments were a common theme in their biweeklies. Oddly, the results did not improve. It is likely the underperformance can be attributable to shortcoming in the perps category. Even outperformers lose activity right once incentives are completely distributed.

Interestingly, it appears volume was steadily increasing into April. This can be explained by users farming $AARK leading into the protocol’s TGE which was delayed to June after initially being slated to go live on April 1st. It is possible the expectation return from AARK’s TGE exceeded the expected return from ARB incentives.

LOGX

LogX received 395k ARB and volume decreased by 54.28% during LTIPP when compared with the month before. Volume had been steadily decreasing since April, and the incentives from LTIPP weren’t able to stop this trend.

According to their application, LogX planned to use ARB incentives to acquire more CEX traders, increase trader engagement, and drive growth via referral programs. To retain and engage these traders, they plan weekly competitions, fee rebates, and bonuses. They also planned to incentivize referrals and affiliates.

Looking at biweeklies, they made adjustments to their original ARB grant usage to focus on trading competition and trading fee rebates. They also added $LOGX rewards to the top trader on the leaderboard and incentivized affiliate partners on a custom based deal basis. In mid-August, they ran a massive $100k ARB and 60k LOGX trading campaign to reward traders on the leaderboard. However, volume remained spiky and continued to decline over the duration of LTIPP.

All in all, it appears the perps category is a challenging category for incentives. Some protocols actually saw a decrease in trading volume, and for the top performers that experienced a significant increase, these gains quickly reverted once incentives finished. Incentive strategies for perp exchanges are fairly consistent, e.g. fee rebates, trading competitions, referral programs, and trading volume, however, results are consistently ineffective, especially when considering post-incentives activity. Intuitively, if all perps protocols are offering fee rebates, trading competitions, and referral programs, then users will use the one that is offering the highest reward at the cheapest cost.

The DAO should consider more rigorous analysis on how to allocate ARB incentives to perps exchanges, because based on our methodology, it appears doing so is not an effective spend of capital. Perp protocols should consider investigating different incentives strategies–simply attracting new traders via fee rebates, trading competitions, and referrals oftentimes may have positive short term effects but seldom sustains it. Most of the time this activity quickly reverts once incentives have been distributed.

Conclusion

LTIPP had a positive impact on the Arbitrum ecosystem, broadly speaking. However, there are ways to improve the effectiveness of incentive programs on a per ARB (or USD) basis. Here is a list of observations that we believe the DAO can consider for upcoming incentives programs.

  • Protocols that formed partnerships during LTIPP generally saw better results. This could be further explored by allocating incentives to protocols that are composable and strengthen the set of existing protocols in the Arbitrum ecosystem, or to programs that explicitly target collaborations and leverage partnerships to more effectively enhance ecosystem-wide metrics. The DAO (or committee running the program) could take more lead on thinking more critically about which protocols are mutually beneficial.
  • For the DEX category, protocols that enhance LP experience did really well. Incentives should be allocated to protocols that enhance LP returns because the flywheel is more pronounced per ARB spent when a DEX is more capital-efficient and beneficial to LPs.
  • Further, DEXs should prioritize pools based on total revenue potential, including additional swap volume per unit of liquidity. Inefficient pools, or long tail (newer) assets, should be avoided in favor of better options. There may be value in DEX protocols funding research/advisory to help individual protocols optimize incentive allocation, similar to Uniswap’s partnership with Gauntlet.
  • Effective marketing, frequent communication, and responsiveness to user feedback are key. The DAO (or committee running the program) should take some lead on promoting LTIPP recipients’ applications and incentive strategies and fostering more engagement.
  • LTIPP significantly increased DEX activity on Arbitrum, and it appears the program saved sequencer revenue from its downward spiral. Hence, if the DAO can expect incentives to boost spot trading activity, then it is prudent for the DAO to focus incentives programs after Timeboost goes live. The main implications of this are Timeboost is expected to increase sequencer revenues for the DAO and increased activity will generate more insights about Timeboost and its impact on users.
  • LTIPP did not have an impact on Arbitrum’s perps ecosystem, more broadly, and based on protocol-specific results, incentives were largely ineffective and/or unsustainable. It’s possible that strategies (e.g. fee rebates, trading competitions, referral programs, and trading activity) for perps are easier to game (e.g. users can wash trade, trade delta neutral positions, find loopholes in referral programs, etc). A more thorough analysis of the perps category would be useful for determining how the DAO should think about allocating ARB incentives to perps protocols.
  • Complex incentives that involve purchasing vesting ARB (goARB, example from Gravita) can result in additional friction and make users opt for protocols that offer similar features and less complexity.

Further Research Directions

As there are ongoing discussions about establishing an incentives group for future programs, this analysis is meant to be a checkpoint for thinking about best (and worst) practices from LTIPP. We look forward to more in-depth analysis from PYOR and Lampros.

As aforementioned, we were unable to scale a way to monitor ARB distributions after protocols initially claimed their distributions within the remainder of our first ARDC engagement. It would be prudent and potentially very valuable to track and analyze exactly how protocols distributed ARB. To get ahead of this problem in future incentives programs, further work should be done to standardize reporting requirements to collect the necessary information so that the DAO can easily monitor activity.

In addition, the analysis would benefit from incorporating a few weeks of post-incentive data to assess not only the efficiency of the incentives during the program but also their lasting impact. Comparing the key metrics one month before and one month after the incentive period would provide valuable insights into the sustainability of the growth.

It is also worth noting the difference between allocating incentives to DEXs vs perps. Namely, it appears spending ARB on DEXs has a higher ROI than spending on perps. Furthermore, we encourage the DAO to learn into DEXs. This might include more thorough analysis on DEX designs and developing frameworks for deducing to which pools to allocate, especially as Timeboost goes live which should increase arbitrage activity across DEXs.

Appendix

Methodology

Given the non-uniform L1 LTIPP data and the limited time left in the ARDC v1, we split LTIPP recipients into cohorts by sector and will focus our analysis on the top 3 sectors (by number of recipients): Lending (10), DEX (8), and Perps (8). These sectors that have the most participants are among the most important drivers for Arbitrum’s growth.

First, we aggregate incentives by sector and compare each sector’s market share against other chains and L2 ecosystems to measure the effectiveness of LTIPP more broadly. Before diving into LTIPP at the protocol level, we want to illustrate the impact of LTIPP on Arbitrum’s market share of important categories. We also review LTIPP’s impact on sequencer revenue. For each metric described in the results, we normalize by market share to avoid any distortion from broader market dynamics. For example, in the case of DEXs, we determine the added TVL by multiplying the increase in market share by the total DEX TVL, allowing us to quantify the impact of the program independently of overall market fluctuations. This has been done in previous analyses, e.g. here. We calculate Arbitrum’s L2 revenue market share by considering its sequencer fees amongst those of other major L2s (Base, Blast, Linea, Optimism, Polygon Zkevm, Scroll, Zksync Era, Zora). We then compare the change in market share before and during the LTIPP program, and multiply this difference by total fees to get the added revenue, normalized by market share. To assess the impact per dollar spent on incentives, we use the ARB price of $1.13 on June 2nd, 2024, the program’s start date.

Then, we collect protocol level metrics as prescribed by OpenBlock Labs and overlay each phase of ARB claims on charts to illustrate basic visualizations of the impact of incentives on protocol level growth metrics for each protocol. We did not track subsequent ARB distributions for each phase because protocols deployed different strategies for distributing incentives. Following flows for each protocol was not scalable within the time frame of our analysis.

For our analysis, we calculate the 30-day median before LTIPP and the median throughout LTIPP for each protocol level metric. We perform a significance test (Mann-Whitney U-Test) to make sure any observable differences when comparing the period before and during are statistically significant.

We then normalized metrics by the log of total ARB claimed. We decided to use claim amounts for a couple of reasons. One, protocols distributed ARB very differently, thus making it more challenging to both scale tracking ARB and apply a consistent methodology for each protocol. And two, normalizing growth by claim amount, we are able to provide a lower bound for how each protocol performed, because if we assume protocols spent all the ARB they were granted, then protocols that did not spend all the ARB would have more effective results. Further analysis, that is able to precisely track the ARB spent, could enhance the findings from this analysis and also detect any misuse of funds (like in previous programs), both of which would be valuable to the DAO.

We take the distribution of protocol metric growth by category and make a boxplot to check for outperformers and underperformers. Then, we can select the protocols that are outliers, in the first and third quartiles (i.e. the bottom 25% and top 75%), and review their bi-weekly updates to observe qualitative factors that contributed to their performance.

References

Results and Analysis: Uniswap Arbitrum Liquidity Mining Program
STIP Analysis of Operations and Incentive Mechanisms
STIP Retroactive Analysis – Perp DEX Volume
Timeboost Revenue and LP Impact Analysis
https://x.com/gauntlet_xyz/status/1839294330911207782

2 Likes