ARDC Research Deliverables

STIP Retroactive Analysis – Perp DEX Volume

The below research report is also available in document format here.

TL;DR

In H2 2023, Arbitrum launched the Short-Term Incentive Program (STIP) by distributing millions of ARB tokens to various protocols to drive user engagement. This report focuses on how the STIP impacted trading volumes in the perp DEX vertical, specifically examining the performance of Vertex, Gains Network, GMX, MUX Protocol, Vela Exchange, Perennial, and Jojo Exchange. By employing the Synthetic Control (SC) causal inference method to create a “synthetic” control group, we aimed to isolate the STIP’s effect from broader market trends. For an in-depth explanation of the utilized inference method and data, see the Methodology and Annex sections at the end of this report.

Our analysis yielded varied results: Vertex saw a significant 70% of its total trading volume in the analyzed period attributed to STIP, while GMX saw 43%. MUX Protocol also benefited, with 15% of its volume linked to STIP incentives. In contrast, our model predicts that Gains Network experienced 5% less volume than if there had been no STIP, and Vela Exchange showed no statistically significant impact. These outcomes seem to highlight that mainly utilizing traditional fee rebates, as done by Vertex, GMX, and MUX, was more effective in driving volume growth than the gamified incentives used by Gains Network and Vela Exchange.

Our methodology and detailed results underscore the complexities of measuring such interventions in a volatile market, stressing the importance of comparative analysis to understand the true impact of incentive programs like STIP.

Context and Goals

In H2 2023, Arbitrum initiated a significant undertaking by distributing millions of ARB tokens to protocols as part of the Short-Term Incentive Program (STIP), aiming to spur user engagement. This program allocated varying amounts to diverse protocols across different verticals. Our objective is to gauge the efficacy of these recipient protocols in leveraging their STIP allocations to boost the usage of their products. The challenge lies in accurately gauging the impact of the STIP amidst a backdrop of various factors, including broader market conditions.

This report pertains to the perp DEX vertical in particular. In this vertical, the STIP recipients were GMX, Gains Network, Vertex, MUX Protocol, Vela Exchange, Perennial, and Jojo Exchange. The following table summarizes the amount of ARB tokens received and when they were used by each protocol.

To assess the impact of the STIP on perp DEX protocols, daily trading volume serves as a crucial metric. Despite varying approaches to liquidity—such as synthetic AMMs versus orderbook liquidity—the true measure of a protocol’s success lies in the volume traded on its platform.

We used a Causal Inference method called Synthetic Control (SC) to analyze our data. This technique helps us understand the effects of a specific event by comparing our variable of interest to a “synthetic” control group. Here’s a short breakdown:

  • Purpose: SC estimates the impact of a particular event or intervention using data over time.
  • How It Works: It creates a fake control group by combining data from similar but unaffected groups. This synthetic group mirrors the affected group before the event.
  • Why It Matters: By comparing the real outcomes with this synthetic control, we can see the isolated effect of the event.

In our analysis, we use data from other protocols to account for market trends. This way, we can better understand how protocols react to changes, like the implementation of the STIP, by comparing their performance against these market-influenced synthetic controls. The results pertain to the period from the start of each protocol’s use of the STIP until March 1st. Therefore, this analysis centers on the effectiveness of various incentive structures, rather than the sustainability of activity. The final date was chosen to keep the statistical significance level at 90%, as further described in the Methodology section.

Jojo Exchange was excluded from the analysis due to insufficient reliable data and an apparent pivot to Base. Blockworks Research has conducted a separate case study on Jojo, available on the governance forum. Perennial has also been excluded due to its v2 launch coinciding with the start of the STIP program, and the methodology we employed requires a comparable dataset from before the STIP’s implementation to accurately gauge its impact.

Results

Vertex

The total estimated impact of the STIP from November 8th, 2023 until March 1st, 2024 on Vertex’s daily volume is $32.8B. Vertex’s total volume in this period was $46.9B, so according to our analysis, 70% of the total volume can be attributed to the STIP. Since Vertex received a total of 3M ARB, valued at around $3.6M (at 1.2$ per ARB), this means that the STIP created $9111.67 in volume per dollar spent for the 115 days analyzed.

Gains Network

The total estimated impact of the STIP from December 29th, 2023 until March 1st, 2024 on Gains Network’s daily volume is minus $0.3B. Gains Network’s total volume in this period is $7.1B, so according to our analysis, a loss of 5% of total volume can be attributed to the STIP. Since Gains Network received a total of 4.5M ARB valued at around $5.4M (at 1.2$ per ARB), this means that the STIP caused a loss of $63.52 in volume per dollar spent for the 64 days analyzed.

GMX

The total estimated impact of the STIP from November 8th, 2023 until March 1st, 2024 on GMX’s daily volume is $10.5B. GMX’s total volume in this period is $24.4B, so according to our analysis, 43% of the total volume can be attributed to the STIP. Since GMX received a total of 12M ARB valued at around $14.4M (at 1.2$ per ARB), this means that the STIP created $731.67 in volume per dollar spent for the 115 days analyzed.

MUX Protocol

The total estimated impact of the STIP from November 16th, 2023 until March 1st, 2024 on MUX Protocol’s daily volume is $2.2B. MUX Protocol’s total volume in this period is $15.2B, so according to our analysis, 15% of the total volume can be attributed to the STIP. Since MUX Protocol received a total of 6M ARB valued at around $7.2M (at 1.2$ per ARB), this means that the STIP created $308.33 in volume per dollar spent for the 107 days analyzed.

Vela Exchange

The total estimated impact of the STIP from December 27th, 2023 until March 1st, 2024 on Vela Exchange’s daily volume was minus $456M, but this impact was not deemed statistically significant. So effectively, the analysis did not find the STIP to have any significant impact, positive or negative, on Vela Exchange’s daily volume. In this context, it means that the observed impact of the STIP on the protocol’s daily volume could simply be due to random fluctuations rather than a real effect.

Main Takeaways

Our analysis, conducted with a 90% significance level, produced interesting results for the impact of the STIP on Vertex, Gains Network, GMX, and MUX Protocol. The analysis deemed the impact on Vela Exchange to not be statistically significant, which means that we couldn’t confidently say that the STIP caused a noticeable change in the protocol’s daily volume, but rather the variations we see could just be due to market behavior. A further explanation of how this and all results were derived can be found in the Methodology and Annex sections.

To interpret and understand the results, it is important to have an overview of the incentive mechanisms utilized by the different protocols.

GMX utilized their STIP ARB incentives by focusing primarily on maximizing TVL to ensure adequate liquidity, which is crucial for providing a good trading experience given their AMM design. Additionally, they also reduced trading fees on the decentralized perpetual exchange to levels comparable to the VIP tiers of leading centralized exchanges. Traders on GMX v2 benefited from a rebate of up to 75% on open and close fees, thanks to the STIP incentives, attracting users with a minimal entry fee of 0.015%. In total, 4,984,768.84 ARB was distributed as trading incentives, with the remaining incentives including liquidity and grants incentives. To further boost engagement, GMX ran a two-week trading competition to attract new traders to the V2 platform, though they do not plan to use bridge incentives for future competitions.

Vertex focused their STIP ARB incentives on KPIs such as monthly trading volume, monthly active users, on-chain activity, and TVL. Their first round of incentives targeted two main areas: trading rewards and Elixir LP Pools. In total, 3 million ARB was allocated across 16 weekly epochs, with 2.55 million ARB dedicated to Vertex trading incentives and 450,000 ARB to Elixir liquidity incentives. Additionally, Vertex matched the STIP with a rewards program, offering dual incentives for trading with approximately 10 million VRTX tokens allocated to each epoch. The data indicates that providing trader rebates significantly boosts on-chain usage of perpetuals.

During the STIP period, Gains Network implemented quite different incentive streams through a points system, mainly rewarding traders for behaviors such as fees paid, absolute PnL, loyalty, and relative PnL. These rewards were distributed weekly, with different allocations for each category. While this gamification attracted engagement, it also led to sybil attempts, especially in the relative PnL category, where actors tried to game the system with delta-neutral positions to extract ARB from the reward pools. Consequently, the relative PnL category was dropped during the program. For the STIP campaign, Gains Network allocated 85% of incentives to trading and 15% to LP incentives, distributing a total of 3.825 million tokens in trading incentives. Additionally, Gains Network provided a partial match of 65,000 GNS tokens to LP incentives, to further boost the incentive program.

MUX Protocol offered rebates of up to 100% on open and close fees for all integrated protocols on the MUX aggregator. This strategy aimed to aggressively onboard more traders to Arbitrum. The total STIP amount was used in the Rebate Program, where traders who opened and closed positions through the MUX Aggregator received weekly ARB token rebates for fees incurred on MUX, GMX V1, GMX V2, and Gains positions on Arbitrum.

Vela Exchange ran a gamified trading competition, the Grand Prix. Throughout the Grand Prix, users competed in five themed rounds, each offering new challenges and opportunities to earn credits, the event’s currency. Liquidity providers and yield farmers benefited from limited-time events where their contributions to VLP minting earned them greater credit multipliers. The grant breakdown for the STIP campaign included 150,000 ARB for multi-chain and fiat onboarding, 500,000 ARB for developing social features and trading leagues, and 350,000 ARB for VLP vault rewards. To prevent wash trading, incentives in the trading leagues were capped based on fees earned and focused on PnL, with volume playing a secondary role.

Vertex saw its daily volume impacted the most compared to the other perp DEXs analyzed. Our analysis attributes 70% of the project’s total volume to the STIP while having received a relatively low amount of ARB tokens. Vertex is followed by GMX, where 43% of the total volume was attributed to the STIP. GMX, however, received the largest amount of ARB tokens of any other protocol so the added volume per dollar spent is naturally smaller. The negative impact of the STIP on Gains Network could be attributed to the incentive mechanism having involved social trading competitions instead of rebating fees and direct rewards. Both Gains Network and Vela Exchange implemented gamified points systems instead of traditional trading rewards and fee rebates, however, according to our analysis, these strategies were less effective in boosting volume beyond the general market trend.

Having said that, it’s essential to acknowledge the limitations inherent in our models, which are only as reliable as the data available. Numerous factors can drastically influence outcomes, making it challenging to isolate the effects of a single intervention. This is particularly true and disproportionate in the crypto industry. Other relevant secondary factors possibly contributing to the differing results among perp DEXs can include traders’ mercenary activity and the cannibalization of trading volume.

Given these complexities, our results should be interpreted comparatively rather than absolutely. The SC methodology was uniformly applied across all protocols, allowing us to gauge the relative efficacy of the STIP allocation.

Appendix

Methodology

TL;DR: We employed an analytical approach known as the Synthetic Control (SC) method. The SC method is a statistical technique utilized to estimate causal effects resulting from binary treatments within observational panel (longitudinal) data. Regarded as a groundbreaking innovation in policy evaluation, this method has garnered significant attention in multiple fields. At its core, the SC method creates an artificial control group by aggregating untreated units in a manner that replicates the characteristics of the treated units before the intervention (treatment). This synthetic control serves as the counterfactual for a treatment unit, with the treatment effect estimate being the disparity between the observed outcome in the post-treatment period and that of the synthetic control. In the context of our analysis, this model incorporates market dynamics by leveraging data from other protocols (untreated units). Thus, changes in market conditions are expected to manifest in the metrics of other protocols, thereby inherently accounting for these external trends and allowing us to explore whether the reactions of the protocols in the analysis differ post-STIP implementation.

To achieve the described goals, we turned to causal inference. Knowing that “association is not causation”, the study of causal inference lies in techniques that try to figure out how to make association be causation. The classic notation of causality analysis revolves around a certain treatment , which doesn’t need to be related to the medical field, but rather is a generalized term used to denote an intervention for which we want to study the effect. We typically consider the treatment intake for unit i, which is 1 if unit i received the treatment and 0 otherwise. Typically there is an , the observed outcome variable for unit i. This is our variable of interest, i.e., we want to understand what the influence of the treatment on this outcome was. The fundamental problem of causal inference is that one can never observe the same unit with and without treatment, so we express this in terms of potential outcomes. We are interested in what would have happened in the case some treatment was taken. It is common to call the potential outcome that happened the factual, and the one that didn’t happen, the counterfactual. We will use the following notation:

- the potential outcome for unit i without treatment

- the potential outcome for the same unit i with the treatment.

With these potential outcomes, we define the individual treatment effect to be . Because of the fundamental problem of causal inference, we will actually never know the individual treatment effect because only one of the potential outcomes is observed.

One technique used to tackle this is Difference-in-Difference (or diff-in-diff). It is commonly used to analyze the effect of macro interventions such as the effect of immigration on unemployment, the effect of law changes in crime rates, but also the impact of marketing campaigns on user engagement. There is always a period before and after the intervention and the goal is to extract the impact of the intervention from a general trend. Let be the potential outcome for treatment D on period T (0 for pre-intervention and 1 for post-intervention). Ideally, we would have the ability to observe the counterfactual and estimate the effect of an intervention as: , the causal effect being the outcome in the period post-intervention in the case of a treatment minus the outcome in the same period in the case of no treatment. Naturally, is counterfactual so it can’t be measured. If we take a before and after comparison, - we can’t really say anything about the effect of the intervention because there could be other external trends affecting that outcome.

The idea of diff-in-diff is to compare the treated group with an untreated group that didn’t get the intervention by replacing the missing counterfactual as such: . We take the treated unit before the intervention and add a trend component to it, which is estimated using the control . We are basically saying that the treated unit after the intervention, had it not been treated, would look like the treated unit before the treatment plus a growth factor that is the same as the growth of the control.

An important thing to note here is that this method assumes that the trends in the treatment and control are the same. If the growth trend from the treated unit is different from the trend of the control unit, diff-in-diff will be biased. So, instead of trying to find a single untreated unit that is very similar to the treated, we can forge our own as a combination of multiple untreated units, creating a synthetic control.

That is the intuitive idea behind using synthetic control for causal inference. Assuming we have units and unit 1 is affected by an intervention. Units are a collection of untreated units, that we will refer to as the “donor pool”. Our data spans T time periods, with periods before the intervention. For each unit j and each time t, we observe the outcome . We define as the potential outcome without intervention and the potential outcome with intervention. Then, the effect for the treated unit at time t, for is defined as . Here is factual but is not. The challenge lies in estimating .

image
Source: 15 - Synthetic Control — Causal Inference for the Brave and True

Since the treatment effect is defined for each period, it doesn’t need to be instantaneous, it can accumulate or dissipate. The problem of estimating the treatment effect boils down to the problem of estimating what would have happened to the outcome of the treated unit, had it not been treated.

The most straightforward approach is to consider that a combination of units in the donor pool may approximate the characteristics of the treated unit better than any untreated unit alone. So we define the synthetic control as a weighted average of the units in the control pool. Given the weights the synthetic control estimate of is .

We can estimate the optimal weights with OLS like in any typical linear regression. We can minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period. Hence, creating a “fake” unit that resembles the treated unit before the intervention, so we can see how it would behave in the post-intervention period.

In the context of our analysis, this means that we can include all other perp DEX protocols that did not receive the STIP in our donor pool and estimate a “fake”, synthetic, control perp DEX protocol that follows the trend of any particular one we want to study in the period before receiving the STIP. As mentioned before, the metric of interest chosen for this analysis was daily volume and, in particular, we calculated the 7-day moving average to smooth the data. Then we can compare the behavior of our synthetic control with the factual and estimate the impact of the STIP by taking the difference. We are essentially comparing what would have happened, had the protocol not received the STIP with what actually happened.

However, sometimes regression leads to extrapolation, i.e., values that are outside of the range of our initial data and can possibly not make sense in our context. This happened when estimating our synthetic control, so we constrained the model to do only interpolation. This means we restrict the weights to be positive and sum up to one so that the synthetic control is a convex combination of the units in the donor pool. Hence, the treated unit is projected in the convex hull defined by the untreated unit. This means that there probably won’t be a perfect match of the treated unit in the pre-intervention period and that it can be sparse, as the wall of the convex hull will sometimes be defined only by a few units. This works well because we don’t want to overfit the data. It is understood that we will never be able to know with certainty what would have happened without the intervention, just that under the assumptions we can make statistical conclusions.

Formalizing interpolation, the synthetic control is still defined in the same way by . But now we use the weights that minimize the square distance between the weighted average of the units in the donor pool and the treated unit for the pre-intervention period , subject to the restriction that are positive and sum to one.

We get the optimal weights using quadratic programming optimization with the described constraints on the pre-STIP period and then use these weights to calculate the synthetic control for the total duration of time we are interested in. We initialized the optimization for each analysis with different starting weight vectors to avoid introducing bias in the model and getting stuck in local minima. We selected the one that minimized the square difference in the pre-intervention period.

As an example, below is the resulting chart for Vertex, showing the factual daily volume observed in Vertex and the synthetic control.

With the synthetic control, we can then estimate the effect of the STIP as the gap between the factual protocol daily volume and the synthetic control, .

To understand whether the result is statistically significant and not just a possible result we got due to randomness, we use the idea of Fisher’s Exact Test. We permute the treated and control units exhaustively by, for each unit, pretending it is the treated one while the others are the control. We create one synthetic control and effect estimates for each protocol, pretending that the STIP was given to another protocol to calculate the estimated impact for this treatment that didn’t happen. If the impact in the protocol of interest is sufficiently larger when compared to the other fake treatments (“placebos”), we can say our result is statistically significant and there is indeed an observable impact of the STIP on the protocol’s daily volume. The idea is that if there was no STIP in the other protocols and we used the same model to pretend that there was, we wouldn’t see any impact.

It is expected that the variance after the intervention will be higher than the variance before the intervention since the synthetic control is designed to minimize the difference in the pre-intervention period. This can be seen in the chart below. Some protocols don’t fit well at all even in the pre-intervention period when no convex combination matches them, so they were removed from the analysis by setting a threshold for pre-intervention error.

With this test, we see that if we pretend the STIP was given to another protocol, we would almost never get an effect so extreme as the one we got with Vertex. For the other perp DEXs in the STIP, this was not always the case, especially after March 2024. For that reason, to maintain statistical significance at 90%, we restricted the analysis to the impact observed until March 1st.

References

“Hernán MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.”

Aayush Agrawal - Causal inference with Synthetic Control using Python and SparseSC

01 - Introduction To Causality — Causal Inference for the Brave and True

Annex

The Annex describes the data used and shows intermediary charts used in the analysis.

The data gathered to evaluate the effect of the STIP on each protocol’s daily volume includes data on daily volume from multiple protocols across a few months. There is a balance between having a long enough timeline of historical data and enough protocols to compare with. For instance, GMX was launched before 2022 but we chose to use only data from 2023 to allow for a larger donor pool. Protocols that also received the STIP were dropped from the analysis. The 7-day moving average was used to smooth out the time series.

Vertex

Protocols used in the donor pool: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), dYdX, GMX (avax).

Gains Network

Protocols used: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), Gains Network (polygon), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), KTX.Finance (bsc), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), HoldStation DeFutures (era), dYdX, GMX (avax).

GMX

Protocols used in the donor pool: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), dYdX, GMX (avax).

MUX Protocol

Protocols used in the donor pool: ApeX Protocol (ethereum), APX Finance (bsc), Fulcrom (cronos), GooseFX (solana), Hyperliquid (hyperliquid), IPOR (ethereum), Level Finance (bsc), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Polynomial Trade (optimism), SpaceDex (bsc), dYdX, GMX (avax).

Vela Exchange

Protocols used in the donor pool: Aevo (ethereum), ApeX Protocol (ethereum), APX Finance (bsc), Based Markets (base), Beamex (moonbeam), BLEX (arbitrum), Drift (solana), Fulcrom (cronos), Gains Network (polygon), HMX (arbitrum), GooseFX (solana), Hyperliquid (hyperliquid), ImmortalX (celo), KiloEx (bsc), KTX.Finance (bsc), IPOR (ethereum), Level Finance (bsc), Level Finance (arbitrum), Morphex (fantom), MUX Protocol (bsc), MUX Protocol (avax), MUX Protocol (optimism), PancakeSwap Perps (bsc), Pinnako (era), Polynomial Trade (optimism), Synthetix (optimism), UniDex (optimism), SpaceDex (bsc), UniDex (era), UniDex (arbitrum), UniDex (fantom), UrDEX Finance (arbitrum), Vela Exchange (arbitrum), HoldStation DeFutures (era), dYdX, GMX (avax).

2 Likes