PYOR - Arbitrum STIP, Backfund STIP and LTIPP incentive Efficacy analysis Preliminary report

PYOR was chosen as one of the teams to evaluate Arbitrum’s incentive programs (STIP, Backfund STIP, and LTIPP) as part of the LTIPP research bounty selection process. More details on this initiative can be found here.

We are now publishing our First report: a preliminary report comparing the performance of different segments across all three programs.

This report includes:

  • A summary of the performance of each segment across all three programs.
  • Key insights and common patterns observed across all programs.
  • ROI analysis.
  • Detailed data tables and visualizations to support our findings.

We analyzed multiple segments based on 8 key metrics, with program performance serving as the benchmark for comparison.

Motive
The ArbitrumDAO has proposed the distribution of a total of 101.86 million ARB tokens to ecosystem protocols through three distinct programs: STIP, Backfund STIP, and LTIPP. Of this total, 95.95 million ARB tokens were claimed by grantee protocols and used for user incentives. The STIP program involved 30 grantees, Backfund included 26 grantees, and LTIPP had 88 grantees.

This study aims to assess the impact of these incentives on Arbitrum, identify the most influential segments, and derive insights that could help the DAO design more effective incentive programs in the future.

As LTIPP concluded in mid-September, there is insufficient post-incentive data to fully assess its long-term impact. This analysis will be included in our second report.

The report can be accessed here: PYOR - Arbitrum STIP, Backfund STIP, and LTIPP Incentive Efficacy Analysis Preliminary report.

Additionally, the data visualization dashboard can be found here: PYOR - Arbitrum Incentives Dashboard.

Our second and final report will focus on the categorization and analysis of incentivized wallet addresses, a qualitative analysis of segment leaders and their strategies, and will address any additional questions or requests from delegates and the Foundation. Please share any questions you would like us to include in our second and final report that haven’t been covered above.

I hate to say it, but this report is pretty far below the standards to which it was funded. For 10,000-20,000 ARB we could have had a team of undergrads a university blockchain club produce more rigorous work. I would suggest in the future ARB solicit some undergrad clubs like CAL, Stanford, Princeton, Columbia, etc.

Composition Needs Work
First, with regards to the composition. The report lacks a standard format APA or MLA. It uses no prose when necessary. There is absolutely no logical flow in the document itself, nor in the presentation of the information. It is basically thrown together. None of your figures or tables have labels and they are not indicated in the body text when referenced.

Statistical Analysis Lacking
On the analysis side there is pretty much no statistical rigor. There are 0 inferential statistics used and therefore we cannot conclusively make any predictions. Even in a field experiment like this you could attempt to equate a no treatment control that is somewhat equivalent. The arbitrariness of the decisions made around sample segmenting is not well explained. The selection of the variable “FAG” you created and defined is arbitrary and the label of the variable is pretty offensive. You cannot simply claim the delta between the two temporal periods is due to the causal force you ascribe, you need to put in the work if you want to build up this assertion.

R Graphs
For your graphs in R, you didn’t rename the variable labels in ggplot, and in some of these the non-linear axes make the charts uninterprable and for no apparent reason. In the first two graphs, your “line of best fit”, meaning your bivariate linear regression of your log transformed variables, lacks both an equation and error bars. It controls for no other variables, such as TVL, which would have made your analysis informative.

It appears there is no meaningful trend in the graph on the left (which I cannot reference by number because you did not label it). I cannot say there is a null effect for certain because you did not present the equation nor the statistical results underlying the linear equation. On the right graph, you put a linear line of best fit on a clearly nonlinear function.

Inaccurate and Unqualified Claims
Your assessment and claim that grant sizes over 4mm ARB has no impact on fees appears to be nonsensical, because, if you included your error bars it will show that your model cannot make meaningful predictions with your sample size at that point (appears to be an n of 1?). I cannot tell because the axes are poorly labeled and the graph not explained. If you are missing observations, you are just collapsing your model down to 1 or a few observations at that point and your error will be so big no conclusion can be reached.

Said another way, how many grants were there that received over 4mm ARB?

Many of the conclusions you draw later on are not sound. To be clear, this is all post-hoc reasoning so your claims should be attenuated or qualified to reflect that fact.

I know nobody really cares or will take accountability to review these reports, but as composed this does not constitute a passing grade.

Hey @Arb4Ever ,

Thank you for your feedback. This was our preliminary report, and we put out all the inferential performance insights based on the segment analysis.

About Composition:

  • Understandable. The Lack of labels on data and graphs will be addressed in the current and future reports.

Statistical Analysis Lacking:

  • We put out the analysis in the report which presented an actionable takeaway. However, we are open to addressing all the questions from the DAO and working on them.
  • We would be happy to get on a call with you to understand the gaps and any further requirements and questions set. We have the required data with us and can perform those analyses. If you can spare half an hour of your time this week, we would be more than happy to discuss those with you over a call. and we would like to address your points regarding R-Graphs too. We are open to rework on this report itself if required.

Rest assured we are still in the process of getting the data profile of the LTIPP period and we will be working on those and will prepare the final report based on all the feedback we will get from delegates.

Again, We appreciate your efforts in going through our work and pointing out the feedback.

Happy to provide feedback.

I would use APA formatting for your final report:

https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_formatting_and_style_guide/general_format.html

You probably want to at least perform ANOVA using your segments/buckets as your categorical IV and your standardized fee variable or total fees as your continuous DV. You have appear to have all the data ready so it is as simple as putting it into R and running the appropriate statistical analyses. You want to report these stats with each figure and in text when you are explaining what they mean. You then want to make a table aggregating and summarizing your statistical results as you have but indicate the inferential results on the table using asterisks to indicate p value thresholds reached across all the tests. This is a lot of work, but also why the grant is so substantial.