Serious People: Proposed KPIs for Arbitrum Grant Programs (LTIPP)

Authors

@SeriousIan @SeriousTaylor @SeriousKeith

Proposed KPIs for Arbitrum Grant Programs

Abstract

The purpose of this post is to address some common concerns that we have been hearing over the past few months with the STIP/LTIPP initiatives. The Serious People team has been having a plethora of conversations over the past few months, with that, we feel we are in an optimal spot to summarize our findings and make a number of recommendations to move the ball forward, focusing on better cohesion of all the moving pieces currently at play.

Our goal is to streamline the application, the advisor work, and the council’s ultimate decision making to line up to predetermined KPI’s set forth and agreed upon by the DAO. This will empower the DAO with a standard KPI rubric to be used for this initiative and to be built off of for future initiatives. Additionally, we need a framework that enables us to compare projects on an apples to apples basis regardless of which program they are in.

So with that, the first step to us is aligning on KPI’s that the DAO believes we should be aiming to achieve with the incentive programs. To do that, we must first identify the kinds of actions we want to incentivize, then create a way to convert each of those into a quantifiable number we can plug into a formula, allowing us to compare different programs by normalizing values and return.

We also believe that the current system is set up to incentivise competition to get into the programs, but not competition for who can provide the most value when the program is running. If we can agree on the basic goals of the program, we can evaluate and weigh each achievement, compare that to the value of the tokens emitted for the program and find out what the return is for each program as a percent.

Please note - this is a starting point for everyone to build off of, we are not saying this is the exact answer and we hold all the solutions. We welcome collaboration and constructive feedback!

Serious People’s Recommended KPI Formula: Return on Emissions (ROE)

We believe the best way to define a new standard and normalize returns is by using our Return on Emissions (ROE) metric, simply put:

TVE = Total Value Emitted (the value of each token at the time of emission)

TVR = Total Value Returned (the value brought back to the Arbitrum DAO)

This formula will tell us what is returned for every dollar spent from emissions. If you spend $100k of $ARB and bring $0 of value you have provided a return of 0%. This means that 100% of your emissions were harmful to the ecosystem. That said, depending on how we evaluate each of the factors that bring the ecosystem value, returns have no limit of how high they can go. Arbitrum and the protocols that are participating in the program should have a goal of bringing more value to the ecosystem than they have spent with the incentives. For example, spend $100k $ARB and bring $110k of value to the ecosystem.

So with that, what do we believe brings Arbitrum value? If we do a good job of capturing everything that brings value, this should be relevant for every program moving forward. We took a first cut, but would love to know what others think we are missing!

TVR Buckets

  • Users from other blockchains
  • New Defi users
  • Permanent users
  • Liquidity acquisition
  • Liquidity retention
  • New volume (for your respective protocol or what you are incentivizing)
  • Retained Volume
  • Sequencer fees generated by your protocol
  • Reasons to hold/buy $ARB
  • Innovation

As we are sure you can tell, some of these factors are objective and some are subjective. It will be difficult to have a perfect formula to evaluate each category, but the more complete our analysis is, the more accurate our comparison will be and the better we can compare projects and incentive programs. Having these goals baked out in advance gives participating protocols a direction on how to use the emissions that ARB gives them, with the target they are shooting for well defined up front. It also gives the advisors metrics to aim for when giving feedback and a more clear way to judge each proposal for the counsel.

This ultimately will also ensure that the application advisors are providing advice to the same rubric the council is voting on, otherwise without this standardization, participants could easily get different answers depending on when and who they speak to, which would only cause headaches down the line and make projects feel shorted.

How do we calculate TVR?

User Metrics - New DeFi users, Users from other blockchains, and Permanent users

  • Identify a value for different types of users. For example, if we found all new users to arb are worth $10 each and a project brings 100 new users, they net $1,000 of TVR

Liquidity - Acquisition and Retention

  • Identify how much the liquidity is worth and does the liquidity stay on Arbitrum after the program ends. We can break this into how much liquidity was acquired, and its stickiness to account for retention.
  • For example, a project brings an average of $500k of liquidity for 90 days and an average of $100k of that liquidity remains for the 90 days after the program. We could say $1k of liquidity for 1 day = $1 of TVR. The TVR would then be calculated as follows: (500190) + (100190)= $54,000).

Volume - New Volume and Retained Volume

  • We establish a baseline volume number for any project looking to optimize on this KPI, ideally using a 30-90 day average. We then measure the increase in volume over the life of the program, and the retained volume one the program closes.
  • Using the baseline across all participants and looking at DAO wide goals, we set dollar values for increases in volume and retained volume, similar to Liquidity and Users.

Sequencer Fees generated by your protocol

  • We believe this one is straightforward, but please confirm our understanding if we missed something. We can measure the fees generated from the protocol’s smart contracts.
  • We think this would be a great spot to pull in the Treasury working group on this as they are already tasked with figuring out the most effective way to spend sequencer fees, so they should have a good starting point on valuing these fees that we can work into ROE.

Holders

  • OpenBlock has some beautiful dashboards that were used for past incentives, we would recommend working with them on the following items:
    • Reason to buy/hold ARB
    • % of claimed rewards sold
    • Average duration held

Innovation

  • We would likely recommend to have this separate from ROE unless the DAO can agree on a fair voting system that the counsel can vote on. We may be able to incentivise certain innovation that the DAO needs higher than others to make this less subjective.

How do we gather all of this information?

  • We believe that we can gather all of this info by using the research bounty program that is already accounted for in Matt’s LTIPP outline, there is a budget of 200k ARB.
  • Serious People has already taken a first cut at how we would set the research bounties into buckets and what questions we would propose for each. You can see them here: Research Bounties for LTIPP
  • Different teams can solve different buckets and define the TVR calculation.
  • The more questions that are answered. The more we can add to our return analysis and the stronger this rubric can get over time.
  • Hopefully other DAOs start to see our work and this rubric can help the larger DeFi space!

It is important to note - even if we can only start off with a couple of these categories, it is still a massive step in the right direction in terms of normalizing and standardizing our process, and then we can continue to build and refine from there.

When should we be running these calculations on incentive programs?

  • We should ensure that each project’s dashboard at least captures TVE and the simple, more objective TVR metrics.
  • A team or working group should be put in place to analyze the program during and monthly afterwards. It is imperative to look at multiple months past the program end to see how well value was retained. A spike in users is only helpful if we actually keep them active on chain post programs.
  • Some of the more subjective pieces will have to be held to the end to analyze once, like innovation for example.

Example project (EXP) KPI Evaluation

Inputs:

  1. EXP is accepted into the Arbitrum LTIPP with a 100k allocation of arb tokens.
  2. EXP decides to take 20k of their $ARB to set up a referral program to get new users from other blockchains onto their platform. This program is very successful and brings 3000 new users to arb through their platform.
  3. EXP decides to use the other 80k $ARB to bond into liquidity for their token. This program raises them $70k of liquidity and attracts an additional 75 users.

Assumptions:

  1. Their $ARB was worth $1 each when they emitted it meaning that they spent $100k of ARB or their TVE was $100,000
  2. Each new user is worth $10

How would we calculate their Return to the DAO?

ROE = [(TVE - TVR) :heavy_division_sign:TVE] +1

ROE = [($100,000 - ($70k + ($3,075* $10))) :heavy_division_sign: $100,000] +1

ROE = [($100,000 - ($70k + $30,750)) :heavy_division_sign: $100,000] +1

ROE = [($100,000 - $100,750) :heavy_division_sign: $100,000] +1

ROE = [$750 :heavy_division_sign: $100,000] +1

ROE = 0.0075 +1

ROE = 1.0075 or 100.75%

We would consider this program to be a success as 100k was spend and over 100k was returned meaning that they had a positive return of .75% to the DAO

Conclusion

We believe that taking this approach will perfectly complement the new changes to the program of having a council and advisors.

Aligning the goals of the program and how we compare project outcomes should have the following effects:

  • Easy alignment between participants in the program, the council, the advisors, and the DAO.
  • A framework that we can carry into the future and improve on, but quickly establishes a baseline KPI very similar to commonly used financial metrics like ROI or IRR.
  • Less collusion or favoritism
  • Better Outcomes as projects have a clear target to shoot for, rather than how the application is currently set up, where we ask projects to come up with their own KPIs.
  • Allow us to simplify the application so that we are getting only the information we need and less fluff / irrelevant information.

With that, we already took a cut at what we believe the updated application template should look like and have posted it here: Application Template Suggested updates

This will allow us to prioritize two things, lining up to this overarching post.

  • Projects providing all of the relevant data needed to analyze KPI’s and showing it on their dashboard during the program.
  • Projects avoid wasting time creating their own KPI’s, having these set by the DAO as we should all be moving toward the same goals anyway.

We welcome all feedback! We want to build this together and collaborate. By no means are we saying we have the complete solution, just wanted to share tangible outputs and recommendations from our conversations so we can keep pushing forward with the DAO!

Additionally we want to thank everyone that has been jumping on the DAO wide calls through Q3 and Q4 last year; your thoughts and feedback were imperative to our refinement process here. We would also like to specifically thank @Matt_StableLab and @AlexLumley for all their hard work, hopping on calls and teeing us up for success with these posts. Also @DisruptionJoe, and the Treasury and Sustainability Working Group, as well as the Incentive Working Group - all been incredible resources!

Let’s make Arbitrum the strongest functioning DAO in DeFi!

31 Likes

Wow this is really well thought out, I’ve been working on quantifying impact (benefit cost ratios) for over a year now and this is one of the best stabs at it that i’ve seen.

This is where i can see things becoming most subjective. I’d love it if we have at least 2 application advisors independently reviewing proposals with the formula, so we can measure the variance between their subjective quantification.

Setting weights on the TVR buckets is another area I’d like to see more work on; how do you plan to go about creating a tip sheet for application advisors to quantify each bucket?

Fact-checking is another area; we will need to rely on self-reporting by projects but we wouldn’t want it to be too difficult for them to calculate, have poor accountability systems encouraging projects to exaggerate claims or disadvantage more humble teams

overall i commend this effort and really keen to see how it plays out in practice. i love how it creates a countervailing incentive for projects to ask for less money (easier to show positive emission) and also provides some basis by which we can increase or decrease amounts given to a project.

A major issue with Optimism RetroPGF or Gitcoin is there is no notion of pricing impact; you get how much ever each voter decides you are subjectively worth. if nothing else, i expect this formula based approach to set an upper limit or ceiling for projects beyond which they cannot get grant money.

13 Likes

Hi everyone, it’s Paul from OpenBlock Labs.

Thanks to the Serious People team for kickstarting the discussion!

We have also been working on an incentive allocation methodology based on the STIP data we’ve collected, which we will be presenting on the January 9th Working Group call. Our proposal incorporates data-driven feedback loops to dynamically adjust incentive allocations across categories and individual protocols.

In anticipation of our Jan. 9 presentation, we have attached a snippet of the methodology below. We have delineated grant buckets for each category, aiming to create incentives that are robust to the variability of DeFi yet objective in nature.

In the screenshot above, the variables represent important metrics, such as TVL growth, sustainability index (e.g. fees generated), and nominal TVL changes. Each grant bucket will have a different set of parameters, as different categories may emphasize different KPIs (DEXs should care more about volume and fees than TVL, lending protocols should focus on borrows, etc.).

Constructing a robust incentive model hinges significantly on the availability of high-quality data. OpenBlock’s efforts alongside the STIP have not only allowed us to start the conversation on a data-driven methodology for forthcoming incentive allocations, but have also equipped the community with vital insights to guide their decision-making when voting on future incentive budgets.

Below is a small snippet of insights we have captured to inform future incentives:

Protocol leaderboard:

Market share per category:

Category-level analysis on sell pressure:

Competitive intelligence across ecosystems:

We are excited to share the rest of the work on our upcoming call, and thanks again to Serious People for getting the public discussion started! It’s great to see more community engagement and the potential of our data to empower a diverse range of future initiatives.

13 Likes

Hey Devansh! Appreciate the kind words and thoughtful feedback :slightly_smiling_face:

Love your idea on doubling up on formula approaches, especially for the more subjective items. I think it would be powerful to have the council involved in this piece as well so there is a unified front on how we are valuing and measuring impact. Makes me very thankful for the pilot program approach, as we can take a solid first cut and then improve from there.

In regards to the TVR buckets, I couldn’t agree more - valuing some of those is going to be a challenge. I think thats where the research grant can really shine for us though. We tried to setup those buckets to line up with some of the gaps we foresee in the formula, that way we can answer as much as possible on the formula and fund research to start figuring out a sound way to quantify the rest.

I also hear you on fact-checking and this is something that I have been going back and forth on a lot. Im a huge believer of confirming data so I don’t see how we can avoid this. It does beg the questions tho - should projects be required to track their own status, or is that putting too much burden on them? And do we actually save time as an ecosystem by forcing projects to try to do this, only to have us then have to fact check and potentially fix anyway. Openblock has great dashboarding, maybe we should be pulling this up to the DAO level completely.

8 Likes

Hey @paulsengh , thank you for all of the feedback and Openblock Labs for all of this great work!

I really like your take on separating grant buckets into categories. We were having a similar thought as it is pretty difficult to compare projects in different verticals with different goals.
Love this sneak peak and cant wait to hop on your call on the 9’th! We would love to hop on a call to discuss either before or after because it seems like your work has a a lot of synergy with our approach.

5 Likes

Hi SeriousPeople :slightly_smiling_face:,

Thanks for the proposal. I find the logic and reasoning behind the equation to be elegant, simple, and robust.

However, I have a minor concern regarding the inclusion of ‘User Metrics’ in the TVR calculation. Defining ‘new users’ in web3 is challenging due to the pseudonymous nature of it, and it’s easy for one to inflate the metrics. Therefore, it might be better to have a weighting and assign a lower weight to User Metrics compared to other factors.

In addition to the items already mentioned in the buckets, I believe it would be beneficial to include protocol revenue as a metric. This directly influences other key metrics i.e. TVL, liquidity, volume, and naturally creates a flywheel effect.

A critical aspect to consider is the assignment of weight to the TVR components. My suggestion is that each sector should have different weights in their respective buckets, aligning with all projects within that sector. This is crucial since each sector has its unique characteristics - for instance, Leding doesn’t generate high volume, Perps don’t require substantial TVL, and gaming doesn’t need TVL.

5 Likes

Hi SeriousPeople !
Good proposal for efficiency calculationю.
I have some question about this formula:

Correct me if I didn’t understand correctly, but I think that if only 500k are raised to the project, then it turns out to be 5.5555 $ per day. Accordingly, multiplying 500 by 90 is not entirely correct - this would be possible if 500,000 were raised to the project every day.
And the second point - it may be worth considering how much ARB was paid during this period of 90 days and subtracting this amount to understand the real increase in liquidity.

5 Likes

Clearly, a lot of thinking has gone into this already!

I’d like to bring an angle so it can serve as feedback and to advance this further.

Quantification works great in well-defined domains e.g. we know now how to compare DEXes. However, quantification often performs poorly in poorly defined domains, or it can even be counterproductive as it reduces success to an artificially narrow set of metrics.

For example, European VCs are significantly funded by the EU, but are subject to constraints that reduce their ability to produce valuable startups compared to USA VCs as Europeans need to fit everything into a formula, and said formula can’t account for the complexities on the ground startups face. Even USA VCs can often harm startups by forcing focus on certain metrics that look good short term but not necessarily lead to a sustainable business. Quantification is full of perils.

As the proposal says, the idea is to compare apples to apples. My concern is how to avoid under-funding all the other fruits that can’t as easily demonstrate value and yet are fundamental for a healthy, thriving ecosystem.

A few examples to look out for:

  • supporting early stage vs more mature projects (early stage projects could have greater potential but will require more time to give ROE)
  • supporting well identified use cases (e.g. NFT marketplace) vs innovation (the first offers no USP value to Arbitrum but offers fees short term, the later could provide differentiation and an advantage to Arbitrum).
  • supporting financial applications vs culture building ones (ecosystems thrive on vibez and not just capital, hence why many people pick London and Paris instead of Dubai, as community flocks where there is a good atmosphere and not just where there is money).
  • Antisocial vs pro-social projects (some projects succeed at the expense of the ecosystem, while others generate value more broadly).

The above are just a few examples, but the peril is that the metrics are unable to address the complexity of an ecosystem, and in overly simplifying it, end up harming it and driving funding away from that which is valuable but harder to measure. Usually when a metric has been defined, that which is not quantifiable is systematically ignored (unfortunate human bias).

Now, I do believe metrics have a place. It would be great to think through which categories they can apply well to (e.g. DeFi) and which not (culture, collab tech, etc). And then create purposely defined KPIs per category.

What do you think?

A step can be understanding the research that has already been produced over the last decades on ecosystem health, cities attractors, and more recently web3 communities (e.g. article link). A 2-3 month reserach project on this could save major headaches and complement more short-term initiatives that define categories for a first cycle.

12 Likes

Thank you for the kind words and feedback!

I like your idea on user metrics, and completely agree with your callout. While this is probably one of the most important categories for the DAO, it is also one of the most easily spoofed. It does seem like we are getting better and better as an industry at pulling accurate info on users off chain. We are hopeful a team can take this category as a grant and solve for these problems (its a large section in the research proposal we posted alongside this one). Then we can weigh this bucket as suggested based on the the confidence of the info that we are gearing!

I would love to hear the best way to calculate protocol revenue as many protocols make revenue a different points in their “sale cycle”. I completely agree that separating into different categories by project will make the comparison much easier.

7 Likes

Hey @cp0x :smiley:

Love questions! So for this part we were just giving a more arbitrary example of how we could tie these things together. I agree with where your head is at - and we’ll need to work on a way to quantify each of these assuming the DAO agrees on the overall approach with the formula. I know Open Block has some fun ideas as well per their post - we’ll be discussing further on the 9th - you should join!!

In regards to your second point - maybe im missing something but I think we are accounting for that with the TVE component of the equation :thinking:

2 Likes

Appreciate the thoughts and different angle!

You bring up fair points on the more subjective and softer defined domains, would love to hear more!

When creating this post and looking at the best way to compare protocols, we knew that there are some areas that are easy and straight forward to calculate and some that aren’t. We did our best to leave our buckets as open as possible as we understand that some of these categories feel grimy to quantify.

With that, what specifically would you cut out or how you would tangibly change the proposed KPIs to better accommodate for this? We are also always happy to hop on a call and discuss with you / the larger working group! Its definitely going to take some reps on this one to refine :muscle:

6 Likes

gm Serious People, and thanks for this initiative.

I strongly believe that some standardized KPIs will help to make any program more efficient, fair, and effective. I know the various teams of the different programs are looking at this aspects as well.

I’d like to raise 3 points as food for thought:

  1. Alignment with high goals of the DAO. KPIs should be driven by some overarching goals based on where we want the Abirtrum DAO and ecosystem to be in 1, 3, 5 years. Plurality Labs has done something similar which helps a lot to drive decisions (and measure their success).

I suggest that each KPI should align with a specific goal.

  1. Innovation and culture. As @danielo pointed out, these are very critical yet hard to measure factors, but they ultimately decide whether an ecosystem thrives or not. Ideally we should identify specific criteria that transcend financial metrics for projects (or aspects of projects) that fall into this category.

Ex: have you invented a new primitive that is being used by other protocols?
Is your protocol a fork of something else?

  1. Rising market
    In a market where prices are rising, it’s easier to show returns, but this should be always compared with the overall performance of the base assets.

For example, this analysis made in December doesn’t paint the best picture for the current STIP, even if the results are positive in absolute dollar value.

[source]

Thanks and looking forward this discussion.

7 Likes

Firstly, I’d like to express my gratitude to the Serious People team for their diligent work on developing the KPIs for the Arbitrum Grant Programs, like the Return on Emissions metric, to streamline the grant application and decision-making process. However, I share concerns about the potential subjectivity and variability in assessing the Total Value Returned (TVR). The difficulty in accurately defining and valuing metrics like ‘new users’ in the web3 environment could lead to skewed assessments. Moreover, I am concerned that over-reliance on quantifiable metrics might inadvertently sideline innovative or early-stage projects that are harder to measure but crucial for a diverse and thriving ecosystem. Therefore, while I support the initiative for more structured evaluation, I believe it’s essential to maintain flexibility and consider qualitative aspects that might not fit neatly into our current metrics but are vital for the long-term health and diversity of the Arbitrum ecosystem

2 Likes

I think this is the right way to think about the effectiveness, i wonder if theres really a good way to set the weights of the buckets tho, it’s gonna be totally different depending on what kind of protocol it is

1 Like

Yes, most likely you are right.
I meant that it seems right to me if we take into account differently the liquidity that came during the grant period and the remaining liquidity after that.
Because during the grant period it (liquidity) can be huge, but perhaps no one likes the product itself, and users only came to make money during the grant period )

So the risk is that KPIs will never be able to handle the complexity of the DAO. That’s been very seriously researched by cyberneticians (see Ashby’s law of requisite variety) and is one of the reasons why OKRs are being criticized so much (they’re too mechanistic and so lead to antagonistic behaviour in their negotiation). The beyond budgeting movement also has contributed a lot here to understand how a small set of indicators can lead to serious problems.

Using the Viable System Model, the idea is that there is a monitoring system (system 3* in their framework for viability) that can both aggregate data from the Units being monitored AND go into said systems to perform audits. So the key here is that the KPIs are not directly (i.e automatically) tied to (re)allocation of funding to grant programmes, but rather serve to trigger audits (i.e. to trigger more in-depth reviews and sense making).

So ideally we want a mix of Health Metrics (general indicators of health, we did some research on this already via TogetherCrew), Growth Metrics (e.g liquidity, new users, etc), and Strategic Metrics (aligned with the positioning strategy and other strategic initiatives).
This matches the usual funding allocation in corporations that goes after

  • operations (keeping the business processes running day in and day out),
  • sustainability (things like culture, wellbeing programmes, DEI, etc etc that keep you healthy)
  • and strategic initiatives (@AlexLumley is advancing something in this regard. the key here is that a lot of Growth is based on foundations that were laid bit by bit showing no direct impact and so only targeting direct growth metrics leads to shortermism and ultimately being outcompeted)
    I’ll comment more on the metrics in the research proposal when it comes to executing on the work.

But mention it here to suggest that

  1. the system needs to NOT be connected mechanistically to funding allocation but needs to have space for more variety management in reviews. Which means the need to design said mechanism upfront and not leave it as a concern for later (as otherwise it tends to never happen and things backfire dramatically)
  2. suggest a classification of which KPIs apply as a tagging system rather than rigid categories, as different initiatives will fit differently across buckets (and we don’t want to underallocate to those that do say 30-30-30 across 3 categories vs one that does only one category of KPIs).

So e.g. a programme can have a specific focus (RnDAO as a grant programme does research fellowships and venture building with a focus on Collab Tech), and then the grantees can have varied impact across KPIs (the fellow we selected to research rewards and compensation won’t generate much liquidity but addresses a strategic consideration about operational excellence, while the fellow that is researching a protocol for angel investment will likely end up having an impact on liquidity/capital attraction at some point (mid term) and short term impact on the strategic consideration for operational excellence).
As a grant programme, I want to show the aggregate of the impact of grantees, by bottom up picking KPIs (and potentially adding my own data when the standard ones are not fit for purpose, and then as a programme we can argue why that was a good choice and the community/delegates agree or not to give us more $$).

Let me know if those recommendations make sense :slight_smile:

5 Likes

I completely agree that this KPI should not be applied automatically, but should be a useful tool for human decision-making

1 Like

Very wise approach. I agree that we should spread the Beyond Budgeting approach.

4 Likes

Upon careful examination and contribution to the Serious People team’s Key Performance Indicators (KPIs) for Arbitrum Grant Programs, I recommend adding a specific metric: the onboarding of new developers, into the Total Value Returned (TVR) framework. This would effectively gauge the program’s ability to attract fresh talent into the Arbitrum ecosystem. Furthermore, it’s important to establish a rating system for each KPI based on its individual impact. Such a system would enable a more comprehensive and precise evaluation of each metric’s contribution to the overall goals of the grant program

3 Likes

Great point, the attraction of talent is a key success factor. And developers are an obvious focus.
I do wonder how to measure the success of attracting other key talents (developers alone rarely build successful, sustainable organizations. Cross-functional teams are needed, including business people, designers, etc., but then that’s harder to measure as they’re not deploying smart contracts directly).

3 Likes