Proposal - Delegate Incentive Program (DIP)

Thank you @SEEDGov for the great update in the DIP! We also like to express our gratitude to the continuous effort to operating the DIP and attitude to addressing feedback from delegates for a better program.

We agree with @Englandzz_Curia and @KlausBrave that it’s important not to discourage participation from diverse delegates. We believe that some kind of asynchronous participation should be considered for those delegates who can’t attend the calls. An idea is to consider proactive comments and discussions on the call notes as “participations”.

We think it’s important to pilot the scoring criteria and then make changes if the quality of comments drops significantly. We support the current scoring system as it is.

We look forward to seeing this updated DIP goes into effect in the near future!

2 Likes

Mirroring the comments brought by @PGov, we prefer the current methodology for tracking commenting on proposals is better than the one presented in this v1.1 program. Reducing the weight of comments after the Snapshot is live will likely lead to increased spam comments. Often, important questions and concerns are discussed while the Snapshot vote is live, especially on topics where one party has the responsibility/knowledge to respond or require members to gather data.

Other than that change to the program, we support this updated Delegate Incentive Program.

3 Likes

We agree on this. Today the program incentivizes proactivity or the generation of new initiatives through bonus points, although we are open to suggestions on how to work better on this point.

As for reducing rewards for basic activities, we believe that the proposed change in weighting to CR and PF is a measure that considers this scenario.

Additionally, it is worth mentioning, as we already stated in our mid-term report:

Compensate on-chain activity is necessary, as there has been a decrease in the amount of VP voting while the quorum needed is becoming larger and larger. Compensating on-chain activity can help mitigate possible governance attacks.

We believe that there is an alignment between the interests of the DAO and those of the delegates when applying an ARB Cap to each Tier.

This motivated in part the introduction of the tiers, since a delegate who wishes to reach the highest tier will probably have to dedicate a greater amount of time to the DAO. In this way, we are not directly demanding or incentivizing exclusivity but giving a fair compensation for the dedication of those delegates who obtain the highest tier.

We believe that maintaining the duration for six months, when the program is no longer experimental, would undermine the stability and planning of the delegates in the DAO. On the other hand, there is a clause that allows the DAO to make changes in the Scoring methodology during the program, so after the mid-term report, the DAO could easily review the program’s performance and push for changes.

This will depend exclusively on how long it takes the MSS to create the transaction and get the signatures for it to be executed. Considering that it is a brand new organism, we do not have estimates of how long this could take, however there is an “expectation” already expressed in the MSS proposal:

In the beginning, it will reach the mere attendance, although as it is a new parameter, we will be watching the evolution of the same during the program.

We agree, although we have seen that in most occasions the call notes are not a commonly published file in the forum. Anyway, we support the idea of having calls in multiple timezones as a solution to this situation. @cliffton.eth @raam Is it possible to add an extra time slot to the calls?

Considering that several delegates have expressed concerns with this scoring methodology, we have proposed alternatives in the following post, we invite you to review it.

1 Like

We’ve listened to the concerns raised by some delegates (@Jojo, @BlockworksResearch, @pedrob, @Bobbay, @KlausBrave, @Bob-Rossi) regarding the weightings and the changes in the Proposal Feedback section, and we would like to share our perspective.

The current system, tested over the past six months, quantitatively focuses on analyzing delegates’ contributions. At the time, we believed this was the best way for delegates to verify their monthly scoring while also avoiding the introduction of subjectivity in the analysis of delegate participation. However, we understand that this might not be the optimal way to evaluate the feedback provided by delegates, as we have observed some drawbacks under this scheme:

  • Delegates are incentivized to provide comments (forcefully) to avoid being penalized by the scoring system. This is problematic because not all proposals lend themselves to lengthy discussions, and we must also consider that delegates may have expertise in some areas while knowing little about others. Forcing their participation in such cases could result in inconsequential or repetitive feedback.

  • Additionally, as @pedrob has pointed out, the current system only weights proposals or discussions that have reached Snapshot, overlooking the rest of the discussions within Arbitrum DAO. This opens the door to speculation and uncertainty about which RFC might go to a vote and which might not.

We are aware that setting metrics for everything happening off-chain in governance is a significant challenge. However, we believe it is worth addressing as many of these issues as possible, as they form the foundation for maintaining incentivized governance. With this in mind, we propose two potential paths forward:

Option 1 - Introducing Delegates Incentive Program v1.5: a New Evaluation System for Delegates’ Feedback

We propose changing the way we collect feedback from a quantitative to a qualitative way.
Instead of counting comments on proposals that reach Snapshot, we propose implementing a monthly analysis of the feedback provided by delegates, regardless of whether the proposal/discussion has reached Snapshot.

In this way, the Program Administrator would be responsible for creating a rubric that evaluates the value and timeliness of the feedback provided by delegates. The goal of this system is to:

  • Incentivize quality over quantity of feedback.
  • Extend the analysis across all contributions made by a delegate in the forum (instead of only considering those that reach Snapshot).
  • Avoid unnecessary or spam comments made solely to achieve a higher score.
  • Allow delegates to focus on contributing to proposals or discussions related to their areas of expertise.

Under this system, a delegate could achieve the same score with (for example) one big significant contribution or by making several smaller contributions. It also discourages actors who might try to take advantage of the program.

Evaluation Approach

This rubric assesses the overall feedback provided by the delegate throughout the month (from day 1 at 00:00 UTC to the last day of the month at 23:59:59 UTC), based on a summary of their participation in various proposals and discussions. The aim is to measure the consistency, quality, and overall impact of their contributions. We expect delegates to comment on and/or provide feedback on proposals and discussions both before and during the voting process. This feedback should aim to foster debate, improve the proposal, or clarify issues not explicitly addressed within it.

We trust the goodwill of the delegates to avoid meaningless/spam comments and ensure that all contributions are sensible.

  • Key point: Feedback or opinions that violate community rules will not be considered. Your interactions should contribute constructively to the discussions and the deliberation and improvement of the proposals.

Rubric Specifications

The parameter “Proposal Feedback” should be renamed to "Delegate’s Feedback” in this case since we’re analyzing the overall feedback provided by the delegate (not just proposals on snapshot) and will maintain a maximum weight of 30%, the score will be awarded based on the following rubric:

  • Relevance: Analyzes whether the delegate’s feedback throughout the month is relevant to the discussion.
  • Depth of Analysis: It evaluates the depth of analysis provided by the delegate concerning the proposals or discussions. This serves as a metric to assess whether the delegate takes the time to thoroughly meditate on the discussion and demonstrates attention to the details. Key elements include solid arguments, relevant questions, and thorough reasoning.
  • Timing: Considers when the delegate provides feedback, rewarding those who provide feedback earlier, as long as they meet the above criteria. Note that feedback will be considered as provided before on-chain/off-chain voting if it was published before the day voting starts at 00:00 UTC.
  • Clarity and Communication: this is a review of the clarity, structured communication, and overall readability of the delegate’s feedback. Clear and well-written feedback is rewarded.
  • Impact on Decision-Making: While the proposer ultimately decides whether to incorporate feedback, high-quality feedback from a delegate often influences the final proposal that goes to vote. This criterion evaluates whether the delegate’s feedback tends to drive changes in proposals/discussions.
  • Presence in Discussions: This is a more quantitative analysis, intended to reflect the effort of delegates who participate in most discussions. This parameter serves as a multiplier to the score obtained across the previous five criteria. Note that the percentage of participation in monthly discussions could be not linear across all DAO’s discussions. Some proposals may carry more weight in the overall discussions (special cases such as LTIPP/STIP, gaming, treasury, etc.).

Monthly Evaluation Process

1. Data Collection: At the end of the month, the complete set of contributions by each delegate across all discussions on the forum is reviewed.
2. Overall Evaluation: The rubric is used to assess the delegate’s overall performance on each criterion, based on a holistic view of their participation.
3. Score Assignment: A level of 1 to 4 is assigned to each criterion, based on the consistency and quality of the delegate’s contributions over the month. Each level has its percentage range that will act as the final score of each criterion.
4. Monthly Report: A qualitative and quantitative report summarizing the delegate’s performance over the month is then produced.

Scoring Methodology

Each rubric criterion has levels with an assigned percentage range, from 0 to 100%, depending on the level achieved.

The initial score is obtained by averaging the first five criteria, while the final score results from applying the “Presence in Discussions” multiplier to the initial average score.

For illustrative purposes, here’s an example:

  • Relevance: Level 3 - Scoring achieved = 65%

  • Depth of Analysis: Level 2 - Scoring achieved = 45%

  • Timing: Level 4 - Scoring achieved = 95%

  • Clarity and Communication: Level 2 - Scoring achieved = 40%

  • Impact on Decision-Making: Level 3 - Scoring achieved = 60%

Initial Score/Average: 61%

  • Participation in Discussions: Level 2 - Multiplier assigned: 1.15x

Final Score: 70.15% or 21.05/30 Delegates’ Feedback points.

Trade-offs

We are aware that this proposed option introduces trust assumptions regarding the Program Administrator’s criteria for evaluating feedback. We view this layer of subjectivity as inevitable until we can implement automated tools, such as the AI that Karma is developing, to assess the quality of delegate feedback. It is important to note that, as Program Administrators, after analyzing proposals and feedback for the last six months, we have gained experience that (we believe) will help us correctly identify constructive feedback.

At SEEDGov, we are committed to being as transparent as possible, as we have been thus far. Therefore, the rubric and the monthly report will always be publicly accessible to all interested parties. During this phase, feedback from Arbitrum DAO will also be crucial in helping us refine our evaluation criteria.

Considerations

This option also introduces modifications to the responsibilities and budget of the Program Administrators. Expanding the scope of the Delegate’s Feedback analysis will require more human resources to meet the objectives. More time and resources will need to be allocated to developing and training the AI that will eventually automate this process.

Additionally, Karma Dashboard will require to development of new tools and overall a new section for calculating and showing qualitative delegates’ contributions.

New budget details:

  • On the Program Administrator side: $16,000/month - $192,000/year (2 program administrators, both full-time instead of 1 part-time and 1 full-time + 1 data analyst part-time)
  • On the Dashboard and Tools provider side: $7250/month - $87,000/year
  • Total administrative budget: $279,000/year (6,6% of the delegates’ incentives budget)

Source: Salaries are approximate and based on U.S. standards. We extracted data from this website.

Option 2 - Delegates Incentive Program v1.1: Maintain the current proposed system with some modifications

In this option, some of the feedback initially raised in this post is still considered, reducing the weight of Proposal Feedback and narrowing the gap between early and late comments. The Communicating Rationale parameter would also be slightly revalued, resulting in more gradual changes in weights.

  • Communicating Rationale (CR): Weight 15% (currently 10%)
  • Proposal Feedback (PF): Weight 25/15% (currently 30/15%)

We also suggest the following changes in PF stages:

  • 100% weight for comments in the Early Stage Feedback (ESF) phase: Comments provided until the day a proposal is sent to Snapshot (typically by Thursday at 00:00 UTC) will receive 100% weight.
  • 60% weight for Late Stage Feedback (LSF) comments: Comments provided during the voting period (starting Thursday at 00:00 UTC) will receive a 40% reduction in weight.

Trade-offs

In this scenario, by maintaining a quantitative system tied only to discussions that reach Snapshot, we may still encounter some of the drawbacks raised at the beginning of this post. However, risks are partially mitigated by modifying certain parameters.

Additionally, the need for trust assumptions or subjectivity will be less than with the rubric, which has its own pros and cons. While in one scheme, the administrator’s role will be more prominent than in the other, it is also true that Option 2 lacks qualitative assessments to some extent. This makes the program somewhat more vulnerable to malicious actors or those contributing minimal value.

Conclusion

We look forward to receiving feedback from the DAO on this matter, as we believe it is essential to take the next step in the incentive program and professionalize the DAO to ensure alignment with ArbitrumDAO’s values.

We intend to present both options when the proposal goes to Snapshot, allowing us to gauge the DAO’s preferences. Note that both options could be treated as “experimental” during the first few months of the program, with the possibility of making adjustments as needed.

More pings for visibility: @Pgov @Blueweb @PennBlockchain @cp0x

6 Likes

knowing that dip will potentially be a foundation of arb staking and so have an ecosystem weight way higher than it is now, would you think seed could be able to scale 1.5 in a situation in which we have x10 the amount of activity we have today?

3 Likes

I find this a bit controversial.

  1. First, what do you mean by competing? What if they are other chains? What if they are projects on another chain or projects that work on several chains? I would like to know more.
  2. I can hardly imagine a person or organization that participates in only one DAO. Why is this so: because this is work that takes time, but is not paid as a full-time job and delegates are forced to look for several DAOs.
  3. The experience of one DAO can be adopted from another and the more a delegate develops in this area (that is, the more projects he participates in), the better quality proposals delegate can give.
  4. Comparing Entropy seems too incorrect to me - they will receive millions for their work, unlike delegates. It is unlikely that the Arbitrum will agree to have 50 exclusive delegates with large salaries. The essence of the program is to attract more delegates, and not to reduce their number due to budget constraints.
  5. And finally - the last but not the least. Delegates can say that they work exclusively, but at another address work with other DAOs. This is difficult to verify and it will turn out that conscientious delegates will lose the competition to dishonest ones.
8 Likes

Good job!
I agree that delegates’ feedback and comments should be timely for active discussion of the proposal.

It is difficult to know how subjective this approach will be, so I suggest, as last time before implementing this proposal, to try a test month or two - how difficult it will be to evaluate the new criteria for delegates.

And based on the results of the test, delegates will understand more clearly how necessary it is to be involved in the discussions and what difficulties this may pose.

2 Likes

Thanks @SEEDGov for the proposal and it’s great to see all the feedback thus far!

We are in favour of incentivising delegates and fully understand that it’s a rather difficult task that has yet to find a standard that is optimal for delegates and the DAO. In saying this, we believe that there is a current mismatch between the proposed budget (rewards for delegates), delegate requirements, and subsequent benefit to the DAO.

Specifically,

  • Quality Control - V1.1 has no quality control over voting rationale and feedback (which v1.5 attempts to solve) so someone who provides significant contributions to a proposal via feedback is valued the same as someone who simply agrees with the proposal or provides little feedback.
  • Lack of VP Recognition - The program currently fails to account for delegates with larger VP and therefore, the added security they bring to the DAO and ecosystem from their voting participation. By not incorporating some sort of scale based on VP, delegates are not incentivized to seek out delegations, something that is fundamental to the security of the DAO.

So with a $4.2M/yr budget, the DAO is paying for voting participation which does not take into account a delegate’s VP and thus, the economic security they bring; as well as feedback and rationale that is equal regardless of its impact.

For delegates to reach the top tier of $7k USD/month, this is certainly attainable and not too hard over a 1 year period. Which we think is a fine, however, given the proposed requirements for reaching the top tier, $6k - $7k USD / month is far too generous. $84k USD/yr is a significant amount of money and higher than most avg. salaries worldwide. By no means are we trying to discredit the effort and time delegates put into the ARB DAO, we just struggle to see justification in the cost vs. requirements as it stands.

Overall, we would love to see a combination of v1.1 and v1.5 where voting participation is still counted and feedback + rationale is graded. We would also like to see a metric that takes into account a delegate’s VP, and, lastly, we would like to see the monthly compensation revised down. With added quality control, participation metrics, and VP accounted for, a max monthly compensation of $5k USD ($60k/yr) seems more reasonable.

3 Likes

Thanks for the update on the proposal. Regarding the options, I believe 1.5 is the natural evolution to the program, so I don’t think it should be 1.1 OR 1.5, but rather 1.1 AND 1.5, but with a few changes.

In the current program, no delegate got the full value on the comment on proposals. That happens for a series of reasons that I won’t touch here, but I believe that a good path is to give a bigger weight to it (1.1) AND incentivize (1.5) deeper contributions after some time.

I would like to suggest the following options:

  • Only 1.1
  • 1.1 AND 1.5 activation after 4 (or other reasonable amount of time) months
3 Likes

The updated proposal is more multi-dimensional and visually intuitive, although it looks more complex. However, I’m pleased to see that representatives are expected to provide feedback on the quality, feasibility, practicality, and referenceability of the proposal, as these are considered highly important metrics.

2 Likes

Hey @SEEDGov - FYI the Governance Calendar isn’t gated, so calls can be added by anyone who proposes to host :slight_smile:

Would you want an extra slot to be added to the Open Discussion of Proposal(s) - Bi-weekly Governance Call or the Arbitrum Reporting Governance Call? And are you suggesting that this extra slot should be a repetition or that alternating sessions should be hosted at different times?

It is also important to note that the monthly ‘Arbitrum Open Governance Call’ is now referred to as ‘Arbitrum Reporting Governance Call’ - see governance calendar here: ArbitrumDAO Governance Community Calendar

Perhaps this proposal can also get sentiment on what times best suit delegates.

4 Likes

i think @SEEDGov can provide us some numbers here.
While the $7k is the ceiling it is, indeed, the ceiling. Skimming into the payment for each month (i am on the multisig currently) quite rarely we have someone reaching the cap, and the current month has this outlier (5k to l2beat) due to their proposal.

I ran quickly numbers in excel, for august average is around 3.5k arb and median is around 3.7k arb, so 26 to 30% below the ceiling.

On a $7K ceiling, if we think about distribution of events being similar (won’t likely be but not that much) it means bulk of delegates will fall between $4.9K to $5.1K per month.

Just throwing the numbers there, could be interesting to run median and avg of all comp of the program to get a better idea, but sense checking says we rarely see a big amount of delegates getting more than 70-75% of the top comp here.

8 Likes

@WintermuteGovernance
I just want to add that L2beat received 5000 in ARB, not in $.
Given the token rate, the $7000 will probably never be real.

3 Likes

This feels like a significant improvement and I support it. That being said, I’m starting to observe some pervasive incentives at play with certain comments being clearly AI generated.
This has often been an issue in other communities that reward participation and we’ve been working on a solution that’s not perfect but can help. Let me know if you want to discuss :slight_smile:

3 Likes

We believe that by automating certain tasks and in the future the possibility of decentralizing the administration of the program could allow the program to scale up in the event of a significant increase in DAO activity.

This is what we want to solve as the program evolves and as you said, that is why we are proposing version 1.5. If this rubric works as expected we can use the same criteria to evaluate voting rationale.

We would like to make two clarifications in this regard. First, although the maximum budget is $4.2M/year, our metrics from the end-of-program report we are currently working on show that, on average, during these six months, the delegates who were part of the program and received compensation earned an average Total Participation of 79.65. Considering the proposed cap of $7,000, this equates to an average monthly compensation per delegate of $5,575.50.

In addition, it is important to consider the amount paid in delegate programs of other DAOs, which in a way, function as competition:

On the other hand, in our mid-term report there are some metrics on voting participation of delegates who have been part of DPI 1.0.

As can be seen in the information above, during the first 3 months of the program, delegates participating in the program contributed an average of 65.64% of the votes in Tally. This indicates that while there is no criterion that directly incentivizes delegates with higher VP, the program has been able to incentivize large amounts of VP. However, we would be interested to hear feedback on how we can incorporate this metric into the framework in the future as we have proposed something similar in the past and there was not much consensus in the DAO on this.

Thanks for your feedback, the reality is that the activation of version 1.5 during the development of the program can be inconvenient at the administrative level and also in terms of the need to adapt the Karma dashboard. We believe that, as you rightly say, version 1.5 is the natural evolution of the program and that if necessary, we can always go back to version 1.1 (which would be easier than implementing v1.5 with the program running).

Although we understand that the governance calendar is open, in this first experimentation we only seek to encourage those calls that are hosted by the Arbitrum Foundation or those that are not related to any incentivized working group, so one idea would be to add an extra slot to the “Open Discussion of Proposal(s) - Bi-weekly Governance Call” and “Arbitrum Reporting Governance Call” as a “repetition” to give delegates from other timezones the possibility to participate. Do you think it can be coordinated? @entropy @raam @tane @KlausBrave @Englandzz_Curia

We have contemplated this situation you mentioned, in both v1.1 and v1.5 we intend to take a more proactive approach to AI generated comments:

3 Likes

We are for sure in support of the path that quality over quantity. But we do see a couple of areas that may improve:

  1. as some delegates have pointed out, I have concerns regarding encouraging early over late replies. Since we already go for the quality path, whenever posted, quality matters. Actually, if there is a “late” reply with good quality, that one should be more valuable since it raises some points that others overlooked.

  2. Regarding inventive distribution among 3 tiers, do we have a general expectation regarding that - maybe 20% / 40% / 30% and others may not be qualified? the administrator will have a pre-determined ratio among those 3 tiers since it’s subjective calls.

But overall, we appreciate @SEEDGov for spearheading the initiative of quality over quantity. We believe this is the right path to go.

3 Likes

We agree on this, that’s why we proposed V1.5 to evaluate comments where the response time is one more criterion, this way we can better evaluate the value contributions. You can see an example here:

We do not understand this question. Do you mean what expectations of participation do we expect?

2 Likes

I totally agree with these points clearly made. Arbitrum needs more community participation which is at a very slow pace today.

3 Likes

I liked the detailed approach to explain the delegate incentive program. Including multiple criteria gives dimension.

This is a catch 22 situation for small token holders.

3 Likes

for the 1.5 option, I believe the reviewers of the delegate feedback should be selected/elected in the open and have a term limit… probably a quite short term limit, 3 months, maybe even 1 month max.

2 Likes