We’ve listened to the concerns raised by some delegates (@Jojo, @BlockworksResearch, @pedrob, @Bobbay, @KlausBrave, @Bob-Rossi) regarding the weightings and the changes in the Proposal Feedback section, and we would like to share our perspective.
The current system, tested over the past six months, quantitatively focuses on analyzing delegates’ contributions. At the time, we believed this was the best way for delegates to verify their monthly scoring while also avoiding the introduction of subjectivity in the analysis of delegate participation. However, we understand that this might not be the optimal way to evaluate the feedback provided by delegates, as we have observed some drawbacks under this scheme:
-
Delegates are incentivized to provide comments (forcefully) to avoid being penalized by the scoring system. This is problematic because not all proposals lend themselves to lengthy discussions, and we must also consider that delegates may have expertise in some areas while knowing little about others. Forcing their participation in such cases could result in inconsequential or repetitive feedback.
-
Additionally, as @pedrob has pointed out, the current system only weights proposals or discussions that have reached Snapshot, overlooking the rest of the discussions within Arbitrum DAO. This opens the door to speculation and uncertainty about which RFC might go to a vote and which might not.
We are aware that setting metrics for everything happening off-chain in governance is a significant challenge. However, we believe it is worth addressing as many of these issues as possible, as they form the foundation for maintaining incentivized governance. With this in mind, we propose two potential paths forward:
Option 1 - Introducing Delegates Incentive Program v1.5: a New Evaluation System for Delegates’ Feedback
We propose changing the way we collect feedback from a quantitative to a qualitative way.
Instead of counting comments on proposals that reach Snapshot, we propose implementing a monthly analysis of the feedback provided by delegates, regardless of whether the proposal/discussion has reached Snapshot.
In this way, the Program Administrator would be responsible for creating a rubric that evaluates the value and timeliness of the feedback provided by delegates. The goal of this system is to:
- Incentivize quality over quantity of feedback.
- Extend the analysis across all contributions made by a delegate in the forum (instead of only considering those that reach Snapshot).
- Avoid unnecessary or spam comments made solely to achieve a higher score.
- Allow delegates to focus on contributing to proposals or discussions related to their areas of expertise.
Under this system, a delegate could achieve the same score with (for example) one big significant contribution or by making several smaller contributions. It also discourages actors who might try to take advantage of the program.
Evaluation Approach
This rubric assesses the overall feedback provided by the delegate throughout the month (from day 1 at 00:00 UTC to the last day of the month at 23:59:59 UTC), based on a summary of their participation in various proposals and discussions. The aim is to measure the consistency, quality, and overall impact of their contributions. We expect delegates to comment on and/or provide feedback on proposals and discussions both before and during the voting process. This feedback should aim to foster debate, improve the proposal, or clarify issues not explicitly addressed within it.
We trust the goodwill of the delegates to avoid meaningless/spam comments and ensure that all contributions are sensible.
- Key point: Feedback or opinions that violate community rules will not be considered. Your interactions should contribute constructively to the discussions and the deliberation and improvement of the proposals.
Rubric Specifications
The parameter “Proposal Feedback” should be renamed to "Delegate’s Feedback” in this case since we’re analyzing the overall feedback provided by the delegate (not just proposals on snapshot) and will maintain a maximum weight of 30%, the score will be awarded based on the following rubric:
- Relevance: Analyzes whether the delegate’s feedback throughout the month is relevant to the discussion.
- Depth of Analysis: It evaluates the depth of analysis provided by the delegate concerning the proposals or discussions. This serves as a metric to assess whether the delegate takes the time to thoroughly meditate on the discussion and demonstrates attention to the details. Key elements include solid arguments, relevant questions, and thorough reasoning.
- Timing: Considers when the delegate provides feedback, rewarding those who provide feedback earlier, as long as they meet the above criteria. Note that feedback will be considered as provided before on-chain/off-chain voting if it was published before the day voting starts at 00:00 UTC.
- Clarity and Communication: this is a review of the clarity, structured communication, and overall readability of the delegate’s feedback. Clear and well-written feedback is rewarded.
- Impact on Decision-Making: While the proposer ultimately decides whether to incorporate feedback, high-quality feedback from a delegate often influences the final proposal that goes to vote. This criterion evaluates whether the delegate’s feedback tends to drive changes in proposals/discussions.
- Presence in Discussions: This is a more quantitative analysis, intended to reflect the effort of delegates who participate in most discussions. This parameter serves as a multiplier to the score obtained across the previous five criteria. Note that the percentage of participation in monthly discussions could be not linear across all DAO’s discussions. Some proposals may carry more weight in the overall discussions (special cases such as LTIPP/STIP, gaming, treasury, etc.).
Monthly Evaluation Process
1. Data Collection: At the end of the month, the complete set of contributions by each delegate across all discussions on the forum is reviewed.
2. Overall Evaluation: The rubric is used to assess the delegate’s overall performance on each criterion, based on a holistic view of their participation.
3. Score Assignment: A level of 1 to 4 is assigned to each criterion, based on the consistency and quality of the delegate’s contributions over the month. Each level has its percentage range that will act as the final score of each criterion.
4. Monthly Report: A qualitative and quantitative report summarizing the delegate’s performance over the month is then produced.
Scoring Methodology
Each rubric criterion has levels with an assigned percentage range, from 0 to 100%, depending on the level achieved.
The initial score is obtained by averaging the first five criteria, while the final score results from applying the “Presence in Discussions” multiplier to the initial average score.
For illustrative purposes, here’s an example:
-
Relevance: Level 3 - Scoring achieved = 65%
-
Depth of Analysis: Level 2 - Scoring achieved = 45%
-
Timing: Level 4 - Scoring achieved = 95%
-
Clarity and Communication: Level 2 - Scoring achieved = 40%
-
Impact on Decision-Making: Level 3 - Scoring achieved = 60%
Initial Score/Average: 61%
- Participation in Discussions: Level 2 - Multiplier assigned: 1.15x
Final Score: 70.15% or 21.05/30 Delegates’ Feedback points.
Trade-offs
We are aware that this proposed option introduces trust assumptions regarding the Program Administrator’s criteria for evaluating feedback. We view this layer of subjectivity as inevitable until we can implement automated tools, such as the AI that Karma is developing, to assess the quality of delegate feedback. It is important to note that, as Program Administrators, after analyzing proposals and feedback for the last six months, we have gained experience that (we believe) will help us correctly identify constructive feedback.
At SEEDGov, we are committed to being as transparent as possible, as we have been thus far. Therefore, the rubric and the monthly report will always be publicly accessible to all interested parties. During this phase, feedback from Arbitrum DAO will also be crucial in helping us refine our evaluation criteria.
Considerations
This option also introduces modifications to the responsibilities and budget of the Program Administrators. Expanding the scope of the Delegate’s Feedback analysis will require more human resources to meet the objectives. More time and resources will need to be allocated to developing and training the AI that will eventually automate this process.
Additionally, Karma Dashboard will require to development of new tools and overall a new section for calculating and showing qualitative delegates’ contributions.
New budget details:
- On the Program Administrator side: $16,000/month - $192,000/year (2 program administrators, both full-time instead of 1 part-time and 1 full-time + 1 data analyst part-time)
- On the Dashboard and Tools provider side: $7250/month - $87,000/year
- Total administrative budget: $279,000/year (6,6% of the delegates’ incentives budget)
Source: Salaries are approximate and based on U.S. standards. We extracted data from this website.
Option 2 - Delegates Incentive Program v1.1: Maintain the current proposed system with some modifications
In this option, some of the feedback initially raised in this post is still considered, reducing the weight of Proposal Feedback and narrowing the gap between early and late comments. The Communicating Rationale parameter would also be slightly revalued, resulting in more gradual changes in weights.
- Communicating Rationale (CR): Weight 15% (currently 10%)
- Proposal Feedback (PF): Weight 25/15% (currently 30/15%)
We also suggest the following changes in PF stages:
- 100% weight for comments in the Early Stage Feedback (ESF) phase: Comments provided until the day a proposal is sent to Snapshot (typically by Thursday at 00:00 UTC) will receive 100% weight.
- 60% weight for Late Stage Feedback (LSF) comments: Comments provided during the voting period (starting Thursday at 00:00 UTC) will receive a 40% reduction in weight.
Trade-offs
In this scenario, by maintaining a quantitative system tied only to discussions that reach Snapshot, we may still encounter some of the drawbacks raised at the beginning of this post. However, risks are partially mitigated by modifying certain parameters.
Additionally, the need for trust assumptions or subjectivity will be less than with the rubric, which has its own pros and cons. While in one scheme, the administrator’s role will be more prominent than in the other, it is also true that Option 2 lacks qualitative assessments to some extent. This makes the program somewhat more vulnerable to malicious actors or those contributing minimal value.
Conclusion
We look forward to receiving feedback from the DAO on this matter, as we believe it is essential to take the next step in the incentive program and professionalize the DAO to ensure alignment with ArbitrumDAO’s values.
We intend to present both options when the proposal goes to Snapshot, allowing us to gauge the DAO’s preferences. Note that both options could be treated as “experimental” during the first few months of the program, with the possibility of making adjustments as needed.
More pings for visibility: @Pgov @Blueweb @PennBlockchain @cp0x