Hi @SEEDGov ,
We recently reviewed the DIP results for February and would like more clarity on how scoring improvements can be made. While we understood the previous scoring system and how scores from 1 to 4 were calculated, we noticed that the full score range has now increased from 4 to 10. Although we are familiar with the general rubric, it remains unclear how to improve our scores within this expanded range.
For example, in the proposal “Increase resilience to outside attackers by updating DAO parameters to not count ‘Abstain’ votes in Quorum,” we received a score of 7 in most categories but only 3 for impact. However, the difference between a 7, 8, or 9 remains unclear. What specific factors differentiate these scores, and how can we adjust our approach to achieve a higher score? With the current evaluation method, it is difficult for delegates to learn from past assessments and improve future feedbacks. The lack of clear distinctions between score levels makes it challenging to identify specific improvements needed to reach a higher rating. Additionally, given the subjective nature of scoring, it can be difficult to justify why certain scores are assigned.
We also have an idea to address this, which is to explore a normalized scoring approach, a method commonly used in academic assessments to ensure fairness, consistency, and transparency by benchmarking the highest-performing responses in each rubric category as a reference for a full score.
How Normalization Works
-
Instead of using a single delegate’s overall high score as the standard, this method identifies the highest-scoring delegate for each rubric category and uses their response as the reference point for a full score.
For example, if Jojo scores 10 in depth of analysis, while Curia receives a full score in clarity, then the benchmark for each rubric should be set based on the delegate who performed best in that specific area.
Alternatively, if Seedgov (as the program manager) determines what constitutes a full score or identifies specific gaps in the top-scoring responses, that could also be used as a benchmark. However, this would require clearly identifying what is needed or lacking to ensure that delegates understand how to achieve a higher score.
-
Defining Key Scoring Factors: Document the specific elements that contributed to the top scores—such as depth of analysis, clarity, or relevance—to provide clear guidelines for other delegates.
-
Standardized Evaluation: Assess other delegates’ scores relative to these category-specific benchmarks, ensuring a structured, transparent, and fair scoring system.
This approach ensures that each rubric category is measured against the highest standard in that specific criterion, rather than relying on a single overall top-scoring delegate. By doing so, it allows for a more precise and meaningful evaluation of delegate performance.
Additionally, this approach would allow delegates to compare their scores against the top-performing delegate in each category and understand:
- How their score was calculated
- What factors contributed to the highest score in each rubric category
- Where they are lacking compared to the highest-performing delegate in that category
Examples of Normalized Scoring in other industry
Normalized scoring is widely used in academia and standardized testing to improve fairness, consistency, and accuracy:
- University Grading: Many universities normalize exam scores to adjust for grading inconsistencies, ensuring more accurate and equitable distributions.
- Standardized Testing: Exams like the SAT and GRE use score normalization to adjust for differences in test difficulty, ensuring that scores remain comparable over time.
- Peer Evaluations: Online learning platforms like Coursera use normalized scoring in peer assessments to reduce grading bias and improve accuracy, achieving results comparable to expert evaluations.
Key Benefits of Normalized Scoring
- Fairness & Consistency: Eliminates bias from evaluator subjectivity or varying assessment difficulty, ensuring scores reflect actual performance.
- Transparency & Interpretability: Provides clear benchmarks for scoring, helping delegates understand how their scores were assigned and how to improve.
- Objective Benchmarking: Ensures comparability across different evaluators, reducing arbitrary differences in scoring.
We believe by integrating a normalized scoring system, DIP evaluations can become more structured, transparent, and actionable, benefiting both evaluators and delegates.
We’d love to hear your thoughts on whether this approach could enhance fairness and clarity in the DIP scoring process.