Open Source Observer x Arbitrum

Hi all,

We recently received a grant from the Foundation to measure the impact of open source contributions to Arbitrum’s growth and adoption. This is our hello world post and a bit of context on what we’ll be doing over the next few months here.

Who we are

Open Source Observer (OSO) is an analytics platform that helps ecosystem funds measure the impact of different forms of open source software contributions to the growth of their network.

We do this by maintaining a registry of open source projects’ GitHubs, package releases, dependencies, addresses across networks, contract deployments, and funding history. We regularly index on- and off-chain data about each project, generating event streams for everything from repo stars and sdk downloads to contract interactions and gas fees. Finally, we combine the datasets to offer powerful insights about various ecosystem health metrics, such as developer growth, retained users, and contribution to protocol revenues.

The data is accessible via an API and a public front end. Our goal is to be the most open and reliable source of impact metrics out there. We even want to open our warehousing and indexing infra to the community.

We got our start at Protocol Labs and were very active during the most recent RetroPGF round. We’re excited to expand to another ecosystem and hope to have thorough coverage of the OSS projects in Arbitrum very soon!

Objectives

Our project has three primary objectives between now and mid-February 2024:

  1. Launch an Arbitrum OSS directory with auto-validation and reward mechanisms for community submissions
  2. Grow the directory to include at least 200 well-known projects building on Arbitrum across various domains (eg, DeFi, gaming, developer tooling, governance tooling, etc) with up-to-date, fully indexed data from on- and off-chain sources
  3. Leverage the data to propose an initial 4-5 “impact pools” of projects’ contributions to specific ecosystem health metrics, eg, new user growth, developer activity, sequencer fees.

Progress so far

The work just kicked off, but we’ve updated our data schema to begin accepting Arbitrum project data and are ironing out the details of the reward mechanisms for community submissions with Thank ARB. Soon we will be incentivizing a group of contributors to submit project data (see the docs here) so we can build out a full picture of all the projects building on and upstream of Arbitrum.

Getting involved

If you’re interested in following along, join our Telegram chat. If you have data skills and want to be an active contributor, please fill out this interest form and we’ll be in touch soon!

10 Likes

Update:

We recently completed our 2nd of 3 milestones:

2. Grow the directory to include at least 200 well-known projects building on Arbitrum across various domains (eg, DeFi, gaming, developer tooling, governance tooling, etc) with up-to-date, fully indexed data from on- and off-chain sources

To mark the occasion, we released our first analysis on the state of OSS on Arbitrum, which you can find here and in this post on X. It covers a total of 345 projects building on Arbitrum. You can help us add / update information for projects by making a PR here.

Some highlights:

  1. We are currently tracking over 300 OSS projects and over 13,000 code artifacts that are making an impact on the Arbitrum ecosystem. These artifacts include both GitHub repos (~10,000) and smart contracts deployed on Arbitrum One (~3,000).
  2. Approximately 1,800 developers are actively engaged in these projects. This number aligns closely with the latest Electric Capital Developer Report. Our analysis, however, incorporates an additional 94 projects not currently captured in their registry.
  3. The number of active developers is 18% lower than the peak of around 2,200 in March 2023. It’s important to note, however, that this reduction is primarily concentrated in a few projects rather than a general decline across the ecosystem. In fact, the majority of projects have maintained a stable developer count over the past year.

This was just a small taste of what’s possible with the data we’re collecting. In early February, we will release a follow-up report that covers the onchain contributions of ~200 of the 300+ projects included here. At the end of February, we will leverage the data to propose an initial 4-5 “impact pools” that can inform future grantmaking efforts by Plurality Labs and the Arbitrum DAO.

You can explore the Python notebook and some static data dumps here. And again, check out the full report here.

2 Likes

Update

Earlier this week we completed our 3rd and final milestone for our grant:

3: Leverage the data to propose an initial 4-5 “impact pools” of projects’ contributions to specific ecosystem health metrics, eg, new user growth, developer activity, sequencer fees.

Here is our write-up on our initial Arbitrum impact pools. The more exciting thing is that the data (and infra) to replicate and extend this analysis is completely open and ready for community members to work on. We believe that having a plurality of impact measurement and weighting coefficients is critical to a successful decentralized capital allocation mechanism.

Here is an excerpt from the report about how the impact metrics and pools are generated:

Methodology

Available impact metrics

Each pool is constructed from a series of relevant impact metrics that have been aggregated for each project (as of February 28, 2024).

Using Uniswap as an example, these indicators include:

project_name                                            Uniswap
first_commit_date                2018-03-07 21:59:23.000000 UTC
last_commit_date                 2024-02-27 04:25:00.000000 UTC
repos                                                      58.0
stars                                                   26540.0
forks                                                   24545.0
contributors                                             1061.0
contributors_6_months                                     123.0
new_contributors_6_months                                  69.0
avg_fulltime_devs_6_months                                  3.0
avg_active_devs_6_months                              19.666667
commits_6_months                                          659.0
issues_opened_6_months                                    468.0
issues_closed_6_months                                    111.0
pull_requests_opened_6_months                             990.0
pull_requests_merged_6_months                             673.0
num_contracts                                             187.0
first_txn_date                                       2021-08-01
total_txns                                           36432885.0
total_l2_gas                                   45079433321289.0
total_users                                           1350355.0
txns_6_months                                         8427926.0
l2_gas_6_months                                12187502928874.0
users_6_months                                         452206.0
new_user_count                                         131403.0
active_users                                           258476.0
high_frequency_users                                      140.0
more_active_users                                       47828.0
less_active_users                                      210508.0
multi_project_users                                    132928.0
retained_users                                         0.191413

Filtering eligible projects

The filters used in this post are pretty basic. We filter any project that had its first (public) contribution after 2023-09-01, ie, less than 6 months from the time of writing. We also limit our pools to just the top 50 projects, which obviates the need for additional filtering.

In future iterations, more sophisticated filtering is recommendeded to ensure a higher level of quality control. For instance, we may only want to consider GitHub repos that have been starred by a Top 100 developer in the Arbitrum ecosystem.

Normalizing the data

The first two steps give us a vector of projects and values for each impact metric. Now, we need to transform the data into a normal distribution that allows us to compare projects’ relative performance.

For this report, we chose to plot most indicators on a lognormal distribution. The only indicators that had a standard normal distribution were related to developer counts and user retention.

Weighting metrics in the pool

Finally, we applied a weighting coefficient to each metric’s distribution function to determine how heavily it should be weighted in the pool. These can be viewed in the Python notebook that accompanies this report as statements like the following:

blockspace_pool = create_impact_pool(
    df[df['first_txn_date'] < DATE_FILTER],
    impact_vectors={
        'l2_gas_6_months': ('log', .5),
        'txns_6_months': ('log', .5)
    }
)

The example above has a weighting of 50/50 for two vectors related to blockspace (gas fees and transactions counts).

You can explore the Python notebook and the static data used for this report here.

Sample Impact Pool

Here’s an example of the momentum pool. This pool benchmarks projects against a mix of positive developer and onchain user trends. It’s configured as:

momentum_pool = create_impact_pool(
    df[df['first_txn_date'] < DATE_FILTER],
    impact_vectors={
        'active_users': ('log', 1/3),
        'avg_active_devs_6_months': ('log', 1/6),        
        'commits_6_months': ('log', 1/3),        
        'issues_closed_6_months': ('log', 1/6)
    }   
)

And here is a screenshot of the top 50 projects in this pool:

Contributing

We welcome any feedback on this analysis, either here or via the Discord/Telegram/GitHub channels on our website. We are also seeking to grow a community of “impact data scientists” who can contribute directly to this type of analytical work and use it to improve capital allocation decisions. If you’d like to contribute to this mission, then we invite you to join our data collective. We will also be re-launching our public dashboarding interface for projects and collections of projects on opensource.observer soon! Finally, you can explore the Python notebook and the static data used for this report here.