Leaderboard #179

antoviaque · 2020-12-18T17:47:31Z

As a contributor, I would like to see my achievements and compare myself with other contributors, in order to celebrate my wins and remain motivated for even more contributions.

To consider:

How to include badges, like the badge created for contributions squashing warnings

antoviaque · 2020-12-18T17:52:42Z

@nasthagiri @nedbat @regisb @idegtiarov Following-up on an action item I took from the last contributor meetup, I've converted this card from the core committer program board into an issue to be able to comment on it. My action item was to add a mention of including badges there, which I've added to the description.

Btw it could be worth starting to specify what we want for the leaderboard. Something like what the OpenStack project has, ie https://www.stackalytics.com/ ?

regisb · 2020-12-18T18:15:28Z

Thanks for assigning this to me @antoviaque! I'm keen to work on this.

idegtiarov · 2020-12-18T19:18:23Z

I will take a look at this as well! Thanks for adding that ticket as a separate issue.

regisb · 2020-12-26T11:03:37Z

I am currently looking at the Discourse API documentation to fetch badge and user information. I would like to be able to fetch the following information:

Discourse badges
Discourse summary
Organization
Organization continent (Europe, Asia, North America, South America, Africa, Australia)
Github username
Number of merged pull requests in the edX organization
Age of the oldest merged PR
Core committer status

This is relatively easy to achieve, but there needs to be a bridge between Discourse and Github. For this, we can use the Discourse "Associated Accounts" (https://discuss.openedx.org/u/regis/preferences/account). Once we make that connection, we can use the Github API to fill in the remaining information.

The only remaining field is the organization. I do not know yet how we can consistently associate a user to an organization. I would like to be able to list (at least) all organizations from the Open edX marketplace. Automatically finding the organization associated to a certain Github profile is imprecise and inconsistent. Thus, I think our best bet is to define a custom Discourse user field. This could either be a free-text field or a dropdown: https://discuss.openedx.org/admin/customize/user_fields @nedbat do you think this would be acceptable?

EDIT: I'd also like to display the organization continent, but I don't have a clean solution for this. Ideas?

regisb · 2021-01-26T08:36:18Z

I have made some progress on this. The idea is to generate a webpage that will display community members along with the number of likes received on the forums, the count of merged Github PRs, and other cool "vanity" metrics that show how engaged they are in the community.

What I had in mind was to parse the Discourse bio summary and to gather extra information via hashtags. For instance, here's what I'd put in my bio:

Principal Tutor maintainer. Open edX core committer. @regisb on Github. Fond of my beautiful mountain village in the French Alps. 🍜 Chinese noodle enthusiast. #overhangio #corecommitter

The "corecommitter" and "overhangio" hashtags will be associated to my profile. The link to Github will also be parsed and the @regisb account name will be associated too. This means that it should be possible to expose the following information via a REST API:

{
  "username": "regis",
  "forums": {
    "likes_received": 223,
  }
  "github": {
    "username": "regisb",
    "pr": {
      "merged_count": 104
    }
  },
  "tags": ["corecommitter", "overhangio"]
}

Someone (else than me) will then be able to create a nice frontend where we can list and sort community members, search them by tags, etc.

Thoughts?

antoviaque · 2021-01-26T16:56:07Z

@regisb That sounds great! :)

One comment is that it might be useful to tie the data to a specific time period - to allow to show the number of PRs, likes,etc over a specific year/month. This would allow newcomers to be able to get to a better position faster, and encourage old-timers to keep contributing :)

antoviaque · 2021-03-09T06:15:03Z

FYI, on our side @symbolist might contribute some parts of this work -- though he would likely only become available from May.

@idegtiarov @regisb Still interested to also do a part of this work?

regisb · 2021-03-09T07:43:13Z

@idegtiarov @regisb Still interested to also do a part of this work?

Actually, I have already written most of the backend code. I just need to implement some caching to make sure that we don't crawl the Discourse API too frequently, while still guaranteeing that we have fresh results at all times.

nedbat · 2021-03-09T19:06:03Z

@regisb Maybe we could develop this in the open so other people can help? :)

regisb · 2021-03-11T08:36:54Z

@nedbat Yes, but I wanted to get the code in a presentable state, first.

idegtiarov · 2021-03-23T15:57:54Z

We are going to investigate Stackalitics service as a leaderboard option with one/couple of our internal repositories. The work is planned to start in April.

regisb · 2021-04-08T15:36:30Z

Here's my what I got so far: https://github.com/openedx/oxct
It's hosted here: https://oxct.overhang.io/
(just leave a few minutes for the cache to warm up)
I encourage everyone to contribute and open pull requests in this repo 🤗

e0d · 2021-04-15T13:28:11Z

Adding to this thread, we already have an installation of the Grimoire Labs dashboard installed that I think can cover a bunch of the goals captured here.

It currently isn't public, but that should be easily enough done.

The project aims to implement the community metrics proposed by CHAOSS.

I would love to have a single source of truth.
More extensive metrics than a leader board

I was going to give Regis a "cooks tour" on a video call next week. If others are interested in joining, ping me on Slack?

regisb · 2021-04-22T13:16:04Z

We just came out of a conversation with @e0d who presented your Grimoire instance. It was really interesting, and I'd like to recap here a few points which are close to my heart:

Having lots of data, from different data sources (Discourse, Github, Slack...) is awesome. The fact that this data is centralized in a single data source (elasticsearch) makes it easy to create custom visualizations.
Kibana is also great: it allows us to generate visualizations on-the-fly and to explore the data.
I understand that some people love leaderboards, and thus that we need them, but we should also have a way to show off our contributions without necessarily comparing to each other. Thus, I would like to have a single page that says "Régis made X commits in the past year which fixed Y different bugs, received Z likes on the forums, etc." For me, both as an individual and an entrepreneur, this page would mean a lot more than a rank in a leaderboard.
Some people make contributions to Open edX that are extremely valuable, yet not captured in any of the currently available data sources. I'm thinking in particular to @sambapete who spends a lot of energy testing new releases and detecting issues. We must invent a new way of acknowledging these people's contributions : in the form of unique badges or Academy Award-like rewards, for instance.

antoviaque · 2021-04-25T02:30:55Z

@e0d Thanks for the presentation of Grimoire, that was really useful to see! I only knew it through Cauldron -- I had tried to run it on the edX github orgs some time ago, but it is a bit limited in the type of sources it can import there: https://cauldron.io/project/3820 . The setup you have seem much more powerful in that regard: https://openedx-metrics.herokuapp.com/ (CC @bradenmacdonald @nasthagiri as this might be useful to gather data about the core committer program, which you are looking at for a blog post about the program.)

Btw, would it be ok to post the recording of this meeting publicly here, in case others would like to watch it?

A few ideas/comments that I've found interesting from what you, @regisb @symbolist @idegtiarov @arbrandes mentioned, or reactions to the points you've made:

The idea of having a canonical database aggregating the contribution data, and then allow to develop & present multiple ways to represent that information seems like a great approach. As you mentioned @regisb, some will want to compare, some will want to get only a summary of their own contributions -- it's good to allow multiple perspectives, and this will allow us to experiment with how to look at the data over time, keeping the process iterative rather than define a single set of statistics once for all, which could be more easily gamed.
Imho this advocates for the idea of not spending too much time trying to define and agree on a precise and definitive set of metrics upfront. We still want to define it, but I agree with @e0d that it would be reasonable to simply start with the CHAOSS metrics, which have the merit of being already defined and implemented -- then we can see what we get from that, and iterate by creating additional views?
From having played a bit with https://openedx-metrics.herokuapp.com/ it looks like a preliminary important step will be to improve the accuracy of the dataset. For example, currently the assignation to organizations seem to be a bit haphazard. For example on the list of all pull requests with the tag "open source contribution", most of the pull requests have a "Unknown" organization, or @pomegranited is listed as being from the Adelaide university.
+1 to "hours of effort" as one of the metrics we should try to capture. Like any metric, it will be imperfect, but it is indeed the one common "currency" we all wish we had more of, and the amount of our time that we spend on contributing to something is definitely representative of our level of implication on that project. It's also one of the main types of commitments from the Declaration of Commitment to the Core Committer Program, so it would allow tracking that more easily. Also, since many providers are tracking their time on their side too, it would allow comparing what the tool measures with what is independently measure, and check that the metrics actually match reality.
For community votes & karma -- we have some metrics on this through the "likes", which several of the tools we use readily support (discourse, github, etc.), and is already being used. Maybe getting that data and aggregating it too would be a good first step to measure karma?
More generally, community votes, nominations, etc. could be good to include as an additional source of information. Subjective opinions and votes are a useful complement to the rest of the data collected -- and could likely be made part of the dataset, too. However, I would be careful to not consider them necessarily more authoritative than the rest of the information -- a quiet developer who contributes a lot of work but doesn't talk much on the forums can be as (or more) important to the project as someone very visible and popular on the forums. Part of the goal with gathering the data is to contribute to dissipating perception bias and obfuscation, by showing actual numbers that reveal the actual work contributed -- if we consider this data secondary to popularity or visibility, this works against the meritocratic principles of open source imho.

Some people make contributions to Open edX that are extremely valuable, yet not captured in any of the currently available data sources. I'm thinking in particular to @sambapete who spends a lot of energy testing new releases and detecting issues. We must invent a new way of acknowledging these people's contributions : in the form of unique badges or Academy Award-like rewards, for instance.

+1 -- these might be things that we could be able to surface through tickets from bug reports, reports/likes on forums, maybe a role within the release working group? Badges are a good way too yes, maybe a stepped-up version of it could be a way to show the titles and responsibilities that any given person takes in the project?

e0d · 2021-04-26T12:55:29Z

I spent some time over the weekend deploying an upgraded instance of Grimoire Labs. It is currently consuming all of the data and I'll share a link once it's done.

The people data is a key place where we need some investment. I don't think it's a ton of work, but the way we are currently mapping people to organizations is pretty brittle and manual.
With the new deployment I've included "hatstall," the Grimoire user management frontend, based on Django.
I did a quick experiment moving ElasticSearch to the AWS managed service. TLDR; not a drop-in thing, reverted to Docker for now.
I've updated the deployment to pull in Discourse data.
I have a sample custom study working, which runs from a URL with a json file that matches records like so

[ { "conditions": [ { "field": "origin", "value": "https://github.com/edx/frontend-component-cookie-policy-banner" } ], "set_extra_fields": [ { "field": "my_namespace_foo", "value": "foo" }, { "field": "my_namespace_bar", "value": "bar" } ] } ]

I'm going to speak with someone from Bitergia later today, but my current thinking is that extending Grimoire could work well. For example, potentially creating a Transifex backend.

regisb · 2021-04-26T13:05:27Z

I'm going to speak with someone from Bitergia later today, but my current thinking is that extending Grimoire could work well. For example, potentially creating a Transifex backend.

This is a great idea!

e0d · 2021-04-29T11:57:11Z

Here are two examples of dashboards that are hosted by Bitergia for FINOS and Gitlab.

FINOS
Gitlab

symbolist · 2021-04-29T13:46:16Z

I have been taking a deeper look at the CHAOSS project this week. To help others who would like to quickly understand what it is about so that they can participate in this discussion, I compiled together some highlights from my investigation here: https://openedx.atlassian.net/wiki/spaces/COMM/pages/2696446382/CHAOSS

Imho this advocates for the idea of not spending too much time trying to define and agree on a precise and definitive set of metrics upfront. We still want to define it, but I agree with @e0d that it would be reasonable to simply start with the CHAOSS metrics, which have the merit of being already defined and implemented -- then we can see what we get from that, and iterate by creating additional views?

I agree with this approach as well. It gives us a concrete starting point that has already been thought about deeply by many experts in the area and has been in use by other communities. We may want to additionally slice and dice the data for specific goals but the framework supports that as well (and so it does not constrain us). Also for the sake of thoroughness, I did try to see if there were any competing standards or options but this seems to be the only comprehensive one.

From having played a bit with https://openedx-metrics.herokuapp.com/ it looks like a preliminary important step will be to improve the accuracy of the dataset. For example, currently the assignation to organizations seem to be a bit haphazard. For example on the list of all pull requests with the tag "open source contribution", most of the pull requests have a "Unknown" organization, or @pomegranited is listed as being from the Adelaide university.

SortingHat is the part of the suite which is responsible for managing identities. From looking at its documentation it looks like it should support what we want and we just need to look into configuring that (looks like @e0d has already installed the user interface "hatstall" for that):

"Sorting Hat maintains an SQL database of unique identities of communities members across (potentially) many different sources. Identities corresponding to the same real person can be merged in the same unique identity with a unique uuid. For each unique identity, a profile can be defined, with the name and other data shown for the corresponding person by default.

In addition, each unique identity can be related to one or more affiliations, for different time periods. This will usually correspond to different organizations in which the person was employed during those time periods."

https://www.researchgate.net/publication/331088184_SortingHat_Wizardry_on_Software_Project_Members has some more details.

@e0d

The people data is a key place where we need some investment. I don't think it's a ton of work, but the way we are currently mapping people to organizations is pretty brittle and manual.

Let me know if I can help with that. 🙂

To also start the conversation about the overall plan, if everyone is in agreement about this as a starting point, the next steps could be:

Make sure that the GrimoireLab instance is fully configured and ingesting data from all the sources it supports (happy to help with this).
Give everyone a chance to play around with it.
Gather recommendations about what initial set of metrics we should focus on for the CC program.
Set up dashboards for them.

e0d · 2021-05-07T12:06:51Z

I've made progress getting Grimoire upgraded and configured against the core data sources. An outstanding item is to configure authentication, which I can look at over the weekend. Without that it is not simply a matter of the data being available to everyone, but that anyone would be able to alter dashboards.

For CCs, I can send you a preview if Slack me directly.

regisb · 2021-05-10T04:46:57Z

An outstanding item is to configure authentication, which I can look at over the weekend.

Is that even possible? I though that authentication was only available in the commercial edition of Kibana?

e0d · 2021-05-14T14:50:56Z

Requiring login with a shared credential is possible, that's where we are right now. This is compatible with allowing readonly access to the views. This needs a little configuration change to work probably, but should be straight-forward.

PM me if you want the credentials to view the data.

arbrandes · 2021-06-22T14:57:35Z

@antoviaque,

From that, it would then be iterative in any case, based on what we think is useful. Does that match your/others memory?

That sums up what I remember, yes.

pomegranited · 2021-06-23T10:14:26Z

@e0d @arbrandes I've added some suggestions and questions to your CHAOSS Cleanup spreadsheet, and would like to create a task to address some of these issues during our next sprint (30 June - 13 July). At a glance, I think "merging organizations" will be the easiest to do first since it's manual. But the others will require some (nice) contributions to sortinghat, like "sourcing organization for non-affiliated individuals from github".

What do we need to get started on this? I could start by creating a github Project for this work, and start adding issues so we can discuss requirements with everybody.

e0d · 2021-06-23T10:31:27Z

GitHub project sounds great. I do think it is OK to have folks in a pseudo organization, say, "individual.". But we want to classify whomever we can when they are affiliated. There will be folks who.are legitimately individuals. Do you have thoughts on which interventions will have the biggest quality impacts? I think focusing on CCs and key firms will touch the majority of contributions for example

…

On Wed, Jun 23, 2021, 12:14 PM Jillian Vogel ***@***.***> wrote: @e0d <https://github.com/e0d> @arbrandes <https://github.com/arbrandes> I've added some suggestions and questions to your CHAOSS Cleanup spreadsheet, and would like to create a task to address some of these issues during our next sprint (30 June - 13 July). At a glance, I think "merging organizations" will be the easiest to do first since it's manual. But the others will require some (nice) contributions to sortinghat, like "sourcing organization for non-affiliated individuals from github". What do we need to get started on this? I could start by creating a github Project for this work, and start adding issues so we can discuss requirements with everybody. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJWEAUUU4IC6WVASVSSP3TTUGXYZANCNFSM4VBPOKOQ> .

pomegranited · 2021-06-23T10:31:45Z

@e0d question -- what are the source github projects included in this initial Grimoire deployment? Can we add non-edx repos like Tutor and the community-supported XBlocks?

e0d · 2021-06-23T10:35:47Z

One more thought, the merged orgs is a good example of the type of change that needs to be sticky. If we merge edX and edX inc. only for edX inc. to be recreated during the next identity analysis that an issue. I'm not yet sure where the two versions originated from. Do we need an aliases concept for orga?

…

On Wed, Jun 23, 2021, 12:31 PM Edward Zarecor ***@***.***> wrote: GitHub project sounds great. I do think it is OK to have folks in a pseudo organization, say, "individual.". But we want to classify whomever we can when they are affiliated. There will be folks who.are legitimately individuals. Do you have thoughts on which interventions will have the biggest quality impacts? I think focusing on CCs and key firms will touch the majority of contributions for example On Wed, Jun 23, 2021, 12:14 PM Jillian Vogel ***@***.***> wrote: > @e0d <https://github.com/e0d> @arbrandes <https://github.com/arbrandes> > I've added some suggestions and questions to your CHAOSS Cleanup > spreadsheet, and would like to create a task to address some of these > issues during our next sprint (30 June - 13 July). At a glance, I think > "merging organizations" will be the easiest to do first since it's manual. > But the others will require some (nice) contributions to sortinghat, like > "sourcing organization for non-affiliated individuals from github". > > What do we need to get started on this? I could start by creating a > github Project for this work, and start adding issues so we can discuss > requirements with everybody. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#179 (comment)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AAJWEAUUU4IC6WVASVSSP3TTUGXYZANCNFSM4VBPOKOQ> > . >

e0d · 2021-06-23T10:44:28Z

Currently it's every public project in the edX and Open edX GitHub orgs. We can add other repos if that makes sense. I think we need to work out that definition.

…

On Wed, Jun 23, 2021, 12:31 PM Jillian Vogel ***@***.***> wrote: @e0d <https://github.com/e0d> question -- what are the source github projects included in this initial Grimoire deployment? Can we add non-edx repos like Tutor and the community-supported XBlocks? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#179 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJWEAXHOALYCH6HXAOASELTUGZZXANCNFSM4VBPOKOQ> .

pomegranited · 2021-06-23T10:48:28Z

@e0d

Do you have thoughts on which interventions will have the biggest quality
impacts? I think focusing on CCs and key firms will touch the majority of
contributions for example

Can we export the number of contributions that are being counted against each non-org individual, so we can sort them and ensure the highest numbers are affiliated somewhere if that's appropriate?

But yes, the CC people by definition will have the most contributions, so I've updated the "Recommended Organization" for all the core contributors I could identify.

pomegranited · 2021-07-07T07:44:55Z

FYI I've created a github project to track these issues and ideas: https://github.com/orgs/edx/projects/6

Can people confirm they can edit those cards? I haven't converted any to proper issues yet, but I think that's what we have to do to allow comments.

pomegranited · 2021-07-08T08:33:38Z

@e0d I've created #226 as the first issue to address, so we start working on data cleanup without having to have access to the edX Grimoire/SortingHat instance.

If anyone has suggestions or something specific they'd like to see out of that task, let me know?

CC @arbrandes @regisb @antoviaque

antoviaque · 2021-07-08T14:51:32Z

@pomegranited Thank you! 👍

https://github.com/orgs/edx/projects/6 Can people confirm they can edit those cards?

I confirm that I can edit them yes.

arbrandes · 2021-07-09T14:56:30Z

I confirm that I can edit them yes.

Same here.

sarina · 2024-05-17T17:48:44Z

Hi everyone, this issue hasn't been touched since June 2021. Was there any enthusiasm/capacity to pick up on this idea, or should we close the issue?

If we want to keep it I propose moving the issue to https://github.com/openedx/wg-coordination/issues since this issue doesn't pertain to an OEP.

sarina · 2024-05-17T20:45:34Z

Closing per #227 (comment)

antoviaque assigned nedbat, regisb, antoviaque and idegtiarov Dec 18, 2020

nasthagiri mentioned this issue May 4, 2021

Contributor's Meetup 2021-05-04 openedx/wg-coordination#27

Closed

arbrandes mentioned this issue May 4, 2021

Contributor's Meetup 2021-05-18 openedx/wg-coordination#28

Closed

arbrandes mentioned this issue Jun 29, 2021

Contributor's Meetup 2021-07-13 openedx/wg-coordination#33

Closed

arbrandes mentioned this issue Jul 13, 2021

Contributor's Meetup 2021-07-27 openedx/wg-coordination#34

Closed

arbrandes mentioned this issue Jul 27, 2021

Contributor's Meetup 2021-08-10 openedx/wg-coordination#35

Closed

arbrandes mentioned this issue Aug 10, 2021

Contributor's Meetup 2021-08-24 openedx/wg-coordination#36

Closed

pomegranited mentioned this issue Aug 30, 2021

Leaderboard: Track "hours of effort" #237

Closed

arbrandes mentioned this issue Aug 30, 2021

Contributor's Meetup 2021-09-07 openedx/wg-coordination#37

Closed

arbrandes mentioned this issue Sep 7, 2021

Contributor's Meetup 2021-09-21 openedx/wg-coordination#38

Closed

arbrandes mentioned this issue Sep 22, 2021

Contributor's Meetup 2021-10-05 openedx/wg-coordination#39

Closed

nizarmah mentioned this issue Oct 6, 2021

Contributor's Meetup 2021-10-19 openedx/wg-coordination#40

Closed

nizarmah mentioned this issue Oct 20, 2021

Contributor's Meetup 2021-11-02 openedx/wg-coordination#41

Closed

antoviaque mentioned this issue Nov 15, 2021

Contributor's Meetup 2021-11-16 openedx/wg-coordination#44

Closed

nizarmah mentioned this issue Nov 16, 2021

Contributor's Meetup 2021-11-30 openedx/wg-coordination#45

Closed

nizarmah mentioned this issue Dec 10, 2021

Contributor's Meetup 2021-12-21 openedx/wg-coordination#48

Closed

nizarmah mentioned this issue Jan 3, 2022

Contributor's Meetup 2022-01-11 openedx/wg-coordination#51

Closed

sarina added the inactive PR author has been unresponsive for several months label May 17, 2024

sarina closed this as completed May 17, 2024

Leaderboard #179

Leaderboard #179

Comments

antoviaque commented Dec 18, 2020 • edited Loading

antoviaque commented Dec 18, 2020

regisb commented Dec 18, 2020

idegtiarov commented Dec 18, 2020

regisb commented Dec 26, 2020 • edited Loading

regisb commented Jan 26, 2021

antoviaque commented Jan 26, 2021

antoviaque commented Mar 9, 2021

regisb commented Mar 9, 2021

nedbat commented Mar 9, 2021

regisb commented Mar 11, 2021

idegtiarov commented Mar 23, 2021

regisb commented Apr 8, 2021

e0d commented Apr 15, 2021

regisb commented Apr 22, 2021

antoviaque commented Apr 25, 2021

e0d commented Apr 26, 2021

regisb commented Apr 26, 2021

e0d commented Apr 29, 2021 • edited Loading

symbolist commented Apr 29, 2021

e0d commented May 7, 2021

regisb commented May 10, 2021

e0d commented May 14, 2021

arbrandes commented Jun 22, 2021

pomegranited commented Jun 23, 2021

e0d commented Jun 23, 2021 via email

pomegranited commented Jun 23, 2021

e0d commented Jun 23, 2021 via email

e0d commented Jun 23, 2021 via email

pomegranited commented Jun 23, 2021

pomegranited commented Jul 7, 2021

pomegranited commented Jul 8, 2021

antoviaque commented Jul 8, 2021

arbrandes commented Jul 9, 2021

sarina commented May 17, 2024

sarina commented May 17, 2024

antoviaque commented Dec 18, 2020 •

edited

Loading

regisb commented Dec 26, 2020 •

edited

Loading

e0d commented Apr 29, 2021 •

edited

Loading