Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leaderboard #179

Closed
antoviaque opened this issue Dec 18, 2020 · 42 comments
Closed

Leaderboard #179

antoviaque opened this issue Dec 18, 2020 · 42 comments
Assignees
Labels
inactive PR author has been unresponsive for several months

Comments

@antoviaque
Copy link
Contributor

antoviaque commented Dec 18, 2020

As a contributor, I would like to see my achievements and compare myself with other contributors, in order to celebrate my wins and remain motivated for even more contributions.

To consider:

@antoviaque
Copy link
Contributor Author

@nasthagiri @nedbat @regisb @idegtiarov Following-up on an action item I took from the last contributor meetup, I've converted this card from the core committer program board into an issue to be able to comment on it. My action item was to add a mention of including badges there, which I've added to the description.

Btw it could be worth starting to specify what we want for the leaderboard. Something like what the OpenStack project has, ie https://www.stackalytics.com/ ?

@regisb
Copy link

regisb commented Dec 18, 2020

Thanks for assigning this to me @antoviaque! I'm keen to work on this.

@idegtiarov
Copy link
Contributor

I will take a look at this as well! Thanks for adding that ticket as a separate issue.

@regisb
Copy link

regisb commented Dec 26, 2020

I am currently looking at the Discourse API documentation to fetch badge and user information. I would like to be able to fetch the following information:

  • Discourse badges
  • Discourse summary
  • Organization
  • Organization continent (Europe, Asia, North America, South America, Africa, Australia)
  • Github username
  • Number of merged pull requests in the edX organization
  • Age of the oldest merged PR
  • Core committer status

This is relatively easy to achieve, but there needs to be a bridge between Discourse and Github. For this, we can use the Discourse "Associated Accounts" (https://discuss.openedx.org/u/regis/preferences/account). Once we make that connection, we can use the Github API to fill in the remaining information.

The only remaining field is the organization. I do not know yet how we can consistently associate a user to an organization. I would like to be able to list (at least) all organizations from the Open edX marketplace. Automatically finding the organization associated to a certain Github profile is imprecise and inconsistent. Thus, I think our best bet is to define a custom Discourse user field. This could either be a free-text field or a dropdown: https://discuss.openedx.org/admin/customize/user_fields @nedbat do you think this would be acceptable?

EDIT: I'd also like to display the organization continent, but I don't have a clean solution for this. Ideas?

@regisb
Copy link

regisb commented Jan 26, 2021

I have made some progress on this. The idea is to generate a webpage that will display community members along with the number of likes received on the forums, the count of merged Github PRs, and other cool "vanity" metrics that show how engaged they are in the community.

What I had in mind was to parse the Discourse bio summary and to gather extra information via hashtags. For instance, here's what I'd put in my bio:

Principal Tutor maintainer. Open edX core committer. @regisb on Github. Fond of my beautiful mountain village in the French Alps. 🍜 Chinese noodle enthusiast. #overhangio #corecommitter

The "corecommitter" and "overhangio" hashtags will be associated to my profile. The link to Github will also be parsed and the @regisb account name will be associated too. This means that it should be possible to expose the following information via a REST API:

{
  "username": "regis",
  "forums": {
    "likes_received": 223,
  }
  "github": {
    "username": "regisb",
    "pr": {
      "merged_count": 104
    }
  },
  "tags": ["corecommitter", "overhangio"]
}

Someone (else than me) will then be able to create a nice frontend where we can list and sort community members, search them by tags, etc.

Thoughts?

@antoviaque
Copy link
Contributor Author

@regisb That sounds great! :)

One comment is that it might be useful to tie the data to a specific time period - to allow to show the number of PRs, likes,etc over a specific year/month. This would allow newcomers to be able to get to a better position faster, and encourage old-timers to keep contributing :)

@antoviaque
Copy link
Contributor Author

FYI, on our side @symbolist might contribute some parts of this work -- though he would likely only become available from May.

@idegtiarov @regisb Still interested to also do a part of this work?

@regisb
Copy link

regisb commented Mar 9, 2021

@idegtiarov @regisb Still interested to also do a part of this work?

Actually, I have already written most of the backend code. I just need to implement some caching to make sure that we don't crawl the Discourse API too frequently, while still guaranteeing that we have fresh results at all times.

@nedbat
Copy link
Contributor

nedbat commented Mar 9, 2021

@regisb Maybe we could develop this in the open so other people can help? :)

@regisb
Copy link

regisb commented Mar 11, 2021

@nedbat Yes, but I wanted to get the code in a presentable state, first.

@idegtiarov
Copy link
Contributor

We are going to investigate Stackalitics service as a leaderboard option with one/couple of our internal repositories. The work is planned to start in April.

@regisb
Copy link

regisb commented Apr 8, 2021

Here's my what I got so far: https://github.com/openedx/oxct
It's hosted here: https://oxct.overhang.io/
(just leave a few minutes for the cache to warm up)
I encourage everyone to contribute and open pull requests in this repo 🤗

@e0d
Copy link
Contributor

e0d commented Apr 15, 2021

Adding to this thread, we already have an installation of the Grimoire Labs dashboard installed that I think can cover a bunch of the goals captured here.

It currently isn't public, but that should be easily enough done.

The project aims to implement the community metrics proposed by CHAOSS.

  • I would love to have a single source of truth.
  • More extensive metrics than a leader board

I was going to give Regis a "cooks tour" on a video call next week. If others are interested in joining, ping me on Slack?

@regisb
Copy link

regisb commented Apr 22, 2021

We just came out of a conversation with @e0d who presented your Grimoire instance. It was really interesting, and I'd like to recap here a few points which are close to my heart:

  1. Having lots of data, from different data sources (Discourse, Github, Slack...) is awesome. The fact that this data is centralized in a single data source (elasticsearch) makes it easy to create custom visualizations.
  2. Kibana is also great: it allows us to generate visualizations on-the-fly and to explore the data.
  3. I understand that some people love leaderboards, and thus that we need them, but we should also have a way to show off our contributions without necessarily comparing to each other. Thus, I would like to have a single page that says "Régis made X commits in the past year which fixed Y different bugs, received Z likes on the forums, etc." For me, both as an individual and an entrepreneur, this page would mean a lot more than a rank in a leaderboard.
  4. Some people make contributions to Open edX that are extremely valuable, yet not captured in any of the currently available data sources. I'm thinking in particular to @sambapete who spends a lot of energy testing new releases and detecting issues. We must invent a new way of acknowledging these people's contributions : in the form of unique badges or Academy Award-like rewards, for instance.

@antoviaque
Copy link
Contributor Author

@e0d Thanks for the presentation of Grimoire, that was really useful to see! I only knew it through Cauldron -- I had tried to run it on the edX github orgs some time ago, but it is a bit limited in the type of sources it can import there: https://cauldron.io/project/3820 . The setup you have seem much more powerful in that regard: https://openedx-metrics.herokuapp.com/ (CC @bradenmacdonald @nasthagiri as this might be useful to gather data about the core committer program, which you are looking at for a blog post about the program.)

Btw, would it be ok to post the recording of this meeting publicly here, in case others would like to watch it?

A few ideas/comments that I've found interesting from what you, @regisb @symbolist @idegtiarov @arbrandes mentioned, or reactions to the points you've made:

  • The idea of having a canonical database aggregating the contribution data, and then allow to develop & present multiple ways to represent that information seems like a great approach. As you mentioned @regisb, some will want to compare, some will want to get only a summary of their own contributions -- it's good to allow multiple perspectives, and this will allow us to experiment with how to look at the data over time, keeping the process iterative rather than define a single set of statistics once for all, which could be more easily gamed.

  • Imho this advocates for the idea of not spending too much time trying to define and agree on a precise and definitive set of metrics upfront. We still want to define it, but I agree with @e0d that it would be reasonable to simply start with the CHAOSS metrics, which have the merit of being already defined and implemented -- then we can see what we get from that, and iterate by creating additional views?

  • From having played a bit with https://openedx-metrics.herokuapp.com/ it looks like a preliminary important step will be to improve the accuracy of the dataset. For example, currently the assignation to organizations seem to be a bit haphazard. For example on the list of all pull requests with the tag "open source contribution", most of the pull requests have a "Unknown" organization, or @pomegranited is listed as being from the Adelaide university.

  • +1 to "hours of effort" as one of the metrics we should try to capture. Like any metric, it will be imperfect, but it is indeed the one common "currency" we all wish we had more of, and the amount of our time that we spend on contributing to something is definitely representative of our level of implication on that project. It's also one of the main types of commitments from the Declaration of Commitment to the Core Committer Program, so it would allow tracking that more easily. Also, since many providers are tracking their time on their side too, it would allow comparing what the tool measures with what is independently measure, and check that the metrics actually match reality.

  • For community votes & karma -- we have some metrics on this through the "likes", which several of the tools we use readily support (discourse, github, etc.), and is already being used. Maybe getting that data and aggregating it too would be a good first step to measure karma?

  • More generally, community votes, nominations, etc. could be good to include as an additional source of information. Subjective opinions and votes are a useful complement to the rest of the data collected -- and could likely be made part of the dataset, too. However, I would be careful to not consider them necessarily more authoritative than the rest of the information -- a quiet developer who contributes a lot of work but doesn't talk much on the forums can be as (or more) important to the project as someone very visible and popular on the forums. Part of the goal with gathering the data is to contribute to dissipating perception bias and obfuscation, by showing actual numbers that reveal the actual work contributed -- if we consider this data secondary to popularity or visibility, this works against the meritocratic principles of open source imho.

Some people make contributions to Open edX that are extremely valuable, yet not captured in any of the currently available data sources. I'm thinking in particular to @sambapete who spends a lot of energy testing new releases and detecting issues. We must invent a new way of acknowledging these people's contributions : in the form of unique badges or Academy Award-like rewards, for instance.

+1 -- these might be things that we could be able to surface through tickets from bug reports, reports/likes on forums, maybe a role within the release working group? Badges are a good way too yes, maybe a stepped-up version of it could be a way to show the titles and responsibilities that any given person takes in the project?

@e0d
Copy link
Contributor

e0d commented Apr 26, 2021

I spent some time over the weekend deploying an upgraded instance of Grimoire Labs. It is currently consuming all of the data and I'll share a link once it's done.

  • The people data is a key place where we need some investment. I don't think it's a ton of work, but the way we are currently mapping people to organizations is pretty brittle and manual.
  • With the new deployment I've included "hatstall," the Grimoire user management frontend, based on Django.
  • I did a quick experiment moving ElasticSearch to the AWS managed service. TLDR; not a drop-in thing, reverted to Docker for now.
  • I've updated the deployment to pull in Discourse data.
  • I have a sample custom study working, which runs from a URL with a json file that matches records like so

[ { "conditions": [ { "field": "origin", "value": "https://github.com/edx/frontend-component-cookie-policy-banner" } ], "set_extra_fields": [ { "field": "my_namespace_foo", "value": "foo" }, { "field": "my_namespace_bar", "value": "bar" } ] } ]

  • I'm going to speak with someone from Bitergia later today, but my current thinking is that extending Grimoire could work well. For example, potentially creating a Transifex backend.

@regisb
Copy link

regisb commented Apr 26, 2021

I'm going to speak with someone from Bitergia later today, but my current thinking is that extending Grimoire could work well. For example, potentially creating a Transifex backend.

This is a great idea!

@e0d
Copy link
Contributor

e0d commented Apr 29, 2021

Here are two examples of dashboards that are hosted by Bitergia for FINOS and Gitlab.

FINOS
Gitlab

@symbolist
Copy link

I have been taking a deeper look at the CHAOSS project this week. To help others who would like to quickly understand what it is about so that they can participate in this discussion, I compiled together some highlights from my investigation here: https://openedx.atlassian.net/wiki/spaces/COMM/pages/2696446382/CHAOSS

Imho this advocates for the idea of not spending too much time trying to define and agree on a precise and definitive set of metrics upfront. We still want to define it, but I agree with @e0d that it would be reasonable to simply start with the CHAOSS metrics, which have the merit of being already defined and implemented -- then we can see what we get from that, and iterate by creating additional views?

I agree with this approach as well. It gives us a concrete starting point that has already been thought about deeply by many experts in the area and has been in use by other communities. We may want to additionally slice and dice the data for specific goals but the framework supports that as well (and so it does not constrain us). Also for the sake of thoroughness, I did try to see if there were any competing standards or options but this seems to be the only comprehensive one.

From having played a bit with https://openedx-metrics.herokuapp.com/ it looks like a preliminary important step will be to improve the accuracy of the dataset. For example, currently the assignation to organizations seem to be a bit haphazard. For example on the list of all pull requests with the tag "open source contribution", most of the pull requests have a "Unknown" organization, or @pomegranited is listed as being from the Adelaide university.

SortingHat is the part of the suite which is responsible for managing identities. From looking at its documentation it looks like it should support what we want and we just need to look into configuring that (looks like @e0d has already installed the user interface "hatstall" for that):

"Sorting Hat maintains an SQL database of unique identities of communities members across (potentially) many different sources. Identities corresponding to the same real person can be merged in the same unique identity with a unique uuid. For each unique identity, a profile can be defined, with the name and other data shown for the corresponding person by default.

In addition, each unique identity can be related to one or more affiliations, for different time periods. This will usually correspond to different organizations in which the person was employed during those time periods."

https://www.researchgate.net/publication/331088184_SortingHat_Wizardry_on_Software_Project_Members has some more details.

@e0d

The people data is a key place where we need some investment. I don't think it's a ton of work, but the way we are currently mapping people to organizations is pretty brittle and manual.

Let me know if I can help with that. 🙂

To also start the conversation about the overall plan, if everyone is in agreement about this as a starting point, the next steps could be:

  1. Make sure that the GrimoireLab instance is fully configured and ingesting data from all the sources it supports (happy to help with this).
  2. Give everyone a chance to play around with it.
  3. Gather recommendations about what initial set of metrics we should focus on for the CC program.
  4. Set up dashboards for them.

@e0d
Copy link
Contributor

e0d commented May 7, 2021

I've made progress getting Grimoire upgraded and configured against the core data sources. An outstanding item is to configure authentication, which I can look at over the weekend. Without that it is not simply a matter of the data being available to everyone, but that anyone would be able to alter dashboards.

For CCs, I can send you a preview if Slack me directly.

@regisb
Copy link

regisb commented May 10, 2021

An outstanding item is to configure authentication, which I can look at over the weekend.

Is that even possible? I though that authentication was only available in the commercial edition of Kibana?

@e0d
Copy link
Contributor

e0d commented May 14, 2021

Requiring login with a shared credential is possible, that's where we are right now. This is compatible with allowing readonly access to the views. This needs a little configuration change to work probably, but should be straight-forward.

PM me if you want the credentials to view the data.

@arbrandes
Copy link
Contributor

@antoviaque,

From that, it would then be iterative in any case, based on what we think is useful. Does that match your/others memory?

That sums up what I remember, yes.

@pomegranited
Copy link
Contributor

@e0d @arbrandes I've added some suggestions and questions to your CHAOSS Cleanup spreadsheet, and would like to create a task to address some of these issues during our next sprint (30 June - 13 July). At a glance, I think "merging organizations" will be the easiest to do first since it's manual. But the others will require some (nice) contributions to sortinghat, like "sourcing organization for non-affiliated individuals from github".

What do we need to get started on this? I could start by creating a github Project for this work, and start adding issues so we can discuss requirements with everybody.

@e0d
Copy link
Contributor

e0d commented Jun 23, 2021 via email

@pomegranited
Copy link
Contributor

@e0d question -- what are the source github projects included in this initial Grimoire deployment? Can we add non-edx repos like Tutor and the community-supported XBlocks?

@e0d
Copy link
Contributor

e0d commented Jun 23, 2021 via email

@e0d
Copy link
Contributor

e0d commented Jun 23, 2021 via email

@pomegranited
Copy link
Contributor

@e0d

Do you have thoughts on which interventions will have the biggest quality
impacts? I think focusing on CCs and key firms will touch the majority of
contributions for example

Can we export the number of contributions that are being counted against each non-org individual, so we can sort them and ensure the highest numbers are affiliated somewhere if that's appropriate?

But yes, the CC people by definition will have the most contributions, so I've updated the "Recommended Organization" for all the core contributors I could identify.

@pomegranited
Copy link
Contributor

FYI I've created a github project to track these issues and ideas: https://github.com/orgs/edx/projects/6

Can people confirm they can edit those cards? I haven't converted any to proper issues yet, but I think that's what we have to do to allow comments.

@pomegranited
Copy link
Contributor

@e0d I've created #226 as the first issue to address, so we start working on data cleanup without having to have access to the edX Grimoire/SortingHat instance.

If anyone has suggestions or something specific they'd like to see out of that task, let me know?

CC @arbrandes @regisb @antoviaque

@antoviaque
Copy link
Contributor Author

@pomegranited Thank you! 👍

https://github.com/orgs/edx/projects/6 Can people confirm they can edit those cards?

I confirm that I can edit them yes.

@arbrandes
Copy link
Contributor

I confirm that I can edit them yes.

Same here.

@sarina
Copy link
Contributor

sarina commented May 17, 2024

Hi everyone, this issue hasn't been touched since June 2021. Was there any enthusiasm/capacity to pick up on this idea, or should we close the issue?

If we want to keep it I propose moving the issue to https://github.com/openedx/wg-coordination/issues since this issue doesn't pertain to an OEP.

@sarina sarina added the inactive PR author has been unresponsive for several months label May 17, 2024
@sarina
Copy link
Contributor

sarina commented May 17, 2024

Closing per #227 (comment)

@sarina sarina closed this as completed May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
inactive PR author has been unresponsive for several months
Projects
None yet
Development

No branches or pull requests

9 participants