Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Epic: repository metadata #43961

Open
4 of 6 tasks
camdencheek opened this issue Nov 4, 2022 · 3 comments
Open
4 of 6 tasks

Epic: repository metadata #43961

camdencheek opened this issue Nov 4, 2022 · 3 comments
Assignees
Labels

Comments

@camdencheek
Copy link
Member

camdencheek commented Nov 4, 2022

This epic captures the current state of the "repository metadata" feature that is currently implemented as an experimental feature.

Definition

The "repository metadata" feature umbrella refers to the ability to add user-defined metadata to repositories and use that metadata throughout Sourcegraph.

Current state

Currently, the following features are available on an experimental basis:

  • Add key:value pairs to a repository through the GraphQL API. Both the key and value are arbitrary strings.
  • Add tags to a repository, which is just implemented as a key:value pair with a null value.
  • Search repositories based on whether the repository has a given piece of metadata with the predicates repo:has(your_key:your_value) and repo:has.tag(your_tag).
  • Only site admins can tag repositories with metadata, and all repository metadata applies globally (no user-scoped metadata).
  • Metadata can only be added to repos. We have no concept of commit metadata, file metadata, etc.
  • Github topics have been ingested as tags on sourcegraph.com and I've done some light testing. Performance is great.
  • Repo metadata is stored in the database and used to filter repositories during repo resolution.
  • Documentation can be found here

Path to beta

These are the things I expect we'll want before we move this feature from experimental to beta:

Stretch goals

These are things that I don't think are strictly required for the feature to be considered complete and viable, but would be big value adds:

  • Automatic ingestion of Github topics as tags. This is the most commonly requested use case for repository metadata. We can build a script for this (I have a hacky one that I used to ingest tags for sourcegraph.com), but ideally this would go through the same flow as our normal repo fetching process.
  • Exposing repo metadata outside of search. I've heard this would be useful for batch changes and code insights.

Possible future goals

These are things that would be cool, but are difficult or provide unclear value and would need to be investigated more before committing:

  • Non-string metadata. E.g. repo:has(stars > 1000) or repo:has(created.at < 1 year ago). This would require some serious search language design work and also likely some big indexing challenges.
  • Regex search over metadata. E.g. repo:has(team:/search-.*/) maybe. Again, there would be some language design considerations here, and part of why performance is great on the scale of sourcegraph.com is that we do strict equality. It's also unclear whether this would actually be valuable enough to justify.
  • More automated metadata ingestion. Things like repo:has(forked-from:github.com/sourcegraph/sourcegraph) or repo:has(committer:camdencheek). Most of these things are probably better as decoupled ingestion scripts, but it's possible there are some things we would like to do for our customers automatically.
@camdencheek
Copy link
Member Author

camdencheek commented Nov 4, 2022

The plan is for @tbliu98 to take over ownership of this feature from here. This is my attempt to consolidate some context in one place to make that easier.

@superhsu @malomarrec @Joelkw @mike-r-mclaughlin (people I remember having engaged with about this feature): Please take a look at this issue, specifically the "Path to beta" and "Stretch goals" sections. This issue was created based on my perspective of what customers actually need, but you're all closer to the customers than I am 🙂 Any feedback about scope or features would be appreciated.

@camdencheek
Copy link
Member Author

The script I used to ingest github tags into sourcegraph.com is here. Note that the script will need to be updated to use the API to list repos. For sourcegraph.com, this was far too slow so I had to resort to a manual export of repositories. The API should work fine for most customer instances though.

@albertocavalcante
Copy link

Consider whether we want to allow non-site-admins to add metadata.

I would just add a +1 for user-scoped metadata

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants