Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

As a programme specialist / data scientist, I want to know that I can trust this data #2

Open
binkymilk opened this issue Mar 6, 2023 · 8 comments
Assignees

Comments

@binkymilk
Copy link
Contributor

What I want:

As a programme specialist / data scientist, I want to know that I can trust this data

I will know that this works when

  • It's clear who created this dataset
  • It's clear when this dataset was created
  • It's clear how this dataset was created
  • It's clear how accurate this data is (When compared against another dataset, or the validation / holdout dataset)
  • It's clear where the training data / underlying data came from
  • It's clear who else has used this data
  • There's some supporting material (A peer reviewed paper, an endorsement from another reputable organisation)
@binkymilk binkymilk converted this from a draft issue Mar 6, 2023
@binkymilk
Copy link
Contributor Author

Questions:

  • For supporting materials, are these links to papers or press releases?
  • Are there any other info we want to add both to the model info and the dataset info?

Notes:

  • Include structure / schema of the datasets here

@butchtm
Copy link
Collaborator

butchtm commented Mar 6, 2023

Fields to include in the catalog card:

  • who created this item
  • when this item was created
  • how this item was created - description? link for more documentation?
  • accuracy/metrics - applicable for models only?
  • training data source - description? links?
  • who else used the data - requires tracking of usage? alternatively, self-reported usage. e.g. citations
  • Supporting material - RRL (Related Research Literature), peer reviewed paper, endorsements from other reputable organizations)

@butchtm
Copy link
Collaborator

butchtm commented Mar 6, 2023

Notes:

  • Included in the catalog card page
  • Included in the catalog json contents

@ghost ghost moved this from Must to Backlog in unicef-ai4d-research-bank-project-planner Mar 8, 2023
@ghost ghost assigned binkymilk Mar 8, 2023
@ghost ghost moved this from Backlog to To Do in unicef-ai4d-research-bank-project-planner Mar 8, 2023
@ghost ghost moved this from To Do to Sprint 1 Milestones in unicef-ai4d-research-bank-project-planner Mar 8, 2023
@butchtm
Copy link
Collaborator

butchtm commented Mar 20, 2023

@butchtm
Copy link
Collaborator

butchtm commented Mar 20, 2023

  • Data preview - show hxl tags
  • suggested hxl tags: location, adm1,2,3,4, name,

@butchtm butchtm moved this from Sprint 1 Milestones to Sprint 2 Backlog in unicef-ai4d-research-bank-project-planner Mar 20, 2023
@ghost ghost moved this from Sprint 2 Backlog to Sprint 2 Milestones in unicef-ai4d-research-bank-project-planner Mar 21, 2023
This was referenced Apr 3, 2023
@AnthonyMockler
Copy link

Quick User Test from Kath:

I will know this works when:

• It is clear who created this dataset
o Is the organization the “creator” of the dataset?
• It is clear when this dataset was created
o Is it the Year/Period or the Date Created?
• It is clear how this dataset was created
o Not sure if I just missed it but couldn’t find the specific information on how this dataset was created
• It is clear how accurate this data is
o There’s an evaluation metric and there’s a link to a paper
• It is clear where the training data/underlying data came from
o For this specific model, there are details for the training dataset – however, I am unsure if the requirement is to specify the data source of the training dataset?
• It is clear who else has used this data
o Not sure if I just missed it but couldn’t find the information
• There’s some supporting material
o I believe this should be under “Related Links” but I only see links to resources for the model, not really supporting materials

What I want:
As a data scientist / data analyst / programme specialist, I want to see a list of all models in the catalogue

I will know this works when:

• There’s a web interface to provide me with a list of all the models / datasets in the catalogue
o I think the search catalogue was easy to find and gave me exactly what I was looking for
• The list is structured in some useful way
o Might be helpful to have some option to sort by Year? Alphabetically?
• There’s sufficient metadata about the models for me to filter / choose the one I’m interested in
o The current metadata and filters seem to be enough

butchtm pushed a commit to butchtm/unicef-ai4d-research-bank that referenced this issue Jun 5, 2023
tm-ger-chagas added a commit that referenced this issue Jun 21, 2023
@ghost
Copy link

ghost commented Jul 20, 2023

We've removed Citations or who used this data as a feature on June 22

@ghost ghost closed this as completed Jul 20, 2023
@github-project-automation github-project-automation bot moved this from Sprint 8 Milestones to Wont Do in unicef-ai4d-research-bank-project-planner Jul 20, 2023
@ghost ghost reopened this Jul 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants