Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lead time to each labelled query and avg lead time for a case #1003

Open
DmitryKey opened this issue Apr 16, 2024 · 4 comments
Open

Add lead time to each labelled query and avg lead time for a case #1003

DmitryKey opened this issue Apr 16, 2024 · 4 comments

Comments

@DmitryKey
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Usually the process in our labelling projects starts like this:

  1. Choose the labelling objective, set up a test project.
  2. Distribute tasks in a group of search experts (we use Excel for coordinating which cases are taken by whom).
  3. Label, learn-rinse-and-repeat, formulate labelling instructions.

Then, we proceed with scaling out the labelling process by involving a larger set of people (who might be domain, but not search, experts).

At this point, knowing what is the avg lead time per unit of work (a query), we know, how much of workforce to request to reach a specific deadline.

Describe the solution you'd like
A lead time is recorded on query level and is rolled up to the case level.
The lead time can be accessed via Notebook feature to perform analytics, like distribution of lead times per annotator.

Describe alternatives you've considered
Recording this manually, but this is not accurate, plus increases the complexity of labelling.

Additional context
Label Studio (https://github.com/HumanSignal/label-studio) offers a lead time feature on task level. The UI is written in react (I believe), so probably, there is a way to adopt and adapt the UI logic for Quepid.

@epugh
Copy link
Member

epugh commented Apr 29, 2024

@DmitryKey Have you looked at the Books infrastructure yet? YOu can merge various books together into new books, and then use that to populate a Case... There might be some nice operational things in there.

In terms of lead time, I'm wondering if the existing update_time and create_time fields that we create for the Judgement objects and QueryDocPair object in the database would help you intuit this? I could imagine using the python noteobok, and calling some api's to get the data, and then do some graph/charting to predict how long?

@epugh
Copy link
Member

epugh commented Apr 29, 2024

@DmitryKey I am going to be Europe from May 1 to May 8th, so we could pair on this analysis together if you want.

Right now on the home page we have this messaging:

image

What if we could predict how long till all ratings are done?

@DmitryKey
Copy link
Contributor Author

Hey @epugh !
Great to hear that, let's be in contact regarding the pair up - would love to!

I think update_time and create_time fields are a good start. Some thoughts:

  1. What is the definition of done for rating a query? Is it that all documents were rated? Or a period of inactivity?
  2. If an annotator revisits a particular query / document, does it mean, that we should update the lead time with the observed time it took?

Predicting ETA for ratings - is a fantastic feature to have. It could start with a simple retrospective prediction ("so far it took this long per query / case, so we can make a prediction"). However, this may vary per annotator, but should probably converge on average across all annotators.

@epugh
Copy link
Member

epugh commented Dec 16, 2024

I was thinking about looking more into this.... Had some discussions about "Workflow" in Quepid, and routing different types of docs to specific judges etc...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants