Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] [REFACTOR] argilla-server: move search logic to search service #5174

Draft
wants to merge 55 commits into
base: develop
Choose a base branch
from

Conversation

frascuchon
Copy link
Member

@frascuchon frascuchon commented Jul 6, 2024

Description

This PR cleans and moves all logic from search endpoint handlers to a Search service.

Update

The code will probably be moved to under the search context module.

Type of change

  • Refactor (change restructuring the codebase without changing functionality)

How Has This Been Tested

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • I confirm My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

jfcalvo and others added 30 commits July 1, 2024 12:31
…5013)

# Description

This PR is the first one related with distribution task feature, adding
the following changes:
* Added `distribution` JSON column to `datasets` table:
* This column is non-nullable so a value is always required when a
dataset is created.
* By default old datasets will have the value `{"strategy": "overlap",
"min_submitted": 1}`.
* Added `distribution` attribute to `DatasetCreate` schema:
  * None is not a valid value.
* If no value is specified for this attribute
`DatasetOverlapDistributionCreate` with `min_submitted` to `1` is used.
* `DatasetOverlapDistributionCreate` only allows values greater or equal
than `1` for `min_submitted` attributed.
* Now the context `create_dataset` function is receiving a dictionary
instead of `DatasetCreate` schema.
* Moved dataset creation validations to a new `DatasetCreateValidator`
class.

Update of `distribution` attribute for datasets will be done in a
different issue.

Closes #5005 

**Type of change**

(Please delete options that are not relevant. Remember to title the PR
according to the type of change)

- [ ] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- [ ] Refactor (change restructuring the codebase without changing
functionality)
- [ ] Improvement (change adding some improvement to an existing
functionality)
- [ ] Documentation update

**How Has This Been Tested**

(Please describe the tests that you ran to verify your changes. And
ideally, reference `tests`)

- [x] Adding new tests and passing old ones.
- [x] Check that migration works as expected with old datasets and
SQLite.
- [x] Check that migration works as expected with old datasets and
PostgreSQL.

**Checklist**

- [ ] I added relevant documentation
- [ ] follows the style guidelines of this project
- [ ] I did a self-review of my code
- [ ] I made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK)
(see text above)
- [ ] I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Paco Aranda <francis@argilla.io>
…nly (#5148)

# Description

Add changes to `responses_submitted` relationship to avoid problems with
existent `responses` relationship and avoid a warning message that
SQLAlchemy was reporting.

Refs #5000 

**Type of change**

- Improvement (change adding some improvement to an existing
functionality)

**How Has This Been Tested**

- [x] Warning is not showing anymore.
- [x] Test are passing.

**Checklist**

- I added relevant documentation
- follows the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)
# Description

This PR adds changes to the endpoints to get the dataset progress and
current user metrics in the following way:

## `GET /datasets/:dataset_id/progress`

I have changed the endpoint to support the new business logic behind the
distribution task. Responding with only `completed` and `pending` type
of records and using `total` as the sum of the two types of records.

Old response without distribution task:

```json
{
  "total": 8,
  "submitted": 2,
  "discarded": 2,
  "conflicting": 1,
  "pending": 3
}
```

New response with the changes from this PR supporting distribution task:

* The `completed` attribute will have the count of all the records with
status as `completed` for the dataset.
* The `pending` attribute will have the count of all the records with
status as `pending` for the dataset.
* The `total` attribute will have the sum of the `completed` and
`pending` attributes.

```json
{
  "total": 5
  "completed": 2,
  "pending": 3,
}
```

@damianpumar some changes are required on the frontend to support this
new endpoint structure.

## `GET /me/datasets/:dataset_id/metrics`

Old response without distribution task:

```json
{
  "records": {
    "count": 7
  },
  "responses": {
    "count": 4,
    "submitted": 1,
    "discarded": 2,
    "draft": 1
  }
}
```

New response with the changes from this PR supporting distribution task:

* `records` section has been eliminated because is not necessary
anymore.
* `responses` `count` section has been renamed to `total`.
* `pending` section has been added to the `responses` section.

```json
{
  "responses": {
    "total": 7,
    "submitted": 1,
    "discarded": 2,
    "draft": 1,
    "pending": 3
  }
}
```

The logic behind these attributes is the following:
* `total` is the sum of `submitted`, `discarded`, `draft` and `pending`
attribute values.
* `submitted` is the count of all responses belonging to the current
user in the specified dataset with `submitted` status.
* `discarded` is the count of all responses belonging to the current
user in the specified dataset with `discarded` status.
* `draft` is the count of all responses belonging to the current user in
the specified dataset with `draft` status.
* `pending` is the count of all records with `pending` status for the
dataset that has not responses belonging to the current user.

@damianpumar some changes are required on the frontend to support this
new endpoint structure as well.

Closes #5139 

**Type of change**

- Breaking change (fix or feature that would cause existing
functionality to not work as expected)

**How Has This Been Tested**

- [x] Modifying existent tests.
- [x] Running test suite with SQLite and PostgreSQL.

**Checklist**

- I added relevant documentation
- follows the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Paco Aranda <francis@argilla.io>
Co-authored-by: Damián Pumar <damianpumar@gmail.com>
#5153)

<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

This PR is the first of a series of PRs for cleaning the
listing-records-related endpoints.

This PR removes the `GET /api/v1/me/datasets/:dataset_id/records`
endpoint since the only client was the frontend application and now is
using the equivalent search endpoint.

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- Improvement (change adding some improvement to an existing
functionality)

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- follows the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)
…aram (#5156)

# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

> [!NOTE]
> This PR must be merged after
#5153

This PR removes support for filtering using metadata as a query param:

-  This filter is not available anymore for list endpoints
- The metadata filter can be defined as part of the request body for
search filters.

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- Refactor (change restructuring the codebase without changing
functionality)

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- follows the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)
…upport (#5163)

# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

> [!NOTE]
> This PR must be merged after
#5156

This PR removes support for filtering records with response_status query
param:

- This filter is removed for listing records endpoints
- The response status filter is available for search endpoints using the
filter request body.

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- Refactor (change restructuring the codebase without changing
functionality)

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: José Francisco Calvo <jose@argilla.io>
…notated datasets (#5171)

# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

This PR changes the current validator when updating the distribution
task to allow updating the distribution task settings for datasets with
records without ANY response.

cc @nataliaElv 

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Improvement (change adding some improvement to an existing
functionality)

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)
…5166)

# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

This PR removes support of `sort_by` query param for list/search records
endpoints.

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Breaking change (fix or feature that would cause existing
functionality to not work as expected)
- Refactor (change restructuring the codebase without changing
functionality)

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)
@frascuchon frascuchon changed the base branch from refactor/argilla-server/move-search-validator-to-proper-module to refactor/argilla-server/list-records-endpoint-using-db July 19, 2024 06:59
Copy link

codecov bot commented Jul 19, 2024

Codecov Report

Attention: Patch coverage is 98.43750% with 3 lines in your changes missing coverage. Please review.

Project coverage is 91.09%. Comparing base (bca45ff) to head (edc0fe0).

Files Patch % Lines
...gilla-server/src/argilla_server/services/search.py 97.72% 2 Missing ⚠️
...lla-server/src/argilla_server/validators/search.py 98.41% 1 Missing ⚠️
Additional details and impacted files
@@                                    Coverage Diff                                     @@
##           refactor/argilla-server/list-records-endpoint-using-db    #5174      +/-   ##
==========================================================================================
- Coverage                                                   91.22%   91.09%   -0.14%     
==========================================================================================
  Files                                                         142      144       +2     
  Lines                                                        5746     5793      +47     
==========================================================================================
+ Hits                                                         5242     5277      +35     
- Misses                                                        504      516      +12     
Flag Coverage Δ
argilla-server 91.09% <98.43%> (-0.14%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@frascuchon frascuchon added this to the v2.1.0 milestone Jul 19, 2024
@frascuchon frascuchon added the type: refactor Indicates internal refactoring of the code-base label Jul 19, 2024
@frascuchon frascuchon marked this pull request as ready for review July 25, 2024 07:23
@frascuchon frascuchon requested a review from jfcalvo July 25, 2024 07:23
@frascuchon frascuchon marked this pull request as draft July 25, 2024 07:23
@frascuchon frascuchon modified the milestones: v2.1.0, v2.2.0 Sep 3, 2024
@frascuchon frascuchon changed the title [REFACTOR] argilla-server: move search logic to search service [DRAFT] [REFACTOR] argilla-server: move search logic to search service Sep 9, 2024
@jfcalvo jfcalvo modified the milestones: v2.2.0, v2.3.0 Sep 18, 2024
Base automatically changed from refactor/argilla-server/list-records-endpoint-using-db to develop September 23, 2024 14:40
@frascuchon frascuchon removed this from the v2.3.0 milestone Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: refactor Indicates internal refactoring of the code-base
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants