Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure Csv delimiter #716

Merged
merged 7 commits into from
Mar 23, 2023
Merged

Configure Csv delimiter #716

merged 7 commits into from
Mar 23, 2023

Conversation

alallema
Copy link
Contributor

Add the ability to provide a CSV delimiter for adding and updating CSV documents as in the specification

SDK requirements: meilisearch/integration-guides#251

@alallema alallema marked this pull request as ready for review March 21, 2023 10:41
@alallema alallema requested a review from brunoocasali March 21, 2023 10:41
meilisearch/index.py Outdated Show resolved Hide resolved
meilisearch/index.py Show resolved Hide resolved
meilisearch/index.py Outdated Show resolved Hide resolved
meilisearch/index.py Show resolved Hide resolved
meilisearch/index.py Show resolved Hide resolved
alallema and others added 2 commits March 21, 2023 14:46
Co-authored-by: Paul Sanders <psanders1@gmail.com>
Co-authored-by: Paul Sanders <psanders1@gmail.com>
Copy link
Member

@brunoocasali brunoocasali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments, if you apply the test ones please apply to both of the test cases :)

@@ -0,0 +1,54 @@
id;title;album;artist;genre;country;released;duration;released-timestamp;duration-float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could reduce this csv file to a few lines :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! But thank @sanders41, the tests have been missing from the package build since #367. In case you wonder

Comment on lines +1549 to +1554
parameters = {}
if primary_key:
parameters["primaryKey"] = primary_key
if csv_delimiter:
parameters["csvDelimiter"] = csv_delimiter
if primary_key is None and csv_delimiter is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that much to see these if conditions because they usually hide a code that will grow indefinitely, WDYT about this?

Suggested change
parameters = {}
if primary_key:
parameters["primaryKey"] = primary_key
if csv_delimiter:
parameters["csvDelimiter"] = csv_delimiter
if primary_key is None and csv_delimiter is None:
parameters = { "csvDelimiter": csv_delimiter, "primaryKey": primary_key }
parameters = dict((k, v) for k, v in parameters.items() if v)
if primary_key is None and csv_delimiter is None:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work, but I would argue it is less readable and not obvious what you are doing. If you do go this direction you could clean it up some:

parameters = {k: v for k, v in parameters.items() if v}

With either direction it could also be worth it to short circuit so parameters don't get created and run if not needed:

    def _build_url(
        self,
        primary_key: Optional[str] = None,
        csv_delimiter: Optional[str] = None,
    ) -> str:
        if primary_key is None and csv_delimiter is None:
            return f"{self.config.paths.index}/{self.uid}/{self.config.paths.document}"

           parameters = { "csvDelimiter": csv_delimiter, "primaryKey": primary_key }
           parameters = {k: v for k, v in parameters.items() if v}

        return f"{self.config.paths.index}/{self.uid}/{self.config.paths.document}?{parse.urlencode(parameters)}"

Copy link
Contributor Author

@alallema alallema Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with sanders41. In general, I prefer to avoid using the magic one-liners of Python to avoid complex understanding.
And we should not have to add a new one. But if we do, I will change it for your code!

tests/conftest.py Outdated Show resolved Hide resolved
index = empty_index("csv-delimiter")
response = index.add_documents_csv(songs_csv_custom_separator, csv_delimiter=";")
assert isinstance(response, TaskInfo)
assert response.task_uid is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, but all the other tests are like that, so I find it weird to remove it for this one.

assert isinstance(response, TaskInfo)
assert response.task_uid is not None
task = index.wait_for_task(response.task_uid)
assert task.status == "succeeded"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the task status, if the wait did not work your test will fail anyway right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible for the task to complete, but the status to be failed right? I think this is what she is testing, that it was successful?

Copy link
Member

@brunoocasali brunoocasali Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the idea, and my point is the assertion has no practical value because if the task fails, the assert task.details["receivedDocuments"] == 53 will fail and also the following assertions after get_documents.

We have 6 assertions in this test case, but only the last two are checking what the use case wants to assert (+ the receivedDocuments assertion), which is to verify if the documents were indexed properly.

I know test code is supposed to be explicit instead of implicit, but in this case, I struggle to find good reasons to keep those assertions everywhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you read this:

def test_add_documents_csv_with_delimiter(empty_index, songs_csv_custom_separator):
    """Tests adding new documents to a clean index."""

    index = empty_index("csv-delimiter")
    response = index.add_documents_csv(songs_csv_custom_separator, csv_delimiter=";")
    task = index.wait_for_task(response.task_uid)
    assert task.details["receivedDocuments"] == 53

    documents = index.get_documents().results
    assert documents[1].country == "Europe"
    assert documents[4].artist == "Elton John"

Could you let me know if you missed the removed assertions?

@alallema alallema merged commit 98e4751 into bump-meilisearch-v1.1.0 Mar 23, 2023
@alallema alallema deleted the csv-delimiter branch March 23, 2023 12:44
@alallema alallema added the enhancement New feature or request label Mar 27, 2023
bors bot added a commit that referenced this pull request Apr 3, 2023
709: Changes related to the next Meilisearch release (v1.1.0) r=bidoubiwa a=meili-bot

Related to this issue: meilisearch/integration-guides#251

This PR:
- gathers the changes related to the next Meilisearch release (v1.1.0) so that this package is ready when the official release is out.
- should pass the tests against the [latest pre-release of Meilisearch](https://github.com/meilisearch/meilisearch/releases).
- might eventually contain test failures until the Meilisearch v1.1.0 is out.

⚠️ This PR should NOT be merged until the next release of Meilisearch (v1.1.0) is out.

_This PR is auto-generated for the [pre-release week](https://github.com/meilisearch/integration-guides/blob/main/resources/pre-release-week.md) purpose._

Done:
- #714 
- #716

726: Update version for the next release (v0.26.0) r=bidoubiwa a=meili-bot

This version makes this package compatible with Meilisearch v1.1 🎉
Check out the changelog of [Meilisearch v1.1](https://github.com/meilisearch/meilisearch/releases/tag/v1.1.0) for more information on the changes.

## ⚠️ Breaking changes

* Change error names from MeiliSerach to Meilisearch (#720) `@sanders41`

## 🚀 Enhancements

- Add the ability to provide a specific `csv-delimiter` when adding and updating documents in CSV format (#716) `@alallema`
- New method `client.multi_search()` provides the possibility to make multiple requests at once (#714) `@alallema`

Example:
```python
  client.multi_search(
    [
      {'indexUid': 'movies', 'q': 'pooh', 'limit': 5},
      {'indexUid': 'movies', 'q': 'nemo', 'limit': 5},
      {'indexUid': 'movie_ratings', 'q': 'us'}
    ]
  )
```


Thanks again to `@alallema` and `@sanders41!` 🎉

Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com>
Co-authored-by: alallema <amelie@meilisearch.com>
Co-authored-by: Amélie <alallema@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants