Configure Csv delimiter #716

alallema · 2023-03-20T17:43:45Z

Add the ability to provide a CSV delimiter for adding and updating CSV documents as in the specification

SDK requirements: meilisearch/integration-guides#251

tests/index/test_index_document_meilisearch.py

meilisearch/index.py

Co-authored-by: Paul Sanders <psanders1@gmail.com>

brunoocasali

I left some comments, if you apply the test ones please apply to both of the test cases :)

brunoocasali · 2023-03-22T18:13:13Z

datasets/songs_custom_delimiter.csv

@@ -0,0 +1,54 @@
+id;title;album;artist;genre;country;released;duration;released-timestamp;duration-float


I think we could reduce this csv file to a few lines :)

Yes! But thank @sanders41, the tests have been missing from the package build since #367. In case you wonder

brunoocasali · 2023-03-22T18:18:53Z

meilisearch/index.py

+        parameters = {}
+        if primary_key:
+            parameters["primaryKey"] = primary_key
+        if csv_delimiter:
+            parameters["csvDelimiter"] = csv_delimiter
+        if primary_key is None and csv_delimiter is None:


I don't like that much to see these if conditions because they usually hide a code that will grow indefinitely, WDYT about this?

Suggested change

parameters = {}

if primary_key:

parameters["primaryKey"] = primary_key

if csv_delimiter:

parameters["csvDelimiter"] = csv_delimiter

if primary_key is None and csv_delimiter is None:

parameters = { "csvDelimiter": csv_delimiter, "primaryKey": primary_key }

parameters = dict((k, v) for k, v in parameters.items() if v)

if primary_key is None and csv_delimiter is None:

This will work, but I would argue it is less readable and not obvious what you are doing. If you do go this direction you could clean it up some:

parameters = {k: v for k, v in parameters.items() if v}

With either direction it could also be worth it to short circuit so parameters don't get created and run if not needed:

def _build_url( self, primary_key: Optional[str] = None, csv_delimiter: Optional[str] = None, ) -> str: if primary_key is None and csv_delimiter is None: return f"{self.config.paths.index}/{self.uid}/{self.config.paths.document}" parameters = { "csvDelimiter": csv_delimiter, "primaryKey": primary_key } parameters = {k: v for k, v in parameters.items() if v} return f"{self.config.paths.index}/{self.uid}/{self.config.paths.document}?{parse.urlencode(parameters)}"

I agree with sanders41. In general, I prefer to avoid using the magic one-liners of Python to avoid complex understanding.
And we should not have to add a new one. But if we do, I will change it for your code!

tests/conftest.py

brunoocasali · 2023-03-22T18:22:48Z

tests/index/test_index_document_meilisearch.py

+    index = empty_index("csv-delimiter")
+    response = index.add_documents_csv(songs_csv_custom_separator, csv_delimiter=";")
+    assert isinstance(response, TaskInfo)
+    assert response.task_uid is not None


Is this really needed?

Not really, but all the other tests are like that, so I find it weird to remove it for this one.

brunoocasali · 2023-03-22T18:23:23Z

tests/index/test_index_document_meilisearch.py

+    assert isinstance(response, TaskInfo)
+    assert response.task_uid is not None
+    task = index.wait_for_task(response.task_uid)
+    assert task.status == "succeeded"


Also the task status, if the wait did not work your test will fail anyway right?

It's possible for the task to complete, but the status to be failed right? I think this is what she is testing, that it was successful?

I got the idea, and my point is the assertion has no practical value because if the task fails, the assert task.details["receivedDocuments"] == 53 will fail and also the following assertions after get_documents.

We have 6 assertions in this test case, but only the last two are checking what the use case wants to assert (+ the receivedDocuments assertion), which is to verify if the documents were indexed properly.

I know test code is supposed to be explicit instead of implicit, but in this case, I struggle to find good reasons to keep those assertions everywhere.

When you read this:

def test_add_documents_csv_with_delimiter(empty_index, songs_csv_custom_separator): """Tests adding new documents to a clean index.""" index = empty_index("csv-delimiter") response = index.add_documents_csv(songs_csv_custom_separator, csv_delimiter=";") task = index.wait_for_task(response.task_uid) assert task.details["receivedDocuments"] == 53 documents = index.get_documents().results assert documents[1].country == "Europe" assert documents[4].artist == "Elton John"

Could you let me know if you missed the removed assertions?

709: Changes related to the next Meilisearch release (v1.1.0) r=bidoubiwa a=meili-bot Related to this issue: meilisearch/integration-guides#251 This PR: - gathers the changes related to the next Meilisearch release (v1.1.0) so that this package is ready when the official release is out. - should pass the tests against the [latest pre-release of Meilisearch](https://github.com/meilisearch/meilisearch/releases). - might eventually contain test failures until the Meilisearch v1.1.0 is out. ⚠️ This PR should NOT be merged until the next release of Meilisearch (v1.1.0) is out. _This PR is auto-generated for the [pre-release week](https://github.com/meilisearch/integration-guides/blob/main/resources/pre-release-week.md) purpose._ Done: - #714 - #716 726: Update version for the next release (v0.26.0) r=bidoubiwa a=meili-bot This version makes this package compatible with Meilisearch v1.1 🎉 Check out the changelog of [Meilisearch v1.1](https://github.com/meilisearch/meilisearch/releases/tag/v1.1.0) for more information on the changes. ## ⚠️ Breaking changes * Change error names from MeiliSerach to Meilisearch (#720) `@sanders41` ## 🚀 Enhancements - Add the ability to provide a specific `csv-delimiter` when adding and updating documents in CSV format (#716) `@alallema` - New method `client.multi_search()` provides the possibility to make multiple requests at once (#714) `@alallema` Example: ```python client.multi_search( [ {'indexUid': 'movies', 'q': 'pooh', 'limit': 5}, {'indexUid': 'movies', 'q': 'nemo', 'limit': 5}, {'indexUid': 'movie_ratings', 'q': 'us'} ] ) ``` Thanks again to `@alallema` and `@sanders41!` 🎉 Co-authored-by: meili-bot <74670311+meili-bot@users.noreply.github.com> Co-authored-by: alallema <amelie@meilisearch.com> Co-authored-by: Amélie <alallema@users.noreply.github.com>

Add csv delimiter

a9eae23

alallema marked this pull request as ready for review March 21, 2023 10:41

alallema requested a review from brunoocasali March 21, 2023 10:41

alallema mentioned this pull request Mar 21, 2023

Changes related to the next Meilisearch release (v1.1.0) #709

Merged

brunoocasali reviewed Mar 21, 2023

View reviewed changes

tests/index/test_index_document_meilisearch.py Show resolved Hide resolved

tests/index/test_index_document_meilisearch.py Show resolved Hide resolved

meilisearch/index.py Show resolved Hide resolved

Modify build_url method to handle csv_delimiter

bf25a80

sanders41 reviewed Mar 21, 2023

View reviewed changes

meilisearch/index.py Outdated Show resolved Hide resolved

meilisearch/index.py Show resolved Hide resolved

meilisearch/index.py Outdated Show resolved Hide resolved

meilisearch/index.py Show resolved Hide resolved

meilisearch/index.py Show resolved Hide resolved

alallema and others added 2 commits March 21, 2023 14:46

Update meilisearch/index.py

b1b9113

Co-authored-by: Paul Sanders <psanders1@gmail.com>

Update meilisearch/index.py

a221f94

Co-authored-by: Paul Sanders <psanders1@gmail.com>

sanders41 approved these changes Mar 21, 2023

View reviewed changes

Add check for content type if csv delimiter is used

e53b5f7

alallema force-pushed the csv-delimiter branch from 24065cb to e53b5f7 Compare March 21, 2023 14:14

alallema requested a review from brunoocasali March 21, 2023 14:42

alallema force-pushed the csv-delimiter branch from 25a1539 to b6739fd Compare March 21, 2023 14:54

Adapt the tests for csv delimiter

5e6feaf

alallema force-pushed the csv-delimiter branch from b6739fd to 5e6feaf Compare March 21, 2023 15:03

brunoocasali approved these changes Mar 22, 2023

View reviewed changes

Reduce dataset file

cb2212a

alallema force-pushed the csv-delimiter branch from 29fa41a to cb2212a Compare March 23, 2023 10:10

alallema merged commit 98e4751 into bump-meilisearch-v1.1.0 Mar 23, 2023

alallema deleted the csv-delimiter branch March 23, 2023 12:44

alallema added the enhancement New feature or request label Mar 27, 2023

alallema mentioned this pull request Mar 29, 2023

Update version for the next release (v0.26.0) #726

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure Csv delimiter #716

Configure Csv delimiter #716

alallema commented Mar 20, 2023

brunoocasali left a comment

brunoocasali Mar 22, 2023

alallema Mar 22, 2023

brunoocasali Mar 22, 2023

sanders41 Mar 22, 2023

alallema Mar 23, 2023 •

edited

Loading

brunoocasali Mar 22, 2023

alallema Mar 23, 2023

brunoocasali Mar 22, 2023

sanders41 Mar 22, 2023

brunoocasali Mar 24, 2023 •

edited

Loading

brunoocasali Mar 24, 2023

		@@ -0,0 +1,54 @@
		id;title;album;artist;genre;country;released;duration;released-timestamp;duration-float

Configure Csv delimiter #716

Configure Csv delimiter #716

Conversation

alallema commented Mar 20, 2023

brunoocasali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alallema Mar 23, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunoocasali Mar 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alallema Mar 23, 2023 •

edited

Loading

brunoocasali Mar 24, 2023 •

edited

Loading