Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Adds German compound words decomposition with new segmenter #303

Merged
merged 16 commits into from
Sep 10, 2024

Conversation

luflow
Copy link
Contributor

@luflow luflow commented Aug 9, 2024

Pull Request

What does this PR do?

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

@luflow
Copy link
Contributor Author

luflow commented Aug 9, 2024

I assume this could be a very expensive algorithm because all word lengths are checked against the dict?

Not sure if there is a better solution, but at least a first version for compound words :)

@luflow
Copy link
Contributor Author

luflow commented Aug 10, 2024

Also another open question: can we even use the dictionary?

The orignal author has it under GNU GPL
https://github.com/uschindler/german-decompounder/blob/master/NOTICE.txt

@curquiza curquiza requested a review from ManyTheFish August 12, 2024 08:08
@luflow
Copy link
Contributor Author

luflow commented Aug 12, 2024

@curquiza @ManyTheFish fixed the fmt and clippy issues, Please rerun

Copy link
Member

@ManyTheFish ManyTheFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @luflow,

Could you add a feature flag on your implementation as I suggested, please? Then add it as a default feature in the Cargo.toml file.

In terms of implementation, you chose to rely on an HashSet to split your words, but I don't think it's the best approach.
I highly suggest using an FstSegmenter like in the Thai tokenizer, it's a bit complex to build but way more efficient in time and space, or you could use an AhoCorasick automaton using the LeftmostLongest match kind.

Sorry for the delays!
Let me know if you have a question

charabia/src/segmenter/mod.rs Show resolved Hide resolved
charabia/src/segmenter/mod.rs Show resolved Hide resolved
charabia/src/segmenter/mod.rs Outdated Show resolved Hide resolved
@luflow
Copy link
Contributor Author

luflow commented Aug 27, 2024

Hi @ManyTheFish!

Do you have any instructions to build the fst file? I could not find any material online - especially because FST is also used in other contexts like R but does something totally different 🤣

Otherwise the leftmostmatch functionality also works with a word dictionary if i understand it correctly?

@ManyTheFish
Copy link
Member

Do you have any instructions to build the fst file? I could not find any material online - especially because FST is also used in other contexts like R but does something totally different 🤣

You can use the CLI fst-bin to build your dictionary from a source file. 😄

Otherwise the leftmostmatch functionality also works with a word dictionary if i understand it correctly?

Yes you can build it from an iterator over str, so it's convenient

@luflow luflow requested a review from ManyTheFish August 28, 2024 11:16
@luflow
Copy link
Contributor Author

luflow commented Aug 28, 2024

@ManyTheFish I extended the FstSegmenter with two options to also be able to handle a min lemma length and being able to hinder the segmenter from spitting out single letters. That keeps my dictionary even smaller and may be also useful for other languages later?

The dictionary is now also transformed into an FST file.

Let me know what you think :)

@luflow
Copy link
Contributor Author

luflow commented Sep 7, 2024

@ManyTheFish dud you find time yet to look over the changes? Do you need anything else from my side? :)

ManyTheFish
ManyTheFish previously approved these changes Sep 9, 2024
Copy link
Member

@ManyTheFish ManyTheFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @luflow,
sorry for the delay, LGTM!

bors merge

meili-bors bot added a commit that referenced this pull request Sep 9, 2024
303: feat: Adds German compound words decomposition with new segmenter r=ManyTheFish a=luflow

# Pull Request

## What does this PR do?
- Adds first version of decomposition for german compound words based on a dictionary (based on https://github.com/uschindler/german-decompounder/)
- Adds benchmark with german sentences

## PR checklist
Please check if your PR fulfills the following requirements:
- [X] Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
- [X] Have you read the contributing guidelines?
- [X] Have you made sure that the title is accurate and descriptive of the changes?


Co-authored-by: Florian Ludwig <florian.ludwig@uninow.de>
Co-authored-by: Florian Ludwig <florian@krautnerds.de>
Copy link
Contributor

meili-bors bot commented Sep 9, 2024

Build failed:

Co-authored-by: Many the fish <many@meilisearch.com>
@luflow
Copy link
Contributor Author

luflow commented Sep 9, 2024

@ManyTheFish ok applied suggestion :)

@luflow luflow requested a review from ManyTheFish September 9, 2024 09:54
@ManyTheFish
Copy link
Member

Hello @luflow,

the test and clippy are not happy,

could you ensure that:

  • cargo clippy
  • cargo test

work on your machine please?

I'll merge as soon as the tests pass 😃

@luflow
Copy link
Contributor Author

luflow commented Sep 9, 2024

@ManyTheFish done 👍🏻

Copy link
Member

@ManyTheFish ManyTheFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Thank you for the contribution!

bors merge

Copy link
Contributor

meili-bors bot commented Sep 10, 2024

Build succeeded:

@meili-bors meili-bors bot merged commit 38b8529 into meilisearch:main Sep 10, 2024
4 checks passed
@luflow luflow deleted the feature/german-compound-words branch September 10, 2024 20:54
spiceratops added a commit to spiceratops/k8s-gitops that referenced this pull request Jan 5, 2025
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[getmeili/meilisearch](https://redirect.github.com/meilisearch/meilisearch)
| minor | `v1.10.3` -> `v1.12.1` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>meilisearch/meilisearch (getmeili/meilisearch)</summary>

###
[`v1.12.1`](https://redirect.github.com/meilisearch/meilisearch/releases/tag/v1.12.1)

[Compare
Source](https://redirect.github.com/meilisearch/meilisearch/compare/v1.12.0...v1.12.1)

#### Fixes

There was a bug in the engine when adding an empty payload, it was
making the batch fails.
Fixed by [@&#8203;irevoire](https://redirect.github.com/irevoire) in
[https://github.com/meilisearch/meilisearch/pull/5192](https://redirect.github.com/meilisearch/meilisearch/pull/5192)

**Full Changelog**:
meilisearch/meilisearch@v1.12.0...v1.12.1

###
[`v1.12.0`](https://redirect.github.com/meilisearch/meilisearch/releases/tag/v1.12.0):
🦗

[Compare
Source](https://redirect.github.com/meilisearch/meilisearch/compare/v1.11.3...v1.12.0)

Meilisearch v1.12 introduces significant indexing speed improvements,
almost halving the time required to index large datasets. This release
also introduces new settings to customize and potentially further
increase indexing speed.

🧰 All official Meilisearch integrations (including SDKs, clients, and
other tools) are compatible with this Meilisearch release. Integration
deployment happens between 4 to 48 hours after a new version becomes
available.

Some SDKs might not include all new features. Consult the project
repository for detailed information. Is a feature you need missing from
your chosen SDK? Create an issue letting us know you need it, or, for
open-source karma points, open a PR implementing it (we'll love you for
that ❤️).

### New features and updates 🔥

#### Improve indexing speed

Indexing time is improved across the board!

-   Performance is maintained or better on smaller machines
- On bigger machines with multiple cores and good IO, Meilisearch v1.12
is much faster than Meilisearch v1.11
    -   More than twice as fast for raw document insertion tasks.
- More than x4 as fast for incrementally updating documents in a large
database.
- Embeddings generation was also improved up to x1.5 for some workloads.

The new indexer also makes task cancellation faster.

Done by [@&#8203;dureuill](https://redirect.github.com/dureuill),
[@&#8203;ManyTheFish](https://redirect.github.com/ManyTheFish), and
[@&#8203;Kerollmops](https://redirect.github.com/Kerollmops) in
[#&#8203;4900](https://redirect.github.com/meilisearch/meilisearch/issues/4900).

#### New index settings: use `facetSearch` and `prefixSearch` to improve
indexing speed

v1.12 introduces two new index settings: `facetSearch` and
`prefixSearch`.

Both settings allow you to skip parts of the indexing process. This
leads to significant improvements to indexing speed, but may negatively
impact search experience in some use cases.

Done by [@&#8203;ManyTheFish](https://redirect.github.com/ManyTheFish)
in
[#&#8203;5091](https://redirect.github.com/meilisearch/meilisearch/issues/5091)

##### `facetSearch`

Use this setting to toggle [facet
search](https://www.meilisearch.com/docs/learn/filtering_and_sorting/search_with_facet_filters#searching-facet-values):

```bash
curl \
  -X PUT 'http://localhost:7700/indexes/books/settings/facet-search' \
  -H 'Content-Type: application/json' \
  --data-binary 'true'
```

The default value for `facetSearch` is `true`. When set to `false`, this
setting disables facet search for all filterable attributes in an index.

##### `prefixSearch`

Use this setting to configure the ability to [search a word by
prefix](https://www.meilisearch.com/docs/learn/engine/prefix) on an
index:

```bash
curl \
  -X PUT 'http://localhost:7700/indexes/books/settings/prefix-search' \
  -H 'Content-Type: application/json' \
  --data-binary 'disabled'
```

`prefixSearch` accepts one of the following values:

- `"indexingTime"`: enables prefix processing during indexing. This is
the default Meilisearch behavior
-   `"disabled"`: deactivates prefix search completely

Disabling prefix search means the query `he` will no longer match the
word `hello`. This may significantly impact search result relevancy, but
speeds up the indexing process.

#### New API route: `/batches`

The new `/batches` endpoint allow you to query information about task
batches.

`GET` `/batches` returns a list of batch objects:

```sh
curl  -X GET 'http://localhost:7700/batches'
```

This endpoint accepts the same parameters as `GET` `/tasks` route,
allowing you to narrow down which batches you want to see. Parameters
used with `GET` `/batches` apply to the tasks, not the batches
themselves. For example, `GET /batches?uid=0` returns batches containing
tasks with a `taskUid` of `0` , not batches with a `batchUid` of `0`.

You may also query `GET` `/batches/:uid` to retrieve information about a
single batch object:

```sh
curl  -X GET 'http://localhost:7700/batches/BATCH_UID'
```

`/batches/:uid` does not accept any parameters.

Batch objects contain the following fields:

```json5
{
  "uid": 160,
  "progress": {
    "steps": [
      {
        "currentStep": "processing tasks",
        "finished": 0,
        "total": 2
      },
      {
        "currentStep": "indexing",
        "finished": 2,
        "total": 3
      },
      {
        "currentStep": "extracting words",
        "finished": 3,
        "total": 13
      },
      {
        "currentStep": "document",
        "finished": 12300,
        "total": 19546
      }
    ],
    "percentage": 37.986263
  },
  "details": {
    "receivedDocuments": 19547,
    "indexedDocuments": null
  },
  "stats": {
    "totalNbTasks": 1,
    "status": {
      "processing": 1
    },
    "types": {
      "documentAdditionOrUpdate": 1
    },
    "indexUids": {
      "mieli": 1
    }
  },
  "duration": null,
  "startedAt": "2024-12-12T09:44:34.124726733Z",
  "finishedAt": null
}
```

Additionally, task objects now include a new field, `batchUid`. Use this
field together with `/batches/:uid` to retrieve data on a specific
batch.

```json5
{
  "uid": 154,
  "batchUid": 142,
  "indexUid": "movies_test2",
  "status": "succeeded",
  "type": "documentAdditionOrUpdate",
  "canceledBy": null,
  "details": {
    "receivedDocuments": 1,
    "indexedDocuments": 1
  },
  "error": null,
  "duration": "PT0.027766819S",
  "enqueuedAt": "2024-12-02T14:07:34.974430765Z",
  "startedAt": "2024-12-02T14:07:34.99021667Z",
  "finishedAt": "2024-12-02T14:07:35.017983489Z"
}
```

Done by [@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;5060](https://redirect.github.com/meilisearch/meilisearch/issues/5060),
[#&#8203;5070](https://redirect.github.com/meilisearch/meilisearch/issues/5070),
[#&#8203;5080](https://redirect.github.com/meilisearch/meilisearch/issues/5080)

#### Other improvements

- New query parameter for `GET` `/tasks`: `reverse`. If `reverse` is set
to `true`, tasks will be returned in reversed order, from oldest to
newest tasks. Done by
[@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;5048](https://redirect.github.com/meilisearch/meilisearch/issues/5048)
- Phrase searches with`showMatchesPosition` set to `true` give a single
location for the whole phrase
[@&#8203;flevi29](https://redirect.github.com/flevi29) in
[#&#8203;4928](https://redirect.github.com/meilisearch/meilisearch/issues/4928)
- New Prometheus metrics by
[@&#8203;PedroTurik](https://redirect.github.com/PedroTurik) in
[#&#8203;5044](https://redirect.github.com/meilisearch/meilisearch/issues/5044)
- When a query finds matching terms in document fields with array
values, Meilisearch now includes an `indices` field to
`_matchesPosition` specifying which array elements contain the matches
by
[@&#8203;LukasKalbertodt](https://redirect.github.com/LukasKalbertodt)
in
[#&#8203;5005](https://redirect.github.com/meilisearch/meilisearch/issues/5005)
- ⚠️ Breaking `vectorStore` change: field distribution no longer
contains `_vectors`. Its value used to be incorrect, and there is no
current use case for the fixed, most likely empty, value. Done as part
of
[#&#8203;4900](https://redirect.github.com/meilisearch/meilisearch/issues/4900)
- Improve error message by adding index name in
[#&#8203;5056](https://redirect.github.com/meilisearch/meilisearch/issues/5056)
by [@&#8203;airycanon](https://redirect.github.com/airycanon)

### Fixes 🐞

- Return appropriate error when primary key is greater than 512 bytes,
by [@&#8203;flevi29](https://redirect.github.com/flevi29) in
[#&#8203;4930](https://redirect.github.com/meilisearch/meilisearch/issues/4930)
- Fix issue where numbers were segmented in different ways depending on
tokenizer, by [@&#8203;dqkqd](https://redirect.github.com/dqkqd) in
[https://github.com/meilisearch/charabia/pull/311](https://redirect.github.com/meilisearch/charabia/pull/311)
- Fix pagination when embedding fails by
[@&#8203;dureuill](https://redirect.github.com/dureuill) in
[https://github.com/meilisearch/meilisearch/pull/5063](https://redirect.github.com/meilisearch/meilisearch/pull/5063)
- Fix issue causing Meilisearch to ignore stop words in some cases by
[@&#8203;ManyTheFish](https://redirect.github.com/ManyTheFish) in
[#&#8203;5062](https://redirect.github.com/meilisearch/meilisearch/issues/5062)
- Fix phrase search with `attributesToSearchOn` in
[#&#8203;5062](https://redirect.github.com/meilisearch/meilisearch/issues/5062)
by [@&#8203;ManyTheFish](https://redirect.github.com/ManyTheFish)

### Misc

-   Dependencies updates
- Update benchmarks to match the new crates subfolder by
[@&#8203;Kerollmops](https://redirect.github.com/Kerollmops) in
[#&#8203;5021](https://redirect.github.com/meilisearch/meilisearch/issues/5021)
- Fix the benchmarks by
[@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;5037](https://redirect.github.com/meilisearch/meilisearch/issues/5037)
- Bump Swatinem/rust-cache from 2.7.1 to 2.7.5 in
[#&#8203;5030](https://redirect.github.com/meilisearch/meilisearch/issues/5030)
- Update charabia v0.9.2 by
[@&#8203;ManyTheFish](https://redirect.github.com/ManyTheFish) in
[#&#8203;5098](https://redirect.github.com/meilisearch/meilisearch/issues/5098)
- Update mini-dashboard to v0.2.16 version by
[@&#8203;curquiza](https://redirect.github.com/curquiza) in
[#&#8203;5102](https://redirect.github.com/meilisearch/meilisearch/issues/5102)
-   CIs and tests
- Improve performance of `delete_index.rs` by
[@&#8203;DerTimonius](https://redirect.github.com/DerTimonius) in
[#&#8203;4963](https://redirect.github.com/meilisearch/meilisearch/issues/4963)
- Improve performance of `create_index.rs` by
[@&#8203;DerTimonius](https://redirect.github.com/DerTimonius) in
[#&#8203;4962](https://redirect.github.com/meilisearch/meilisearch/issues/4962)
- Improve performance of `get_documents.rs` by
[@&#8203;PedroTurik](https://redirect.github.com/PedroTurik) in
[#&#8203;5025](https://redirect.github.com/meilisearch/meilisearch/issues/5025)
- Improve performance of `formatted.rs` by
[@&#8203;PedroTurik](https://redirect.github.com/PedroTurik) in
[#&#8203;5043](https://redirect.github.com/meilisearch/meilisearch/issues/5043)
- Fix the path used in the flaky tests CI by
[@&#8203;Kerollmops](https://redirect.github.com/Kerollmops) in
[#&#8203;5049](https://redirect.github.com/meilisearch/meilisearch/issues/5049)
-   Misc
- Rollback the Meilisearch Kawaii logo by
[@&#8203;Kerollmops](https://redirect.github.com/Kerollmops) in
[#&#8203;5017](https://redirect.github.com/meilisearch/meilisearch/issues/5017)
- Add image source label to Dockerfile by
[@&#8203;wuast94](https://redirect.github.com/wuast94) in
[#&#8203;4990](https://redirect.github.com/meilisearch/meilisearch/issues/4990)
- Hide code complexity into a subfolder by
[@&#8203;Kerollmops](https://redirect.github.com/Kerollmops) in
[#&#8203;5016](https://redirect.github.com/meilisearch/meilisearch/issues/5016)
- Internal tool: implement offline upgrade from v1.10 to v1.11 by
[@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;5034](https://redirect.github.com/meilisearch/meilisearch/issues/5034)
- Internal tool: implement offline upgrade from v1.11 to v1.12 by
[@&#8203;ManyTheFish](https://redirect.github.com/ManyTheFish) in
[#&#8203;5146](https://redirect.github.com/meilisearch/meilisearch/issues/5146)
- Meilisearch is now able to retrieve Katakana words from a Hiragana
query by [@&#8203;tats-u](https://redirect.github.com/tats-u) in
[https://github.com/meilisearch/charabia/pull/312](https://redirect.github.com/meilisearch/charabia/pull/312)
- Improve error handling when writing into LMDB by
[@&#8203;Kerollmops](https://redirect.github.com/Kerollmops) in
[https://github.com/meilisearch/meilisearch/pull/5089](https://redirect.github.com/meilisearch/meilisearch/pull/5089)

❤️ Thanks again to our external contributors:

- [Meilisearch](https://redirect.github.com/meilisearch/meilisearch):
[@&#8203;airycanon](https://redirect.github.com/airycanon),
[@&#8203;DerTimonius](https://redirect.github.com/DerTimonius),
[@&#8203;flevi29](https://redirect.github.com/flevi29),
[@&#8203;LukasKalbertodt](https://redirect.github.com/LukasKalbertodt),
[@&#8203;PedroTurik](https://redirect.github.com/PedroTurik),
[@&#8203;wuast94](https://redirect.github.com/wuast94)
- [Charabia](https://redirect.github.com/meilisearch/charabia):
[@&#8203;dqkqd](https://redirect.github.com/dqkqd)
[@&#8203;tats-u](https://redirect.github.com/tats-u)

###
[`v1.11.3`](https://redirect.github.com/meilisearch/meilisearch/releases/tag/v1.11.3):
🐿️

[Compare
Source](https://redirect.github.com/meilisearch/meilisearch/compare/v1.11.2...v1.11.3)

#### What's Changed

- For REST/OpenAI/ollama autoembedders users: Retry if deserialization
of remote response failed by
[@&#8203;dureuill](https://redirect.github.com/dureuill) in
[https://github.com/meilisearch/meilisearch/pull/5058](https://redirect.github.com/meilisearch/meilisearch/pull/5058)

**Full Changelog**:
meilisearch/meilisearch@v1.11.2...v1.11.3

###
[`v1.11.2`](https://redirect.github.com/meilisearch/meilisearch/releases/tag/v1.11.2):
🐿️

[Compare
Source](https://redirect.github.com/meilisearch/meilisearch/compare/v1.11.1...v1.11.2)

#### What's Changed

- Add timeout on read and write operations. by
[@&#8203;dureuill](https://redirect.github.com/dureuill) in
[https://github.com/meilisearch/meilisearch/pull/5051](https://redirect.github.com/meilisearch/meilisearch/pull/5051)

**Full Changelog**:
meilisearch/meilisearch@v1.11.1...v1.11.2

###
[`v1.11.1`](https://redirect.github.com/meilisearch/meilisearch/releases/tag/v1.11.1):
🐿️

[Compare
Source](https://redirect.github.com/meilisearch/meilisearch/compare/v1.11.0...v1.11.1)

#### What's Changed

- Add 3s timeout to embedding requests made during search by
[@&#8203;dureuill](https://redirect.github.com/dureuill) in
[https://github.com/meilisearch/meilisearch/pull/5039](https://redirect.github.com/meilisearch/meilisearch/pull/5039)

**Full Changelog**:
meilisearch/meilisearch@v1.11.0...v1.11.1

###
[`v1.11.0`](https://redirect.github.com/meilisearch/meilisearch/releases/tag/v1.11.0):
🐿️

[Compare
Source](https://redirect.github.com/meilisearch/meilisearch/compare/v1.10.3...v1.11.0)

Meilisearch v1.11 introduces AI-powered search performance improvements
thanks to binary quantization and various usage changes, all of which
are steps towards a future stabilization of the feature. We have also
improved federated search usage following user feedback.

🧰 All official Meilisearch integrations (including SDKs, clients, and
other tools) are compatible with this Meilisearch release. Integration
deployment happens between 4 to 48 hours after a new version becomes
available.

Some SDKs might not include all new features. Consult the project
repository for detailed information. Is a feature you need missing from
your chosen SDK? Create an issue letting us know you need it, or, for
open-source karma points, open a PR implementing it (we'll love you for
that ❤️).

### New features and updates 🔥

#### Experimental - AI-powered search improvements

This release is Meilisearch's first step towards stabilizing AI-powered
search and introduces a few breaking changes to its API. [Consult the
PRD for full usage
details.](https://www.notion.so/meilisearch/v1-11-AI-search-changes-0e37727193884a70999f254fa953ce6e)

Done by [@&#8203;dureuill](https://redirect.github.com/dureuill) in
[#&#8203;4906](https://redirect.github.com/meilisearch/meilisearch/issues/4906),
[#&#8203;4920](https://redirect.github.com/meilisearch/meilisearch/issues/4920),
[#&#8203;4892](https://redirect.github.com/meilisearch/meilisearch/issues/4892),
and
[#&#8203;4938](https://redirect.github.com/meilisearch/meilisearch/issues/4938).

##### ⚠️ Breaking changes

- When performing AI-powered searches, `hybrid.embedder` is now a
**mandatory** parameter in `GET` and `POST`
`/indexes/{:indexUid}/search`
- As a consequence, it is now **mandatory** to pass `hybrid` even for
pure semantic searches
- `embedder` is now a **mandatory** parameter in `GET` and `POST`
`/indexes/{:indexUid}/similar`
- Meilisearch now ignores `semanticRatio` and performs a pure semantic
search for queries that include `vector` but not `q`

##### Addition & improvements

- The default model for OpenAI is now `text-embedding-3-small` instead
of `text-embedding-ada-002`
- This release introduces a new embedder option:
`documentTemplateMaxBytes`. Meilisearch will truncate a document's
template text when it goes over the specified limit
- Fields in `documentTemplate` include a new `field.is_searchable`
property. The default document template now filters out both empty
fields and fields not in the searchable attributes list:

v1.11:

    {% for field in fields %}
      {% if field.is_searchable and not field.value == nil %}
        {{ field.name }}: {{ field.value }}\n
      {% endif %}
    {% endfor %}

v1.10:

    {% for field in fields %}
      {{ field.name }}: {{ field.value }}\n
    {% endfor %}

Embedders using the v1.10 document template will continue working as
before. The new default document template will only work with newly
created embedders.

#### Vector database indexing performance improvements

v1.11 introduces a new embedder option, `binaryQuantized`:

```bash
curl \
  -X PATCH 'http://localhost:7700/indexes/movies/settings' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "embedders": {
      "image2text": {
        "binaryQuantized": true
      }
    }
  }'
```

Enable binary quantization to convert embeddings of floating point
numbers into embeddings of boolean values. This will negatively impact
the relevancy of AI-powered searches but significantly improve
performance in large collections with more than 100 dimensions.

In our benchmarks, this reduced the size of the database by a factor of
10 and divided the indexing time by a factor of 6 with little impact on
search times.

> \[!WARNING]
> Enabling this feature will update all of your vectors to contain only
`1`s or `-1`s, significantly impacting relevancy.
>
> **You cannot revert this option once you enable it**. Before setting
`binaryQuantized` to `true`, Meilisearch recommends testing it in a
smaller or duplicate index in a development environment.

Done by [@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;4941](https://redirect.github.com/meilisearch/meilisearch/issues/4941).

#### Federated search improvements

##### Facet distribution and stats for federated searches

This release adds two new federated search options, `facetsByIndex` and
`mergeFacets`. These allow you to request a federated search for facet
distributions and stats data.

##### Facet information by index

To obtain facet distribution and stats for each separate index, use
`facetsByIndex` when querying the `POST` `/multi-search` endpoint:

```json5
POST /multi-search
{
  "federation": {
    "limit": 20,
    "offset": 0,
	"facetsByIndex": {
	  "movies": ["title", "id"],
	  "comics": ["title"],
	}
  },
  "queries": [
    {
      "q": "Batman",
      "indexUid": "movies"
    },
    {
      "q": "Batman",
      "indexUid": "comics"
    }
  ]
}
```

The multi-search response will include a new field, `facetsByIndex` with
facet data separated per index:

```json5
{
  "hits": […],
  …
  "facetsByIndex": {
      "movies": {
        "distribution": {
          "title": {
            "Batman returns": 1
          },
          "id": {
            "42": 1
          }
        },
        "stats": {
          "id": {
            "min": 42,
            "max": 42
          }
        }
      },
     …
  }
}
```

##### Merged facet information

To obtain facet distribution and stats for all indexes merged into a
single, use both `facetsByIndex` and `mergeFacets` when querying the
`POST` `/multi-search` endpoint:

```json5
POST /multi-search
{

  "federation": {
    "limit": 20,
    "offset": 0,
	  "facetsByIndex": {
	    "movies": ["title", "id"],
	    "comics": ["title"],
	  },
	  "mergeFacets": {
	    "maxValuesPerFacet": 10,
	  }
  }
  "queries": [
    {
      "q": "Batman",
      "indexUid": "movies"
    },
    {
      "q": "Batman",
      "indexUid": "comics"
    }
  ]
}
```

The response includes two new fields, `facetDistribution` and
`facetStarts`:

```json5
{
  "hits": […],
  …
  "facetDistribution": {
    "title": {
      "Batman returns": 1
      "Batman: the killing joke":
    },
    "id": {
      "42": 1
    }
  },
  "facetStats": {
    "id": {
      "min": 42,
      "max": 42
    }
  }
}
```

Done by [@&#8203;dureuill](https://redirect.github.com/dureuill) in
[#&#8203;4929](https://redirect.github.com/meilisearch/meilisearch/issues/4929).

#### Experimental — New `STARTS WITH` filter operator

Enable the experimental feature to use the `STARTS WITH` filter
operator:

```bash
curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "containsFilter": true
  }'
```

Use the `STARTS WITH` operator when filtering:

```json5
curl \
  -X POST http://localhost:7700/indexes/movies/search \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "filter": "hero STARTS WITH spider"
  }'
```

🗣️ This is an experimental feature, and we need your help to improve it!
Share your thoughts and feedback on this [GitHub
discussion](https://redirect.github.com/orgs/meilisearch/discussions/763).

Done by [@&#8203;Kerollmops](https://redirect.github.com/Kerollmops) in
[#&#8203;4939](https://redirect.github.com/meilisearch/meilisearch/issues/4939).

#### Other improvements

- Language support and [localizedAttributes
settings](https://www.meilisearch.com/docs/reference/api/settings#localized-attributes)
by [@&#8203;ManyTheFish](https://redirect.github.com/ManyTheFish) in
[#&#8203;4937](https://redirect.github.com/meilisearch/meilisearch/issues/4937)
    -   Add ISO-639-1 variants
    -   Convert ISO-639-1 into ISO-639-3
- Add a German language tokenizer by
[@&#8203;luflow](https://redirect.github.com/luflow) in
[meilisearch/charabia#303](https://redirect.github.com/meilisearch/charabia/issues/303)
and in
[#&#8203;4945](https://redirect.github.com/meilisearch/meilisearch/issues/4945)
- Improve Turkish language support by
[@&#8203;tkhshtsh0917](https://redirect.github.com/tkhshtsh0917) in
[meilisearch/charabia#305](https://redirect.github.com/meilisearch/charabia/issues/305)
and in
[#&#8203;4957](https://redirect.github.com/meilisearch/meilisearch/issues/4957)
- Upgrade "batch failed" log to error level in
[#&#8203;4955](https://redirect.github.com/meilisearch/meilisearch/issues/4955)
by [@&#8203;dureuill](https://redirect.github.com/dureuill).
- Update the search UI: remove the forced capitalized fields, by
[@&#8203;curquiza](https://redirect.github.com/curquiza) in
[#&#8203;4993](https://redirect.github.com/meilisearch/meilisearch/issues/4993)

### Fixes 🐞

- ⚠️ When using federated search, `query.facets` was silently ignored at
the query level, but should not have been. It now returns the
appropriate error. Use `federation.facetsByIndex` instead if you want
facets to be applied during federated search.
- Prometheus `/metrics` return the route pattern instead of the real
route when returning the HTTP requests total by
[@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;4839](https://redirect.github.com/meilisearch/meilisearch/issues/4839)
- Truncate values at the end of a list of facet values when the number
of facet values is larger than `maxValuesPerFacet`. For example, setting
`maxValuesPerFacet` to `2` could result in `["blue", "red", "yellow"]`,
being truncated to `["blue", "yellow"]` instead of \["blue", "red"]\`.
By [@&#8203;dureuill](https://redirect.github.com/dureuill) in
[#&#8203;4929](https://redirect.github.com/meilisearch/meilisearch/issues/4929)
- Improve the task cancellation when vectors are used, by
[@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;4971](https://redirect.github.com/meilisearch/meilisearch/issues/4971)
- Swedish support: the characters `å`, `ä`, `ö` are no longer normalized
to `a` and `o`. By
[@&#8203;ManyTheFish](https://redirect.github.com/ManyTheFish) in
[#&#8203;4945](https://redirect.github.com/meilisearch/meilisearch/issues/4945)
- Update rhai to fix an internal error when [updating documents with a
function](https://redirect.github.com/orgs/meilisearch/discussions/762)
(experimental) by
[@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;4960](https://redirect.github.com/meilisearch/meilisearch/issues/4960)
- Fix the bad experimental search queue size by
[@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;4992](https://redirect.github.com/meilisearch/meilisearch/issues/4992)
- Do not send empty edit document by function by
[@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;5001](https://redirect.github.com/meilisearch/meilisearch/issues/5001)
- Display vectors when no custom vectors were ever provided by
[@&#8203;dureuill](https://redirect.github.com/dureuill) in
[#&#8203;5008](https://redirect.github.com/meilisearch/meilisearch/issues/5008)

### Misc

-   Dependencies updates
- Security dependency upgrade: bump quinn-proto from 0.11.3 to 0.11.8 by
[@&#8203;dependabot](https://redirect.github.com/dependabot) in
[#&#8203;4911](https://redirect.github.com/meilisearch/meilisearch/issues/4911)
-   CIs and tests
- Make the tests run faster by
[@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;4808](https://redirect.github.com/meilisearch/meilisearch/issues/4808)
-   Documentation
- Fix broken links in README by
[@&#8203;iornstein](https://redirect.github.com/iornstein) in
[#&#8203;4943](https://redirect.github.com/meilisearch/meilisearch/issues/4943)
-   Misc
- Allow Meilitool to upgrade from v1.9 to v1.10 without a dump in some
conditions, by [@&#8203;dureuill](https://redirect.github.com/dureuill)
in
[#&#8203;4912](https://redirect.github.com/meilisearch/meilisearch/issues/4912)
- Fix bench by adding embedder by
[@&#8203;dureuill](https://redirect.github.com/dureuill) in
[#&#8203;4954](https://redirect.github.com/meilisearch/meilisearch/issues/4954)
- Revamp analytics by
[@&#8203;irevoire](https://redirect.github.com/irevoire) in
[#&#8203;5011](https://redirect.github.com/meilisearch/meilisearch/issues/5011)

❤️ Thanks again to our external contributors:

- [Meilisearch](https://redirect.github.com/meilisearch/meilisearchg):
[@&#8203;iornstein](https://redirect.github.com/iornstein).
- [Charabia](https://redirect.github.com/meilisearch/charabia):
[@&#8203;luflow](https://redirect.github.com/luflow),
[@&#8203;tkhshtsh0917](https://redirect.github.com/tkhshtsh0917).

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://redirect.github.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOS43OS4xIiwidXBkYXRlZEluVmVyIjoiMzkuODguMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsicmVub3ZhdGUvY29udGFpbmVyIiwidHlwZS9taW5vciJdfQ==-->
kodiakhq bot pushed a commit to cloudquery/cloudquery that referenced this pull request Jan 7, 2025
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [getmeili/meilisearch](https://togithub.com/meilisearch/meilisearch) | minor | `v1.1.0` -> `v1.12.1` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency Dashboard for more information.

---

### Release Notes

<details>
<summary>meilisearch/meilisearch (getmeili/meilisearch)</summary>

### [`v1.12.1`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.12.1)

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.12.0...v1.12.1)

#### Fixes

There was a bug in the engine when adding an empty payload, it was making the batch fails.
Fixed by [@&#8203;irevoire](https://togithub.com/irevoire) in [https://github.com/meilisearch/meilisearch/pull/5192](https://togithub.com/meilisearch/meilisearch/pull/5192)

**Full Changelog**: https://github.com/meilisearch/meilisearch/compare/v1.12.0...v1.12.1

### [`v1.12.0`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.12.0): 🦗

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.11.3...v1.12.0)

Meilisearch v1.12 introduces significant indexing speed improvements, almost halving the time required to index large datasets. This release also introduces new settings to customize and potentially further increase indexing speed.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

### New features and updates 🔥

#### Improve indexing speed

Indexing time is improved across the board!

-   Performance is maintained or better on smaller machines
-   On bigger machines with multiple cores and good IO, Meilisearch v1.12 is much faster than Meilisearch v1.11
    -   More than twice as fast for raw document insertion tasks.
    -   More than x4 as fast for incrementally updating documents in a large database.
    -   Embeddings generation was also improved up to x1.5 for some workloads.

The new indexer also makes task cancellation faster.

Done by [@&#8203;dureuill](https://togithub.com/dureuill), [@&#8203;ManyTheFish](https://togithub.com/ManyTheFish), and [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;4900](https://togithub.com/meilisearch/meilisearch/issues/4900).

#### New index settings: use `facetSearch` and `prefixSearch` to improve indexing speed

v1.12 introduces two new index settings: `facetSearch` and `prefixSearch`.

Both settings allow you to skip parts of the indexing process. This leads to significant improvements to indexing speed, but may negatively impact search experience in some use cases.

Done by [@&#8203;ManyTheFish](https://togithub.com/ManyTheFish) in [#&#8203;5091](https://togithub.com/meilisearch/meilisearch/issues/5091)

##### `facetSearch`

Use this setting to toggle [facet search](https://www.meilisearch.com/docs/learn/filtering_and_sorting/search_with_facet_filters#searching-facet-values):

```bash
curl \
  -X PUT 'http://localhost:7700/indexes/books/settings/facet-search' \
  -H 'Content-Type: application/json' \
  --data-binary 'true'
```

The default value for `facetSearch` is `true`. When set to `false`, this setting disables facet search for all filterable attributes in an index.

##### `prefixSearch`

Use this setting to configure the ability to [search a word by prefix](https://www.meilisearch.com/docs/learn/engine/prefix) on an index:

```bash
curl \
  -X PUT 'http://localhost:7700/indexes/books/settings/prefix-search' \
  -H 'Content-Type: application/json' \
  --data-binary 'disabled'
```

`prefixSearch` accepts one of the following values:

-   `"indexingTime"`: enables prefix processing during indexing. This is the default Meilisearch behavior
-   `"disabled"`: deactivates prefix search completely

Disabling prefix search means the query `he` will no longer match the word `hello`. This may significantly impact search result relevancy, but speeds up the indexing process.

#### New API route: `/batches`

The new `/batches` endpoint allow you to query information about task batches.

`GET` `/batches` returns a list of batch objects:

```sh
curl  -X GET 'http://localhost:7700/batches'
```

This endpoint accepts the same parameters as `GET` `/tasks` route, allowing you to narrow down which batches you want to see. Parameters used with `GET` `/batches` apply to the tasks, not the batches themselves. For example, `GET /batches?uid=0` returns batches containing tasks with a `taskUid` of `0` , not batches with a `batchUid` of `0`.

You may also query `GET` `/batches/:uid` to retrieve information about a single batch object:

```sh
curl  -X GET 'http://localhost:7700/batches/BATCH_UID'
```

`/batches/:uid` does not accept any parameters.

Batch objects contain the following fields:

```json5
{
  "uid": 160,
  "progress": {
    "steps": [
      {
        "currentStep": "processing tasks",
        "finished": 0,
        "total": 2
      },
      {
        "currentStep": "indexing",
        "finished": 2,
        "total": 3
      },
      {
        "currentStep": "extracting words",
        "finished": 3,
        "total": 13
      },
      {
        "currentStep": "document",
        "finished": 12300,
        "total": 19546
      }
    ],
    "percentage": 37.986263
  },
  "details": {
    "receivedDocuments": 19547,
    "indexedDocuments": null
  },
  "stats": {
    "totalNbTasks": 1,
    "status": {
      "processing": 1
    },
    "types": {
      "documentAdditionOrUpdate": 1
    },
    "indexUids": {
      "mieli": 1
    }
  },
  "duration": null,
  "startedAt": "2024-12-12T09:44:34.124726733Z",
  "finishedAt": null
}
```

Additionally, task objects now include a new field, `batchUid`. Use this field together with `/batches/:uid` to retrieve data on a specific batch.

```json5
{
  "uid": 154,
  "batchUid": 142,
  "indexUid": "movies_test2",
  "status": "succeeded",
  "type": "documentAdditionOrUpdate",
  "canceledBy": null,
  "details": {
    "receivedDocuments": 1,
    "indexedDocuments": 1
  },
  "error": null,
  "duration": "PT0.027766819S",
  "enqueuedAt": "2024-12-02T14:07:34.974430765Z",
  "startedAt": "2024-12-02T14:07:34.99021667Z",
  "finishedAt": "2024-12-02T14:07:35.017983489Z"
}
```

Done by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;5060](https://togithub.com/meilisearch/meilisearch/issues/5060), [#&#8203;5070](https://togithub.com/meilisearch/meilisearch/issues/5070), [#&#8203;5080](https://togithub.com/meilisearch/meilisearch/issues/5080)

#### Other improvements

-   New query parameter for `GET` `/tasks`: `reverse`. If `reverse` is set to `true`, tasks will be returned in reversed order, from oldest to newest tasks. Done by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;5048](https://togithub.com/meilisearch/meilisearch/issues/5048)
-   Phrase searches with`showMatchesPosition` set to `true` give a single location for the whole phrase [@&#8203;flevi29](https://togithub.com/flevi29) in [#&#8203;4928](https://togithub.com/meilisearch/meilisearch/issues/4928)
-   New Prometheus metrics by [@&#8203;PedroTurik](https://togithub.com/PedroTurik) in [#&#8203;5044](https://togithub.com/meilisearch/meilisearch/issues/5044)
-   When a query finds matching terms in document fields with array values,  Meilisearch now includes an `indices` field to `_matchesPosition` specifying which array elements contain the matches by [@&#8203;LukasKalbertodt](https://togithub.com/LukasKalbertodt) in [#&#8203;5005](https://togithub.com/meilisearch/meilisearch/issues/5005)
-   ⚠️ Breaking `vectorStore` change: field distribution no longer contains `_vectors`. Its value used to be incorrect, and there is no current use case for the fixed, most likely empty, value. Done as part of [#&#8203;4900](https://togithub.com/meilisearch/meilisearch/issues/4900)
-   Improve error message by adding index name in [#&#8203;5056](https://togithub.com/meilisearch/meilisearch/issues/5056) by [@&#8203;airycanon](https://togithub.com/airycanon)

### Fixes 🐞

-   Return appropriate error when primary key is greater than 512 bytes, by [@&#8203;flevi29](https://togithub.com/flevi29) in [#&#8203;4930](https://togithub.com/meilisearch/meilisearch/issues/4930)
-   Fix issue where numbers were segmented in different ways depending on tokenizer, by [@&#8203;dqkqd](https://togithub.com/dqkqd) in [https://github.com/meilisearch/charabia/pull/311](https://togithub.com/meilisearch/charabia/pull/311)
-   Fix pagination when embedding fails by [@&#8203;dureuill](https://togithub.com/dureuill) in [https://github.com/meilisearch/meilisearch/pull/5063](https://togithub.com/meilisearch/meilisearch/pull/5063)
-   Fix issue causing Meilisearch to ignore stop words in some cases by [@&#8203;ManyTheFish](https://togithub.com/ManyTheFish) in [#&#8203;5062](https://togithub.com/meilisearch/meilisearch/issues/5062)
-   Fix phrase search with `attributesToSearchOn` in [#&#8203;5062](https://togithub.com/meilisearch/meilisearch/issues/5062) by [@&#8203;ManyTheFish](https://togithub.com/ManyTheFish)

### Misc

-   Dependencies updates
    -   Update benchmarks to match the new crates subfolder by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;5021](https://togithub.com/meilisearch/meilisearch/issues/5021)
    -   Fix the benchmarks by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;5037](https://togithub.com/meilisearch/meilisearch/issues/5037)
    -   Bump Swatinem/rust-cache from 2.7.1 to 2.7.5 in [#&#8203;5030](https://togithub.com/meilisearch/meilisearch/issues/5030)
    -   Update charabia v0.9.2 by [@&#8203;ManyTheFish](https://togithub.com/ManyTheFish) in [#&#8203;5098](https://togithub.com/meilisearch/meilisearch/issues/5098)
    -   Update mini-dashboard to v0.2.16 version by [@&#8203;curquiza](https://togithub.com/curquiza) in [#&#8203;5102](https://togithub.com/meilisearch/meilisearch/issues/5102)
-   CIs and tests
    -   Improve performance of `delete_index.rs` by [@&#8203;DerTimonius](https://togithub.com/DerTimonius) in [#&#8203;4963](https://togithub.com/meilisearch/meilisearch/issues/4963)
    -   Improve performance of `create_index.rs` by [@&#8203;DerTimonius](https://togithub.com/DerTimonius) in [#&#8203;4962](https://togithub.com/meilisearch/meilisearch/issues/4962)
    -   Improve performance of `get_documents.rs` by [@&#8203;PedroTurik](https://togithub.com/PedroTurik) in [#&#8203;5025](https://togithub.com/meilisearch/meilisearch/issues/5025)
    -   Improve performance of `formatted.rs` by [@&#8203;PedroTurik](https://togithub.com/PedroTurik) in [#&#8203;5043](https://togithub.com/meilisearch/meilisearch/issues/5043)
    -   Fix the path used in the flaky tests CI by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;5049](https://togithub.com/meilisearch/meilisearch/issues/5049)
-   Misc
    -   Rollback the Meilisearch Kawaii logo by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;5017](https://togithub.com/meilisearch/meilisearch/issues/5017)
    -   Add image source label to Dockerfile by [@&#8203;wuast94](https://togithub.com/wuast94) in [#&#8203;4990](https://togithub.com/meilisearch/meilisearch/issues/4990)
    -   Hide code complexity into a subfolder by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;5016](https://togithub.com/meilisearch/meilisearch/issues/5016)
    -   Internal tool: implement offline upgrade from v1.10 to v1.11 by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;5034](https://togithub.com/meilisearch/meilisearch/issues/5034)
    -   Internal tool: implement offline upgrade from v1.11 to v1.12 by [@&#8203;ManyTheFish](https://togithub.com/ManyTheFish) in  [#&#8203;5146](https://togithub.com/meilisearch/meilisearch/issues/5146)
    -   Meilisearch is now able to retrieve Katakana words from a Hiragana query by [@&#8203;tats-u](https://togithub.com/tats-u) in [https://github.com/meilisearch/charabia/pull/312](https://togithub.com/meilisearch/charabia/pull/312)
    -   Improve error handling when writing into LMDB by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [https://github.com/meilisearch/meilisearch/pull/5089](https://togithub.com/meilisearch/meilisearch/pull/5089)

❤️ Thanks again to our external contributors:

-   [Meilisearch](https://togithub.com/meilisearch/meilisearch): [@&#8203;airycanon](https://togithub.com/airycanon), [@&#8203;DerTimonius](https://togithub.com/DerTimonius), [@&#8203;flevi29](https://togithub.com/flevi29), [@&#8203;LukasKalbertodt](https://togithub.com/LukasKalbertodt), [@&#8203;PedroTurik](https://togithub.com/PedroTurik), [@&#8203;wuast94](https://togithub.com/wuast94)
-   [Charabia](https://togithub.com/meilisearch/charabia): [@&#8203;dqkqd](https://togithub.com/dqkqd) [@&#8203;tats-u](https://togithub.com/tats-u)

### [`v1.11.3`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.11.3): 🐿️

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.11.2...v1.11.3)

#### What's Changed

-   For REST/OpenAI/ollama autoembedders users: Retry if deserialization of remote response failed by [@&#8203;dureuill](https://togithub.com/dureuill) in [https://github.com/meilisearch/meilisearch/pull/5058](https://togithub.com/meilisearch/meilisearch/pull/5058)

**Full Changelog**: https://github.com/meilisearch/meilisearch/compare/v1.11.2...v1.11.3

### [`v1.11.2`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.11.2): 🐿️

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.11.1...v1.11.2)

#### What's Changed

-   Add timeout on read and write operations. by [@&#8203;dureuill](https://togithub.com/dureuill) in [https://github.com/meilisearch/meilisearch/pull/5051](https://togithub.com/meilisearch/meilisearch/pull/5051)

**Full Changelog**: https://github.com/meilisearch/meilisearch/compare/v1.11.1...v1.11.2

### [`v1.11.1`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.11.1): 🐿️

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.11.0...v1.11.1)

#### What's Changed

-   Add 3s timeout to embedding requests made during search by [@&#8203;dureuill](https://togithub.com/dureuill) in [https://github.com/meilisearch/meilisearch/pull/5039](https://togithub.com/meilisearch/meilisearch/pull/5039)

**Full Changelog**: https://github.com/meilisearch/meilisearch/compare/v1.11.0...v1.11.1

### [`v1.11.0`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.11.0): 🐿️

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.10.3...v1.11.0)

Meilisearch v1.11 introduces AI-powered search performance improvements thanks to binary quantization and various usage changes, all of which are steps towards a future stabilization of the feature. We have also improved federated search usage following user feedback.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

### New features and updates 🔥

#### Experimental - AI-powered search improvements

This release is Meilisearch's first step towards stabilizing AI-powered search and introduces a few breaking changes to its API. [Consult the PRD for full usage details.](https://www.notion.so/meilisearch/v1-11-AI-search-changes-0e37727193884a70999f254fa953ce6e)

Done by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4906](https://togithub.com/meilisearch/meilisearch/issues/4906), [#&#8203;4920](https://togithub.com/meilisearch/meilisearch/issues/4920), [#&#8203;4892](https://togithub.com/meilisearch/meilisearch/issues/4892), and [#&#8203;4938](https://togithub.com/meilisearch/meilisearch/issues/4938).

##### ⚠️ Breaking changes

-   When performing AI-powered searches, `hybrid.embedder` is now a **mandatory** parameter in `GET` and `POST` `/indexes/{:indexUid}/search`
-   As a consequence, it is now **mandatory** to pass `hybrid` even for pure semantic searches
-   `embedder` is now a **mandatory** parameter in `GET` and `POST` `/indexes/{:indexUid}/similar`
-   Meilisearch now ignores `semanticRatio` and performs a pure semantic search for queries that include `vector` but not `q`

##### Addition & improvements

-   The default model for OpenAI is now `text-embedding-3-small` instead of `text-embedding-ada-002`
-   This release introduces a new embedder option: `documentTemplateMaxBytes`. Meilisearch will truncate a document's template text when it goes over the specified limit
-   Fields in `documentTemplate` include a new `field.is_searchable` property. The default document template now filters out both empty fields and fields not in the searchable attributes list:

v1.11:

    {% for field in fields %}
      {% if field.is_searchable and not field.value == nil %}
        {{ field.name }}: {{ field.value }}\n
      {% endif %}
    {% endfor %}

v1.10:

    {% for field in fields %}
      {{ field.name }}: {{ field.value }}\n
    {% endfor %}

Embedders using the v1.10 document template will continue working as before. The new default document template will only work with newly created embedders.

#### Vector database indexing performance improvements

v1.11 introduces a new embedder option, `binaryQuantized`:

```bash
curl \
  -X PATCH 'http://localhost:7700/indexes/movies/settings' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "embedders": {
      "image2text": {
        "binaryQuantized": true
      }
    }
  }'
```

Enable binary quantization to convert embeddings of floating point numbers into embeddings of boolean values. This will negatively impact the relevancy of AI-powered searches but significantly improve performance in large collections with more than 100 dimensions.

In our benchmarks, this reduced the size of the database by a factor of 10 and divided the indexing time by a factor of 6 with little impact on search times.

> \[!WARNING]
> Enabling this feature will update all of your vectors to contain only `1`s or `-1`s, significantly impacting relevancy.
>
> **You cannot revert this option once you enable it**. Before setting `binaryQuantized` to `true`, Meilisearch recommends testing it in a smaller or duplicate index in a development environment.

Done by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4941](https://togithub.com/meilisearch/meilisearch/issues/4941).

#### Federated search improvements

##### Facet distribution and stats for federated searches

This release adds two new federated search options, `facetsByIndex` and `mergeFacets`. These allow you to request a federated search for facet distributions and stats data.

##### Facet information by index

To obtain facet distribution and stats for each separate index, use `facetsByIndex` when querying the `POST` `/multi-search` endpoint:

```json5
POST /multi-search
{
  "federation": {
    "limit": 20,
    "offset": 0,
	"facetsByIndex": {
	  "movies": ["title", "id"],
	  "comics": ["title"],
	}
  },
  "queries": [
    {
      "q": "Batman",
      "indexUid": "movies"
    },
    {
      "q": "Batman",
      "indexUid": "comics"
    }
  ]
}
```

The multi-search response will include a new field, `facetsByIndex` with facet data separated per index:

```json5
{
  "hits": […],
  …
  "facetsByIndex": {
      "movies": {
        "distribution": {
          "title": {
            "Batman returns": 1
          },
          "id": {
            "42": 1
          }
        },
        "stats": {
          "id": {
            "min": 42,
            "max": 42
          }
        }
      },
     …
  }
}
```

##### Merged facet information

To obtain facet distribution and stats for all indexes merged into a single, use both `facetsByIndex` and `mergeFacets` when querying the `POST` `/multi-search` endpoint:

```json5
POST /multi-search
{

  "federation": {
    "limit": 20,
    "offset": 0,
	  "facetsByIndex": {
	    "movies": ["title", "id"],
	    "comics": ["title"],
	  },
	  "mergeFacets": {
	    "maxValuesPerFacet": 10,
	  }
  }
  "queries": [
    {
      "q": "Batman",
      "indexUid": "movies"
    },
    {
      "q": "Batman",
      "indexUid": "comics"
    }
  ]
}
```

The response includes two new fields, `facetDistribution` and `facetStarts`:

```json5
{
  "hits": […],
  …
  "facetDistribution": {
    "title": {
      "Batman returns": 1
      "Batman: the killing joke":
    },
    "id": {
      "42": 1
    }
  },
  "facetStats": {
    "id": {
      "min": 42,
      "max": 42
    }
  }
}
```

Done by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4929](https://togithub.com/meilisearch/meilisearch/issues/4929).

#### Experimental — New `STARTS WITH` filter operator

Enable the experimental feature to use the `STARTS WITH` filter operator:

```bash
curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "containsFilter": true
  }'
```

Use the `STARTS WITH` operator when filtering:

```json5
curl \
  -X POST http://localhost:7700/indexes/movies/search \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "filter": "hero STARTS WITH spider"
  }'
```

🗣️ This is an experimental feature, and we need your help to improve it! Share your thoughts and feedback on this [GitHub discussion](https://togithub.com/orgs/meilisearch/discussions/763).

Done by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;4939](https://togithub.com/meilisearch/meilisearch/issues/4939).

#### Other improvements

-   Language support and [localizedAttributes settings](https://www.meilisearch.com/docs/reference/api/settings#localized-attributes) by [@&#8203;ManyTheFish](https://togithub.com/ManyTheFish) in [#&#8203;4937](https://togithub.com/meilisearch/meilisearch/issues/4937)
    -   Add ISO-639-1 variants
    -   Convert ISO-639-1 into ISO-639-3
-   Add a German language tokenizer by [@&#8203;luflow](https://togithub.com/luflow) in [meilisearch/charabia#303](https://togithub.com/meilisearch/charabia/issues/303) and in [#&#8203;4945](https://togithub.com/meilisearch/meilisearch/issues/4945)
-   Improve Turkish language support by [@&#8203;tkhshtsh0917](https://togithub.com/tkhshtsh0917) in [meilisearch/charabia#305](https://togithub.com/meilisearch/charabia/issues/305) and in  [#&#8203;4957](https://togithub.com/meilisearch/meilisearch/issues/4957)
-   Upgrade "batch failed" log to error level in [#&#8203;4955](https://togithub.com/meilisearch/meilisearch/issues/4955) by [@&#8203;dureuill](https://togithub.com/dureuill).
-   Update the search UI: remove the forced capitalized fields, by [@&#8203;curquiza](https://togithub.com/curquiza) in [#&#8203;4993](https://togithub.com/meilisearch/meilisearch/issues/4993)

### Fixes 🐞

-   ⚠️ When using federated search, `query.facets` was silently ignored at the query level, but should not have been. It now returns the appropriate error. Use `federation.facetsByIndex` instead if you want facets to be applied during federated search.
-   Prometheus `/metrics` return the route pattern instead of the real route when returning the HTTP requests total by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4839](https://togithub.com/meilisearch/meilisearch/issues/4839)
-   Truncate values at the end of a list of facet values when the number of facet values is larger than `maxValuesPerFacet`. For example, setting `maxValuesPerFacet` to `2` could result in `["blue", "red", "yellow"]`, being truncated to `["blue", "yellow"]` instead of \["blue", "red"]\`. By [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4929](https://togithub.com/meilisearch/meilisearch/issues/4929)
-   Improve the task cancellation when vectors are used, by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4971](https://togithub.com/meilisearch/meilisearch/issues/4971)
-   Swedish support: the characters `å`, `ä`, `ö` are no longer normalized to `a` and `o`. By [@&#8203;ManyTheFish](https://togithub.com/ManyTheFish) in [#&#8203;4945](https://togithub.com/meilisearch/meilisearch/issues/4945)
-   Update rhai to fix an internal error when [updating documents with a function](https://togithub.com/orgs/meilisearch/discussions/762) (experimental) by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4960](https://togithub.com/meilisearch/meilisearch/issues/4960)
-   Fix the bad experimental search queue size by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4992](https://togithub.com/meilisearch/meilisearch/issues/4992)
-   Do not send empty edit document by function by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;5001](https://togithub.com/meilisearch/meilisearch/issues/5001)
-   Display vectors when no custom vectors were ever provided by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;5008](https://togithub.com/meilisearch/meilisearch/issues/5008)

### Misc

-   Dependencies updates
    -   Security dependency upgrade: bump quinn-proto from 0.11.3 to 0.11.8 by [@&#8203;dependabot](https://togithub.com/dependabot) in [#&#8203;4911](https://togithub.com/meilisearch/meilisearch/issues/4911)
-   CIs and tests
    -   Make the tests run faster by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4808](https://togithub.com/meilisearch/meilisearch/issues/4808)
-   Documentation
    -   Fix broken links in README by [@&#8203;iornstein](https://togithub.com/iornstein) in [#&#8203;4943](https://togithub.com/meilisearch/meilisearch/issues/4943)
-   Misc
    -   Allow Meilitool to upgrade from v1.9 to v1.10 without a dump in some conditions, by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4912](https://togithub.com/meilisearch/meilisearch/issues/4912)
    -   Fix bench by adding embedder by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4954](https://togithub.com/meilisearch/meilisearch/issues/4954)
    -   Revamp analytics by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;5011](https://togithub.com/meilisearch/meilisearch/issues/5011)

❤️ Thanks again to our external contributors:

-   [Meilisearch](https://togithub.com/meilisearch/meilisearchg): [@&#8203;iornstein](https://togithub.com/iornstein).
-   [Charabia](https://togithub.com/meilisearch/charabia): [@&#8203;luflow](https://togithub.com/luflow), [@&#8203;tkhshtsh0917](https://togithub.com/tkhshtsh0917).

### [`v1.10.3`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.10.3): 🦩

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.10.2...v1.10.3)

#### Search improvements

This PR lets you configure two behaviors of the engine through experimental cli flags:

-   The number of searches Meilisearch can process concurrently per core with the [`--experimental-nb-searches-per-core`](https://togithub.com/orgs/meilisearch/discussions/784) cli flag
-   After how many seconds Meilisearch can consider a search as irrelevant and drop it straight away without processing it with the [`--experimental-drop-search-after`](https://togithub.com/orgs/meilisearch/discussions/783) cli flag

Done by [@&#8203;irevoire](https://togithub.com/irevoire) in [https://github.com/meilisearch/meilisearch/pull/5000](https://togithub.com/meilisearch/meilisearch/pull/5000)

**Full Changelog**: https://github.com/meilisearch/meilisearch/compare/v1.10.2...v1.10.3

### [`v1.10.2`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.10.2): 🦩

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.10.1...v1.10.2)

#### Fixes 🦋

##### Activate the Swedish tokenization Pipeline

The Swedish tokenization pipeline were deactivated in the previous versions, now it is activated when specifying the index Language in the settings:

##### PATCH `/indexes/:index-name/settings`

```json
{
  "localizedAttributes": [ { "locales": ["swe"], "attributePatterns": ["*"] } ]
}
```

related PR: [#&#8203;4949](https://togithub.com/meilisearch/meilisearch/issues/4949)

### [`v1.10.1`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.10.1): 🦩

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.10.0...v1.10.1)

#### Fixes 🦋

##### Better search handling under heavy loads

All of the next PR should make meilisearch behave better under heavy loads:

-   Only spawn one search queue in actix-web by [@&#8203;irevoire](https://togithub.com/irevoire) in [https://github.com/meilisearch/meilisearch/pull/4893](https://togithub.com/meilisearch/meilisearch/pull/4893)
-   Make sure the index scheduler never stops running by [@&#8203;irevoire](https://togithub.com/irevoire) in [https://github.com/meilisearch/meilisearch/pull/4896](https://togithub.com/meilisearch/meilisearch/pull/4896)
-   Explicitly drop the search permits by [@&#8203;irevoire](https://togithub.com/irevoire) in [https://github.com/meilisearch/meilisearch/pull/4898](https://togithub.com/meilisearch/meilisearch/pull/4898)
-   Stop trying to process searches after one minute by [@&#8203;irevoire](https://togithub.com/irevoire) in [https://github.com/meilisearch/meilisearch/pull/4899](https://togithub.com/meilisearch/meilisearch/pull/4899)

#### Speed improvement 🐎

We made the autobatching of the document deletion with the document deletion by filter possible which should uncklog the task queue of the people using these two operations heavily.
Meilisearch still cannot autobatch the document deletion by filter and the document addition, though.

-   Autobatch document deletion by filter by [@&#8203;irevoire](https://togithub.com/irevoire) in [https://github.com/meilisearch/meilisearch/pull/4901](https://togithub.com/meilisearch/meilisearch/pull/4901)
-   Do not fail the whole batch when a single document deletion by filter fails by [@&#8203;irevoire](https://togithub.com/irevoire) in [https://github.com/meilisearch/meilisearch/pull/4905](https://togithub.com/meilisearch/meilisearch/pull/4905)

**Full Changelog**: https://github.com/meilisearch/meilisearch/compare/v1.10.0...v1.10.1

### [`v1.10.0`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.10.0): 🦩

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.9.1...v1.10.0)

Meilisearch v1.10 introduces federated search. This innovative feature allows you to receive a single list of results for multi-search requests. v1.10 also includes a setting to manually define which language or languages are present in your documents, and two new new experimental features: the `CONTAINS` filter operator and the ability to update a subset of your dataset with a function.

🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.

Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

### New features and updates 🔥

#### Federated search

Use the new `federation` setting of the `/multi-search` route to return a single search result object:

```bash
curl \
  -X POST 'http://localhost:7700/multi-search' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "federation": {
      "offset": 5,
      "limit": 10
    }
    "queries": [
      {
        "q": "Batman",
        "indexUid": "movies"
      },
      {
        "q": "Batman",
        "indexUid": "comics"
      }
    ]
  }'
```

Response:

```json5
{
  "hits": [
    {
      "id": 42,
      "title": "Batman returns",
      "overview": "..",
      "_federation": {
        "indexUid": "movies",
        "queriesPosition": 0
      }
    },
    {
      "comicsId": "batman-killing-joke",
      "description": "..",
      "title": "Batman: the killing joke",
      "_federation": {
        "indexUid": "comics",
        "queriesPosition": 1
      }
    },
    …
 ],
  processingTimeMs: 0,
  limit: 20,
  offset: 0,
  estimatedTotalHits: 2,
  semanticHitCount: 0,
}
```

When performing a federated search, Meilisearch merges the results coming from different sources in descending ranking score order.

If `federation` is empty (`{}`), Meilisearch  sets `offset` and `limit` to 0 and 20 respectively.

If `federation` is `null` or missing, multi-search returns one list of search result objects for each index.

##### Federated results relevancy

When performing federated searches, use `federationOptions` in the request's `queries` array to configure the relevancy and the weight of each index:

```bash
curl \
 -X POST 'http://localhost:7700/multi-search' \
 -H 'Content-Type: application/json' \
 --data-binary '{
  "federation": {},
  "queries": [
    {
      "q": "apple red",
      "indexUid": "fruits",
      "filter": "BOOSTED = true",
      "_showRankingScore": true,
      "federationOptions": {
        "weight": 3.0
      }
    },
    {
      "q": "apple red",
      "indexUid": "fruits",
      "_showRankingScore": true,
    }
  ]
}'
```

`federationOptions` must be an object. It supports a single field, `weight`, which must be a positive floating-point number:

-   if `weight` < `1.0`, results from this index are **less** likely to appear in the results
-   if `weight` > `1.0`, results from this index are **more** likely to appear in the results
-   if not specified, `weight` defaults to `1.0`

📖 Consult the [usage page](https://meilisearch.notion.site/v1-10-federated-search-698dfe36ab6b4668b044f735fb40f0b2) for more information about the merge algorithm.

Done by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4769](https://togithub.com/meilisearch/meilisearch/issues/4769).

#### Experimental: `CONTAINS` filter operator

Enable the `containsFilter` experimental feature to use the `CONTAINS` filter operator:

```bash
curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "containsFilter": true
  }'
```

`CONTAINS` filters results containing partial matches to the specified string, similar to a SQL `LIKE`:

```bash
curl \
  -X POST http://localhost:7700/indexes/movies/search \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "q": "super hero",
    "filter": "synopsis CONTAINS spider"
  }'
```

🗣️ This is an experimental feature, and we need your help to improve it! Share your thoughts and feedback on this [GitHub discussion](https://togithub.com/orgs/meilisearch/discussions/763).

Done by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4804](https://togithub.com/meilisearch/meilisearch/issues/4804).

#### Language settings

Use the new `localizedAttributes` index setting and the `locales` search parameter to explicitly set the languages used in document fields and the search query itself. This is particularly useful for <=v1.9 users who have to occasionally resort to alternative Meilisearch images due to language auto-detect issues in [Swedish](https://togithub.com/meilisearch/meilisearch/pull/4604) and [Japanese](https://togithub.com/meilisearch/meilisearch/pull/3882) datasets.

Done by [@&#8203;ManyTheFish](https://togithub.com/ManyTheFish) in [#&#8203;4819](https://togithub.com/meilisearch/meilisearch/issues/4819).

##### Set language during indexing with `localizedAttributes`

Use the newly introduced `localizedAttributes` setting to explicitly declare which languages correspond to which document fields:

```bash
curl \
  -X PATCH 'http://localhost:7700/indexes/movies/settings' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "localizedAttributes": [
      {"locales": ["jpn"], "attributePatterns": ["*_ja"]},
      {"locales": ["eng"], "attributePatterns": ["*_en"]},
      {"locales": ["cmn"], "attributePatterns": ["*_zh"]},
      {"locales": ["fra", "ita"], "attributePatterns": ["latin.*"]},
      {"locales": [], "attributePatterns": ["*"]}
    ]
  }'
```

`locales` is a list of ISO-639-3 language codes to assign to a pattern. The currently supported languages are: `epo`, `eng`, `rus`, `cmn`, `spa`, `por`, `ita`, `ben`, `fra`, `deu`, `ukr`, `kat`, `ara`, `hin`, `jpn`, `heb`, `yid`, `pol`, `amh`, `jav`, `kor`, `nob`, `dan`, `swe`, `fin`, `tur`, `nld`, `hun`, `ces`, `ell`, `bul`, `bel`, `mar`, `kan`, `ron`, `slv`, `hrv`, `srp`, `mkd`, `lit`, `lav`, `est`, `tam`, `vie`, `urd`, `tha`, `guj`, `uzb`, `pan`, `aze`, `ind`, `tel`, `pes`, `mal`, `ori`, `mya`, `nep`, `sin`, `khm`, `tuk`, `aka`, `zul`, `sna`, `afr`, `lat`, `slk`, `cat`, `tgl`, `hye`.

`attributePattern` is a pattern that can start or end with a `*` to match one or several attributes.

If an attribute matches several rules, only the first rule in the list will be applied. If the locales list is empty, then Meilisearch is allowed to auto-detect any language in the matching attributes.

These rules are applied to the `searchableAttributes`, the `filterableAttributes`, and the `sortableAttributes`.

##### Set language at search time with `locales`

The `/search` route accepts a new parameter, `locales`. Use it to define the language used in the current query:

```bash
curl \
  -X POST http://localhost:7700/indexes/movies/search \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "q": "進撃の巨人",
    "locales": ["jpn"]
  }'
```

The `locales` parameter overrides eventual `locales` in the index settings.

#### Experimental: Edit documents with a Rhai function

Use a [Rhai function](https://rhai.rs/) to edit documents in your database directly from Meilisearch:

First, activate the experimental feature:

```bash
curl \
  -X PATCH 'http://localhost:7700/experimental-features/' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "editDocumentsByFunction": true
  }'
```

Then query the `/documents/edit` route with the editing function:

```bash
curl http://localhost:7700/indexes/movies/documents/edit \
  -H 'content-type: application/json' \
  -d '{
   "function": "doc.title = `✨ ${doc.title.to_upper()} ✨`",
   "filter": "id > 3000"
  }'
```

`/documents/edit` accepts three parameters in its payload: `function`, `filter`, and `context`.

`function` must be a string with a Rhai function. `filter` must be a [filter expression.](https://www.meilisearch.com/docs/learn/filtering_and_sorting/filter_expression_reference). `context` must be an object with data you want to make available for the editing function.

📖 More information [here](https://meilisearch.notion.site/Update-Documents-by-Function-0cff8fea7655436592e7c8a6de932062).

🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this [GitHub discussion](https://togithub.com/orgs/meilisearch/discussions/762).

Done by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;4626](https://togithub.com/meilisearch/meilisearch/issues/4626).

#### Experimental AI-powered search: quality of life improvements

For the purpose of future stabilization of the feature, we are applying changes and quality-of-life improvements.

Done by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4801](https://togithub.com/meilisearch/meilisearch/issues/4801), [#&#8203;4815](https://togithub.com/meilisearch/meilisearch/issues/4815), [#&#8203;4818](https://togithub.com/meilisearch/meilisearch/issues/4818), [#&#8203;4822](https://togithub.com/meilisearch/meilisearch/issues/4822).

##### ⚠️ Breaking changes: Changing the parameters of the REST API

The old parameters of the REST API are too numerous and confusing.

Removed parameters: `query` , `inputField`, `inputType`, `pathToEmbeddings` and `embeddingObject`.
Replaced by:

-   `request` : A JSON value that represents the request made by Meilisearch to the remote embedder. The text to embed must be replaced by the placeholder value `“{{text}}”`.
-   `response`: A JSON value that represents a fragment of the response made by the remote embedder to Meilisearch. The embedding must be replaced by the placeholder value `"{{embedding}}"`.

Before:

```json5
// v1.10 version ✅
{
  "source": "rest",
  "url": "https://localhost:10006",
  "request": {
    "model": "minillm",
    "prompt": "{{text}}"
  },
  "response": {
    "embedding": "{{embedding}}"
  }
}
```

```json5
// v1.9 version ❌
{
  "source": "rest",
  "url": "https://localhost:10006",
  "query": {
    "model": "minillm",
  },
  "inputField": ["prompt"],
  "inputType": "text",
  "embeddingObject": ["embedding"]
}
```

> \[!CAUTION]
> This is a breaking change to the configuration of REST embedders.
> Importing a dump containing a REST embedder configuration will fail in v1.10 with an error: "Error: unknown field `query`, expected one of `source`, `model`, `revision`, `apiKey`, `dimensions`, `documentTemplate`, `url`, `request`, `response`, `distribution` at line 1 column 752".

Upgrade procedure:

1.  Remove embedders with source `"rest"`
2.  Update your [Meilisearch Cloud project](https://www.meilisearch.com/docs/learn/update_and_migration/updating#updating-meilisearch-cloud) or [self-hosted Meilisearch instance](https://www.meilisearch.com/docs/learn/update_and_migration/updating#updating-a-self-hosted-meilisearch-instance) as usual

##### Add custom headers to REST embedders

When the `source` of an embedder is set to `rest`, you may include an optional `headers` parameter. Use this to configure custom headers you want Meilisearch to include in the requests it sends the embedder.

Embedding requests sent from Meilisearch to a remote REST embedder always contain two headers:

-   `Authorization: Bearer <apiKey>` (only if `apiKey` was provided)
-   `Content-Type: application/json`

When provided, `headers` should be a JSON object whose keys represent the name of additional headers to send in requests, and the values represent the value of these additional headers.

If `headers` is missing or `null` for a `rest` embedder, only `Authorization` and `Content-Type` are sent, as described above.

If `headers` contains `Authorization` and `Content-Type`, the declared values will override the ones that are sent by default.

Using the `headers` parameter for any other `source` besides `rest` results in an `invalid_settings_embedder` error.

##### Other quality-of-life improvements

📖 More details [here](https://meilisearch.notion.site/v1-10-AI-search-changes-737c9d7d010d4dd685582bf5dab579e2)

-   Add `url` parameter to the OpenAI embedder. `url` should be an URL to the embedding endpoint (including the v1/embeddingspart) from OpenAI. If `url` is missing or `null` for an `openAi` embedder, the default OpenAI embedding route will be used (https://api.openai.com/v1/embeddings).
-   `dimensions` is now available as an optional parameter for `ollama` embedders. Previously it was only available for rest, `openAi` and `userProvided` embedders.
-   Previously `_vectors.embedder` was omitted for documents without at least one embedding for `embedder`. This was inconsistent and prevented the user from checking the value of `regenerate`.
-   When a request to a REST embedder fails, the duration of the exponential backoff is now randomized up to twice its base duration
-   Truncate rather than embed by chunk when OpenAI embeddings are bigger than the max number of tokens
-   Improve error message when indexing documents and embeddings are missing for a user-provided embedder
-   Improve error message when a model configuration cannot be loaded and its "architectures" field does not contain "BertModel"

#### ⚠️ Important change regarding the minimal Ubuntu version compatible with Meilisearch

Because the GitHub Actions runner now enforces the usage of a Node version that is not compatible with Ubuntu 18.04 anymore, we had to upgrade the minimal Ubuntu version compatible with Meilisearch. Indeed, we use these GitHub actions to build and provide our binaries.

Now, Meilisearch is only compatible with Ubuntu 20.04 and later and not with Ubuntu 18.4 anymore.

Done by [@&#8203;curquiza](https://togithub.com/curquiza) in [#&#8203;4783](https://togithub.com/meilisearch/meilisearch/issues/4783).

#### Other improvements

-   Search speed optimization: implement intersection at the end of the search pipeline by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;4717](https://togithub.com/meilisearch/meilisearch/issues/4717)
-   Indexing speed optimization: stop opening indexes to only check if they exist by [@&#8203;Karribalu](https://togithub.com/Karribalu) in [#&#8203;4787](https://togithub.com/meilisearch/meilisearch/issues/4787)
-   Improve tenant token error messages by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4724](https://togithub.com/meilisearch/meilisearch/issues/4724)
-   Add null byte as hard context separator by [@&#8203;LukasKalbertodt](https://togithub.com/LukasKalbertodt) in [meilisearch/charabia#295](https://togithub.com/meilisearch/charabia/issues/295)
-   Adds all [math symbols](https://www.compart.com/en/unicode/category/Sm) to the default separator list by [@&#8203;phillitrOSU](https://togithub.com/phillitrOSU) in [meilisearch/charabia#301](https://togithub.com/meilisearch/charabia/issues/301)
-   Errors emitted at the main level of the Meilisearch binary are now logged with level `ERROR` by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4835](https://togithub.com/meilisearch/meilisearch/issues/4835)

### Fixes 🐞

-   Fix invalid primary key for big numbers [@&#8203;JWSong](https://togithub.com/JWSong) in [#&#8203;4725](https://togithub.com/meilisearch/meilisearch/issues/4725)
-   Fix wrong HTTP status and confusing error message on wrong payload by [@&#8203;Karribalu](https://togithub.com/Karribalu) in [#&#8203;4716](https://togithub.com/meilisearch/meilisearch/issues/4716)
-   Fix the missing geo distance when one or both of the lat/lng are string by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4731](https://togithub.com/meilisearch/meilisearch/issues/4731)
-   Fix errors related to `OffsetDateTime`: use a fixed date format regardless of features by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4850](https://togithub.com/meilisearch/meilisearch/issues/4850)
-   Fix filter that doesn't return valid documents by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4864](https://togithub.com/meilisearch/meilisearch/issues/4864) & [#&#8203;4858](https://togithub.com/meilisearch/meilisearch/issues/4858)

### Misc

-   Dependencies updates
    -   Update most of the dependencies by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4786](https://togithub.com/meilisearch/meilisearch/issues/4786)
    -   Update yaup by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4703](https://togithub.com/meilisearch/meilisearch/issues/4703)
    -   Bump docker/build-push-action from 5 to 6 by [@&#8203;dependabot](https://togithub.com/dependabot) in [#&#8203;4758](https://togithub.com/meilisearch/meilisearch/issues/4758)
    -   Bump zerovec from 0.10.1 to 0.10.4 by [@&#8203;dependabot](https://togithub.com/dependabot) in [#&#8203;4785](https://togithub.com/meilisearch/meilisearch/issues/4785)
    -   Update rustls as much as possible by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4806](https://togithub.com/meilisearch/meilisearch/issues/4806)
-   CIs and tests
    -   Fix CI with Rust v1.79 by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4723](https://togithub.com/meilisearch/meilisearch/issues/4723)
    -   Fix flaky test by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4730](https://togithub.com/meilisearch/meilisearch/issues/4730)
    -   Specify the rust toolchain by [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4706](https://togithub.com/meilisearch/meilisearch/issues/4706)
    -   Add `vX` Docker tag when publishing Docker image by [@&#8203;curquiza](https://togithub.com/curquiza) in [#&#8203;4761](https://togithub.com/meilisearch/meilisearch/issues/4761)
    -   Add search benchmarks by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4762](https://togithub.com/meilisearch/meilisearch/issues/4762)
    -   Add tests on the rest embedder by [@&#8203;irevoire](https://togithub.com/irevoire) and [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4755](https://togithub.com/meilisearch/meilisearch/issues/4755)
    -   Add OpenAI tests by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4846](https://togithub.com/meilisearch/meilisearch/issues/4846)
-   Documentation
    -   Add june 11th webinar banner by [@&#8203;Strift](https://togithub.com/Strift) in [#&#8203;4691](https://togithub.com/meilisearch/meilisearch/issues/4691)
    -   Revert "Add june 11th webinar banner" by [@&#8203;curquiza](https://togithub.com/curquiza) in [#&#8203;4705](https://togithub.com/meilisearch/meilisearch/issues/4705)
    -   Update the README to link more demos by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;4711](https://togithub.com/meilisearch/meilisearch/issues/4711)
    -   Update README.md by [@&#8203;Strift](https://togithub.com/Strift) in [#&#8203;4721](https://togithub.com/meilisearch/meilisearch/issues/4721)
    -   Change the Meilisearch logo to the kawaii version by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;4778](https://togithub.com/meilisearch/meilisearch/issues/4778)
-   Misc
    -   New workload to ignore the initial compression phase by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;4773](https://togithub.com/meilisearch/meilisearch/issues/4773)
    -   Rename the sortable into the filterable movies workload by [@&#8203;Kerollmops](https://togithub.com/Kerollmops) in [#&#8203;4774](https://togithub.com/meilisearch/meilisearch/issues/4774)
    -   Correct apk usages in Dockerfile by [@&#8203;PeterDaveHello](https://togithub.com/PeterDaveHello) in [#&#8203;4781](https://togithub.com/meilisearch/meilisearch/issues/4781)
    -   Make milli use edition 2021 by [@&#8203;hanbings](https://togithub.com/hanbings) in [#&#8203;4770](https://togithub.com/meilisearch/meilisearch/issues/4770)
    -   Allow `MEILI_NO_VERGEN` env var to skip vergen by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4812](https://togithub.com/meilisearch/meilisearch/issues/4812)

❤️ Thanks again to our external contributors:

-   [Meilisearch](https://togithub.com/meilisearch/meilisearch): [@&#8203;Karribalu](https://togithub.com/Karribalu), [@&#8203;hanbings](https://togithub.com/hanbings), [@&#8203;junhochoi](https://togithub.com/junhochoi), [@&#8203;JWSong](https://togithub.com/JWSong), [@&#8203;PeterDaveHello](https://togithub.com/PeterDaveHello).
-   [Charabia](https://togithub.com/meilisearch/charabia): [@&#8203;LukasKalbertodt](https://togithub.com/LukasKalbertodt), [@&#8203;phillitrOSU](https://togithub.com/phillitrOSU).

### [`v1.9.1`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.9.1): 🦎

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.9.0...v1.9.1)

#### Fixes 🪲

-   Return an empty list of embeddings for embedders that have no document for an embedder. by [@&#8203;dureuill](https://togithub.com/dureuill) in [https://github.com/meilisearch/meilisearch/pull/4889](https://togithub.com/meilisearch/meilisearch/pull/4889)

This fixes an issue where dumps created for indexes with:

1.  A user-provided embedder
2.  At least one documents that opt-out of vectors for that user-provided embedder

would fail to import correctly.

#### Upgrade path to v1.10.0 🚀

If you are a Cloud user affected by the above issue, please contact customer support so we perform the upgrade for you.

If you are an OSS user affected by the above, perform the following operations:

1.  Upgrade from v1.9.0 to v1.9.1 without using a dump
2.  Upgrade to v1.10.0 using a dump created from v1.9.1

[**Full Changelog**](https://togithub.com/meilisearch/meilisearch/compare/v1.9.0...v1.9.1)

### [`v1.9.0`](https://togithub.com/meilisearch/meilisearch/releases/tag/v1.9.0): 🦎

[Compare Source](https://togithub.com/meilisearch/meilisearch/compare/v1.8.4...v1.9.0)

Meilisearch v1.9 includes performance improvements for hybrid search and the addition/updating of settings. This version benefits from multiple requested features, such as the new `frequency` matching strategy and the ability to retrieve similar documents.



🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.



Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).

### New features and updates 🔥

#### Hybrid search updates

This release introduces multiple [hybrid search updates](https://meilisearch.notion.site/v1-9-AI-search-changes-e90d6803eca8417aa70a1ac5d0225697#38e6d3adf40e4ef1be14a3c4be39df94).

Done by [@&#8203;dureuill](https://togithub.com/dureuill) and [@&#8203;irevoire](https://togithub.com/irevoire) in [#&#8203;4633](https://togithub.com/meilisearch/meilisearch/issues/4633) and [#&#8203;4649](https://togithub.com/meilisearch/meilisearch/issues/4649)

##### ⚠️ Breaking change: Empty `_vectors.embedder` arrays

Empty `_vectors.embedder` arrays are now interpreted as having no vector embedding.

Before v1.9, Meilisearch interpreted these as a single embedding of dimension 0. This change follows user feedback that the previous behavior was unexpected and unhelpful.

##### ⚠️ Breaking change: `_vectors` field no longer present in search results

When the experimental `vectorStore` feature is enabled, Meilisearch no longer includes `_vectors` in returned search results by default. This will considerably improve performance.

Use the new `retrieveVectors` search parameter to display the `_vectors` field:

```sh
curl \
  -X POST 'http://localhost:7700/indexes/INDEX_NAME/search' \
  -H 'Content-Type: application/json' \
  --data-binary '{
    "q": "SEARCH QUERY",
    "retrieveVectors": true
  }'
```

##### ⚠️ Breaking change: Meilisearch no longer preserves the exact representation of embeddings appearing in `_vectors`

In order to save storage and run faster, Meilisearch is no longer storing your vector "as-is". Meilisearch now returns the float in a canonicalized representation rather than the user-provided representation.

For example, `3` may be represented as `3.0`

##### Document `_vectors` accepts object values

The document `_vectors` field now accepts objects in addition to embedding arrays:

```json
{
  "id": 42,
  "_vectors": {
    "default": [0.1, 0.2 ],
    "text": {
      "embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],
      "regenerate": false
    },
    "translation": {
      "embeddings": [0.1, 0.2, 0.3, 0.4],
      "regenerate": true
    }
  }
}
```

The `_vectors` object may contain two fields: `embeddings` and `regenerate`.

If present, `embeddings` will replace this document's embeddings.

`regenerate` must be either `true` or `false`. If `regenerate: true`, Meilisearch will overwrite the document embeddings each time the document is updated in the future. If `regenerate: false`, Meilisearch will keep the last provided or generated embeddings even if the document is updated in the future.

This change allows importing embeddings to autoembedders as a one-shot process, by setting them as `regenerate: true`. This change also ensures embeddings are not regenerated when importing a dump created with Meilisearch v1.9.

Meilisearch v1.9.0 also improves performance when indexing and using hybrid search, avoiding useless operations and optimizing the important ones.

#### New feature: Ranking score threshold

Use `rankingScoreThreshold` to exclude search results with low ranking scores:

```bash
curl \
 -X POST 'http://localhost:7700/indexes/movies/search' \
 -H 'Content-Type: application/json' \
 --data-binary '{
    "q": "Badman dark returns 1",
    "showRankingScore": true,
    "limit": 5,
    "rankingScoreThreshold": 0.2
 }'
```

Meilisearch does not return any documents below the configured threshold. Excluded results do not count towards `estimatedTotalHits`, `totalHits`, and facet distribution.

⚠️ For performance reasons, if the number of documents above `rankingScoreThreshold` is higher than `limit`, Meilisearch does not evaluate the ranking score of the remaining documents. Results ranking below the threshold are not immediately removed from the set of candidates. In this case, Meilisearch may overestimate the count of `estimatedTotalHits`, `totalHits` and facet distribution.

Done by [@&#8203;dureuill](https://togithub.com/dureuill) in [#&#8203;4666](https://togithub.com/meilisearch/meilisearch/issues/4666)

#### New feature: Get similar documents endpoint

This release introduces a new AI-powered search feature allowing you to send a document to Meilisearch and receive a list of similar documents in return.

Use the `/indexes/{indexUid}/similar` endpoint to query Meilisearch for related documents:

```sh
curl \
  -X POST /indexes/:indexUid/similar
  -H 'Content-Type: application/json' \
  --data-binary '{
    "id": "23",
    "offset": 0,
    "limit": 2,
    "filter": "release_date > 1521763199",
    "embedder": "default",
    "attributesToRetrieve": [],
    "showRankingScore": false,
    "showRankingScoreDetails": false
  }'
```

-   `id`: string indicating the document needing similar results, required
-   `offset`: number of results to skip when paginating, optional, defaults to `0`
-   `limit`: number of results to display, optional, defaults to `20`
-   `filter`: string with a filter expression Meilisearch should apply to the results, optional, defaults to `null`
-   `embedder`: string indicating the embedder Meilisearch should use to retrieve similar documents, optional, defaults to `"default"`
-   `attributesToRetrieve`: array of strings ind

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on the first day of the month" (UTC), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants