Releases: meilisearch/meilisearch
v1.9.0-rc.3 🦎
Bug fixes
- Fix a meilisearch freeze that could happen under heavy search loads by @dureuill in #4681 -- Note that this bug is already fixed in Meilisearch v1.8.2
Breaking changes
- The
_vectors
field is not returned anymore when retrieving documents; you must use theretrieveVector
parameter instead - When retrieving the
_vectors
field with theretrieveVector
parameter, their embeddings are not returned "as-is"; they'll always be returned with the maximum precision - When specifying or retrieving vectors, the
userProvided
field has been removed in favor of a newregenerate
field that better represents your intent. When set totrue
it means the embeddings will be regenerated on every change to the document (default behavior). If set tofalse
the embeddings will never be updated by the engine. - Dumps with embeddings created from previous RCs cannot be imported into the new RC
Improvements
- Speed Up Filter ANDs operations by @Kerollmops in #4682
- Speedup the vector store and reduce the size of the database by @irevoire and @dureuill in #4649
- Define your distinct attributes at search time by @Kerollmops in #4693
Misc
- Fix ci tests by @ManyTheFish in #4685
Full Changelog: v1.9.0-rc.2...v1.9.0-rc.3
v1.8.2 🪼
v1.9.0-rc.2 🦎
Meilisearch v1.9 includes performance improvements for hybrid search and the addition/updating of settings. This version benefits from multiple requested features, such as the new frequency
matching strategy and the ability to retrieve similar documents.
Speedup additional searchable Attributes by @Kerollmops in #4680
When adding new fields in the searchableAttributes
setting, the engine will only index the additional attributes instead of recomputing all the searchable attributes.
Update Charabia v0.8.11 by @ManyTheFish in #4684
The words containing œ
or æ
will be retrieved using oe
or ae
, like Daemon
<=> Dæmon
.
Misc
Fix: Test CI failing when enabling/disabling some features #4629
v1.9.0-rc.1 🦎
Meilisearch v1.9 includes performance improvements for hybrid search and the addition/updating of settings. This version benefits from multiple requested features, such as the new frequency
matching strategy and the ability to retrieve similar documents.
New features and updates 🔥
Filter by score
To filter returned documents by their ranking score, a new rankingScoreThreshold
parameter has been added to the search and similar routes.
When a rankingScoreThreshold
is provided, the results of the search/similar request are modified in the following way:
- No document whose
_rankingScore
is under therankingScoreThreshold
is returned - Any document encountered during the search that is under the threshold is removed from the set of candidates and won’t count towards the
estimatedTotalHits
,totalHits
and the facet distribution.
Examples
request without score threshold:
POST /indexes/movies/search
{
"q": "Badman dark returns 1",
"showRankingScore": true,
"limit": 5
}
results:
{
"hits": [
{
"title": "Batman the dark knight returns: Part 1",
"id": "A",
"_rankingScore": 0.93430081300813
},
{
"title": "Batman the dark knight returns: Part 2",
"id": "B",
"_rankingScore": 0.6685627880184332
},
{
"title": "Badman",
"id": "E",
"_rankingScore": 0.25
},
{
"title": "Batman Returns",
"id": "C",
"_rankingScore": 0.11553030303030302
},
{
"title": "Batman",
"id": "D",
"_rankingScore": 0.11553030303030302
}
],
"query": "Badman dark returns 1",
"processingTimeMs": 11,
"limit": 5,
"offset": 0,
"estimatedTotalHits": 62
}
request with score threshold:
POST /indexes/movies/search
{
"q": "Badman dark returns 1",
"showRankingScore": true,
"limit": 5
"rankingScoreThreshold": 0.2
}
results:
{
"hits": [
{
"title": "Batman the dark knight returns: Part 1",
"id": "A",
"_rankingScore": 0.93430081300813
},
{
"title": "Batman the dark knight returns: Part 2",
"id": "B",
"_rankingScore": 0.6685627880184332
},
{
"title": "Badman",
"id": "E",
"_rankingScore": 0.25
}
],
"query": "Badman dark returns 1",
"processingTimeMs": 11,
"limit": 5,
"offset": 0,
"estimatedTotalHits": 3
}
Known limitations
limit
hits above the rankingScoreThreshold
, then the ranking score of the remaining documents is not evaluated, and so they are not removed from the set of candidates, even if their ranking score would be below the threshold.
As a result, in this configuration the estimatedTotalHits
, totalHits
and the facet distribution may be overapproximation of their values.
Other improvements
Misc
See also the changelog for v1.9.0-rc.0
v1.9.0-rc.0 🦎
Meilisearch v1.9 includes performance improvements for hybrid search and the addition/updating of settings. This version benefits from multiple requested features, such as the new frequency
matching strategy and the ability to retrieve similar documents.
New features and updates 🔥
Hybrid search improvements
Since we're focusing on AI innovation, this version introduces multiple improvements and changes related to hybrid search.
More detailed changelog here.
Done by @dureuill and @irevoire in #4633 and #4649
⚠️ Breaking changes of hybrid search usage
- Before v1.9, an empty array in
_vectors.embedder
used to be interpreted as a single embedding of dimension 0 when specifying embeddings in documents. In v1.9 it is now interpreted as 0 embedding. The previous behavior was surprising and not useful.
Improvements
Meilisearch v1.9.0 improves performance when indexing and using hybrid search, avoiding useless operations and optimizing the important ones.
Get similar documents
To retrieve similar documents in your datasets, two new routes have been introduced
POST /indexes/:indexUid/similar
using parameters in the request body.GET /indexes/:indexUid/similar
, using query URL parameters.
POST /indexes/:indexUid/similar
{
// Mandatory: the external id of the target document
"id": "23",
// Optional, defaults to 0: how many results to skip
"offset": 0,
// Optional, defaults to 20: how many results to display
"limit": 2,
// Optional, defaults to null: an additional filter for the returned documents
"filter": "release_date > 1521763199",
// Optional, defaults to the default embedder: name of the embedder to use
// for computing recommendations.
"embedder": "default",
// Optional, defaults to null: same as the search query parameter of the same name
"attributesToRetrieve": [],
// Both optional, defaults to false: allow displaying the ranking score
// (resp. detailed ranking score)
"showRankingScore": false,
"showRankingScoreDetails": false
}
frequency
matching strategy when searching
A frequency
variant to the matchingStrategy
search parameter has been added. This favors the least frequent query words when retrieving the documents.
curl \
-X POST 'http://localhost:7700/indexes/movies/search' \
-H 'Content-Type: application/json' \
--data-binary '{
"q": "chaval blanc",
"matchingStrategy": "frequency"
}'
Previous existing values for matchingStrategy
are last
and all
(last
is the default value).
Done by @ManyTheFish in #4667
Improve indexing speed when updating/adding settings
Meilisearch now limits operations when importing settings by avoiding useless writing operations in its internal database and by reducing disk usage.
Done by @irevoire and @Kerollmops in #4646, #4656 and #4631
Other improvements
- Prometheus experimental feature: Use HTTP path pattern instead of full path in metrics (#4619) @gh2k
⚠️ RemoveexportPuffinReport
experimental feature. Use logs routes and logs modes instead (#4655) @Kerollmops
Fixes 🐞
- When no searchable attributes are declared, all the fields have the same importance instead of being randomly given more importance. More information here (#4631) @irevoire
- Fix
searchableAttributes
behavior with nested fields when they were not explicitly defined. More information here (#4631) @irevoire - Fix security issue in dependency: bump Rustls to non-vulnerable versions (#4622) @Kerollmops
Misc
- CIs and tests
- Documentation
- Misc
❤️ Thanks again to our external contributors:
v1.8.1 🪼
Fixes 🪲
- Index the _geo fields when changing the setting while there are already documents in the DB by @irevoire and @ManyTheFish in #4642
v1.8.0 🪼
Meilisearch v1.8 introduces new changes and optimizations related to the Hybrid search with the addition of new models and embedders like REST embedders and the Ollama model. This version also focuses on stability by adding more security around the search requests. Finally, we introduce the negative operator to exclude specific terms from a search query.
🧰 All official Meilisearch integrations (including SDKs, clients, and other tools) are compatible with this Meilisearch release. Integration deployment happens between 4 to 48 hours after a new version becomes available.
Some SDKs might not include all new features. Consult the project repository for detailed information. Is a feature you need missing from your chosen SDK? Create an issue letting us know you need it, or, for open-source karma points, open a PR implementing it (we'll love you for that ❤️).
New features and updates 🔥
Hybrid search
This release introduces a few changes to hybrid search.): a new distribution
embedder setting, support for two new embedder sources, and breaking changes to hybrid and semantic search ranking score.
🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.
Done by @dureuill and @jakobklemm in #4456, #4537, #4509, #4548, #4549.
⚠️ Breaking changes: _semanticScore
To increase search response times and reduce bandwidth usage:
- Meilisearch no longer returns the
vector
field will in the search response - Meilisearch no longer returns the
_semanticScore
in the search response. Use_rankingScore
in its place - Meilisearch no longer displays the query vector and its value when
"showRankingScoreDetails": true
New embedders: Ollama and generic REST embedder
Ollama model
Ollama is a framework for building and running language models locally. Configure it by supplying an embedder object to the /settings
endpoint:
"default": {
"source": "ollama",
"url": "http://localhost:11434/api/embeddings", // optional, fetched from MEILI_OLLAMA_URL environment variable if missing
"apiKey": "<foobarbaz>", // optional
"model": "nomic-embed-text",
"documentTemplate": "A document titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
}
Generic REST embedder
Meilisearch now also supports any embedder with a RESTful interface. Configure it by supplying an embedder object to the /settings
endpoint:
"default": {
"source": "rest",
"url": "http://localhost:12345/api/v1/embed", //Mandatory, full URL to the embedding endpoint
"apiKey": "187HFLDH97CNHN", // Optional, passed as Bearer in the Authorization header
"dimensions": 512, // Optional, inferred with a dummy request if missing
"documentTemplate": "A document titled '{{doc.title}}' whose description starts with {{doc.overview|truncatewords: 20}}"
"inputField": ["data", "text"], // Optional, defaults to []
"inputType": "text", // Optional, either "text" or "textArray", defaults to text
"query": { // Optional, defaults to {}
"model": "MODEL_NAME",
"dimensions": 512
},
"pathToEmbeddings": ["data"], // Optional, defaults to []
"embeddingObject": ["embedding"] // Optional, defaults to []
}
New embedder setting: distribution
Use distribution
to apply an affine transformation to the _rankingScore
of semantic search results. This can help to compare _rankingScore
s of semantic and keyword search results and improve result ranking.
"default": {
"source": "huggingFace",
"model": "MODEL_NAME",
"distribution": { // describes the natural distribution of results
"mean": 0.7, // mean value
"sigma": 0.3 // variance
}
}
Other hybrid search improvements
- Hide the API key in settings and task queue (#4533) @dureuill
- Return keyword search results even in case of a failure of the embedding when performing hybrid searches (#4548) @dureuill
- For hybrid or semantic search requests, add a
semanticHitCount
field at the top of the search response indicating the number of hits originating from the semantic search (#4548) @dureuill
New feature: Negative keywords
Search queries can now contain a negative keyword to exclude terms from the search. Use the -
operator in front of a word or a phrase to make sure no document that contains those words are shown in the results:
curl \
-X POST 'http://localhost:7700/indexes/places/search' \
-H 'Content-Type: application/json' \
--data-binary '{ "q": "-escape room" }'
-escape
returns any document that does not containescape
-escape room
returns documents containingroom
but notescape
-"on demand"
returns any document that does not contain"on demand"
Done by @Kerollmops in #4535.
Search robustness updates
Search cutoff
To avoid crashes and performance issues, Meilisearch now interrupts search requests that take more than 1500ms to complete.
Use the /settings
endpoint to customize this value:
curl \
-X PATCH 'http://localhost:7700/indexes/movies/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"searchCutoffMs": 150
}'
The default value of the searchCutoffMs
setting is null
and corresponds to a 1500ms timeout.
Concurrent search request limits
This release introduces a limit for concurrent search requests to prevent Meilisearch from consuming an unbounded amount of RAM and crashing.
The default number of requests in the queue is 1000. Relaunch your self-hosted instance with --experimental-search-queue-size
to change this limit:
./meilisearch --experimental-search-queue-size 100
👉 This limit does NOT impact the search performance. It only affects the number of enqueued search requests to prevent security issues.
🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.
Other improvements
- Increase indexing speed when updating settings (#4504) @ManyTheFish
- Update search logs: do not display
hits
in the search output for DEBUG log level (#4580) @irevoire - The
sortFacetValuesBy
setting now impacts the/facet-search
route (#4476) @Kerollmops - Prometheus experimental feature: add
status
code label to the HTTP request counter (#4373) @rohankmr414 - Tokenizer improvements by bumping charabia to 0.8.8 (#4511) @6543
- Support markdown formatted code blocks
- Improve Korean segmentation to correctly use the context ID registered in the dictionary
- Add
\t
as recognized separator - Make the pinyin-normalization optional - this can be reactivated by enabling the
chinese-normalization-pinyin
feature
Fixes 🐞
- Fix crash when putting empty separator (#4574) @ManyTheFish
- Stop crashing when panic occurs in thread pool (#4593) @Kerollmops
- Always show facet numbers in alpha order in the facet distribution (#4581) @Kerollmops
- Prometheus experimental feature: fix the HTTP request duration histogram bucket boundaries to follow the OpenTelemetry spec (#4530) @rohankmr414
- Hybrid search experimental feature: fix an error on Windows when generating embeddings (#4549) @dureuill
Misc
- Dependency updates
- CIs and tests
- Add automation to create openAPI issues (#4520) @curquiza
- Add tests to check when the field limit is reached (#4463) @irevoire
- Allow running benchmarks without sending results to the dashboard (#4475) @dureuill
- Create automation when creating GitHub milestones to create update-version issue (#4416) @curquiza
- Fix reason param when benches are triggered from a comment (#4483) @dureuill
- Documentation
- Fix milli link in contributing doc (#4499) @mohsen-alizadeh
- Fix some typos in comments (#4546) @redistay
- Remove repetitive words in Benchmark docs (#4526) @availhang
- Remove repetitive words in code-base comments (#4491) @shuangcui
- Update sprint_issue.md (#4516) @curquiza
- Add documentation for benchmarks (#4477) @dureuill
- Fix typos (#4542) @brunoocasali
- Misc
❤️ Thanks again to our external contributors:
v1.8.0-rc.2 🪼
What's Changed
- Remove useless analytics by @irevoire in #4578
- Stop crashing when panic occurs in thread pool by @Kerollmops in #4593
- Fix embedders api by @ManyTheFish in #4600
- Fix embeddings settings update by @ManyTheFish in #4597
v1.8.0-rc.1 🪼
What's Changed
- Avoid clearing db in transform by @ManyTheFish in #4504
- Update the search logs by @irevoire in #4580
- Always show facet numbers in alpha order in the facet distribution by @Kerollmops in #4581
- increase the default search time budget from 150ms to 1.5s by @irevoire in #4576
- Update charabia v0.8.9 by @ManyTheFish in #4583
- Remove pinyin normalization
\t
is now part of the default separators
v1.8.0-rc.0 🪼
Meilisearch v1.8 introduces new changes and optimizations related to the Hybrid search with the addition of new models and embedders like REST embedders and the Ollama model. This version also focuses on stability by adding more security around the search requests. Finally, we introduce the negative operator to exclude specific terms from a search query.
New features and improvements 🔥
Hybrid search improvements
Full description of hybrid search changes here.
🗣️ This is an experimental feature and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.
Done by @dureuill and @jakobklemm in #4456, #4537, #4509, #4548, #4549.
⚠️ Breaking changes of hybrid search usage
- To ease the search answer speed and bandwidth, Meilisearch no longer returns the query vector in the search response. The
vector
field will not be displayed. _semanticScore
is no longer returned in the search response. The_rankingScore
field has the same value as the_semanticScore
, and should be used in its place. To get the_rankingScore
value, add"showRankingScore": true
to the search query.- When adding
"showRankingScoreDetails": true
to a semantic search query, the vector and itsvalue
are no longer displayed to improve the search speed and bandwidth use.
New embedders: generic REST embedder and Ollama model
New embedder sources have been added
ollama
sourcerest
source
REST embedder
Meilisearch now supports any REST embedder. You can set them up with the following configuration:
"default": {
"source": "rest", // 👈 Use the REST source
"url": "http://localhost:12345/api/v1/embed",
// ☝️ Mandatory, full URL to the embedding endpoint
"apiKey": "187HFLDH97CNHN",
// ☝️ optional, will be passed as Bearer in the Authorization header
"dimensions": 512,
// ☝️ optional, inferred with a dummy request if missing
"documentTemplate": "blabla",
"inputField": ["data", "text"],
// ☝️ inject texts in data.text in the query
// Optional, defaults to []
"inputType": "text", // text or textArray
// ☝️ inject a single text
// Optional, defaults to text
"query": {
// A JSON object describing other fields to send in a query
// for example
"model": "name-of-your-model",
"dimensions": 512
},
// ☝️ A JSON object describing other fields to send in a query
// Optional, defaults to {}
"pathToEmbeddings": ["data"],
// ☝️ look at embeddings in "data" in the response
// Optional, defaults to []
"embeddingObject": ["embedding"]
// ☝️ look at the embedding inside of "embedding"
// Optional, defaults to []
}
Here is an example of setting OpenAI embedder with the rest
source:
{
"source": "rest",
"apiKey": "<your-openai-api-key>",
"dimensions": 1536,
"url": "https://api.openai.com/v1/embeddings",
"query": {
"model": "text-embedding-ada-002"
},
"inputField": ["input"],
"inputType": "textArray",
"pathToEmbeddings": ["data"],
"embeddingObject": ["embedding"]
}
Ollama model
Here is how to set up the Ollama model:
"default": {
"source": "ollama", // 👈 Use the Ollama source
"url": "http://localhost:11434/api/embeddings",
// ☝️ optional, fetched from MEILI_OLLAMA_URL environment variable if missing
"apiKey": "<foobarbaz>",
// ☝️ optional
"model": "nomic-embed-text",
"documentTemplate": "blabla" // like for openAI and huggingFace sources
}
Expose the distribution
shift setting
When setting an embedder, you can now set the distribution
shift.
"default": {
"source": "huggingFace", // supported for any source
"model": "some/model",
"distribution": { // describes the natural distribution of results
"mean": 0.7, // mean value
"sigma": 0.3 // variance
}
}
The “distribution shift” is an affine transformation applied to the _rankingScore
of a semantic search result with the aim of making the comparison to the _rankingScore
of a keyword search result more meaningful.
Other hybrid search improvements
- Hide the API key in settings and task queue (#4533) @dureuill
- Return the keyword search results even in case of a failure of the embedding (#4548) @dureuill
- For hybrid or semantic search requests, add a
semanticHitCount
field at the top of the search response indicating the number of hits originating from the semantic search (#4548) @dureuill
Support negative keyword when searching
Search queries can now contain a negative keyword to exclude terms from the search. Use the -
operator in front of a word or a phrase to make sure no document that contains those words are shown in the results.
-escape
returns a placeholder search without any document contains theescape
word.-escape room
returns only documents containing theroom
word but not theescape
one.-"on demand"
returns a placeholder search but without any document containing the"on demand"
phrase.
Done by @Kerollmops in #4535.
Search robustness improvements
Add a search cutoff
To avoid any crash and performance issues, Meilisearch now stops search requests lasting more than 150ms.
If you want to customize this value, you can update the searchCutoffMs
settings (value in ms):
curl \
-X PATCH 'http://localhost:7700/indexes/movies/settings' \
-H 'Content-Type: application/json' \
--data-binary '{
"searchCutoffMs": 50
}'
The default value of the searchCutoffMs
setting is null
and corresponds to 150ms.
Limit concurrent search requests
Meilisearch now limits the number of search requests waiting to be processed to avoid consuming an unbounded amount of RAM and crashing. So a queue of search requests waiting to be processed has been introduced.
👉 This change does NOT impact the search performance, but only the number of enqueued search requests to prevent from any security issues.
The default number of requests in the queue is 1000.
To change this limit, use the experimental CLI flag:
./meilisearch --experimental-search-queue-size 100
🗣️ This is an experimental flag and we need your help to improve it! Share your thoughts and feedback on this GitHub discussion.
Other improvements
- The
sortFacetValuesBy
setting now impacts the/facet-search
route (#4476) @Kerollmops - Related to Prometheus experimental feature: add
status
code label to the HTTP request counter (#4373) @rohankmr414 - Tokenizer improvements by bumping charabia to 0.8.8 (#4511) @6543
- Support markdown formatted code blocks
- Improve Korean segmentation to correctly use the context ID registered in the dictionary
Fixes 🐞
- Related to Prometheus experimental feature: fix the HTTP request duration histogram bucket boundaries to follow the OpenTelemetry spec (#4530) @rohankmr414
- Related to Hybrid search experimental feature: fix an error on Windows when generating embeddings (#4549) @dureuill
Misc
- Dependencies upgrade
- CIs and tests
- Add automation to create openAPI issue (#4520) @curquiza
- Add tests when the field limit is reached (#4463) @irevoire
- Allow running benchmarks without sending results to the dashboard (#4475) @dureuill
- Create automation when creating Milestone to create update-version issue (#4416) @curquiza
- Fix reason param when benches are triggered from a comment (#4483) @dureuill
- Documentation
- Fix milli link in contributing doc (#4499) @mohsen-alizadeh
- Fix some typos in comments (#4546) @redistay
- Remove repetitive words in Benchmark docs (#4526) @availhang
- Remove repetitive words in code-base comments (#4491) @shuangcui
- Update sprint_issue.md (#4516) @curquiza
- Add documentation for benchmarks (#4477) @dureuill
- Fix typos (#4542) @brunoocasali
- Misc
❤️ Thanks again to our external contributors: