-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove fieldnorm #1134
Comments
Context: It's purpose is to compute the BM25 similarity metric (you can consider it a refined version of TfIdf). Until recently, producing fieldnorms was not optional in tantivy. In Quickwit, we do not support BM25 at the moment so this datastructure is perfectly useless. |
Since the options are set using https://docs.rs/tantivy/latest/tantivy/schema/struct.TextOptions.html#method.set_indexing_options which takes in the TextFieldIndexing configuration. So I believe the only change to not use fieldnorms should be like so: In case I am missing something important let me know, Ill take another look otherwise ill open a PR to fix this |
What would be nice is to be able to choose if we want a fieldnorm or not. By default, no fieldnorm. Can you add a parameter to configure this new option? |
@fmassot diff --git a/quickwit-doc-mapper/src/default_doc_mapper/field_mapping_entry.rs b/quickwit-doc-mapper/src/default_doc_mapper/field_mapping_entry.rs
index 7c6897d..49942fc 100644
--- a/quickwit-doc-mapper/src/default_doc_mapper/field_mapping_entry.rs
+++ b/quickwit-doc-mapper/src/default_doc_mapper/field_mapping_entry.rs
@@ -486,6 +486,8 @@ struct FieldMappingEntryForSerialization {
#[serde(skip_serializing_if = "Option::is_none")]
indexed: Option<bool>,
#[serde(skip_serializing_if = "Option::is_none")]
+ fieldnorms: Option<bool>,
+ #[serde(skip_serializing_if = "Option::is_none")]
tokenizer: Option<String>,
#[serde(skip_serializing_if = "Option::is_none")]
record: Option<IndexRecordOption>,
@@ -527,6 +529,7 @@ impl From<FieldMappingEntry> for FieldMappingEntryForSerialization {
let type_with_cardinality = value.mapping_type.type_with_cardinality();
let mut fast = false;
let mut indexed = None;
+ let mut fieldnorms = None;
let mut record = None;
let mut stored = false;
let mut tokenizer: Option<String> = None;
@@ -538,6 +541,7 @@ impl From<FieldMappingEntry> for FieldMappingEntryForSerialization {
record = Some(indexing_options.index_option());
} else {
...skipping 1 line
+ fieldnorms = Some(false);
}
}
FieldMappingType::I64(options, _)
@@ -561,6 +565,7 @@ impl From<FieldMappingEntry> for FieldMappingEntryForSerialization {
type_with_cardinality,
fast,
indexed,
+ fieldnorms,
record,
stored,
tokenizer,
@@ -602,7 +607,7 @@ impl FieldMappingEntryForSerialization {
if self.indexed.unwrap_or(true) {
let mut indexing_options = TextFieldIndexing::default();
if let Some(index_option) = self.record {
- indexing_options = indexing_options.set_index_option(index_option);
+ indexing_options = indexing_options.set_index_option(index_option).set_fieldnorms(self.fieldnorms.unwrap_or(false));
}
if let Some(tokenizer) = &self.tokenizer {
indexing_options = indexing_options.set_tokenizer(tokenizer);
@@ -796,6 +801,30 @@ mod tests {
Ok(())
}
...skipping 1 line
+ #[test]
+ fn test_deserialize_valid_fieldnorms() -> anyhow::Result<()> {
+ let result = serde_json::from_str::<FieldMappingEntry>(
+ r#"
+ {
+ "name": "my_field_name",
+ "type": "text",
+ "stored": true,
+ "indexed": true,
+ "fieldnorms": true,
+ "record": "basic",
+ "tokenizer": "english"
+ }"#);
+ match result.unwrap().mapping_type {
+ FieldMappingType::Text(options, _) => {
+ assert_eq!(options.is_stored(), true);
+ let index_options = options.get_indexing_options().unwrap();
+ assert_eq!(index_options.fieldnorms(), true);
+ }
+ _ => panic!("wrong property type"),
+ }
+ Ok(())
+ }
+
...skipping 1 line
fn test_error_on_text_with_invalid_options() -> anyhow::Result<()> {
let result = serde_json::from_str::<FieldMappingEntry>(
@@ -938,6 +967,7 @@ mod tests {
match result.mapping_type {
FieldMappingType::I64(options, cardinality) => {
assert_eq!(options.is_indexed(), true); // default
+ assert_eq!(options.fieldnorms(), false); // default
assert_eq!(options.is_fast(), false); // default
assert_eq!(options.is_stored(), true); // default
assert_eq!(cardinality, Cardinality::MultiValues); It seems to be passing all the tests and the custom test I wrote, however I am not sure. If it helps to make it more readable i will open a pull request but I am not sure if I am supposed to fork first or directly push by creating a separate branch. Thank you! |
I think you have the rights to create a branch, push it and open a PR. |
* Add option for fieldnorm * Updated backward compatibility tests. Closes #1134
Tantivy now makes it possible to not index fieldnorm. This is perfectly useless for logs so we might as well disable it in quickwit. This should make all index a few percent smaller.
The text was updated successfully, but these errors were encountered: