Skip to content

Commit

Permalink
feat(search): improve search index by chunked separated by h2 and h3 (#…
Browse files Browse the repository at this point in the history
  • Loading branch information
dillonzq authored Apr 29, 2020
1 parent 108679e commit 3096ff6
Show file tree
Hide file tree
Showing 8 changed files with 32 additions and 25 deletions.
2 changes: 1 addition & 1 deletion assets/js/theme.min.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion assets/js/theme.min.js.map

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions exampleSite/config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ enableEmoji = true
# type of search engine ("lunr", "algolia")
type = "algolia"
# max index length of the chunked content
contentLength = 5000
contentLength = 4000
# placeholder of the search bar
placeholder = ""
# max number of results length
Expand Down Expand Up @@ -310,7 +310,7 @@ enableEmoji = true
# 搜索引擎的类型 ("lunr", "algolia")
type = "algolia"
# 文章内容最长索引长度
contentLength = 5000
contentLength = 4000
# 搜索框的占位提示语
placeholder = ""
# 最大结果数目
Expand Down Expand Up @@ -500,7 +500,7 @@ enableEmoji = true
# type of search engine ("lunr", "algolia")
type = "algolia"
# max index length of the chunked content
contentLength = 5000
contentLength = 4000
# placeholder of the search bar
placeholder = ""
# max number of results length
Expand Down
8 changes: 4 additions & 4 deletions exampleSite/content/posts/theme-documentation-basics.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ Please open the code block below to view the complete sample configuration :(far
# type of search engine ("lunr", "algolia")
type = "lunr"
# max index length of the chunked content
contentLength = 5000
contentLength = 4000
# placeholder of the search bar
placeholder = ""
# {{< version 0.2.1 >}} max number of results length
Expand Down Expand Up @@ -972,7 +972,7 @@ Here is the search configuration in your [site configuration](#site-configuratio
# type of search engine ("lunr", "algolia")
type = "lunr"
# max index length of the chunked content
contentLength = 5000
contentLength = 4000
# placeholder of the search bar
placeholder = ""
# {{< version 0.2.1 >}} max number of results length
Expand All @@ -992,8 +992,8 @@ The following is a comparison of two search engines:
but high bandwidth and low performance (Especially for Chinese which needs a large segmentit library)
* `algolia`: high performance and low bandwidth, but need to synchronize `index.json` and limit for `contentLength`

{{< version 0.2.1 >}} The content of the post is separated by `h2` HTML tag to improve query performance and basically implement full-text search.
`contentLength` is used to limit the max index length of the part starting with `h2` HTML tag.
{{< version 0.2.3 >}} The content of the post is separated by `h2` and `h3` HTML tag to improve query performance and basically implement full-text search.
`contentLength` is used to limit the max index length of the part starting with `h2` and `h3` HTML tag.
{{< /admonition >}}

{{< admonition tip "Tips about algolia" >}}
Expand Down
8 changes: 4 additions & 4 deletions exampleSite/content/posts/theme-documentation-basics.fr.md
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ Please open the code block below to view the complete sample configuration :(far
# type of search engine ("lunr", "algolia")
type = "lunr"
# max index length of the chunked content
contentLength = 5000
contentLength = 4000
# placeholder of the search bar
placeholder = ""
# {{< version 0.2.1 >}} max number of results length
Expand Down Expand Up @@ -977,7 +977,7 @@ Here is the search configuration in your [site configuration](#site-configuratio
# type of search engine ("lunr", "algolia")
type = "lunr"
# max index length of the chunked content
contentLength = 5000
contentLength = 4000
# placeholder of the search bar
placeholder = ""
# {{< version 0.2.1 >}} max number of results length
Expand All @@ -997,8 +997,8 @@ The following is a comparison of two search engines:
but high bandwidth and low performance (Especially for Chinese which needs a large segmentit library)
* `algolia`: high performance and low bandwidth, but need to synchronize `index.json` and limit for `contentLength`

{{< version 0.2.1 >}} The content of the post is separated by `h2` HTML tag to improve query performance and basically implement full-text search.
`contentLength` is used to limit the max index length of the part starting with `h2` HTML tag.
{{< version 0.2.3 >}} The content of the post is separated by `h2` and `h3` HTML tag to improve query performance and basically implement full-text search.
`contentLength` is used to limit the max index length of the part starting with `h2` and `h3` HTML tag.
{{< /admonition >}}

{{< admonition tip "Tips about algolia" >}}
Expand Down
8 changes: 4 additions & 4 deletions exampleSite/content/posts/theme-documentation-basics.zh-cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ hugo
# 搜索引擎的类型 ("lunr", "algolia")
type = "lunr"
# 文章内容最长索引长度
contentLength = 5000
contentLength = 4000
# 搜索框的占位提示语
placeholder = ""
# 最大结果数目
Expand Down Expand Up @@ -977,7 +977,7 @@ defaultContentLanguage = "zh-cn"
# 搜索引擎的类型 ("lunr", "algolia")
type = "lunr"
# 文章内容最长索引长度
contentLength = 5000
contentLength = 4000
# 搜索框的占位提示语
placeholder = ""
# 最大结果数目
Expand All @@ -996,8 +996,8 @@ defaultContentLanguage = "zh-cn"
* `lunr`: 简单, 无需同步 `index.json`, 没有 `contentLength` 的限制, 但占用带宽大且性能低 (特别是中文需要一个较大的分词依赖库)
* `algolia`: 高性能并且占用带宽低, 但需要同步 `index.json` 且有 `contentLength` 的限制

{{< version 0.2.1 >}} 文章内容被 `h2` HTML 标签切分来提供查询效果并且基本实现全文搜索.
`contentLength` 用来限制 `h2` HTML 标签开头的内容部分的最大长度.
{{< version 0.2.3 >}} 文章内容被 `h2``h3` HTML 标签切分来提供查询效果并且基本实现全文搜索.
`contentLength` 用来限制 `h2` `h3` HTML 标签开头的内容部分的最大长度.
{{< /admonition >}}

{{< admonition tip "关于 algolia 的使用技巧" >}}
Expand Down
21 changes: 14 additions & 7 deletions layouts/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,23 @@
{{- end -}}
{{- $params := .Params | merge $.Site.Params.page -}}
{{- $content := dict "content" .Content "ruby" $params.ruby "fraction" $params.fraction "fontawesome" $params.fontawesome | partial "function/content.html" -}}
{{- range $i, $chunked := split $content "<h2 id=" -}}
{{- range $i, $contenti := split $content "<h2 id=" -}}
{{- if gt $i 0 -}}
{{- $chunked = printf "<h2 id=%s" $chunked -}}
{{- $contenti = printf "<h2 id=%s" $contenti -}}
{{- end -}}
{{- $chunked = $chunked | plainify | htmlUnescape | replace "\n" " " | replace "\t" " " | replaceRE " +" " " -}}
{{- if gt $.Site.Params.search.contentLength 0 -}}
{{- $chunked = substr $chunked 0 $.Site.Params.search.contentLength -}}
{{- range $j, $contentj := split $contenti "<h3 id=" -}}
{{- if gt $j 0 -}}
{{- $contentj = printf "<h3 id=%s" $contentj -}}
{{- end -}}
{{- $contentj = $contentj | plainify | htmlUnescape | replaceRE `[\n\t ]+` " " -}}
{{- if gt $.Site.Params.search.contentLength 0 -}}
{{- $contentj = substr $contentj 0 $.Site.Params.search.contentLength -}}
{{- end -}}
{{- if $contentj | and (ne $contentj " ") -}}
{{- $one := printf "%s:%d:%d" $uri $i $j | dict "content" $contentj "objectID" | merge $meta -}}
{{- $index = $index | append $one -}}
{{- end -}}
{{- end -}}
{{- $one := printf "%s:%d" $uri $i | dict "content" $chunked "objectID" | merge $meta -}}
{{- $index = $index | append $one -}}
{{- end -}}
{{- end -}}

Expand Down
2 changes: 1 addition & 1 deletion src/js/theme.js
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ class Theme {
this._algoliaIndex
.search(query, {
offset: 0,
length: searchConfig.maxResultLength * 3,
length: searchConfig.maxResultLength * 10,
attributesToHighlight: ['title'],
attributesToSnippet: ['content:30'],
highlightPreTag: `<${searchConfig.highlightTag}>`,
Expand Down

0 comments on commit 3096ff6

Please sign in to comment.