Skip to content
This repository has been archived by the owner on Aug 12, 2021. It is now read-only.

As a user, I want the search engine to automatically handle common spelling variations so that I can more easily find titles with variant spelling. #81

Open
rlskoeser opened this issue Jun 18, 2018 · 3 comments

Comments

@rlskoeser
Copy link
Contributor

rlskoeser commented Jun 18, 2018

Notes for testing

  • The following transformations are automatically done when content is indexed and keyword search terms are entered:
    • e at end of words is removed
    • vv is converted to w
    • ll is converted to l (double L to single)
    • j is converted to i
    • v is converted to u
  • The search results have a temporary section displaying search terms in context so you can get a better sense of how the search is working.
  • Try searching with both versions of the variant spellings (e.g. realme and realm or methode and method, or witch and vvitch) to see how they match the book content and check the terms that match in the highlighted section.

E.g. If I search for witch-craft, I would like results for “vvitch-craft” to appear.
Some common variations include i=j; vv=w; u=v. See: https://quod.lib.umich.edu/t/tcp/help/search-spelling.html

@rlskoeser
Copy link
Contributor Author

I think we should be able to do this with a Solr filter. I expect we'd need to use the same filter on the indexed content and on the search input. Maybe the pattern replace filter? https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-PatternReplaceFilter

Might also want to incorporate the unicode folding filter we used in Derrida.

Probably worth some investigation to see if anyone else has done something like this before.

@rlskoeser rlskoeser changed the title As a user, I want the search engine to automatically handle for common spelling variations so that I can more easily find titles with variant spelling. As a user, I want the search engine to automatically handle common spelling variations so that I can more easily find titles with variant spelling. Jul 2, 2018
@rlskoeser
Copy link
Contributor Author

@rlskoeser
Copy link
Contributor Author

Schema API includes options for adding and replacing field types: https://lucene.apache.org/solr/guide/6_6/schema-api.html#SchemaAPI-ModifytheSchema

@rlskoeser rlskoeser assigned rlskoeser and unassigned meg-codes Aug 6, 2018
meg-codes pushed a commit that referenced this issue Aug 13, 2018
…sorting (#112)

* Don't repeat book annotator names in Solr index

* Basic solr field customization with preliminary spelling variants #81

* Fix missed template variable on solr authors -> author rename

* Return empty schema field data for test solr mock

* Note about using char filter instead of pattern

* Fix branch specifier for customized version of SolrClient

* Add local solr config based on basic configs

* Update solr core create command to use local config directory

* Update travis-ci solr setup script to use local config directory

* Revise solr field type config to preserve case in facets

* Show search snippets to make field customization visible

* Revise solr config path in travis-ci solr setup script

* Correct travis-ci solr setup script solr config path

* Keep django-autocomplete-light on 3.2.x version for now

* Handle solr error on attempt to retrieve highlighted terms
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants