Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore spaces and special chars in mod search #2709

Merged
merged 1 commit into from
Apr 25, 2019

Conversation

DasSkelett
Copy link
Member

@DasSkelett DasSkelett commented Mar 31, 2019

Problem

If you want to search the mod list (in GUI, commandline or consoleUI), you have to know the name/identifier/author/description exactly, even spaces and special character.
There are a lot of mods with spaces and special characters in their name, sometimes even in the identifier indexed in the CKAN.
Examples:

  • aati-flags (name) - AATI-Flags (identifier)
    (special char)
  • [1.3] Capsule Corporation Endeavour (Mars Base Camp)
    (special chars and spaces)
  • BetterBurnTime
    (no spaces)
  • and on and on

It's nearly impossible to find a mod on the first try right now, if you don't know the exact name including all spaces, hyphens, brackets, dots...

Solution

On startup, more precisely during the deserialization of CkanModules on load of registry, 'Searchables' are calculated for every mo and mod version, and saved during runtime in CkanModule. The don't get serialized!
Those 'Searchables' are

SearchableIdentifier
SearchableName
SearchableAbstract
SearchableDescription
SearchableAuthors (this one is a List<string>)

and represent special-characterless versions of their respective counterparts.
This means no spaces, no hyphens, no brackets and so on. Only alphanumerics remain.
They are calculated by removing each non-alphanumeric char from the source strings (regex [^a-zA-Z0-9].
If a mod search is done, the search term gets treated the same way.

This applies to all three interfaces, for all search types (author, description, name...).


Furthermore, I've improved the console search a bit. There's now also a --detail option (similar to the one in available, see #2286) to list more than just the identifier, including the compatibility for KSP versions of the last mod version.
The new --all option lists incompatible mods too. Without --detail, they're mixed up in the compatibles, but with they are printed as two separate lists.
Third, the --author <authorName> option allows to search for mods by a specific author. An empty search_term is possible, to list all the mods of an author.

So the search command can now range from
ckan search kerbal to
ckan search --all --detail --author linuxguru RecycledPartsFTmNImprovedAtomicRockets,
something for everyone.


tl;dr:

  • aatiflags or aati flags finds aati-flags
  • capsulecorporation endeavour Mars Base Camp finds [1.3] Capsule Corporation Endeavour (Mars Base Camp)
  • better burn time finds BetterBurnTime
  • ckan search --all --detail --author linuxguru RecycledPartsFTmNImprovedAtomicRock finds RecycledPartsFTmNImprovedAtomicRockets by LGG in commandline
  • ...

Closes #2708

@DasSkelett DasSkelett added Enhancement New features or functionality Pull request labels Mar 31, 2019
@politas
Copy link
Member

politas commented Apr 7, 2019

Could we speed this process up by adding fields to GUIMod with the filtered properties, so we aren't constantly applying multiple regexes to every line in GUIModList every time we refresh? This data isn't changing dynamically based on user input.

@DasSkelett
Copy link
Member Author

DasSkelett commented Apr 7, 2019

If we add it to GUImod, we can use the feature only for the GUI. But generally I support the idea.
Don't know if this makes sense in CkanModule? That would be used by every UI.

Also we would have to save it for the name, identifier, author and description, maybe as an array?

A wild idea: concatenate all 4 strings together in one field, and only have one search box for everything, looking if the search term is included in the concatenated string.

@yalov
Copy link
Contributor

yalov commented Apr 9, 2019

one search box and 3 checkboxes, instead of the 3 search boxes?

@yalov yalov mentioned this pull request Apr 19, 2019
@HebaruSan
Copy link
Member

HebaruSan commented Apr 19, 2019

  • Don't list out all punctuation characters in the regex, just negate the alphanums (see next bullet point's code citation for an example) (I guess you have to allow '@' as well to avoid breaking ConsoleUI's author search)
  • The regex should be static and compiled for performance:
    private static readonly Regex nonAlphaNumPrefix = new Regex("^[^a-zA-Z0-9]*", RegexOptions.Compiled);
  • It makes sense to precalculate as we do with GUIMod.Abbreviation, and I guess I'm OK with putting it in CkanModule
  • A concatenated string wouldn't work for GUI, since you couldn't use it to search just the name or author. How about a simple set of supplemental properties named to make their purpose clear (and obviously excluded from the JSON serialization)?
    • SearchableName
    • SearchableIdentifier
    • SearchableAbstract
    • SearchableDescription
    • SearchableAuthors
  • (Old issue) Does Cmdline completely fail to consider the Abstract? That's where most modules have their descriptions. And GUI ignores the description, looking only at the abstract?? Consistency would be nice here.

@DasSkelett
Copy link
Member Author

Good suggestions. When to calculate the 'Searchables'?
I'm not happy increasing the startup time even more. Maybe do it in the background, and deactivate the search boxes until it's done?

@HebaruSan
Copy link
Member

HebaruSan commented Apr 20, 2019

I wouldn't attempt to optimize this further until after you've measured the performance using a static, compiled regex. Some portion of the slowdown you saw before is due to re-creating and compiling the regex over a thousand times, and it could be a large portion.

Sorry, that probably wasn't clear: set the Searchable properties in the constructor and any post-deserialization hook as needed.

@DasSkelett DasSkelett added the In progress We're still working on this label Apr 21, 2019
@DasSkelett
Copy link
Member Author

DasSkelett commented Apr 22, 2019

Update: I do use a static, compiled, negated (non alphanumerics) regex now > performance is better.
Not noticable in GUI or consoleUI anymore, a time ckan search tells me 4.8s before this PR vs. 5.1s after.

The calculations are done in the constructor of CkanModule and on deserialisation of AvialableModule. Did I miss any possible source of CkanModules that needs the calculations added?

All UIs now search the abstract as well as the description.

@DasSkelett
Copy link
Member Author

I want to put the calculations in a separate function in CkanModule which also gets called in AvailableModule and maybe remove all but one of the regexes to simplify possible future changes, so hang on please.

@DasSkelett DasSkelett removed the In progress We're still working on this label Apr 23, 2019
@DasSkelett
Copy link
Member Author

DasSkelett commented Apr 23, 2019

The calculations are now in a separate method: CalculateSearchables().
Every use of the regex is also done via CkanModule.nonAlphaNums, the other ones were removed.
It's better if it's only in one place.
The calculations previously done in the deserialization of AvailableModule are now done via a call to CalculateSearchables() in the deserialization of CkanModule. That's where it belongs to, and is no problem, because a deserialization of AvailableModule implies a deserialization of CkanModule.

Core/Types/CkanModule.cs Outdated Show resolved Hide resolved
@DasSkelett
Copy link
Member Author

DasSkelett commented Apr 24, 2019

@HebaruSan we only print the identifier of the search results in the command line.

// Print each mod on a separate line.
foreach (CkanModule mod in matching_mods)
{
user.RaiseMessage(mod.identifier);
}

Wouldn't it make sense to list the mod names too? They are a bit user friendlier, I think.
I would propose like this:

[x] Science! (xscience)

@HebaruSan
Copy link
Member

Yes, that would be nicer. The issue that came up last time in #2286 is that people might be running the CmdLine client in scripts that expect the old output, so any new format should probably be controlled by a new command line option.

@DasSkelett
Copy link
Member Author

Okay. For consistency I would do it like #2286, via a --detail flag and with identifier, latest version and full name.
Is the abstract needed for the search command? It seems like you know what a mod does if you search for it.

@HebaruSan
Copy link
Member

Having the abstract could be nice if you're searching for a particular kind of mod rather than identifiers. E.g., you might want to find planet packs and then pick one based on its description.

@HebaruSan
Copy link
Member

As long as you're going all-out for this, should there also be a way to include incompatible mods in the CmdLine search? It might be nice to see that I could install more planet packs if I marked 1.4 as compatible, for example.

@DasSkelett
Copy link
Member Author

Support this. What about naming it --all? Also I think we should include the compatible ksp version for these.
By the way, is there a way to search for mods of a single author in the cmdline? Didn't find anything.
Could imagine it like ckan search "@HebaruSan astro", including the @. Due to the regex, the author could only be placed at the beginning of the search term.

@HebaruSan
Copy link
Member

Or ckan search --author Hebarusan astro, might as well take advantage of the command line parameter framework when we can.

@DasSkelett
Copy link
Member Author

I've updated the post with infos about all those shiny new options for the commandline search.

@HebaruSan HebaruSan added Core (ckan.dll) Issues affecting the core part of CKAN Cmdline Issues affecting the command line ConsoleUI Issues affecting the interactive console UI GUI Issues affecting the interactive GUI labels Apr 25, 2019
Cmdline/Action/Search.cs Outdated Show resolved Hide resolved
Also add a --detail option to the command line search which displays mod name, version, KSP version and abstract on top.
Add --all to allow the search of compatible and incompatible modules.
Add --author <authorName> to allow search by a specific mod author.
@HebaruSan HebaruSan merged commit df698b1 into KSP-CKAN:master Apr 25, 2019
@Olympic1 Olympic1 removed Enhancement New features or functionality Pull request labels Apr 25, 2019
@DasSkelett DasSkelett deleted the feature/search branch April 25, 2019 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cmdline Issues affecting the command line ConsoleUI Issues affecting the interactive console UI Core (ckan.dll) Issues affecting the core part of CKAN GUI Issues affecting the interactive GUI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignore spaces and hyphen in the filters by name
5 participants