Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Over-zealous extension splitting #146

Merged
merged 3 commits into from
Jan 30, 2022

Commits on Jan 30, 2022

  1. Add tests to demonstrate the PATH ext bug

    I'm not sure sure it is *actually* a bug, but the PATH algorithm's way
    of splitting extensions was over-zealous and in practice will split off
    more extensions that is probably desired. To fix this, we will need to
    add a heuristic, but this commit adds tests to demonstrate the problem.
    SethMMorton committed Jan 30, 2022
    Configuration menu
    Copy the full SHA
    961d3bb View commit details
    Browse the repository at this point in the history
  2. Add some limiting heuristics to the PATH suffix splitting

    The prior algorithm went as follows: Obtain ALL suffixes from the base
    component of the filename. Then, starting from the back, keep the
    suffixes split until a suffix is encountered that begins with the
    regular expression /.\d/. It was assumed that this was intended to be a
    floating point number, and not an extension, and thus the splitting
    would stop at that point.
    
    Some input has been seen where the filenames are composed nearly entirely
    of Word.then.dot.and.then.dot. One entry amongst them contained
    Word.then.dot.5.then.dot. This caused this one entry to be treated
    differently from the rest of the entries due to the ".5", and the
    sorting order was not as expected.
    
    The new algorithm is as follows: Obtain a maxium of two suffixes. Keep
    these suffixes until one of them has a length greater than 4 or starts
    with the regular expression /.\d/.
    
    This heuristic of course is not bullet-proof, but it will do a better
    job on most real-world filenames than the previous algorithm.
    SethMMorton committed Jan 30, 2022
    Configuration menu
    Copy the full SHA
    9aad50d View commit details
    Browse the repository at this point in the history
  3. Update changelog

    SethMMorton committed Jan 30, 2022
    Configuration menu
    Copy the full SHA
    4832c15 View commit details
    Browse the repository at this point in the history