Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case-insensitive matching without external crates #1988

Closed
clarfonthey opened this issue May 2, 2017 · 5 comments
Closed

Case-insensitive matching without external crates #1988

clarfonthey opened this issue May 2, 2017 · 5 comments
Labels
T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@clarfonthey
Copy link

I see a fair amount of code that does something to the extent of:

match s.to_lowercase().as_ref() {
    "string1" | "string2" => ...,
    "something" | "else" => ...,
}

And it'd be really nice if we could provide a way to do this with libstd or even libcore without requiring a crate like regex to do the heavy lifting.

We already have eq_ascii_ignore_case but no equivalent with full unicode support, and there seems to be no easy way to do efficient, case-insensitive matching without allocating short of doing something ridiculous like:

let cs = s.chars().map(char::to_lowercase);
match cs.next() {
    Some('s') | Some('S') => match cs.next() {
        // ...
    },
    Some('e') | Some('E') => match cs.next() {
        // ...
    }
    None => ...,
}
@BurntSushi
Copy link
Member

See also: https://crates.io/crates/caseless

I suspect this is something that needs to be cultivated on crates.io before it could make it into std.

@clarfonthey
Copy link
Author

I agree! I will say that the caseless crate does suffer the same problem that eq_ascii_ignore_case does; it only works on comparing two strings, and not multiple strings. It's definitely a good start though!

I have a feeling that something like this would require a macro instead of just a function, although I'm sure that there are plenty of solutions.

@SimonSapin
Copy link
Contributor

I see a fair amount of code that

That requires Unicode case folding? Can you point to some examples?

Anyway, I just came up with this:

pub fn iter_eq_any_of<A, B>(mut a: A, bs: &mut [Option<B>]) -> Option<usize>
where A: Iterator, B: Iterator, A::Item: PartialEq<B::Item> {
    // Could be `bs.len()` if the caller pinky-swears to never pass `None` in the initial slice.
    let mut b_count = bs.iter().filter(|b| b.is_some()).count();
    loop {
        let a_item = a.next();
        for (i, maybe_b) in bs.iter_mut().enumerate() {
            if let Some(ref mut b) = *maybe_b {
                match (&a_item, b.next()) {
                    (&None, None) => {
                        // Reached the end of two iterators without finding a difference:
                        // they match.
                        return Some(i)
                    }
                    (&Some(ref x), Some(ref y)) if x == y => {
                        // Matching so far, continue with the next b.
                        continue
                    }
                    _ => {
                        // Lexical borrows :(
                    }
                }
            } else {
                continue
            }
            // Found a difference, stop looking in this b.
            *maybe_b = None;
            b_count -= 1;
            if b_count == 0 {
                // Found a difference in every b.
                // (This is an optimization, the algorithm works without tracking b_count.)
                return None
            }
        }
        if a_item.is_none() {
            // Reached the end of a without reaching the end of a matching b.
            return None
        }
    }
}

#[test]
fn it_works() {
    extern crate caseless;

    // Allocates a String
    match &*caseless::default_case_fold_str("String2") {
        "string1" | "string2" => {},
        "something" | "else" => panic!(),
        _ => panic!(),
    }

    // No heap allocation
    match iter_eq_any_of(
        caseless::Caseless::default_case_fold("String1".chars()),
        &mut [
            // These are known to be fixed points (default_case_fold(x) == x)
            // In the general case each of them would need to be case-folded too.
            // A procedural macro could do this at compile-time.
            Some("string1".chars()),
            Some("string2".chars()),
            Some("something".chars()),
            Some("else".chars()),
        ]
    ) {
        Some(0) | Some(1) => {},
        Some(2) | Some(3) => panic!(),
        _ => panic!(),
    }
}

Left as an exercise to the reader: use procedural-masquarade to make the latter code look like the former.

This seems very niche, though. Or at least the solution is convoluted enough that I’m not convinced it belongs in std.

iter_eq_any_of might be a candidate for https://github.com/bluss/rust-itertools.

@withoutboats withoutboats added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels May 14, 2017
@Centril
Copy link
Contributor

Centril commented Apr 26, 2018

Triage ping: Any progress on this?

@SimonSapin
Copy link
Contributor

I’m going to close this since there is no concrete proposal, and I feel that the set of stated constraints make this niche enough and would require a contrived enough API that whatever we could come up with may not belong in the standard library.

To push this further I’d recommend discussing on internals.rlo and/or developing on crates.io, then possibly opening a formal RFC later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

5 participants