Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added logic to properly treat underscores, dashes and spaces. Fixes #318 #341

Merged
merged 7 commits into from
Oct 29, 2014

Conversation

nzubair
Copy link
Contributor

@nzubair nzubair commented Oct 22, 2014

This PR addresses behavior observed in issues #318.

Primarily, handling of underscores and dashes which do not connect words. They either have a space before or after them, or both, as was the case for this bug.

Additionally, the regex in FromPascalCase was not handling spaces in a string properly. Amended the regex to handle the spaces. It can be improved, given my lack of mastery of .Net regular expression.

Finally, a few test cases were added to test different combinations of spaces, dashes and underscores.

The regex can be refined to exclude leading and trailing spaces from the
results of pascalCaseWordBoundary.Split(). For now, word.Trim() is added
to discard the unneeded space.
word.ToCharArray().All(Char.IsUpper) && word.Length > 1
? word
: word.ToLower())
word.Trim().ToCharArray().All(Char.IsUpper) && word.Length > 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In consideration of performance, can you do the Trim() once.

I know you didn't introduced it, but can you also remove ToCharArray(), it should work without it anyway.

@mexx
Copy link
Collaborator

mexx commented Oct 22, 2014

Thanks for the PR. Please add it to the release notes as mentioned in CONTRIBUTING.md.

It would be cool if you would address the comments I've made.

@nzubair
Copy link
Contributor Author

nzubair commented Oct 22, 2014

Thanks for the feedback Max. I will address your comments today.

@nzubair
Copy link
Contributor Author

nzubair commented Oct 23, 2014

@mexx As requested, reduced the number of Trim()s, simplified the regex and updated the release_notes.md.

However, I was not able to remove ToCharArray(). MSBuild spat out the following error:

StringHumanizeExtensions.cs(35,21): error CS1061: 'string' does not contain a definition for 'All' and no extension method ' All' accepting a first argument of type 'string' could be found (are you missing a using directive or an assembly reference? ) [D:\Work\Dev\Projects\nzh\Src\Humanizer\Humanizer.csproj]

@MehdiK
Copy link
Member

MehdiK commented Oct 26, 2014

Thanks for the contribution @nzubair.

I think dash shouldn't be removed as part of humanization. For example "left-handed people".Humanize() should still return "left-handed people" (I understand there is no real rule around this and it could be quite subjective but I think leaving it in makes more sense). The reason underscore is being removed is that underscore normally doesn't make sense in a human readable sentence and removing it normally results into a more human readable sentence. This behavior is highlighted by the name of the covering test (CanHumanizeStringWithUnderscores). You've added a few test cases to that test that are checking for dashes which don't fit there, firstly because of the method name and more importantly because I don't think we should remove dashes.

As per #318, as I mentioned in the comments, "Humanize should leave the text alone (because as far as it can see it's already human-readable)" regardless of how many times it's called on it.

@mexx
Copy link
Collaborator

mexx commented Oct 26, 2014

@MehdiK We already remove dashes, look at the implementation.
So maybe the problem is with that wrong handling?

@nzubair
Copy link
Contributor Author

nzubair commented Oct 26, 2014

Hi @MehdiK, Thank you for taking time to review the PR and for providing feedback.

However, I'm a bit confused as what you are describing contradicts the behavior of Humanizer at present.

Dashes
In the current implementation, dashes and underscores are removed regardless of where they are and how they are connecting words. "left-handed people" will turn into "left handed people". For #318, TEST 1 - THIS IS A TEST turns into TEST 1[space][space][space]THIS IS A TEST. This is happening at L54 (check) and L14(replace). It's a simple substitution without logic.

Spaces (and impact on all uppercase strings)
Additionally, the FromPascalCase does not handle spaces. So handling of all caps strings is incorrect. The check for all caps at L51 only works if the string is a single word, otherwise it returns false. Once that check fails, L57 calls FromPascalCase on all caps string. The regular expression does not deal with spaces. Therefore, .Split() produces a single result, with the entire string, which is then title cased and returned. This produces Test 1 This is a test from TEST 1 THIS IS A TEST.

Can you please let me know if my interpretation of the expected behavior is incorrect?

Thanks again for your time.

@MehdiK
Copy link
Member

MehdiK commented Oct 29, 2014

Thanks @mexx and @nzubair for the correction. You're right. That is the current behaviour! I will check this out again later today (and merge).

P.S. That method was the very first method implemented in the library and hasn't been touched (apart from a RegEx rewrite) since!

@MehdiK MehdiK merged commit 0e6edfe into Humanizr:master Oct 29, 2014
@nzubair nzubair deleted the StringExtensions branch October 30, 2014 00:14
@MehdiK
Copy link
Member

MehdiK commented Oct 31, 2014

This is now released to NuGet as v1.30.0. Thanks for the great contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants