Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word diff "ignores" newline and extra spaces #30

Closed
matanox opened this issue Aug 6, 2014 · 5 comments
Closed

Word diff "ignores" newline and extra spaces #30

matanox opened this issue Aug 6, 2014 · 5 comments

Comments

@matanox
Copy link

matanox commented Aug 6, 2014

Not sure if this is a feature or part of design, but whereas a line diff will treat a new line character (\n at least) as a potential difference, word diff will not. Indeed it's not a word, and I can see the inherent abstract consistency of this treatment. But it may come to mind providing control over what characters delimit words and what characters don't, for word diff. Anyway please don't break the API, the default should behave as it does now..

Having said that, the problem is more severe than new line handling, e.g. any non-word delimiter such as a space character, is just swallowed during the word diff process, and does not show as a difference in the word diff result. I think this aspect should be reconsidered for improvement (while keeping default behavior as is).

@matanox matanox changed the title Word diff "ignores" newline Word diff "ignores" newline and extra spaces Aug 6, 2014
@papandreou
Copy link
Contributor

Sounds like something I needed a while ago, and luckily I sent a PR (#22 :).

Try diffWordsWithSpace.

@matanox
Copy link
Author

matanox commented Aug 8, 2014

@papandreou don't WordDiff and WordWithSpaceDiff just use the same tokenization?
I must have missed something there when reviewing the source yesterday, and still am.

@papandreou
Copy link
Contributor

Yeah, it's a little obscure. They use the same tokenization regexp, but WordDiff is based on a Diff instance where ignoreWhitespace is set to true, which WordWithSpaceDiff leaves out:

https://github.com/kpdecker/jsdiff/blob/master/diff.js#L155-L156

The Diff constructor sets it on the instance here: https://github.com/kpdecker/jsdiff/blob/master/diff.js#L41-L43

And uses it when deciding whether to consider two all whitespace fragments equal here: https://github.com/kpdecker/jsdiff/blob/master/diff.js#L137-L143

@matanox
Copy link
Author

matanox commented Aug 9, 2014

Oh thanks, I ended up switching to https://github.com/cubicdaiya/node-dtl, which lets me do my own tokenization without touching the code, but it's great to know this would also work!

@matanox
Copy link
Author

matanox commented Aug 9, 2014

Could be nice adding this to the readme documentation..

@matanox matanox closed this as completed Aug 9, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants