Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Space missing when opening tag at end of a line #329

Closed
yapper-git opened this issue Nov 30, 2015 · 12 comments
Closed

Space missing when opening tag at end of a line #329

yapper-git opened this issue Nov 30, 2015 · 12 comments
Assignees
Labels
Milestone

Comments

@yapper-git
Copy link

I use Tidy in a project, but I got a bug.
Below is a minimal input code example.

<p>This a minimal example for<strong>
TidyTidyTidyTidyTidyTidyTidyTidyTidy</strong> is another nice open source tool :)</p>

The problem is that a line ends with <strong>, but the white space (here newline) is in the `\n'.

All browsers displays a space between for and TidyTidyTidyTidyTidyTidyTidyTidyTidy (thanks to the newline).
While if you convert this HTML file with tidy (with default configuration).
Tidy outputs :

<!DOCTYPE html>
<html>
<head>
<meta name="generator" content=
"HTML Tidy for HTML5 for Linux version 5.1.25">
<title></title>
</head>
<body>
<p>This a minimal example
for<strong>TidyTidyTidyTidyTidyTidyTidyTidyTidy</strong> is another
nice open source tool :)</p>
</body>
</html>

The whitespace disappeared...

@geoffmcl
Copy link
Contributor

geoffmcl commented Dec 1, 2015

@yapper-git thank you for moving the bug here... This is certainly where it may receive attention!

You might want to update your tidy output to show this problem still exists in current htacg tidy, current master branch, version 5.1.28 plus. Showing a 2009 output does not help the case.

I called this a feature request purely because tidy has been doing this forever! So it is a quite a fundamental change in the parsing...

The problem is, in the current parsing, in certain circumstances, tidy hides newlines, quietly swallowing them, so the caller does not know they even exist. So the fix is deeper than you think.

But that said, does not mean it is not a bug. I, for one, agree 100% that all the browser tests I have tried on your sample shows a space between for and the first Tidy..., and one of the strong aims of tidy is to render a tidied output that will look the same in a browser!

As stated, thanks for the report, and I, or others, will look into this soonest. But please update the output to reflect current tidy, and maybe remove the . which does not seem to be in the input... thanks...

@geoffmcl geoffmcl added the Bug label Dec 1, 2015
@geoffmcl geoffmcl added this to the 5.1 milestone Dec 1, 2015
@yapper-git
Copy link
Author

You might want to update your tidy output to show this problem still exists in current htacg tidy, current master branch, version 5.1.28 plus. Showing a 2009 output does not help the case.

I updated the output above (with 5.1.25 version installed on ArchLinux).

@geoffmcl
Copy link
Contributor

geoffmcl commented Dec 3, 2015

@yapper-git thanks for updating the sample... as promised will look at this soonest...

OT: Just out of interest where did you get the tidy 5.1.25 version?

@yapper-git
Copy link
Author

@geoffmcl : I get tidy 5.1.25 from AUR.

@geoffmcl
Copy link
Contributor

geoffmcl commented Dec 3, 2015

@yapper-git thanks for the info. It is good to see your AUR packages really keeps up with Tidy release tags... thanks to @arthru - seems [s]he updated that almost the same date as I pushed that tag.

I really wonder how to get my Ubuntu 14.04 LTS package updated? It still holds a 2009 release of tidy!!! YUCK! But that is OT here.

I have started to look at this, but as suspected it becomes how to know to move a space, in this case a newline, to back before the <strong> tag?

There was another case of moving a space - yes in this sample <p>Move<b> bold</b> space</p> gets correctly tidied to <p>Move <b>bold</b> space</p>. The space was kept, and moved...

That seems to be the same principal here, except it is a newline. And as I suspected, changing that sample space after the <b> to a newline gives me a bad result <p>Move<b>bold</b> space</p>!

Just about out of time today, but I am on its tail ;=))

@geoffmcl geoffmcl self-assigned this Dec 3, 2015
geoffmcl added a commit that referenced this issue Dec 9, 2015
geoffmcl added a commit that referenced this issue Dec 9, 2015
@geoffmcl
Copy link
Contributor

geoffmcl commented Dec 9, 2015

@yapper-git have pushed an experimental fix to the issue-329 branch...

$ cd tidy-html5
$ git pull
$ git checkout issue-329
$ cd build/cmake and build...

As suspected, after the <strong> is decoded, any following newline was quietly discarded...

The fix is quite profound in that if it is a newline, and the mode is not IgnoreWhitespace, save this to the lexer - well actually it later gets changed into a space.

Maybe this improves this case and does not break previous behaviour... still in testing...

If you have the chance, please test... thanks...

@geoffmcl geoffmcl modified the milestones: 5.2, 5.1 Dec 11, 2015
geoffmcl added a commit that referenced this issue Jan 9, 2016
geoffmcl added a commit that referenced this issue Jan 9, 2016
@geoffmcl
Copy link
Contributor

Have just been alerted to a possible problem with this fix... still to be fully tested...

The case involves the textarea element... need to test 3 cases -

<textarea>1: line 1
line 2
</textarea>
<textarea>
2: line 1
line 2
</textarea>

and the extreme case -

<textarea>


3: line 1
line 2
</textarea>

Any help with testing, comments, ideas... very welcome...

@geoffmcl
Copy link
Contributor

As suspected the behaviour is quite different - using only default configuration...

Input (input5\in_329-3.html)

<textarea>1: line 1
line 2
</textarea>
<textarea>
2: line 1
line 2
</textarea>
and the extreme case -
<textarea>


3: line 1
line 2
</textarea>

Current Tidy - Output

<meta name="generator" content=
"HTML Tidy for HTML5 for Windows version 5.1.32">
<textarea>1: line 1
line 2</textarea>

<textarea>2: line 1
line 2</textarea>
 and the extreme case -
<textarea>

3: line 1
line 2
</textarea>

issue-329 Tidy - Output

<meta name="generator" content=
"HTML Tidy for HTML5 for Windows version 5.1.34Exp">
<textarea>1: line 1
line 2</textarea>

<textarea>
2: line 1
line 2</textarea>
 and the extreme case -
<textarea>


3: line 1
line 2
</textarea>

This is a lot of difference!

Now is it bad, or wrong, or better, perfect? It does seem to respect what the users wrote, but is that is what is needed? Sometime tidy concatinates, sometimes expands, like with wrapping... What is best? Seek comments on that...

This fix works wonders on getting the following snippet correctly rendered as <p>a <em>b></p>, so some parts of it must stay!

<p>a<em>
b</em></p>

But it does set back the schedule for merging this issue-329 branch to master, until this is further discussed and clarified.

@geoffmcl
Copy link
Contributor

@yapper-git as explained above I have coded a simple fix for this, in lexer.c, but at present it has other consequences which must then be addressed as well...

                 {
                      c = TY_(ReadChar)(doc->docIn);

 -                    if (c != '\n' && c != '\f')
 +                    if ((c == '\n') && (mode != IgnoreWhitespace)) /* Issue #329 - Can NOT afford to lose this newline */
 +                        TY_(UngetChar)(c, doc->docIn);  /* Issue #329 - make sure the newline is maintained for now */
 +                    else if (c != '\n' && c != '\f')
                          TY_(UngetChar)(c, doc->docIn);

                      lexer->waswhite = yes;  /* to swallow leading whitespace */

This is an important bug, and must be fully fixed... and I must also clean up the issue-329 branch I created, but @balthisar has shown me how to do this, but yet to get to it...

However, we are currently considering a new release 5.2 shortly, and it is unlikely that I will get to fully fixing this before then, so am moving the milestone to 5.3...

If you or others can look into this, then maybe it could still make it into 5.2...

@geoffmcl geoffmcl modified the milestones: 5.3, 5.2 Mar 24, 2016
@yapper-git
Copy link
Author

Sorry I never entered in tidy code and I have no time for that currently…
Thx for your work!

@geoffmcl
Copy link
Contributor

@yapper-git eventually got around to pushing the above fix to master, and have now deleted the issue-329 branch...

I hope you, or others, can pull master, version 5.3.18 plus, and test this fix... it certainly seems to work on your minimal example...

Maybe this can eventually be closed... thanks...

@geoffmcl
Copy link
Contributor

This seems to have been fixed, so closing this issue...

Please feel free to reopen, or post a new issue... thanks...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants