-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ExtractBoilerpipeText to remove headers as well. #253 #256
Conversation
Codecov Report
@@ Coverage Diff @@
## master #256 +/- ##
==========================================
- Coverage 70.52% 70.35% -0.17%
==========================================
Files 41 41
Lines 1038 1039 +1
Branches 191 191
==========================================
- Hits 732 731 -1
- Misses 240 242 +2
Partials 66 66
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much cleaner output!
i.e.
(20091219,www.davidsuzuki.org,http://www.davidsuzuki.org/_pvw706D01C3/Oceans/PNCIMA/Unique-Features-Within-PNCIMA.asp,Here are more of the many unique features within PNCIMA: Endangered Species •There are 32 species listed as Endangered, Threatened, and Special Concern by the Committee on the Status of Endangered Wildlife in Canada (COSEWIC) that live in or migrate through the PNCIMA. North Pacific Right Whales •They are the most endangered whale in the world ...
I'm happy to merge if this looks good to you as well @ruebot. |
Do we need to update any documentation? |
Nope, it's a straight change (just removes the http headers from the boilerplate removal, which is what we should have been doing all along) - tested w/ the documentation script and works well. |
Yes it does resolve. I thought the solution was something else, but this seemed more straight-forward. |
Ok. In the future, please be more explicit because as it stands now, it is very unclear. We need to know how and why the PR resolves the issues, and if it differs from the issue created, why. |
GitHub issue(s):
#253
What does this Pull Request do?
Alters ExtractBoilerpipeText to remove headers from text output as well.
How should this be tested?
The test I created + Travis and Codecov should cover it.
Additional Notes:
I attempted to provide a .keepHeader option, but to do so would require moving the .apply call from the object and that would require documentation changes. This seemed too much to accommodate a fairly marginal use-case.
This option can be provided by returning a class with .keepHeader and .text attributes. I do not think it's too necessary.
Interested parties
@ianmilligan1
Thanks in advance for your help with the Archives Unleashed Toolkit!