Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement full-text scraping #563

Merged
merged 3 commits into from
Dec 24, 2019
Merged

Reimplement full-text scraping #563

merged 3 commits into from
Dec 24, 2019

Conversation

DriverXX
Copy link
Contributor

I have reimplemented full-text scraping using this library:
https://github.com/andreskrey/readability.php

What do you think about this?

@Grotax
Copy link
Member

Grotax commented Nov 10, 2019

Do you plan to continue working on this?

@DriverXX
Copy link
Contributor Author

I use it everyday, so I at least plan to keep it working and fix the bugs.
I would do it anyway even if this patch won't be accepted

@stale
Copy link

stale bot commented Dec 1, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Dec 1, 2019
@DriverXX
Copy link
Contributor Author

DriverXX commented Dec 1, 2019

It has been marked as stale, please let me know if you are not interested in this and I will let it expire

@stale stale bot removed the stale label Dec 1, 2019
@Grotax
Copy link
Member

Grotax commented Dec 1, 2019

Oh I'm interested I thought you would fix the complains of the bots first :)

Signed-off-by: Gioele Falcetti <thegio.f@gmail.com>
Signed-off-by: Gioele Falcetti <thegio.f@gmail.com>
Signed-off-by: Gioele Falcetti <thegio.f@gmail.com>
@DriverXX
Copy link
Contributor Author

DriverXX commented Dec 1, 2019

Ops... sorry, I hadn't noticed them.
It should be ok now :D

@Grotax
Copy link
Member

Grotax commented Dec 7, 2019

I took a look at this and to me it looks good.

Why did you go for readability? Seems like it's not well maintained anymore.

Maybe considering https://github.com/j0k3r/graby would be a good idea.

@DriverXX
Copy link
Contributor Author

DriverXX commented Dec 9, 2019

I chose readability.php because it is a port of Mozilla's Readability.js which works really well on Firefox, and when I started using it, a few months ago, it was still maintained.

Honestly I prefer readability.php because it's simpler, graby has a huge amount of dependency, but I'll give it a try to check how well it works and I'll let you know.
It should not be difficult to switch between them.

@DriverXX
Copy link
Contributor Author

I tried https://github.com/j0k3r/graby and I noticed that readability.php works better on the feeds that I use.
Moreover, as I already said, graby as a huge amount of dependency and some of them are not maintained better than readability.php.
For these reasons I will keep using readability.

But if you prefer to use graby in the official version of news, I have the code ready (https://github.com/DriverXX/news/tree/graby) and I can edit my pull request to use it, instead of readability.

Please, let me know if you want to use graby, and I will edit my PR.

@Grotax
Copy link
Member

Grotax commented Dec 19, 2019

No it's alright you have checked it and made a reasonable decision. I'm fine with it.

It also looks not too complex to me so I think we will go with it.

If it breaks at some point we might decide to remove it again, if no one is willing to fix it.

@Grotax Grotax merged commit 6673cbc into nextcloud:master Dec 24, 2019
@nachoparker
Copy link
Member

awesome. Many thanks

@Grotax Grotax mentioned this pull request Dec 26, 2019
Grotax added a commit that referenced this pull request Dec 27, 2019
Changed
- Minimum PHP version is now 7.2
- Reimplement full-text scraping #563
- Update for nextcloud 18 #593 #583
- Update httpLastModified from the feed response #594

Signed-off-by: Benjamin Brahmer <info@b-brahmer.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants