-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't get content from zhihu.com #2
Comments
please let me know if you need further information. |
May the same issue as #1. Did you check if wrapping the output in
solves the issue? Wrapping it in above HTML code produced that output for me: https://dankito.net/test/zhihu-output.html. The reason why Article.getContent() returns content in a <div> and not in <html> is that Readability.js does the same. Another issue is that the images aren't displayed. You can use it in this way:
Do you think the above output is OK and solves the issue? |
thanks for your reply, but I think the Mozilla library output is with the tag, but your explanation is very helpful, thanks, I'll close the issue. looking forward to getContentWrappedInHtmlBody method ^^ |
Just released version 1.0.1. Article now has the method |
Hi, Readability4J is a nice library, I found a website that can't working with Readability4J, please check it thank you.
" https://zhuanlan.zhihu.com/p/22049205 "
Readability4J can't get content from this URL, but Mozilla‘s Readability.js is working, please check this, thank you.
The text was updated successfully, but these errors were encountered: