-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blogspot article text scrape? #2789
Comments
The metadata postprocessor is probably what you are looking for, kind of. It will dump the text into the "content" section, but at least with this example it seems to break with the encoding... Something like this should work: "postprocessors": [
{
"name": "metadata",
"mode": "json",
"whitelist": ["blogger"]
}
], |
how can I use that? can it be converted to a command line arg? |
You need a config file to properly use post processors, that's not something that can really be done with just command-line arguments. The settings from #2789 (comment) are just To achieve that, you need the changes from 5038893 and use something like the following as post processor: {
"postprocessors": [
{
"name": "metadata",
"mode": "custom",
"format": "{post[content]}",
"event": "post",
"filename": "{post[date]:%Y-%m-%d} {post[title]}.txt"
}
]
}
|
is it possible to also get the blog text that posted.. by default it just downloads the images/videos...but would be good to get the article text aswel....
The text was updated successfully, but these errors were encountered: