-
-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specifying selectors for extracting links. #217
Comments
I just recommended using a custom driver in the other issue! Yes, these are all good points! The example you have is the current best option, however, it would be fairly easy to add a custom selector via cmdline, These are all good suggestions - for now you can use the driver script you have, we'll add to this ticket once we have a chance to add this! The tool is still pre 1.0.0 release, so a few things are changing, like the switch to ESM modules, but we hope to have a stable driver format in place soon! |
We are impacted by this issue as well at Kiwix, we have a website to ZIM relying on Should we also develop a custom driver or would you recommend that we make a PR to add selectors via cmdline as suggested? |
Hi @benoit74, I'd suggest that perhaps a PR to add selectors via a cmdline argument would be the better/more flexible approach here. It shouldn't be too difficult, it would just be a matter of checking if the argument was provided (perhaps as a json string) and if so, applying the settings by overwriting the selectors default argument to |
Thank you @tw4l for the detailed suggestions. Just for the record, the work on this from Kiwix has been postponed to "later", and since it might mean "months", should someone want to contribute to this issue, feel free, we will not collide on this. Should we start to work on this I will notify here first. |
…aram - selectors are of the form [css selector]->[property to use] or [css selector]->@[attribute to use], default being 'a[href]->href' - fixes #217
I came across a site which uses an
<area>
tag with anhref
attribute to create links with a non-standard shape. I don't know if this is the correct way to approach this, but I was able to capture these links by implementing the following custom driver.However, I did not see anything in the documentation hinting at this and it required reading through the source code to even determine that the driver is what I should be looking into.
Furthermore, I've noticed that
defaultDriver.js
has changed significantly over time, so it is not clear to me whether this approach will remain valid in the long run. And to emphasize that point, it is worth mentioning that this driver works in 0.7.1 but breaks in 0.8.0-beta.1 (though I realize that fixing it just requires changingmodule.exports =
toexport default
).Would you consider implementing an easier way to configure the link extraction selectors? Or, if a custom driver is the recommended approach, is this documented somewhere?
The text was updated successfully, but these errors were encountered: