Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for generating unique selector from multiple elements? #45

Closed
ScrapeFlare opened this issue Dec 1, 2020 · 10 comments
Closed

Support for generating unique selector from multiple elements? #45

ScrapeFlare opened this issue Dec 1, 2020 · 10 comments
Assignees

Comments

@ScrapeFlare
Copy link

Is such a feature on the roadmap?

@fczbkk
Copy link
Owner

fczbkk commented Dec 2, 2020

@ScrapeFlare I'm not sure I understand what you mean. Can you please give me an example?

@jribbens
Copy link

It sounds like they might mean something I would find helpful, which is a way to say "here is a given selector, which matches one or more elements, and also here is a specific one of those elements - please take the existing selector and, if necessary, add to it so that it only matches that one element".

@fczbkk
Copy link
Owner

fczbkk commented Mar 15, 2021

@jribbens Can you please provide me with an example of what selectors you're trying to get?

@jribbens
Copy link

Well perhaps you might say link[rel=stylesheet] and indicate a specific one of the link elements, and the module would enhance the selector to say link[rel=stylesheet][href=main.css] or something, rather than now where it may return even something as vague as link if there is only one link element.

Basically my use case for the module requires thinking across time, as the HTML file may change. Now obviously there is no way at all to guarantee that a selector created for a previous version of the file will still relate to "the same" element in the new version (whatever we might mean by that exactly), but there are ways we can make that more likely, and those include avoiding making the selector too short and avoiding too much reliance on nth-child, for example.

@AfrazHussain
Copy link

AfrazHussain commented Jun 1, 2021

If I'm not wrong, I think what they're trying to suggest is a feature like getMultiSelector([elements]) from the optimal-select library, because that is basically what I'm looking for as well.

The reason I don't like using the optimal-select library's getMultiSelector is because it doesn't honor the options that are passed into the function (issue: autarc/optimal-select#39), so it would be a nice to have in this library too.

I definitely wouldn't mind working with you on this feature, so please do hit me up.

@fczbkk
Copy link
Owner

fczbkk commented Jun 2, 2021

@AfrazHussain I'm not sure this is the same as what @jribbens describes. Anyway, it is easy to create a function that returns a list of selectors for a list of provided elements. Just use this:

function getMultipleCssSelectors (elements = [], options = {}) {
  return elements.map((element) => getCssSelector(element, options))
}

@AfrazHussain
Copy link

@fczbkk Thanks for your reply, and thank you for this library.

From what I understood, the idea isn't to create a list of single unique selectors given an array of elements, but it is rather to create a single selector that would select all of the elements selected.

I'll give you an example, suppose I want to scrape all the item names on this website. Now if I pass the element in the getCssSelector(element, options) method, it will result in probably a selector like a[title='Acer Aspire ES1-572 Black']. That's good for that one item names, but if I want to select all the item names using a unique query selector, it would probably be a.title.

The Webscraper extension does this very nicely where you can select a single element, or select multiple elements by holding the control key, and it will give a unique selector for the multiple items that you've selected.

Please let me know if some of this doesn't make sense and I'll be happy to clarify. :) Again thanks for you reply. 👍

@jribbens
Copy link

jribbens commented Jun 2, 2021

I can't speak to what @ScrapeFlare was wanting, but what I'm doing is creating a selector that uniquely identifies a single element on the page, but does so in a way that means we have a decent chance of the selector still indicating that same element even if the page then changes somewhat (e.g. if we come back to the page a month later).

As it stands, if you have, say, only one image on a page then this library will return a very short but very fragile selector of just img. If someone adds another image to the page, the selector is now useless. What I need is a way to say things like "if the element has an id, always include it in the selector; if the element has a src or href, always include those in the selector even if it is already unique without them". (This is what I thought the whitelist option might do, but it turns out it doesn't.)

I've written my own library to do this now, but I just thought I'd clarify.

@fczbkk
Copy link
Owner

fczbkk commented Jun 2, 2021

@jribbens Thanks for the explanation, I get it now.

@fczbkk fczbkk self-assigned this Aug 15, 2021
@fczbkk
Copy link
Owner

fczbkk commented Aug 15, 2021

I had an idea on how to implement this. It required a lot more of rewrites than I originally thought. But it seems to be working.

Since v3.2.0 you can use an array of elements as input instead of just single element. The library will produce single selector matching all elements at once if possible, otherwise it will produce selector for each of the elements and join them by comma.

More info in documentation:
https://github.com/fczbkk/css-selector-generator#multi-element-selector

@fczbkk fczbkk closed this as completed Aug 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants