Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document find, findall methods using visit() #188

Closed
alimanfoo opened this issue Nov 19, 2017 · 18 comments · Fixed by #1241
Closed

Document find, findall methods using visit() #188

alimanfoo opened this issue Nov 19, 2017 · 18 comments · Fixed by #1241
Assignees
Labels
enhancement New features or improvements good-first-issue Good place to get started as a new contributor. help wanted Issue could use help from someone with familiarity on the topic

Comments

@alimanfoo
Copy link
Member

alimanfoo commented Nov 19, 2017

Maybe useful to have methods "find" and "findall" on Group class to search for members (descendants?) matching some name query.


2022 Update: See the examples below on how to implement the method using visit(). These should be included in the documentation.

@jakirkham
Copy link
Member

Should be pretty doable with a visitor pattern.

@alimanfoo alimanfoo added the enhancement New features or improvements label Nov 21, 2017
@joshmoore joshmoore added the help wanted Issue could use help from someone with familiarity on the topic label Sep 22, 2021
@ShristiJain
Copy link

I would like to work on this issue. Can I get more information on this and some guidance on how to get started.

@joshmoore
Copy link
Member

Hi @ShristiJain. I think based on the age that this is going to be a case of there being a solution in place. For example, see the visit functions here: https://zarr.readthedocs.io/en/stable/api/hierarchy.html#zarr.hierarchy.Group.array_keys

Probably the best way forward is to try to write a very small example here on the issue that shows how one would print only groups matching a given name as @alimanfoo suggested.

From that point, we either add a method with your code, or if it's simple enough, maybe we just include your example directly in the documentation. For example in the tutorial: https://github.com/zarr-developers/zarr-python/blob/master/docs/tutorial.rst

@ShristiJain
Copy link

Thank you @joshmoore , I understand your point now. Actually I am an Outreachy Applicant from India and I was looking for some issues that I can work on and record on the Outreachy site. I am very eager to be a part of this community. I would request you if you could suggest any issue on which I can work.
I am proficient in JAVA and Python and know android studio too, I am open to learning new things and technologies and am wiling to contribute to the project titled "Build registry of Zarr codecs".
Thank you in advance!

@jakirkham
Copy link
Member

FWIW here are the different visit methods, which could be used to implement this. Would try creating a simple dataset locally and then see if you can write code using one of those methods to do this

@caviere
Copy link
Contributor

caviere commented Oct 17, 2022

Hi I'm Weddy, an outreachy applicant. I would like to work on this issue.

@jakirkham
Copy link
Member

Hi Weddy, that would be great! 😄

Would recommend trying to write this in terms of the visit methods and check for keys that match (possibly using re).

@caviere
Copy link
Contributor

caviere commented Oct 17, 2022

Got it! Thank you.

@caviere
Copy link
Contributor

caviere commented Oct 22, 2022

Hi @joshmoore @MSanKeys963 @jakirkham I just wanted to give you an update. Apologies that i did not tag joshmoore and MSanKeys963 while requesting to be assigned the issue earlier. I am currently trying to incorporate both re.search and re.findall together in the group class that I've created. As of now my regex is printing none which I am trying to resolve. Thank you

@jakirkham
Copy link
Member

@caviere, no worries. Have assigned you 🙂

Would encourage sending a PR with what you have (even if it is incomplete and has bugs). It's possible we can spot things and share tips that would save you some time 😉

@caviere
Copy link
Contributor

caviere commented Oct 23, 2022

@jakirkham thank you. I've been able to make a small progress. The code is now printing the name of the groups. If it is okay with you, I'll send it on or before tuesday for your review, guidance and approval.

@jakirkham
Copy link
Member

Yep sounds great! Thank you 🙏

@caviere
Copy link
Contributor

caviere commented Oct 27, 2022

@jakirkham @joshmoore after going over it several times, I think the best option is for users to use visit rather than add methods for find and findall since visit already provides enough to quickly implement these two, I can't find a single situation where find and findall might provide additional benefit.

Consider the following:

root = zarr.group()
foo = root.create_group("foo")
bar = root.create_group("bar")
root.create_group("aaa").create_group("bbb").create_group("ccc").create_group("aaa")

If we print tree we get:

/
 ├── aaa
 │   └── bbb
 │       └── ccc
 │           └── aaa
 ├── bar
 └── foo

To implement find, that is the first path that has pattern aaa, we do the following. Since we only want the first match, we return a non-None value to stop further iteration.

pattern = re.compile("aaa")
found = None

def find(path):
    nonlocal found
    if pattern.search(path) is not None:
        found = path
        return True

root.visit(find)
print(found)
# ['aaa']

To implement findall, we gather all the results into a list:

pattern = re.compile("aaa")
found = []

def findall(path):
    if pattern.search(path) is not None:
        found.append(path)

root.visit(findall)
print(found)
# ['aaa', 'aaa/bbb', 'aaa/bbb/ccc', 'aaa/bbb/ccc/aaa']

In some cases, we only want to search on the last part of the path, in this case, we can filter out the prefix using a greedy regex:

prefix_pattern = re.compile(r".*/")
pattern = re.compile("aaa")
found = [

def findall(path):
    match = prefix_pattern.match(path)
    if match is None:
        name = path
    else:
        _, end = match.span()
        name = path[end:]

    if pattern.search(name) is not None:
        found.append(path)
    return None

root.visit(findall)
print(found)
# ['aaa', 'aaa/bbb/ccc/aaa']

@jakirkham since I'd like to make a contribution, is it possible for you to consider a couple of linting errors I found and fixed while I was working on this, here's my pull request #1226

@joshmoore
Copy link
Member

Also my apologies for sending my PR two days late.

No problems at all.

Consider the following:

This is a great write up. Thank you.

after going over it several times, I think the best option is for users to use visit rather than add methods for find and findall

Motion to close then? (auto-destruct initiated...)

@jakirkham
Copy link
Member

Thanks for the writeup! We could always include these as examples in the docs. Perhaps they provide a good starting place for users wanting similar functionality or just generally wanting to learn about visit* methods

@joshmoore
Copy link
Member

Good idea!

@caviere
Copy link
Contributor

caviere commented Oct 31, 2022

yes, it is okay if it's closed

@joshmoore joshmoore added the good-first-issue Good place to get started as a new contributor. label Oct 31, 2022
@joshmoore joshmoore changed the title find, findall Document find, findall methods using visit() Oct 31, 2022
@caviere
Copy link
Contributor

caviere commented Nov 2, 2022

@joshmoore I have started working on the documentation here #1241 , there are a couple of errors I am trying to fix:

  • on running make html it says there is an indentation error
  • on running python -m pytest -v --doctest zarr, it says Expected is 'aaa' but Got is aaa, yet I have it as 'aaa'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements good-first-issue Good place to get started as a new contributor. help wanted Issue could use help from someone with familiarity on the topic
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants