-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add get_name patch/spool method #471
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #471 +/- ##
=======================================
Coverage 99.85% 99.85%
=======================================
Files 115 115
Lines 9507 9548 +41
=======================================
+ Hits 9493 9534 +41
Misses 14 14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@ahmadtourei, mind taking this for a spin to see if it behaves how you wanted ? |
dascore/utils/patch.py
Outdated
Examples | ||
-------- | ||
>>> import dascore as dc | ||
>>> from dascore.utils.patch import get_patch_names | ||
>>> patch = dc.get_example_patch() | ||
>>> name = get_patch_names(patch) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name
is:
0 DAS_______random__2017_09_18__2017_09_18T00_00_07
Name: network, dtype: object
We get DAS_______random__2017_09_18__2017_09_18T00_00_07
using name[0]
because the function returns a <class 'pandas.core.series.Series'>
. Is there a specific reason that it returns a pandas series with Name: network, dtype: object
(a little confusing) instead of the names in string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for the function to work with a spool it needs to return multiple names. I avoided returning multiple types (based on if the input is a Patch or not) from get_patch_names
as that is a bit of an anti-pattern.
I understand the confusion though. I will add get_patch_name
as a method of Patch
that will do this. I think with the different name (get_patch_name
vs get_patch_names
) it will be clear.
patch_data: pd.DataFrame | dc.Patch | dc.BaseSpool, | ||
prefix="DAS", | ||
attrs=("network", "station", "tag"), | ||
coords=("time",), | ||
sep="__", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need test(s) for optional inputs . For example, get_patch_names(patch, patch)
does not raise an error.
tests/conftest.py
Outdated
@pytest.fixture(scope="session") | ||
def random_spool_directory(tmp_path_factory): | ||
"""A directory with a few patch files.""" | ||
out = Path(tmp_path_factory.mktemp("one_file_file_spool")) | ||
spool = ex.get_example_spool("random_das") | ||
out_path = ex.spool_to_directory(spool, path=out) | ||
return dc.spool(out_path).update() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have this as an example in dascore.examples as all of the example spools are memory spools.
dascore/utils/patch.py
Outdated
def _get_filename(path_ser): | ||
"""Get the file name from a path series.""" | ||
ser = path_ser.astype(str) | ||
file_names = [x[-1].split(".")[0] for x in ser.str.split("/")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we introduce an optional argument (such as keep_extension) so the user can decide whether to get patch's name with or without file extension? I can work on this as I already implemented this in my version #461. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I have posted my comments below.
One general comment that I have is because this can be a very useful function, it would be great to create a shortcut for it (something like patch.name
and spool.name
so the user can quickly get name(s) in string format. Then we can include it in the Shortcuts section of dascore/docs/tutorial/patch.qmd
. I assume users with limited DASCore experience cannot quickly locate it in dascore.utils.patch
.
I agree. I think it should be a patch/spool method. Because there are optional parameters though, probably best not to make it an attribute (because then the users are always stuck with the defaults). |
Thanks @ahmadtourei for the great review. I have addressed all of your suggestions/concerns. Anything else you want to see in here? I also included a fix for the doc build so I am keen to merge this soon to see if the fix works. |
dascore/examples.py
Outdated
@@ -451,6 +451,24 @@ def random_spool(time_gap=0, length=3, time_min=np.datetime64("2020-01-03"), **k | |||
return dc.spool(out) | |||
|
|||
|
|||
@register_func(EXAMPLE_SPOOLS, key="radom_directory_das") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*random
The "name" of the patch, which is the filename that would be used if the patch were saved to disk by DASCore, can be generated via the [`Spool.get_patch_names`](`dascore.BaseSpool.get_patch_names`) to get a series of the names of the managed patches, or [`Patch.get_patch_name`](`dascore.Patch.get_patch_name`) to get a string of the patch name. | ||
|
||
```python | ||
import dascore as dc | ||
|
||
patch = dc.get_example_patch() | ||
spool = dc.get_example_spool() | ||
|
||
patch_name = patch.get_patch_name() | ||
patch_names = spool.get_patch_names() | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great!
Thanks, Derrick, for your work on this PR! It looks great. The only new comment I have is a typo I listed below. |
Description
This PR implements a
get_name
method for the patch and the spool to return a series of names. The names are either generated by columns provided to the function (which has sensible defaults to maintain the old behavior) or by existing columns such asname
orpath
.This PR supersede #461.
Checklist
I have (if applicable):