Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow passing in pandas dataframes to x2sys_cross #591

Merged
merged 21 commits into from
Sep 10, 2020

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Sep 8, 2020

Description of proposed changes

Run crossover analysis directly on pandas.DataFrame inputs instead of having to write to tab-separated value (TSV) files first!

Example code:

import pygmt
from tempfile import TemporaryDirectory

dataframe: pd.DataFrame = pygmt.datasets.load_sample_bathymetry()
dataframe.columns = ["x", "y", "z"]  # longitude, latitude, bathymetry

os.environ["X2SYS_HOME"] = os.getcwd()

with TemporaryDirectory(prefix="X2SYS", dir=os.environ["X2SYS_HOME"]) as tmpdir:
    tag = os.path.basename(tmpdir)
    pygmt.x2sys_init(tag=tag, fmtfile="xyz", suffix="xyz", force=True)
    output: pd.DataFrame = pygmt.x2sys_cross(tracks=[dataframe], tag=tag, coe="i", verbose="i")

This isn't a trivial thing to implement, because:

  • x2sys requires those TSV files in quite a specific format (especially for the datetime columns)
  • The TSV files cannot be in any arbitrary directory like /tmp, it must be stored either in the current working directory or in specific locations listed in the TAG_paths.txt file.

Support for pandas DataFrame inputs into x2sys_cross was originally left out in the original implementation at #546 because we wanted to wait for GenericMappingTools/gmt#3717. But seeing as it's not a trivial matter, this is an interim solution.

Fixes #

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Implemented by storing pandas.DataFrame data in a temporary file and passing this intermediate file to x2sys_cross. Need to do some regex file parsing to get the right file extension (suffix) for this to work.
@weiji14 weiji14 added the enhancement Improving an existing feature label Sep 8, 2020
So that the tests will pass on macOS and Windows too.
Because Windows (and macOS?) might not support opening same temporary file twice.
@vercel vercel bot temporarily deployed to Preview September 9, 2020 21:55 Inactive
@weiji14 weiji14 marked this pull request as draft September 9, 2020 23:53
@weiji14 weiji14 marked this pull request as ready for review September 9, 2020 23:53
pygmt/x2sys.py Outdated
Comment on lines 296 to 302
) # e.g. "-Dxyz -Etsv -I1/1"
try:
# 1st try to match file extension after -E
suffix = re.search(pattern=r"-E(\S*)", string=lastline).group(1)
except AttributeError: # 'NoneType' object has no attribute 'group'
# 2nd try to match file extension after -D
suffix = re.search(pattern=r"-D(\S*)", string=lastline).group(1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to check if -Exyz is in the string, and if not, fallback to parsing from Dxyz?

Copy link
Member

@seisman seisman Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this one?

lastline = "-Dxyz -Etsv -I1/1"
#lastline = "-Dxyz -I1/1"

for item in lastline.split():
    for key in ['-E', '-D']:
        if item.startswith(key):
            suffix = item[2:]
            break
print(suffix)

Note: the code may be wrong, but maybe some codes like this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this gives me some ideas. I'll play around with it, thanks!

Also rename 'result' to 'table' to prevent pylint complaining about R0914: Too many local variables (16/15) (too-many-locals)
Copy link
Member Author

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok ready for review! I understand that this isn't easy to review properly, but would really appreciate getting this in for v0.2.0 tomorrow as I'll be using it for my PhD research. There's two unit tests added, one for internal crossovers (i.e. on 1 track) and one for external crossovers (2 tracks). I'll add a tutorial example for this over the weekend to explain things better.

Comment on lines +30 to +40
try:
tmpfilename = f"track-{unique_name()[:7]}.{suffix}"
track.to_csv(
path_or_buf=tmpfilename,
sep="\t",
index=False,
date_format="%Y-%m-%dT%H:%M:%S.%fZ",
)
yield tmpfilename
finally:
os.remove(tmpfilename)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original implementation using GMTTempFile/NamedTemporaryFile didn't work because of some permissions issues (on macOS/Windows), which is why this try-finally block is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code quality looks good. As you're the one who develops and uses these functions, we have to trust you. 😄

Just one suggestion, add the comment to the codes, explaining why you use unique_name here.

That's the first question when I read your codes before I see your comment here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code quality looks good. As you're the one who develops and uses these functions, we have to trust you. smile

It's all Paul's work done a decade ago, I'm just wrapping it in Python so more people can use it easily 😃 You won't believe how many 'crossover analysis' tools have been written again and again, but that's another story.

Just one suggestion, add the comment to the codes, explaining why you use unique_name here.

That's the first question when I read your codes before I see your comment here.

Ok, will do.

@weiji14 weiji14 merged commit 6deb388 into master Sep 10, 2020
@weiji14 weiji14 deleted the x2sys_cross_dataframes branch September 10, 2020 05:08
weiji14 added a commit to weiji14/deepicedrain that referenced this pull request Sep 13, 2020
Bumps [pygmt](https://github.com/GenericMappingTools/pygmt) from 0.1.2-36-g4939ee2a to 0.2.0.
  - [Release notes](https://github.com/GenericMappingTools/pygmt/releases)
  - [Changelog](https://github.com/GenericMappingTools/pygmt/blob/master/doc/changes.rst)
  - [Commits](GenericMappingTools/pygmt@v0.1.2-36-g4939ee2a...v0.2.0)

This includes several enhancements such as 'Sensible array outputs for pygmt info' (GenericMappingTools/pygmt#575) and 'Allow passing in pandas dataframes to x2sys_cross' (GenericMappingTools/pygmt#591) that will make our crossover analysis work and figure generation easier! Also edited Github Actions workflow to only run Docker build on Pull Requests when ready to review or when review is requested (i.e. not when PR is in draft mode).
weiji14 added a commit to weiji14/deepicedrain that referenced this pull request Sep 15, 2020
Bumps [pygmt](https://github.com/GenericMappingTools/pygmt) from 0.1.2-36-g4939ee2a to 0.2.0.
  - [Release notes](https://github.com/GenericMappingTools/pygmt/releases)
  - [Changelog](https://github.com/GenericMappingTools/pygmt/blob/master/doc/changes.rst)
  - [Commits](GenericMappingTools/pygmt@v0.1.2-36-g4939ee2a...v0.2.0)

This includes several enhancements such as 'Sensible array outputs for pygmt info' (GenericMappingTools/pygmt#575) and 'Allow passing in pandas dataframes to x2sys_cross' (GenericMappingTools/pygmt#591) that will make our crossover analysis work and figure generation easier! Also edited Github Actions workflow to only run Docker build on Pull Requests when ready to review or when review is requested (i.e. not when PR is in draft mode).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants