Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few more clean-up re-organization items #11

Open
eeholmes opened this issue Apr 5, 2022 · 2 comments
Open

A few more clean-up re-organization items #11

eeholmes opened this issue Apr 5, 2022 · 2 comments

Comments

@eeholmes
Copy link
Member

eeholmes commented Apr 5, 2022

  • Move DataDownload into contributors?

Since contributors is the sandbox and DataDownload was/is a group sandbox, maybe we move it to contributors for now? I have some things in DataDownload to clean up still but not going to do that immediately.

  • Where should we have shared data?

The following are types of data that would be good to have in a shared space. None of these are huge.

  • Shared geojson files for different regions of interest.
  • ATL03 data (in some not too huge but easy to read-in format) for good test cases. Those working on photon classification algorithms just need a set of ATL03 data to work on and would be good to have consistent set test cases. I don't know what's a good format. I think it'll be a geopandas dataframe? that we want to save. Or at least a pandas dataframe.
@iled
Copy link
Member

iled commented Apr 5, 2022

  • Move DataDownload into contributors?

Since contributors is the sandbox and DataDownload was/is a group sandbox, maybe we move it to contributors for now? I have some things in DataDownload to clean up still but not going to do that immediately.

I think so, too.

  • Where should we have shared data?

As suggested in #6, I think a ./data directory would be a fine place for that purpose.

The following are types of data that would be good to have in a shared space. None of these are huge.

  • Shared geojson files for different regions of interest.

The example given in #6 was exactly on ROIs.

  • ATL03 data (in some not too huge but easy to read-in format) for good test cases. Those working on photon classification algorithms just need a set of ATL03 data to work on and would be good to have consistent set test cases. I don't know what's a good format. I think it'll be a geopandas dataframe? that we want to save. Or at least a pandas dataframe.

I think it makes sense to have a common place for things that are likely to be reused multiple times, and subsets for test cases are such an example. Other pieces of data specific to some example could reside for example in ./examples/data.

I am also not sure about the best format. Perhaps something based on hdf? I don't know what's more typical in the ICESat world. Note that (geo)pandas dataframes are not file formats, those are data structures that only exist when running code. I guess they could be dumped in binary form—e.g., to a pickle—but I think that would not be very portable.

@iled
Copy link
Member

iled commented Apr 5, 2022

I can do the refactoring of DataDownload and linked notebooks later this week or in the weekend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants