Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plenary: Cloud computing and cloud-optimized data formats #35

Open
JessicaS11 opened this issue May 1, 2024 · 3 comments
Open

Plenary: Cloud computing and cloud-optimized data formats #35

JessicaS11 opened this issue May 1, 2024 · 3 comments
Assignees
Labels
All plenary session for all attendees

Comments

@JessicaS11
Copy link

JessicaS11 commented May 1, 2024

Lead: Aimee Barciauskas
Date: 19/08/2024
Start Time: 1300
Duration: 45
Description:

Details

Learning Outcomes

  • outcome 1
  • outcome 2
  • outcome 3

People Developing the Tutorial (content creation, helpers, teachers)

Summary Description

  • Why we should care about cloud-optimized formats (now)?
  • What does it mean to be cloud-optimized?
  • Cloud formats and cloud computing
  • Demo of ICESat-2 in Parquet format using lonboard

Dependencies (things people should know in advance of the tutorial)

Technical Needs (GPUs? Large file storage? Unique libraries?)

@JessicaS11 JessicaS11 added ICESat-2 icesat-2 specific event Schedule labels May 1, 2024
@JessicaS11 JessicaS11 changed the title Tutorial: ICESat-2 Data Overview Tutorial: Cloud computing and cloud-optimized data formats May 1, 2024
@jomey jomey removed the Schedule label May 1, 2024
@JessicaS11 JessicaS11 added All plenary session for all attendees and removed ICESat-2 icesat-2 specific event labels May 9, 2024
@JessicaS11 JessicaS11 changed the title Tutorial: Cloud computing and cloud-optimized data formats Plenary: Cloud computing and cloud-optimized data formats May 9, 2024
@abarciauskas-bgse
Copy link

abarciauskas-bgse commented Jul 25, 2024

Outline

Preamble

We shouldn't have to think about formats so this tutorial is hopefully be obsolete in the next 5 years. But we have a long ways to go so we want to share with you what cloud optimized means and why you should care so you can help us get there.

  1. Why should you care - If you have any science that may requires multiple files and may be memory intensive when dealing with multiple files in-memory. If things are slow in accessing those files, you should know what to look for in explaining why and perhaps advocating for things to be better!
  2. What does it mean to be cloud-optimized
  3. Explain cloud optimized vs cloud native and go through formats you may see ICESat-2 products in
    4. HDF5 and cloud-optimized hdf5
    5. zarr (cloud native multi dimension, applies to higher level products)
    6. geoparquet
  4. Brief introduction to cloud computing: You are already using cloud computing on the CryoCloud! CryoCloud is providing in-region "collocated" compute. Other frameworks for using cloud optimized formats and parallel computing are dask/coiled and cubed. Many other serverless frameworks for parallel computing.
  5. Demo: Use sliderule to output in parquet demonstrate icesat-2 in geoparquet with lonboard

3 learning outcomes

  • what does it mean to be cloud optimized
  • somethings to look for if things are slow
  • understand what cloud optimized hdf5, zarr, and geoparquet are

@scottyhq
Copy link
Contributor

use sliderule to output in parquet demonstrate icesat-2 in geoparquet with lonboard

I'll help out to make sure this notebook from last year can work with lonboard on CryoCloud JupyterHub
https://icesat-2-2023.hackweek.io/tutorials/sliderule/parquet-s3.html

@abarciauskas-bgse
Copy link

As a point of comparison, my colleague Sean went through the process of creating parquet without sliderule and it is much more complicated: https://github.com/developmentseed/icesat-parquet/blob/main/atl08_earthaccess.ipynb. May be worth making that point so participants are motivated to use sliderule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
All plenary session for all attendees
Projects
Status: MONDAY - 19-08-2024
Development

No branches or pull requests

4 participants