two requests about eazypy architecture for cloud computing and scalabilty #39

shongscience · 2024-02-21T08:38:12Z

I am a principal scientist at a korean astronomy institute, especially interested in applying Big Data techs to Astronomical Problems.

I have found two issues when I try to run eazypy on my Spark Cluster.

[1] local file access for filters and parameters
When running programs on Cloud, we do not have local file system, though we have "bucket", a cloud storage.
Hence, all filters and sed-parameters need to be "in-memory" objects or "cloud-storable" objects.

your approach using symbolic links is not friendly for running eazypy on cloud or big data platform.

[2] your hard-wired, single node + multi-thread, optimization
Unfortunately, I have found many astronomical tools are hard-optimized on "single node" + "multithread".

This specific optimization is not good for writing a "scalable" code.

Just, single thread + one by one SED fitting architecture, not loading thousands objects with running them on multi-threads,
could be enough to massively parallelize the code for thousands or millions threads simulanesouly on hundreds multi-nodes cluster using big data platform.

===
I do not know whether this can be applied or not, but single node + multi-thread optimization is not good for both simple single thread run and massive multi-nodes run.

gbrammer · 2024-07-04T10:33:43Z

Thank you for the feedback @shongscience. I agree that the file I/O and multi-threading are quite naive in the current version of eazy-py, so I'd be very interested in any suggestions on how to improve things to work more efficiently in cloud / cluster environments.

gbrammer · 2024-09-27T10:52:02Z

See #46 for new behavior to avoid the symbolic links. Updates to multithreading TBD.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

two requests about eazypy architecture for cloud computing and scalabilty #39

two requests about eazypy architecture for cloud computing and scalabilty #39

shongscience commented Feb 21, 2024

gbrammer commented Jul 4, 2024

gbrammer commented Sep 27, 2024 •

edited

Loading

two requests about eazypy architecture for cloud computing and scalabilty #39

two requests about eazypy architecture for cloud computing and scalabilty #39

Comments

shongscience commented Feb 21, 2024

gbrammer commented Jul 4, 2024

gbrammer commented Sep 27, 2024 • edited Loading

gbrammer commented Sep 27, 2024 •

edited

Loading