Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

two requests about eazypy architecture for cloud computing and scalabilty #39

Open
shongscience opened this issue Feb 21, 2024 · 2 comments

Comments

@shongscience
Copy link

I am a principal scientist at a korean astronomy institute, especially interested in applying Big Data techs to Astronomical Problems.

I have found two issues when I try to run eazypy on my Spark Cluster.

[1] local file access for filters and parameters
When running programs on Cloud, we do not have local file system, though we have "bucket", a cloud storage.
Hence, all filters and sed-parameters need to be "in-memory" objects or "cloud-storable" objects.

your approach using symbolic links is not friendly for running eazypy on cloud or big data platform.

[2] your hard-wired, single node + multi-thread, optimization
Unfortunately, I have found many astronomical tools are hard-optimized on "single node" + "multithread".

This specific optimization is not good for writing a "scalable" code.

Just, single thread + one by one SED fitting architecture, not loading thousands objects with running them on multi-threads,
could be enough to massively parallelize the code for thousands or millions threads simulanesouly on hundreds multi-nodes cluster using big data platform.

===
I do not know whether this can be applied or not, but single node + multi-thread optimization is not good for both simple single thread run and massive multi-nodes run.

@gbrammer
Copy link
Owner

gbrammer commented Jul 4, 2024

Thank you for the feedback @shongscience. I agree that the file I/O and multi-threading are quite naive in the current version of eazy-py, so I'd be very interested in any suggestions on how to improve things to work more efficiently in cloud / cluster environments.

@gbrammer
Copy link
Owner

gbrammer commented Sep 27, 2024

See #46 for new behavior to avoid the symbolic links. Updates to multithreading TBD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants