You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am a principal scientist at a korean astronomy institute, especially interested in applying Big Data techs to Astronomical Problems.
I have found two issues when I try to run eazypy on my Spark Cluster.
[1] local file access for filters and parameters
When running programs on Cloud, we do not have local file system, though we have "bucket", a cloud storage.
Hence, all filters and sed-parameters need to be "in-memory" objects or "cloud-storable" objects.
your approach using symbolic links is not friendly for running eazypy on cloud or big data platform.
[2] your hard-wired, single node + multi-thread, optimization
Unfortunately, I have found many astronomical tools are hard-optimized on "single node" + "multithread".
This specific optimization is not good for writing a "scalable" code.
Just, single thread + one by one SED fitting architecture, not loading thousands objects with running them on multi-threads,
could be enough to massively parallelize the code for thousands or millions threads simulanesouly on hundreds multi-nodes cluster using big data platform.
===
I do not know whether this can be applied or not, but single node + multi-thread optimization is not good for both simple single thread run and massive multi-nodes run.
The text was updated successfully, but these errors were encountered:
Thank you for the feedback @shongscience. I agree that the file I/O and multi-threading are quite naive in the current version of eazy-py, so I'd be very interested in any suggestions on how to improve things to work more efficiently in cloud / cluster environments.
I am a principal scientist at a korean astronomy institute, especially interested in applying Big Data techs to Astronomical Problems.
I have found two issues when I try to run eazypy on my Spark Cluster.
[1] local file access for filters and parameters
When running programs on Cloud, we do not have local file system, though we have "bucket", a cloud storage.
Hence, all filters and sed-parameters need to be "in-memory" objects or "cloud-storable" objects.
your approach using symbolic links is not friendly for running eazypy on cloud or big data platform.
[2] your hard-wired, single node + multi-thread, optimization
Unfortunately, I have found many astronomical tools are hard-optimized on "single node" + "multithread".
This specific optimization is not good for writing a "scalable" code.
Just, single thread + one by one SED fitting architecture, not loading thousands objects with running them on multi-threads,
could be enough to massively parallelize the code for thousands or millions threads simulanesouly on hundreds multi-nodes cluster using big data platform.
===
I do not know whether this can be applied or not, but single node + multi-thread optimization is not good for both simple single thread run and massive multi-nodes run.
The text was updated successfully, but these errors were encountered: