Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Materialization of DFT field data on the master only #1797

Closed
ahoenselaar opened this issue Oct 22, 2021 · 3 comments · Fixed by #1855
Closed

Materialization of DFT field data on the master only #1797

ahoenselaar opened this issue Oct 22, 2021 · 3 comments · Fixed by #1855

Comments

@ahoenselaar
Copy link
Contributor

DFT fields returned to the user are gathered in full on each process. This can result in massive memory allocations, especially when multiple Meep processes are running on the same physical node.

In many scenarios, the DFT field data is not needed on each worker/process but only on the master. Being able to materialize DFT fields on the master only could alleviate many out-of-memory situations.

@stevengj
Copy link
Collaborator

stevengj commented Oct 23, 2021

(As much as possible, it would be nice to avoid collecting the DFT field data on any single process—which ultimately won't scale, because all the DFT fields will overwhelm the memory available on any single process if the problem gets big enough—but rather to leave it distributed and to compute with it in that form, e.g. as we do for the near-to-far transformation.)

@stevengj
Copy link
Collaborator

stevengj commented Nov 3, 2021

In particular, the adjoint solver should never call get_dft — it should compute a distributed dot product of the forward and adjoint fields.

@stevengj
Copy link
Collaborator

stevengj commented Nov 10, 2021

The outline I have in mind is:

  1. Distribute the computation of the "dot product" between the forward and adjoint fields that gives the gradient. This way, you won't need to collect the DFT fields on any process. However, the degrees of freedom and the gradients (just 2d arrays for 2d material grids) will still be replicated — these are much smaller than the DFT fields for 2d material grids, though!
  2. In the long run, one could have a distributed version of the CCSA algorithm, so that each process only stores a portion of the degrees of freedom (e.g. you break the material grid into "chunks" according to process boundaries, and chunks that are not needed are not stored locally) and the CCSA algorithm operates in parallel on distributed data. This way, from the user perspective it will be the same as now — you just write one "serial" Meep script that happens to run in parallel — but it will scale better to huge problems (e.g. volumetric degrees of freedom for 3d printing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants