Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect backward EigenmodeCoefficient gradients and update test case #1593

Merged
merged 3 commits into from
Sep 30, 2021

Conversation

ianwilliamson
Copy link
Contributor

@ianwilliamson ianwilliamson commented Jun 4, 2021

This PR continues from #1592 to update the test for the Meep-JAX wrapper to check the gradients from mpa.EigenmodeCoefficients with forward=False. In particular, this PR extends the straight waveguide test case with an additional source on the RHS of the waveguide and with two additional mpa.EigenmodeCoefficients such that both the forward and backward traveling waves on each side of the waveguide are monitored and can be included in the loss function.

Currently, the cases that check the gradient of the transmission from right to left are failing (selected via excite_port_idx param of the test case), presumably because of #1585. Note that another option for exposing the issue described in #1585 is to check the gradient of reflection from the straight waveguide against finite differences (although this quantity should be close to zero).

@ianwilliamson
Copy link
Contributor Author

ianwilliamson commented Jun 4, 2021

As a starting point, it seems that the fix suggested in #1585 of adding:

kpoint_func=lambda a,b: mp.Vector3(x=-1)

to the constructor of EigenmodeCoefficient does not seem to have any effect. That was an error on my part.

@ianwilliamson ianwilliamson force-pushed the adjoint-backward-grad-fix branch 5 times, most recently from b7dacf3 to 4ea9ee5 Compare June 5, 2021 01:56
@ianwilliamson
Copy link
Contributor Author

The issue described in #1585 should be fixed with this.

With this PR, I am proposing a slight refactoring where a private _direction_vector is defined upon registering the mode monitors in the forward simulation. This direction vector is then used both for pulling out the coefficients and for setting up the adjoint source (flipping the direction vector).

@ianwilliamson ianwilliamson changed the title WIP: Update test case to check gradients for backward oriented monitors Update test case to check gradients for backward oriented monitors Jun 5, 2021
@ianwilliamson ianwilliamson changed the title Update test case to check gradients for backward oriented monitors Fix incorrect backward EigenmodeCoefficient gradients and update test case Jun 5, 2021
@ianwilliamson ianwilliamson requested a review from smartalecH June 5, 2021 02:13
@smartalecH
Copy link
Collaborator

Rather than make the hack more elegant, why not focus on fixing the actual bug?

@ianwilliamson
Copy link
Contributor Author

Rather than make the hack more elegant, why not focus on fixing the actual bug?

Is there some reason that it's not advisable to define the forward orientation as being along the negative direction of an axis? In any case, this approach at least leads to correct gradients until someone is able to look into this more.

Comment on lines 133 to 166
if self._monitor.normal_direction == 0:
direction_vector = mp.Vector3(x=1)
elif self._monitor.normal_direction == 1:
direction_vector = mp.Vector3(y=1)
elif self._monitor.normal_direction == 2:
direction_vector = mp.Vector3(z=1)
Copy link
Collaborator

@smartalecH smartalecH Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this preclude us from passing arbitrary kpoints for oblique waveguides?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reintroduced the kpoint_func param. WDYT?

Copy link
Collaborator

@smartalecH smartalecH Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems right from a high level -- thanks for looking into this for me.

Have you created a test for it by chance? Or are all the tests just forward/backward (or equally up/down)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case updated in this PR checks forward and backward gradients via the forward parameter of EigenmodeCoefficient, but does not have a case where kpoint_func is specified. I've personally not used the oblique waveguide capability but if you have something in mind let me know.

@smartalecH
Copy link
Collaborator

Is there some reason that it's not advisable to define the forward orientation as being along the negative direction of an axis?

In theory, no, but if there's a bug causing the issue we should probably fix the bug. Especially since it might lead to fixing other issues.

I'm a little worried this approach actually limits the user (i.e. me) from defining arbitrary modes (e.g. diffraction modes, oblique waveguide modes, etc.)

@ianwilliamson
Copy link
Contributor Author

In theory, no, but if there's a bug causing the issue we should probably fix the bug.

Sure, I agree, but the concern here is that incorrect results are being produced... so we should fix that first.

I'm a little worried this approach actually limits the user (i.e. me) from defining arbitrary modes (e.g. diffraction modes, oblique waveguide modes, etc.)

Are you saying that you want the ability override the forward parameter by passing in a kpoint_func? That's fine. I think we could support that while still fixing the issue with the gradients.

@ianwilliamson
Copy link
Contributor Author

I think we could support that while still fixing the issue with the gradients.

I've reintroduced the kpoint_func parameter of EigenmodeCoefficient. Let me know what you think!

@ianwilliamson ianwilliamson force-pushed the adjoint-backward-grad-fix branch 2 times, most recently from 2646e70 to 7a540be Compare June 7, 2021 19:47
kpoint_func = self.kpoint_func
overlap_idx = self.kpoint_func_overlap_idx
else:
kpoint_func = lambda *not_used: self._direction_vector
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right — the k point is dimensionful, so the default guess is scaled by fcen*(refractive index).

For comparison, this is the default guess in mpb.cpp:

kmatch = frequency * sqrt(real(get_eps(cen, frequency)) * real(get_mu(cen, frequency)));

if self.kpoint_func:
eig_kpoint = self.kpoint_func(time_src.frequency, self.mode)
else:
eig_kpoint = self._direction_vector
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite the right guess, since it has the wrong units. It would be better to use mpb.cpp's guess for this, which is obtained by passing (0.0,0.0,0.0) for the k-point. To tell it to flip the sign, we could pass (-0.0,-0.0,-0.0) and use the sign of zero:

diff --git a/src/mpb.cpp b/src/mpb.cpp
index c41e3c2a..5a68ca7b 100644
--- a/src/mpb.cpp
+++ b/src/mpb.cpp
@@ -484,7 +484,7 @@ void *fields::get_eigenmode(double frequency, direction d, const volume where, c
   // which we automatically pick if kmatch == 0.
   if (match_frequency && kmatch == 0) {
     vec cen = eig_vol.center();
-    kmatch = frequency * sqrt(real(get_eps(cen, frequency)) * real(get_mu(cen, frequency)));
+    kmatch = copysign(frequency * sqrt(real(get_eps(cen, frequency)) * real(get_mu(cen, frequency))), kmatch);
     if (d == NO_DIRECTION) {
       for (int i = 0; i < 3; ++i)
         k[i] = dot_product(R[i], kdir) * kmatch; // kdir*kmatch in reciprocal basis

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the sign is being lost in kmatch, so you'd have to look at _kpoint.in_direction(d) as the second argument of copysign

@ianwilliamson ianwilliamson force-pushed the adjoint-backward-grad-fix branch from 7a540be to 0e34127 Compare June 10, 2021 16:49
@ianwilliamson
Copy link
Contributor Author

Okay, following up on this... it seems that the copysign() approach while passing mp.Vector3(0.0, 0.0, 0.0) and mp.Vector3(-0.0, -0.0, -0.0) for forward=True and forward=False doesn't work. I added a print statement in mpb.cpp to print out kmatch and it seem to never change sign. Is that expected? Maybe we need to copy the sign from another quantity in that function?

@stevengj
Copy link
Collaborator

Maybe it is easier to just pass a boolean flag.

@ianwilliamson
Copy link
Contributor Author

Maybe it is easier to just pass a boolean flag.

I agree. In the future it would probably be ideal to add a boolean flag, rather than having the user pass +/- zero, to specify the direction in MPB.

Looks like the sign is being lost in kmatch, so you'd have to look at _kpoint.in_direction(d) as the second argument of copysign

I can confirm that the suggestion above in #1593 (comment) successfully sets the sign of kmatch from the input _kpoint. However, I am still unable to get the correct gradients.

I noticed that when |kpoint| == 0, MPB requires you to use direction=mp.AUTOMATIC, rather than direction=mp.NO_DIRECTION. In the approach that I used before, where I obtained correct gradients in both directions, I was using direction=mp.NO_DIRECTION. In that case, I was also passing a non-zero k-vector. This seems to suggest that there is more going on down in that MPB function, beyond selecting the sign of kmatch, that we need to obtain correct gradients along both directions.

Any other thoughts? I would be in favor of using my original approach (passing a vector scaled by the frequency to specify the direction) in the near term if this is going to require a bunch of hacking in the lower level MPB code. I realize that there are some edge cases around users picking potentially weird materials or different units, but right now the adjoint module is producing incorrect gradients and I think that is probably a bigger issue.

@stevengj
Copy link
Collaborator

stevengj commented Jun 11, 2021

I can confirm that the suggestion above in #1593 (comment) successfully sets the sign of kmatch from the input _kpoint. However, I am still unable to get the correct gradients.

Hmm, looking at the code it looks like the reason might be that it is then computing the group velocity in the wrong direction, since kdir is now opposite to the direction of k = kmatch * kdir:

maxwell_ucross_op(W[0], W[1], mdata, kdir); // W[1] = (dTheta/dk) W[0]

Maybe it would be better to flip the sign of kdir, which corresponds more closely to what happens if you pass an initial-guess kpoint in the opposite direction. That is, instead of the copysign line I suggested above, after this line do:

if (std::signbit(_kpoint.in_direction(d)))
    kdir[d - X] = -1; // -0.0 was passed, flip k direction

@ianwilliamson
Copy link
Contributor Author

Maybe it would be better to flip the sign of kdir, which corresponds more closely to what happens if you pass an initial-guess kpoint in the opposite direction.

kdir[d - X] = -1;

This (assignment of a negative value) seems to cause a SIGSEGV error.

@oskooi
Copy link
Collaborator

oskooi commented Jun 12, 2021

Using gdb on the latest commit in this branch and running the unit test python/tests/test_adjoint_solver.py produces this backtrace:

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x00007ffff624cf78 in meep::vec::operator-= (this=0x80, a=...) at ../src/meep/vec.hpp:544
544	    LOOP_OVER_DIRECTIONS(dim, d) t[d] -= a.t[d];
(gdb) bt
#0  0x00007ffff624cf78 in meep::vec::operator-= (this=0x80, a=...) at ../src/meep/vec.hpp:544
#1  0x00007ffff5d30f7a in meep::fields::add_eigenmode_source (this=0x20b0d30, c0=meep::Dielectric, src=..., 
    d=meep::X, where=..., eig_vol=..., band_num=1, kpoint=..., match_frequency=true, parity=0, resolution=0, 
    eigensolver_tol=9.9999999999999998e-13, amp=..., A=0x0, dp=0x0) at mpb.cpp:816

The line causing the segmentation fault is mpb.cpp:816 which in the master branch is:

global_eigenmode_data->center -= where.center();

But what is really causing the segfault is this line in python/adjoint/objective.py:

_KPOINT_NEG_DIR = mp.Vector3(-0.0, 0.0, 0.0)

Changing this manually to mp.Vector3(-1.0, 0.0, 0.0) and everything runs fine. I think the problem is that a negative zero floating point is not being recognized as signed but rather unsigned which is causing the error.

@ianwilliamson ianwilliamson force-pushed the adjoint-backward-grad-fix branch from 28757c1 to d3abd81 Compare June 23, 2021 00:49
src/mpb.cpp Outdated
@@ -485,6 +485,8 @@ void *fields::get_eigenmode(double frequency, direction d, const volume where, c
if (match_frequency && kmatch == 0) {
vec cen = eig_vol.center();
kmatch = frequency * sqrt(real(get_eps(cen, frequency)) * real(get_mu(cen, frequency)));
if (std::signbit(kpoint.in_direction(d)))
kdir[d - X] = -1; // -0.0 was passed, flip k direction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead, try putting this before the kdir[d - X] = 1; line

@ianwilliamson ianwilliamson force-pushed the adjoint-backward-grad-fix branch from d3abd81 to 2d46ab8 Compare June 23, 2021 03:46
@ianwilliamson
Copy link
Contributor Author

This is still failing on the check of the gradients in the backward direction using the suggested approach of flipping the sign of kmatch and passing direction=mp.AUTOMATIC from Python.

I was previously able to get the gradients to be correct in both directions only when direction=mp.NO_DIRECTION and a unit vector in the propagation direction was passed for kpoint. This seems to indicate that there is some additional stuff going on in the direction=mp.NO_DIRECTION branch of mpb.cpp that results in the correct fields.

@stevengj
Copy link
Collaborator

Actually, there is a more fundamental problem, it seems. If you look at the principle of equivalence, which is described in this line (which comes from equation 4.4 in this paper, then if you're trying to make a backwards mode we should really flip the sign of nhat.

Maybe, before this line, add a line

if (global_eigenmode_data->group_velocity < 0)
    amp *= -1; // equivalent to flipping the direction of nhat.

@ianwilliamson ianwilliamson force-pushed the adjoint-backward-grad-fix branch from 2d46ab8 to 08d3e7f Compare June 25, 2021 15:49
@ianwilliamson
Copy link
Contributor Author

This is still failing (currently along both directions) with the most recently suggested approach of changing the sign of amp based on the group velocity direction. To summarize, I am currently:

  • Using direction = mp.AUTOMATIC from Python for both the source construction and getting the eigenmode coeffs
  • Passing +/- mp.Vector3(0.0, 0.0, 0.0) to specify the direction of the mode
  • Flipping the sign of kmatch based on _kpoint.in_direction(d)
  • Flipping the sign of amp based on global_eigenmode_data->group_velocity < 0

These changes are all pushed if you want to take a look and confirm that I'm not doing something unintended. As I mentioned above, I was only able to obtain correct gradients when using direction = mp.NO_DIRECTION from Python.

@stevengj
Copy link
Collaborator

stevengj commented Jul 1, 2021

Would be good to visualize the mode that is being launched here, to verify that it is going in the backward direction, or if fundamentally the mode that is being launched is incorrect.

@oskooi
Copy link
Collaborator

oskooi commented Aug 26, 2021

If we plot the Hz and Ez scalar fields for just one of the two EigenModeSources with eig_kpoint=mp.Vector3(-1, 0, 0) (i.e., from the right to the left with the source on the right side of the design region) then it does seem that the backward-propagating mode is being launched correctly.

Hz
Hz_5

Ez
Ez_5

@oskooi
Copy link
Collaborator

oskooi commented Aug 26, 2021

Switching the source to:

    mp.EigenModeSource(
      mp.GaussianSource(frequency=fmean, fwidth=fmean * gaussian_rel_width),
      eig_band=1,
      direction=mp.AUTOMATIC,
      size=mp.Vector3(0, wg_width + 2 * wg_padding, 0),
      center=[sx / 2 - pml_width - source_to_pml, 0, 0],
    ),

and turning verbosity on for the MPB eigenmode solver produces:

MPB solved for frequency_1(-2.0249,0,0) = 0.645161 after 1 iters
Newton step: group velocity v=-0.283247, kmatch=-2.0249
Dominant planewave for band 1: (-2.024903,0.000000,0.000000)

The field profiles do show a backward-propagating mode but the phase of Ez has shifted by π.

Ez
Ez_new

Hz
Hz_new

edit The Hz field is tiny compared to the Ez:

max(Hz): 1.3032941913182494e-11
max(Ez): 0.0001109920849557966

@stevengj
Copy link
Collaborator

Two things:

  1. In your test, above, @oskooi, you are plotting both Ez and Hz. However, one of those is "numerical noise" from the mode solver (and hence its sign is not meaningful), since the eigenmode should be purely polarized. The symptom should be that one of these two fields has vastly smaller amplitude than the other.

  2. @ianwilliamson, in your test, we want to isolate whether the problem is the eigenmode source, the eigenmode coefficient, or the adjoint source constructed from the eigenmode coefficient. So, to start with, in your "mirror-flipped" simulation, I would use the flipped eigenmode source (with kx=-0.0), but use an "ordinary" eigenmode coefficient (i.e. with the default eig_kpoint, and just request the mode coefficient in the "backwards" direction, which is a different element of the coefficient output).

@ianwilliamson ianwilliamson force-pushed the adjoint-backward-grad-fix branch from 7910d73 to ee7e823 Compare September 28, 2021 22:35
… test to check gradients in both forward and backward directions

Flip sign of `amp` based on group velocity direction

Original fix
@ianwilliamson ianwilliamson force-pushed the adjoint-backward-grad-fix branch from ee7e823 to a579d4d Compare September 28, 2021 22:36
@ianwilliamson
Copy link
Contributor Author

I've reverted this PR to use my original approach, which was to pass an appropriate signed unit vector as the k-point to MPB when defining the source and when evaluating the eigenmode coefficients. As @stevengj pointed out above, this approach has the disadvantage of not being scaled correctly / not having the appropriate units. However, I personally believe this is an acceptable trade off in order to fix the more immediate issue with the gradients being incorrect. In practice, MPB usually finds the correct mode. Perhaps we can fork off a separate issue to focus on the work of figuring out a more long term solution with what's going on down in the lower level MPB code.

WDYT?

@codecov-commenter
Copy link

codecov-commenter commented Sep 28, 2021

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 80.95238% with 4 lines in your changes missing coverage. Please review.

Project coverage is 74.42%. Comparing base (553e6e7) to head (75af0a0).
Report is 383 commits behind head on master.

Files with missing lines Patch % Lines
python/adjoint/objective.py 80.95% 4 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1593      +/-   ##
==========================================
+ Coverage   74.39%   74.42%   +0.02%     
==========================================
  Files          13       13              
  Lines        4581     4586       +5     
==========================================
+ Hits         3408     3413       +5     
  Misses       1173     1173              
Files with missing lines Coverage Δ
python/adjoint/objective.py 91.76% <80.95%> (+0.85%) ⬆️

... and 2 files with indirect coverage changes

@stevengj
Copy link
Collaborator

Test failure looks relevant:

FAIL: test_adjoint_solver_eigenmode (__main__.TestAdjointSolver)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/meep/meep/build/meep-1.21.0-beta/_build/sub/python/../../../python/tests/test_adjoint_solver.py", line 234, in test_adjoint_solver_eigenmode
    self.assertClose(adjsol_obj,S12_unperturbed,epsilon=1e-6)
  File "/home/runner/work/meep/meep/build/meep-1.21.0-beta/python/tests/utils.py", line 28, in assertClose
    self.assertLessEqual(diff_norm, epsilon * np.maximum(x_norm, y_norm), msg)
AssertionError: 0.00012112395228314199 not less than or equal to 0.00012045757274762158 : 

@oskooi
Copy link
Collaborator

oskooi commented Sep 29, 2021

The failing test occurs for only the single-precision floating point case. We can just probably lower the tolerance (epsilon=1e-6) slightly and it should be fine.

python/tests/test_adjoint_jax.py Outdated Show resolved Hide resolved
python/adjoint/objective.py Outdated Show resolved Hide resolved
python/adjoint/objective.py Outdated Show resolved Hide resolved
…cient creation comprehension in test, fix design resolution calculation in test
@ianwilliamson
Copy link
Contributor Author

The failing test occurs for only the single-precision floating point case. We can just probably lower the tolerance (epsilon=1e-6) slightly and it should be fine.

That test seems to be flaky.

@stevengj stevengj merged commit 9ec7ba7 into NanoComp:master Sep 30, 2021
@ianwilliamson ianwilliamson deleted the adjoint-backward-grad-fix branch October 1, 2021 16:00
mawc2019 pushed a commit to mawc2019/meep that referenced this pull request Nov 3, 2021
…st case (NanoComp#1593)

* Fix backward gradients in EigenmodeCoefficient and update JAX adjoint test to check gradients in both forward and backward directions

Flip sign of `amp` based on group velocity direction

Original fix

* Scale kpoint by abs(center_frequency) and apply yapf formatting

* Fix parenthesis in center frequency calculation, fix eigenmode coefficient creation comprehension in test, fix design resolution calculation in test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants