-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply minimum linewidth constraint only to a binary design of the waveguide mode converter #32
Conversation
…eguide mode converter
Regarding the spikes in the convergence plot: this is a normal feature of nonlinear optimization algorithms. Sometimes, they take too large a step, which turns out to make the objective much worse, and then the algorithm has to somehow "backtrack" and take a smaller step (which in CCSA happens by increasing a penalty term). This kind of thing is especially prominent in badly conditioned problems (with second derivatives in some directions much larger than in others), corresponding to optimizing along a "narrow ridge" or valley. |
So even though your design is "pretty" binary, I'm a little worried we are declaring victory too early. Primarily because the linewidth constraints aren't truly in effect. The optimizer is working hard to satisfy them, but never gets to a point where it can return to improving performance. take a look at some of our convergence plots in our paper (where "larger is better", unlike the current example where "smaller is better"): Notice how the optimizer rebounds? The means two things: (1) the GLC constraints are satisfied; (2) the performance is actually being optimized with the constraints active. Also notice the lower values of |
Good point. For a minimum lengthscale constraint of 90 nm (results shown below), it does seem that the optimizer improves the objective function in the final epoch (in which the constraint is activated). The final design also shows decent broadband performance. As a separate issue, when I measure the minimum lengthscale of the 90 nm design using @mawc2019's ruler in this repository, I get a value of 63 nm. Since the design pixels are 10 nm in length, this means there is roughly a three-pixel mismatch between the 1. reflectance into mode 1
2. transmittance into mode 2
CSV file of final design: |
When I used another version of |
Ok so we are making progress. But as discussed, there's a few things to consider: (1) in the last epoch, when we "calibrate" the dummy parameter ( (2) this is the "fun part" of TO! Until somebody automates this part (any takers?) then each problem requires a bit of detective work to nail down the proper hyperparameters. For example, in addition to the dummy parameter, it's always good to look at the gradients of the FOMs and the constraint functions. The gradients will tell you a lot about what's going on... When trying to nail down a proper |
|
The underestimated minimum length scale given by If the left and right sides of the pattern are extended as follows, the minimum length scale estimated by
In the current version of |
By enlarging the mask at the edge of the design region from one pixel to the filter radius, the measured lengthscale of the final designs (using @mawc2019's ruler) are now more consistent with the imposed lengthscale constraint. Based on this change as well as adjustment of the constraint-function hyperparameters, I have added designs with lengthscale constraints of 50 nm, 60 nm, 70 nm, 80 nm, and 90 nm. The measured lengthscale is consistently larger than the imposed constraint except for the 80 nm case for which the measured lengthscale is 75 nm (which is within one pixel dimension). The performance metrics for each design are added to the |
Would be good to compare your objective function max(R + 1-T) to the same quantity for Ian's — ideally we should be doing at least as well as Ian's structure on this metric, since he's optimizing a somewhat different objective. If the max(R + 1-T) data is not easily accessible for Ian, you can bound it above by max(R + 1-T) ≤ max(R) + max(1-T). |
It should be possible to compute this quantity since it depends on the reflection and transmission that are calculated by the script checked into this repository. Another option for comparison here would be the quantities that the script in this repository reports: worst case transmission and worst case reflection. |
As discussed, those flat regions in the beginning are odd. They could be due to the subpixel smoothing... they could be due to the damping factor, or some permutation of the two? Also, that hard flat regions at the end of the optimization are somewhat discouraging... my experience has taught me to expect something much more smooth and organic (like an asymptotic convergence etc.) So as discussed, we should look at |
Specifying an initial value for the epigraph variable based on the objective function and the GLC constraints seems to make the final design worse compared to setting the epigraph variable based on the objective function alone (at least for the test case involving a minimum lengthscale of 70 nm). Why does this happen? As shown in the plot of the epigraph variable and objective function (R+1-T) vs. iteration number (second figure below), this is probably because the value of the GLC constraint is two orders of magnitude larger than the objective function (~100 vs. ~1). With such a large disparity in magnitude, the optimizer is mainly working to satisfy the GLC constraint rather than minimizing the objective function.
I tried modifying the setup such that damping and subpixel smoothing were applied separately rather than together but this did not seem to affect the results. Specifically, I applied damping and no subpixel smoothing for the first two epochs (β=8, 16) in which the design is mostly greyscale and subpixel smoothing (and no damping) for the remaining epochs (β = 32, 64, 128, and 256) in which the design is mostly binarized. reflectance/transmittance spectra of final design (performance is worse than reference design in this PR)
|
I maintain that the initial flat region is due to the incorrect adjoint gradients around the initial guess, i.e., |
Even with a random rather than a uniform initial structure, the "flat region" at the start of the first epoch is still present. |
The best thing to do at this point is to closely look at all the outputs (gradients, indicator functions, FOM values, The fact that the dummy parameter started to go down, and then flat-lines shows that something is going on. Looking at all the data is the only way to figure out what that something is. |
Perhaps my previous test on a random structure was not presentative. What is your random seed? I can test the gradients related to your random structure. |
As discussed, we could try "scheduling" the I still think there is something else going on here... |
We also discussed the importance of checking the gradient as we go... there are some obvious checks you can perform just by looking at it (weird nans, asymmetries, etc) but it's also good to do some actual checks (e.g. with a directional derivative) to gauge things quantitatively. |
Another thing to do is to set |
There seems to be an inconsistency between the final design (binarized) and its gradient. While the We would expect the gradient map to be mostly nonzero along the boundaries where the design transitions discontinuously from a weight of 0 to 1. 1. final design 2. gradient for a single wavelength [needs to be rotated 90° CW to be consistent with the design in (1)] 3. objective function history (maximum over six wavelengths) |
Try looking at the gradient before backpropagating the smoothing filter Maybe superimpose it on the structure (and make sure your image rotations are consistent etc) |
Five major changes which taken together seem to produce better results:
|
I have added an additional set of results for designs with imposed minimum feature sizes of 100 nm, 125 nm, 150 nm, 175 nm, 200 nm, and 225 nm. The results in the These results, however, are a bit inconsistent: e.g., the design with minimum feature size of 200 nm has a measured feature size of 325 nm and significantly outperforms the design with 175 nm feature size with measured feature size of 175 nm . This suggests that the results can probably be improved. |
It looks like we are able to generate some useful designs for the broadband waveguide mode converter. Thanks to @mochen4 for assistance in getting this to work. Some additional tuning is likely required to improve the results further.
This required three main changes to the original setup of the multiwavelength minimax topology optimization:
Ensuring that the initial design of the final epoch (in which the minimum linewidth constraint is applied) is binary. We realized that trying to apply the linewidth constraint to a greyscale design will immediately ruin the performance and produce a poor final design. The change involved increasing the number of epochs to include three new β values of 64, 128, and 256.
Turning on subpixel smoothing in the
MaterialGrid
due to the binarized designs.Applying damping to the
MaterialGrid
to penalize intermediate values.The final design using a minimum linewidth constraint of 50 nm (shown below as an image and added to this repository as a CSV file) generated using these changes shows decent performance for the reflectance and transmittance across the six wavelengths:
1. reflectance into mode 1: wavelength (μm), reflectance, reflectance (dB)
2. transmittance into mode 2: wavelength (μm), transmittance, transmittance (dB)
Some notes regarding this data. The optimization history shown below (maximum of the objective function of$R+1-T$ over the six wavelengths vs. iteration number) shows a performance degradation in the final epoch (iterations 300 - 400). If we inspect the designs at the end of the last two epochs (β of 128 and 256), they differ only slightly. This indicates that even small changes to the design are enough to affect its performance. Also, there are unexpected "spikes" present in the optimization history. Could these be due to the inner iterations of the MMA algorithm?
It's possible that these results can be further improved by adjusting the two hyperparameters used in the minimum linewidth constraint function as well as the β scheduling.
The next steps are to generate designs for additional values of the minimum linewidths beyond 50 nm.
cc @smartalecH