Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating to Nicil v2.1 #115

Merged
merged 21 commits into from
May 12, 2021
Merged

Conversation

jameswurster
Copy link
Contributor

@jameswurster jameswurster commented Mar 16, 2021

This commit updates the non-ideal MHD library to Nicil v2.1 (see Appendix A of Wurster (2021).
Notes:

  1. I have remove the pre-processor CMACIONIZE & logical use_CMacIonize since they used n_electronT which has been depreciated. However, this pre-processor and logical did nothing aside from print n_electronT; however, this appears to never be calculated.

2a) This inexplicably fails the MPI pipeline test; it appears that there is a memory allocation issue upon execution, so I don't think it has anything to do with my changes. The code passes the MPI test suite when run locally on my Mac. Edit: when tested at pull request, the MPI test passed
2b) In test_nonidealmhd, we get different values for dt_courant for MPI=yes vs MPI=no. I am adding this to the open issue

  1. Nicil requires real(kind=8) to run (or even inexplicably to compile). I have added a dummy file that will be used if Phantom is being run in single precision. Why is single precision even an option anymore? I suspect that a few of the equations of state will give nonsensical answers if run in real(kind=4) (although they do compile)

@jameswurster
Copy link
Contributor Author

nonideal test suite yields inconsistent results between my mac, my repo, and this repo. This issue is necessarily challenging to track down, so a work in progress.

@jameswurster
Copy link
Contributor Author

The MPI failure appears to be due to the cluster and not the code. Note that the MPI tests pass on my mac.

The issue outlined in my initial comment 2 was a result of the race condition that existed in the calculation of divcurlB. After further investigation, the error appeared both using and not using MPI. When we calculate divcurlB after we calculate density, then all tests consistently pass, with MPI and not-MPI providing the same result. The error caused by the race condition can be up to a few percent. At the moment, we divcurlB is always calculated after density, but I have left in the original code currently turned off with a switch so that we can use the fast calculation if required. divcurlB is required for non-ideal MHD and for diagnostics, so it might be worth using the new routine only when using non-ideal MHD.

The attached graph shows the result of the wave-dampening test, where the symbols represent calculating divcurlB in densityiterate with the race condition (once); calculating divcurlB in densityiterate, but calling the subroutine twice in a row (twice); calculating divcurlB after densityiterate as in the current version pushed to the repo (mhd).

wave_damp

Copy link
Owner

@danieljprice danieljprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the update here. However, we need to fix the race condition in a more elegant way.

One other idea here is that it might be nice to include NICIL in phantom using the git submodules feature -- i.e. the subdirectory is just the NICIL git repo updated to a particular tag. This would save having to have copies of NICIL code in the phantom repo and would make it easier to stay in sync.

There's a lot going on in this pull request so would be good to separate some of the issues if possible...

@@ -1608,7 +1590,7 @@ subroutine store_results(icall,cell,getdv,getdb,realviscosity,stressmax,xyzh,&
!
! store div B, curl B and related quantities
!
if (mhd .and. iamgasi) then
if (mhd .and. iamgasi .and. mhd_racecondition) then
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh, surely we can do without a flag for a bug...

@@ -0,0 +1,228 @@
!----------------------------------------------------------------------!
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like overkill, surely would be better to just send dummy variables to/from nicil and translate back to double/single precision in phantom rather than creating a whole dummy interface?

the point is not that we want to use single precision in phantom, but that precision in phantom can be changed (e.g. to real*16 in principle also)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but it inexplicably does not compile when I exclude the dummy interface. I get many
Warning: Possible change of value in conversion from REAL(8) to REAL(4) at (1) [-Wconversion]
errors before it just stops. Annoyingly, these errors persist for many subroutines in Phantom itself and the code still compiles. If I remove a few of the offending lines, then the compilation will just stop on earlier ones. These lines have nothing to do with the nicil-phantom interface, and there are no kind-declarations within nicil itself (although Nicil's Makefile does force it to be compiled in double precision.
Any advice on how to fix this more cleanly is appreciated.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure these warnings would just be masking another problem. However, happy to leave it for the moment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've tried to track it down, but with no luck.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem solved (had to force some parameters to be real*8); commit made and the dummy file has been deleted.

build/Makefile Outdated
@@ -1781,7 +1785,7 @@ SOURCES= physcon.f90 ${CONFIG} ${SRCKERNEL} io.F90 units.f90 boundary.f90 \
mpi_dens.F90 mpi_force.F90 stack.F90 mpi_derivs.F90 kdtree.F90 linklist_kdtree.F90 ${SRCTURB} \
${SRCPHOTO} ${SRCINJECT} ${SRCKROME} memory.F90 ${SRCREADWRITE_DUMPS} \
quitdump.f90 ptmass.F90 \
readwrite_infile.F90 dens.F90 force.F90 utils_deriv.f90 deriv.F90 energies.F90 sort_particles.F90 \
readwrite_infile.F90 dens.F90 mhd_divcurlB.F90 force.F90 utils_deriv.f90 deriv.F90 energies.F90 sort_particles.F90 \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is mhd_divcurlB.f90 ??

@@ -2785,7 +2783,7 @@ subroutine finish_cell_and_store_results(icall,cell,fxyzu,xyzh,vxyzu,poten,dt,dv
endif
#endif
if (mhd_nonideal) then
call nimhd_get_dudt(dudtnonideal,etaohmi,etaambii,rhoi,curlBi,Bxyzi(1:3))
call nicil_get_dudt_nimhd(dudtnonideal,etaohmi,etaambii,rhoi,curlBi,Bxyzi)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason this belongs in the nicil library rather than being a phantom routine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just always has been a Nicil subroutine. Anything used to calculate non-ideal MHD terms in a method-independent way is included in nicil itself.

! this needs to be performed after the density is updated to prevent race
! conditions. For simplicity, we have copied dens.F90, and stripped out
! everything that is not related to the calculation of divcurlB.
! To calculate divcurlB in dens (permitting the race condition), set
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit yuck. The point is that we do density iterations, so that after a couple of iterations the density is already known inside compute_density, so the density is initially just a guess, but on the 2nd or 3rd iteration it is known to the accuracy of tolh. We should NOT need a completely separate routine to evaluate div B and curl B here...


end subroutine store_results
!--------------------------------------------------------------------------

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a LOT of repeated code in the above, I cannot merge this...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reply covers all previous comments about the bug & new routine.

I remember that you, Terry and I discussed the calculation of divcurlB when we made the switch to evolving B/rho rather than B. At the time, we agreed that we could continue to include the calculation in dens since the error introduced would be small. The comment in dens implies that the error is due to hi not being converged, but the issue is that hj might not have been updated yet, so we are using densities at inconsistent times. Since this is a race condition, the results are not reproducible.

Indeed, the error seem to be less than a percent, and only appeared when I updated nicil since it was failing the testsuite, and I figured it would be better to delve into the problem rather than simply increase the tolerance on the relevant tests.

Due to the small error, and that divcurlB is used only for diagnostics, the revised cleaning timestep and non-ideal MHD, I left the original code in with a flag so we can use the fast code if we don't need accuracy. We might even want to set the flag such that the old version is used except for only non-ideal MHD.

I would be happy to find a more elegant solution, but at the moment, I don't know what that is. If we want the accuracy, then we need to calculate divcurlB after the density update is complete, and that is what the new subroutine does. Given the MPI commands, it was simpler to use dens as a template rather than starting from scratch; I understand this is yuck, but I don't see a way around it if we want to maintain accuracy. This new structure -- update density, update divcurlB, calculate forces -- is the same as what is implemented in sphNG.

I am open to suggestions on how to clean this up.

@danieljprice
Copy link
Owner

mpi failure is fixed, but issues pending from review are blocking merge at present

@jameswurster
Copy link
Contributor Author

Re git modules:
We can do this if you like. Nicil is not often updated, so it's not a problem for me to ensure that the current version is in the phantom repo. We can make it cleaner if you would like and only include nicil.F90 and put this file in src/main.

Re 'A lot going on':
Agreed, but several issues appeared as the pipelines were failing, so it is hard to disentangle the issues. Also, it seems as though I can only have one pull request open at a time, and all new commits are automatically added to the request.

@dmentipl
Copy link
Contributor

Also, it seems as though I can only have one pull request open at a time, and all new commits are automatically added to the request.

You can create multiple branches on your fork and pull request each of them as required. Then any new commits will go only to the pull request for that branch. (In general, it is seen as good practice to always work on branches rather than master.)

@jameswurster
Copy link
Contributor Author

Commit f016c85 failed due to internal git issues, not my code.
In the recent updates, I have removed mhd_divcurlB and instead, if MHD=yes, we call densityiterate twice. I have added in logicals so if it is called twice, then only half the calculations are performed on each call. As requested, this is much cleaner since we are not duplicating text. Currently, the controlling logicals are variables, but the code can be slightly modified for them to be parameters if MHD=no. Also, the accurate calculation of divcurlB will be done for MHD=yes, but we can switch this to NONIDEALMHD=yes if you prefer.

@jameswurster
Copy link
Contributor Author

jameswurster commented Apr 21, 2021

PS: commit ea06b49 fixes the issue in test_sedov where dtclean was the controlling timestep (see issue #4).

Copy link
Owner

@danieljprice danieljprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one final change here and I think we're good to merge. Namely, I think fast_divcurlb should be true by default for ideal MHD, since this should have almost no effect. Happy if you want to make it false for non-ideal MHD where we use curl B directly in the induction equation.

@@ -0,0 +1,228 @@
!----------------------------------------------------------------------!
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure these warnings would just be masking another problem. However, happy to leave it for the moment

@@ -149,6 +149,14 @@ subroutine derivs(icall,npart,nactive,xyzh,vxyzu,fxyzu,fext,divcurlv,divcurlB,&
if (icall==1) then
call densityiterate(1,npart,nactive,xyzh,vxyzu,divcurlv,divcurlB,Bevol,&
stressmax,fxyzu,fext,alphaind,gradh,rad,radprop,dvdx)
if (.not. fast_divcurlB) then
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is definitely better than cutting-and-pasting code from the densityiterate routine, if you really really think we need the .not.fast_divcurlB option then this seems like the least-worst solution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll modify it so it's only true for non-ideal MHD. Despite it being a very small error, I would prefer to leave this in for non-ideal MHD, since I will (hopefully) shortly be running some non-ideal sims in Phantom with Ian.

if (use_massfrac) then
call write_inopt(massfrac_X, 'massfrac_X', 'Hydrogen mass fraction',iunit)
call write_inopt(massfrac_Y, 'massfrac_Y', 'Helium mass fraction',iunit)
endif
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, are the mass fractions still parameters for the thermal ionisation and if so how are they set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

massfrac have been removed, since it was typically calculated based upon abundances set within nicil. In the newer version, instead of proxy ions whose mass is calculated based upon abundances of a few species, we explicitly model the ionisiation of several chemical species (see appendix A of Wurster 2021).

#else
logical, parameter :: mhd = .false.
logical, parameter :: fast_divcurlB = .true. ! do not toggle; multiple calculations controlled by this flag, and this is faster when excluding MHD
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we NOT make fast_divcurlb false by default in MHD. The only thing this is used for in ideal MHD is the divergence cleaning where it does not matter if we get div b wrong by a few percent. Would be better to do this only if non-ideal MHD is used, which is a much more restricted use-case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@jameswurster
Copy link
Contributor Author

Requested changes have been made, and all tests pass.
(somehow, running the test suite on my mac did not pick up on the bug I accidentally committed on the first try)

@danieljprice danieljprice merged commit 5862ae7 into danieljprice:master May 12, 2021
@danieljprice
Copy link
Owner

thanks for the update @jameswurster!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants