-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Pymatgen StructureGraph with different local_env strategies #18
Conversation
Remove Zero Padding from model
This commit provides support for using from pymatgen.analysis.graphs import StructureGraph in generating the crystal graphs
As the codebase already depends on pymatgen and pymatgen is unlikely to go anywhere I think these are helpful improvements that increase the functionality of the code and make great use of #16. I am not sure whether @txie-93 actively maintains this codebase so I think for the PR to be merged it would need the pre-trained models to be re-trained - it would also be interesting to see what affect zero-padding had on model performance. Is this feasible? |
@CompRhys Thanks for the comment! I would be more than happy to do this, but unfortunately I'm trying to debug one remaining issue with my implementation. Everything, to the best of my knowledge, is working correctly with the That being said, I don't think it will automatically accept the PR though -- it says only those with write access can merge the PR, which means presumably only if @txie-93 were to approve it. Context: With Perhaps I just need to tweak some of the settings. Hopefully that's all it is. |
@arosen93 I would guess it's an issue with caching in the dataloaders? Have you run through a profiler? The need to re-train models is that if I were @txie-93 I wouldn't merge the extensions to the code unless the PR kept the same functionality and the default model was as described in his original paper (not that zero padding is mentioned in the paper). This is why my PR was a request for comment [RFC] as I wanted to know in principle he would merge the changes before re-training the models. |
@CompRhys That might be it. Thanks for the suggestion. I haven't run it through a profiler (admittedly, I have not used a profiler before) but can investigate further on that. Noted about re-training the models. Yes, it'd definitely be good to make sure things are kept the same there. |
@arosen93 I would expect using pymatgen to get the neighbours to just be straight up slower than having a fixed number of neighbours so there may not be any issues with your code and it may just be the penalty you have to pay. If caching correctly you will only pay this cost once on the first epoch as the data sets here are typically small enough to hold everything in RAM I would think. (removing zero-padding probably also slows the code down vs dense multiplications but I never tested this) |
@CompRhys. Yes, the pymatgen nearest neighbor searching is definitely slower, so there's an up-front penalty for that. In my tests, the start of every new epoch is extremely slow -- it's not just epoch 0, sadly. I'm running the profiler right now to see if that leads to any ideas. I did some quick tests with the removal of the zero-padding you implemented by setting |
Some values were starting at index0 and others at index1
@CompRhys -- After running it through the profiler, I believe you are right. The issue is in the dataloader, specifically the call to |
If I understand correctly changing the method from the default |
Hey @pgg1610! Yeah, all the Pymatgen nearest neighbor methods are slow (compared to a simple call to Technically, the dataloader does load the data before calling the neural network, but the data stored in Example using Example using Epoch: [1][1/45] Time 31.617 (31.617) |
@CompRhys -- I'm starting to think that's along the lines of what the issue is. As you can see from the log file I posted before, even when using The corresponding profiler results are also attached and indicate that calls to the dataloader take up a lot of the time -- more so than the forward/backward calls to the neural network in the case of I know this isn't your problem to fix, so I appreciate the suggestions you've provided. Hopefully I am close to finding the fix. |
I want to thank you again for taking the time to help me troubleshoot this. I ended up figuring it out. As @CompRhys correctly surmised from the beginning, the issue I ran into was entirely due to a caching problem with the crystal graphs being flushed from memory. It turns out I did indeed have memory overload, particularly when split over several workers where each worker has only a few GBs of memory. This then caused all the feature data to get dumped from memory and regenerated at the start of the new epochs -- when it needed all the training data info again. This issue has nothing to do with my implementation of the Should anyone run into this problem in the future, there are a few ways to tackle it. The first is to just request more memory if possible. Another way to do this is to use a full node but fewer CPU workers so each worker has more memory. Of course, this slows down the initial file I/O fairly substantially. Another option that I am exploring is saving all the data returned by |
New flag `save_torch` in `CIFData` allows for saving of crystal graphs as .pkl files so if the memory is flushed, it can read directly from .pkl files rather than regenerate the graphs. The `clean_torch` flag will remove all existing .pkl files before the run begins if set to True.
Previously, predict.py computed the MAE on the entire data folder. Now, predict.py computes the MAE on the train, val, and test set separately and outputs 3 csv files
Return train and val only if specified by user
Use --train-val-test in predict.py to get train/val/test stats
Notify user is .json files are being read
I'm closing this PR because I ended up making a bunch of commits unrelated to the original PR (apologies...). I can make a new branch of just the relevant commits if ever needed in the future. |
Currently, the approach to generating the crystal graph is based on a simple constant cutoff radius, which is not necessarily ideal. It would be great if
cgcnn
were able to use Pymatgen'spymatgen.analysis.graphs.StructureGraph
function to generate a crystal graph via one of its severalpymatgen.analysis.local_env
strategies.This PR provides support for a new argument in
CIFData
withindata.py
to allow for the use of Pymatgen'sStructureGraph
function. One can now provide thenn_method
argument to theCIFData
function, which must be the name (in string format) of a validpymatgen.analysis.local_env.NearNeighbors
object. An example of aNearNeighbors
object would benn_method='CrystalNN'
, for instance. The newnn_method
argument accounts for the user-specifiedmax_num_nbr
andradius
. In other words, the number of neighbors will never exceedmax_num_nbr
and, in this case, no neighbor will have a distance greater thanradius
from the central atom.This PR is built on PR #16 of @CompRhys to allow for the removal of zero padding, which is particularly important in this case because each environment will not have the same number of neighbors as every other environment.