-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip neighbor list for very small systems #4070
Conversation
@@ -69,8 +69,10 @@ class OPENMM_EXPORT_COMMON NonbondedUtilities { | |||
* @param exclusionList for each atom, specifies the list of other atoms whose interactions should be excluded | |||
* @param kernel the code to evaluate the interaction | |||
* @param forceGroup the force group in which the interaction should be calculated | |||
* @param useNeighborList specifies whether a neighbor list should be used to optimize this interaction. This is | |||
* should be viewed as only a suggestion. Even when it is false, a neighbor list be used anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This is should be" -> "This should be"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same with the other places where the typo was copied/pasted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a neighbor list be used anyway." -> "a neighbor list will/may/might be used anyway."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More concise for the first sentence: "This should only be viewed as a suggestion."
Thanks. I fixed the typos. |
Does the skipping also remove the PME evaluations and other things caused by use of a cutoff? That would reduce the kernel launch and command encoding latency. |
It has no effect on the reciprocal space calculation. That's completely separate. |
That's part of my conclusion about removing the nearest neighbor list. Water with PME and a cutoff could never exceed 1,000 ns/day. Water with brute force nonbonded and no PME/RF/LJPME is now bottlenecked at 4,000 ns/day. Only going half the way blunts the performance gain - by how much, I don't know. |
RX 6600 on Windows. Not launching
|
@peastman : Is there a way to select this dynamically, or allow the choice of algorithm to be set via a Platform parameter? |
It depends on the GPU. Larger ones will have more compute power, so the synchronization latency bottleneck appears at larger system sizes. |
The current implementation does pick it dynamically based on the size of the system. That's by far the strongest influence on which one is faster. There's a range of intermediate sizes from about 3000-6000 particles where other factors can swing it one way or the other, but I'm not sure it's worth trying to do anything more complicated to speed up that narrow range of sizes, especially given the risk of a large slowdown if we choose wrong. See #4065 (comment). |
RX 6600 on Ubuntu 20.04. Cutoff for this optimization is more inline with other GPUs at around 3000 atoms.
|
For a sufficiently small system, simple Ewald should be faster than PME. It only involves two kernels for the reciprocal space calculation. |
The problem was a synchronization latency bottleneck, limiting to 1,000 ns/day. After removing that, the next
|
I've lost track of what you're trying to do. If you don't want long range interactions, just specify |
What I work with isn’t periodic systems. For example, if you have a nanomechanical gear, it’s one solid piece. No repeating crystal structure, no solvent molecules. Simulating an infinite lattice of mechanical gears is unnecessary to simulate one gear spinning and may even introduce artifacts. For me, cutoffs and neighbor lists are roughly synonymous. PME and RF are algorithms for implementing cutoffs, which have this troublesome aspect called periodicity. I need either:
|
Cutoffs, neighbor lists, and periodic boundary conditions are three different things. PME is a method for calculating the infinite interactions in a periodic system. If you want want a periodic system but not to include any interactions beyond the cutoff, specify Neighbor lists are a structure used internally to accelerate the evaluation of interactions with a cutoff. They don't affect the results, just how long it takes to compute them. |
- Port optimization from openmm/openmm#4070 to HIP for compatibility with upcoming OpenMM 8.1 release - It may be possible to revert some of the changes in amd@08c967d, which was optimizing for small systems as well
Skip neighbor list for very small systems openmm#4070 Store bounding box sizes in half precision openmm@2ae50f9 Use large blocks to optimize building the neighbor list openmm@3955033 Improved sorting of blocks when building neighbor list openmm@796ffaa Fixed bug in large blocks optimization with triclinic boxes openmm@4c10732 Optimize sorting of non-uniformly distributed data openmm@71d9bb1
Skip neighbor list for very small systems openmm#4070 Store bounding box sizes in half precision openmm@2ae50f9 Use large blocks to optimize building the neighbor list openmm@3955033 Improved sorting of blocks when building neighbor list openmm@796ffaa Fixed bug in large blocks optimization with triclinic boxes openmm@4c10732 Optimize sorting of non-uniformly distributed data openmm@71d9bb1 Co-authored-by: bdenhollander <44237618+bdenhollander@users.noreply.github.com>
Skip neighbor list for very small systems openmm#4070 Store bounding box sizes in half precision openmm@2ae50f9 Use large blocks to optimize building the neighbor list openmm@3955033 Improved sorting of blocks when building neighbor list openmm@796ffaa Fixed bug in large blocks optimization with triclinic boxes openmm@4c10732 Optimize sorting of non-uniformly distributed data openmm@71d9bb1 Co-authored-by: bdenhollander <44237618+bdenhollander@users.noreply.github.com>
Skip neighbor list for very small systems openmm#4070 Store bounding box sizes in half precision openmm@2ae50f9 Use large blocks to optimize building the neighbor list openmm@3955033 Improved sorting of blocks when building neighbor list openmm@796ffaa Fixed bug in large blocks optimization with triclinic boxes openmm@4c10732 Optimize sorting of non-uniformly distributed data openmm@71d9bb1 Co-authored-by: bdenhollander <44237618+bdenhollander@users.noreply.github.com>
Skip neighbor list for very small systems openmm#4070 Store bounding box sizes in half precision openmm@2ae50f9 Use large blocks to optimize building the neighbor list openmm@3955033 Improved sorting of blocks when building neighbor list openmm@796ffaa Fixed bug in large blocks optimization with triclinic boxes openmm@4c10732 Optimize sorting of non-uniformly distributed data openmm@71d9bb1 Co-authored-by: bdenhollander <44237618+bdenhollander@users.noreply.github.com>
For very small systems, building the neighbor list takes more time than it saves. It's better to skip that step and just compute all interactions. Based on #4065 (comment), I settled on the simple heuristic of skipping the neighbor list for systems of up to 3000 atoms. That's for NonbondedForce. If you have a CustomNonbondedForce I set the limit lower at 2000 atoms. Many CustomNonbondedForces are inexpensive, but since we don't know for sure how expensive a particular one will be, it seemed safer to be a little bit more conservative. If you have any other force that involves a neighbor list, like GBSAOBCForce or AmoebaMultipoleForce, it always builds the neighbor list.