Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip neighbor list for very small systems #4070

Merged
merged 4 commits into from
May 23, 2023
Merged

Conversation

peastman
Copy link
Member

For very small systems, building the neighbor list takes more time than it saves. It's better to skip that step and just compute all interactions. Based on #4065 (comment), I settled on the simple heuristic of skipping the neighbor list for systems of up to 3000 atoms. That's for NonbondedForce. If you have a CustomNonbondedForce I set the limit lower at 2000 atoms. Many CustomNonbondedForces are inexpensive, but since we don't know for sure how expensive a particular one will be, it seemed safer to be a little bit more conservative. If you have any other force that involves a neighbor list, like GBSAOBCForce or AmoebaMultipoleForce, it always builds the neighbor list.

@@ -69,8 +69,10 @@ class OPENMM_EXPORT_COMMON NonbondedUtilities {
* @param exclusionList for each atom, specifies the list of other atoms whose interactions should be excluded
* @param kernel the code to evaluate the interaction
* @param forceGroup the force group in which the interaction should be calculated
* @param useNeighborList specifies whether a neighbor list should be used to optimize this interaction. This is
* should be viewed as only a suggestion. Even when it is false, a neighbor list be used anyway.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"This is should be" -> "This should be"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the other places where the typo was copied/pasted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a neighbor list be used anyway." -> "a neighbor list will/may/might be used anyway."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More concise for the first sentence: "This should only be viewed as a suggestion."

@peastman
Copy link
Member Author

Thanks. I fixed the typos.

@philipturner
Copy link
Contributor

Does the skipping also remove the PME evaluations and other things caused by use of a cutoff? That would reduce the kernel launch and command encoding latency.

@peastman
Copy link
Member Author

It has no effect on the reciprocal space calculation. That's completely separate.

@philipturner
Copy link
Contributor

That's part of my conclusion about removing the nearest neighbor list. Water with PME and a cutoff could never exceed 1,000 ns/day. Water with brute force nonbonded and no PME/RF/LJPME is now bottlenecked at 4,000 ns/day. Only going half the way blunts the performance gain - by how much, I don't know.

@bdenhollander
Copy link
Contributor

RX 6600 on Windows. Not launching findBlocksWithInteractions means no waiting for enqueueReadBuffer() so the atom cutoff where this optimization helps is 5000-6000.

Width Atoms Cutoff Time (Neighbor List) Time (All Interactions) Ratio
3 2661 0.6 2.17 0.86 0.39
3 2661 1 2.22 0.87 0.39
3 2661 1.5 2.27 0.86 0.38
3.5 4158 0.6 2.32 1.36 0.59
3.5 4158 1 2.46 1.33 0.54
3.5 4158 1.5 2.75 1.33 0.48
3.75 5085 0.6 2.43 1.79 0.74
3.75 5085 1 2.60 1.70 0.65
3.75 5085 1.5 2.96 1.70 0.57
4 6282 0.6 2.43 2.44 1.00
4 6282 1 2.74 2.37 0.86
4 6282 1.5 3.28 2.37 0.72
5 12255 0.6 3.48 8.06 2.32
5 12255 1 3.66 7.37 2.02
5 12255 1.5 5.79 7.38 1.28

@jchodera
Copy link
Member

@peastman : Is there a way to select this dynamically, or allow the choice of algorithm to be set via a Platform parameter?

@philipturner
Copy link
Contributor

philipturner commented May 15, 2023

It depends on the GPU. Larger ones will have more compute power, so the synchronization latency bottleneck appears at larger system sizes.

@peastman
Copy link
Member Author

The current implementation does pick it dynamically based on the size of the system. That's by far the strongest influence on which one is faster. There's a range of intermediate sizes from about 3000-6000 particles where other factors can swing it one way or the other, but I'm not sure it's worth trying to do anything more complicated to speed up that narrow range of sizes, especially given the risk of a large slowdown if we choose wrong. See #4065 (comment).

@bdenhollander
Copy link
Contributor

RX 6600 on Ubuntu 20.04. Cutoff for this optimization is more inline with other GPUs at around 3000 atoms.

Width Atoms Cutoff Time (Neighbor List) Time (All Interactions) Ratio
3 2661 0.6 1.39 1.34 0.96
3 2661 1 1.45 1.30 0.89
3 2661 1.5 1.48 1.33 0.90
3.5 4158 0.6 1.56 1.93 1.24
3.5 4158 1 1.62 1.78 1.10
3.5 4158 1.5 1.94 1.76 0.91
3.75 5085 0.6 1.69 2.39 1.42
3.75 5085 1 1.82 2.20 1.21
3.75 5085 1.5 2.30 2.25 0.98
4 6282 0.6 1.73 2.98 1.72
4 6282 1 1.98 2.84 1.43
4 6282 1.5 2.88 2.93 1.02
5 12255 0.6 2.85 8.62 3.03
5 12255 1 3.18 7.81 2.45
5 12255 1.5 4.69 7.82 1.66

@peastman
Copy link
Member Author

Water with PME and a cutoff could never exceed 1,000 ns/day.

For a sufficiently small system, simple Ewald should be faster than PME. It only involves two kernels for the reciprocal space calculation.

@philipturner
Copy link
Contributor

The problem was a synchronization latency bottleneck, limiting to 1,000 ns/day. After removing that, the next $O(1)$ bottleneck is command encoding latency. Without PME, the ceiling is 4,000 ns/day. With PME, it is probably 3,000 ns/day. I need a real-world test to find the ceiling when you:

  • remove the cutoff/neighbor list, but
  • don't remove the (now unnecessary) methods for computing forces beyond the (now removed) cutoff

@peastman
Copy link
Member Author

I've lost track of what you're trying to do. If you don't want long range interactions, just specify CutoffPeriodic instead of PME. What do you mean about the cutoff being removed? In a periodic system, there are infinitely many interactions. You either completely ignore all the ones beyond the cutoff, or you use Ewald summation or something similar to divide it into short and long range parts and compute them both.

@philipturner
Copy link
Contributor

What I work with isn’t periodic systems. For example, if you have a nanomechanical gear, it’s one solid piece. No repeating crystal structure, no solvent molecules. Simulating an infinite lattice of mechanical gears is unnecessary to simulate one gear spinning and may even introduce artifacts.

For me, cutoffs and neighbor lists are roughly synonymous. PME and RF are algorithms for implementing cutoffs, which have this troublesome aspect called periodicity. I need either:

  • No cutoff, no periodic
  • Cutoff to reduce compute cost, no periodic

@peastman
Copy link
Member Author

Cutoffs, neighbor lists, and periodic boundary conditions are three different things. PME is a method for calculating the infinite interactions in a periodic system. If you want want a periodic system but not to include any interactions beyond the cutoff, specify CutoffPeriodic. If you want a non-periodic system but still to exclude interactions beyond the cutoff, specify CutoffNonPeriodic. If you want a non-periodic system and to include all interactions regardless of distance, specify NoCutoff.

Neighbor lists are a structure used internally to accelerate the evaluation of interactions with a cutoff. They don't affect the results, just how long it takes to compute them.

@peastman peastman merged commit 655518c into openmm:master May 23, 2023
@peastman peastman deleted the neighbors branch May 23, 2023 00:05
@bdenhollander bdenhollander mentioned this pull request Sep 2, 2023
bdenhollander added a commit to bdenhollander/openmm-hip that referenced this pull request Sep 4, 2023
- Port optimization from openmm/openmm#4070 to HIP for compatibility with upcoming OpenMM 8.1 release
- It may be possible to revert some of the changes in amd@08c967d, which was optimizing for small systems as well
ex-rzr pushed a commit to ex-rzr/openmm that referenced this pull request Aug 24, 2024
Skip neighbor list for very small systems

    openmm#4070

Store bounding box sizes in half precision

    openmm@2ae50f9

Use large blocks to optimize building the neighbor list

    openmm@3955033

Improved sorting of blocks when building neighbor list

    openmm@796ffaa

Fixed bug in large blocks optimization with triclinic boxes

    openmm@4c10732

Optimize sorting of non-uniformly distributed data

    openmm@71d9bb1
ex-rzr added a commit to ex-rzr/openmm that referenced this pull request Aug 25, 2024
Skip neighbor list for very small systems

    openmm#4070

Store bounding box sizes in half precision

    openmm@2ae50f9

Use large blocks to optimize building the neighbor list

    openmm@3955033

Improved sorting of blocks when building neighbor list

    openmm@796ffaa

Fixed bug in large blocks optimization with triclinic boxes

    openmm@4c10732

Optimize sorting of non-uniformly distributed data

    openmm@71d9bb1

Co-authored-by: bdenhollander <44237618+bdenhollander@users.noreply.github.com>
ex-rzr added a commit to ex-rzr/openmm that referenced this pull request Aug 27, 2024
Skip neighbor list for very small systems

    openmm#4070

Store bounding box sizes in half precision

    openmm@2ae50f9

Use large blocks to optimize building the neighbor list

    openmm@3955033

Improved sorting of blocks when building neighbor list

    openmm@796ffaa

Fixed bug in large blocks optimization with triclinic boxes

    openmm@4c10732

Optimize sorting of non-uniformly distributed data

    openmm@71d9bb1

Co-authored-by: bdenhollander <44237618+bdenhollander@users.noreply.github.com>
ex-rzr added a commit to ex-rzr/openmm that referenced this pull request Sep 1, 2024
Skip neighbor list for very small systems

    openmm#4070

Store bounding box sizes in half precision

    openmm@2ae50f9

Use large blocks to optimize building the neighbor list

    openmm@3955033

Improved sorting of blocks when building neighbor list

    openmm@796ffaa

Fixed bug in large blocks optimization with triclinic boxes

    openmm@4c10732

Optimize sorting of non-uniformly distributed data

    openmm@71d9bb1

Co-authored-by: bdenhollander <44237618+bdenhollander@users.noreply.github.com>
ex-rzr added a commit to ex-rzr/openmm that referenced this pull request Sep 5, 2024
Skip neighbor list for very small systems

    openmm#4070

Store bounding box sizes in half precision

    openmm@2ae50f9

Use large blocks to optimize building the neighbor list

    openmm@3955033

Improved sorting of blocks when building neighbor list

    openmm@796ffaa

Fixed bug in large blocks optimization with triclinic boxes

    openmm@4c10732

Optimize sorting of non-uniformly distributed data

    openmm@71d9bb1

Co-authored-by: bdenhollander <44237618+bdenhollander@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants