Skip neighbor list for very small systems #4070

peastman · 2023-05-14T02:01:19Z

For very small systems, building the neighbor list takes more time than it saves. It's better to skip that step and just compute all interactions. Based on #4065 (comment), I settled on the simple heuristic of skipping the neighbor list for systems of up to 3000 atoms. That's for NonbondedForce. If you have a CustomNonbondedForce I set the limit lower at 2000 atoms. Many CustomNonbondedForces are inexpensive, but since we don't know for sure how expensive a particular one will be, it seemed safer to be a little bit more conservative. If you have any other force that involves a neighbor list, like GBSAOBCForce or AmoebaMultipoleForce, it always builds the neighbor list.

philipturner · 2023-05-14T06:26:26Z

platforms/common/include/openmm/common/NonbondedUtilities.h

@@ -69,8 +69,10 @@ class OPENMM_EXPORT_COMMON NonbondedUtilities {
     * @param exclusionList  for each atom, specifies the list of other atoms whose interactions should be excluded
     * @param kernel         the code to evaluate the interaction
     * @param forceGroup     the force group in which the interaction should be calculated
+     * @param useNeighborList  specifies whether a neighbor list should be used to optimize this interaction.  This is
+     *                         should be viewed as only a suggestion.  Even when it is false, a neighbor list be used anyway.


"This is should be" -> "This should be"

Same with the other places where the typo was copied/pasted.

"a neighbor list be used anyway." -> "a neighbor list will/may/might be used anyway."

More concise for the first sentence: "This should only be viewed as a suggestion."

peastman · 2023-05-14T15:18:14Z

Thanks. I fixed the typos.

philipturner · 2023-05-14T22:44:43Z

Does the skipping also remove the PME evaluations and other things caused by use of a cutoff? That would reduce the kernel launch and command encoding latency.

peastman · 2023-05-14T23:29:31Z

It has no effect on the reciprocal space calculation. That's completely separate.

philipturner · 2023-05-15T00:17:44Z

That's part of my conclusion about removing the nearest neighbor list. Water with PME and a cutoff could never exceed 1,000 ns/day. Water with brute force nonbonded and no PME/RF/LJPME is now bottlenecked at 4,000 ns/day. Only going half the way blunts the performance gain - by how much, I don't know.

bdenhollander · 2023-05-15T22:06:25Z

RX 6600 on Windows. Not launching findBlocksWithInteractions means no waiting for enqueueReadBuffer() so the atom cutoff where this optimization helps is 5000-6000.

Width	Atoms	Cutoff	Time (Neighbor List)	Time (All Interactions)	Ratio
3	2661	0.6	2.17	0.86	0.39
3	2661	1	2.22	0.87	0.39
3	2661	1.5	2.27	0.86	0.38
3.5	4158	0.6	2.32	1.36	0.59
3.5	4158	1	2.46	1.33	0.54
3.5	4158	1.5	2.75	1.33	0.48
3.75	5085	0.6	2.43	1.79	0.74
3.75	5085	1	2.60	1.70	0.65
3.75	5085	1.5	2.96	1.70	0.57
4	6282	0.6	2.43	2.44	1.00
4	6282	1	2.74	2.37	0.86
4	6282	1.5	3.28	2.37	0.72
5	12255	0.6	3.48	8.06	2.32
5	12255	1	3.66	7.37	2.02
5	12255	1.5	5.79	7.38	1.28

jchodera · 2023-05-15T22:14:26Z

@peastman : Is there a way to select this dynamically, or allow the choice of algorithm to be set via a Platform parameter?

philipturner · 2023-05-15T22:31:02Z

It depends on the GPU. Larger ones will have more compute power, so the synchronization latency bottleneck appears at larger system sizes.

peastman · 2023-05-15T22:38:15Z

The current implementation does pick it dynamically based on the size of the system. That's by far the strongest influence on which one is faster. There's a range of intermediate sizes from about 3000-6000 particles where other factors can swing it one way or the other, but I'm not sure it's worth trying to do anything more complicated to speed up that narrow range of sizes, especially given the risk of a large slowdown if we choose wrong. See #4065 (comment).

bdenhollander · 2023-05-16T00:42:09Z

RX 6600 on Ubuntu 20.04. Cutoff for this optimization is more inline with other GPUs at around 3000 atoms.

Width	Atoms	Cutoff	Time (Neighbor List)	Time (All Interactions)	Ratio
3	2661	0.6	1.39	1.34	0.96
3	2661	1	1.45	1.30	0.89
3	2661	1.5	1.48	1.33	0.90
3.5	4158	0.6	1.56	1.93	1.24
3.5	4158	1	1.62	1.78	1.10
3.5	4158	1.5	1.94	1.76	0.91
3.75	5085	0.6	1.69	2.39	1.42
3.75	5085	1	1.82	2.20	1.21
3.75	5085	1.5	2.30	2.25	0.98
4	6282	0.6	1.73	2.98	1.72
4	6282	1	1.98	2.84	1.43
4	6282	1.5	2.88	2.93	1.02
5	12255	0.6	2.85	8.62	3.03
5	12255	1	3.18	7.81	2.45
5	12255	1.5	4.69	7.82	1.66

peastman · 2023-05-16T02:02:35Z

Water with PME and a cutoff could never exceed 1,000 ns/day.

For a sufficiently small system, simple Ewald should be faster than PME. It only involves two kernels for the reciprocal space calculation.

philipturner · 2023-05-16T02:16:51Z

The problem was a synchronization latency bottleneck, limiting to 1,000 ns/day. After removing that, the next $O(1)$ bottleneck is command encoding latency. Without PME, the ceiling is 4,000 ns/day. With PME, it is probably 3,000 ns/day. I need a real-world test to find the ceiling when you:

remove the cutoff/neighbor list, but
don't remove the (now unnecessary) methods for computing forces beyond the (now removed) cutoff

peastman · 2023-05-16T03:17:48Z

I've lost track of what you're trying to do. If you don't want long range interactions, just specify CutoffPeriodic instead of PME. What do you mean about the cutoff being removed? In a periodic system, there are infinitely many interactions. You either completely ignore all the ones beyond the cutoff, or you use Ewald summation or something similar to divide it into short and long range parts and compute them both.

philipturner · 2023-05-16T03:55:09Z

What I work with isn’t periodic systems. For example, if you have a nanomechanical gear, it’s one solid piece. No repeating crystal structure, no solvent molecules. Simulating an infinite lattice of mechanical gears is unnecessary to simulate one gear spinning and may even introduce artifacts.

For me, cutoffs and neighbor lists are roughly synonymous. PME and RF are algorithms for implementing cutoffs, which have this troublesome aspect called periodicity. I need either:

No cutoff, no periodic
Cutoff to reduce compute cost, no periodic

peastman · 2023-05-16T16:16:55Z

Cutoffs, neighbor lists, and periodic boundary conditions are three different things. PME is a method for calculating the infinite interactions in a periodic system. If you want want a periodic system but not to include any interactions beyond the cutoff, specify CutoffPeriodic. If you want a non-periodic system but still to exclude interactions beyond the cutoff, specify CutoffNonPeriodic. If you want a non-periodic system and to include all interactions regardless of distance, specify NoCutoff.

Neighbor lists are a structure used internally to accelerate the evaluation of interactions with a cutoff. They don't affect the results, just how long it takes to compute them.

- Port optimization from openmm/openmm#4070 to HIP for compatibility with upcoming OpenMM 8.1 release - It may be possible to revert some of the changes in amd@08c967d, which was optimizing for small systems as well

Skip neighbor list for very small systems openmm#4070 Store bounding box sizes in half precision openmm@2ae50f9 Use large blocks to optimize building the neighbor list openmm@3955033 Improved sorting of blocks when building neighbor list openmm@796ffaa Fixed bug in large blocks optimization with triclinic boxes openmm@4c10732 Optimize sorting of non-uniformly distributed data openmm@71d9bb1

Skip neighbor list for very small systems openmm#4070 Store bounding box sizes in half precision openmm@2ae50f9 Use large blocks to optimize building the neighbor list openmm@3955033 Improved sorting of blocks when building neighbor list openmm@796ffaa Fixed bug in large blocks optimization with triclinic boxes openmm@4c10732 Optimize sorting of non-uniformly distributed data openmm@71d9bb1 Co-authored-by: bdenhollander <44237618+bdenhollander@users.noreply.github.com>

Skip neighbor list for very small systems

d2d9c61

philipturner reviewed May 14, 2023

View reviewed changes

peastman added 2 commits May 14, 2023 08:14

Fixed typos

72cc39e

Don't skip box size check when not using neighbor list

f784b52

Made test larger to ensure it uses neighbor list

0628dc1

peastman merged commit 655518c into openmm:master May 23, 2023

peastman deleted the neighbors branch May 23, 2023 00:05

bdenhollander mentioned this pull request Sep 2, 2023

Planning for 8.1 #4182

Closed

This was referenced Sep 4, 2023

Port skip neighbor list for very small systems amd/openmm-hip#9

Open

Valid to skip neighbor list for more forces? #4227

Closed

ex-rzr mentioned this pull request Oct 14, 2023

Fix exclusion tiles sorting on AMD GCN/CDNA (64 threads per wave) in OpenCL #4268

Merged

bdenhollander mentioned this pull request Oct 28, 2023

findBlocksWithInteractions optimization idea #3966

Closed

peastman mentioned this pull request Nov 6, 2023

[Question] Computation speed doesn't seem to be improved from v8.0.0 to 8.1.0b on CUDA Platform #4300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip neighbor list for very small systems #4070

Skip neighbor list for very small systems #4070

peastman commented May 14, 2023

philipturner May 14, 2023

philipturner May 14, 2023

philipturner May 14, 2023

philipturner May 14, 2023

peastman commented May 14, 2023

philipturner commented May 14, 2023

peastman commented May 14, 2023

philipturner commented May 15, 2023

bdenhollander commented May 15, 2023

jchodera commented May 15, 2023

philipturner commented May 15, 2023 •

edited

Loading

peastman commented May 15, 2023

bdenhollander commented May 16, 2023

peastman commented May 16, 2023

philipturner commented May 16, 2023

peastman commented May 16, 2023

philipturner commented May 16, 2023

peastman commented May 16, 2023

Skip neighbor list for very small systems #4070

Skip neighbor list for very small systems #4070

Conversation

peastman commented May 14, 2023

philipturner May 14, 2023

Choose a reason for hiding this comment

philipturner May 14, 2023

Choose a reason for hiding this comment

philipturner May 14, 2023

Choose a reason for hiding this comment

philipturner May 14, 2023

Choose a reason for hiding this comment

peastman commented May 14, 2023

philipturner commented May 14, 2023

peastman commented May 14, 2023

philipturner commented May 15, 2023

bdenhollander commented May 15, 2023

jchodera commented May 15, 2023

philipturner commented May 15, 2023 • edited Loading

peastman commented May 15, 2023

bdenhollander commented May 16, 2023

peastman commented May 16, 2023

philipturner commented May 16, 2023

peastman commented May 16, 2023

philipturner commented May 16, 2023

peastman commented May 16, 2023

philipturner commented May 15, 2023 •

edited

Loading