Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set the default number of threads to one for multi-threading #1745

Merged
merged 4 commits into from
Sep 9, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/docs/Build_From_Source.md
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ By default, Meep's configure script tries to guess the gcc `-march` flag for the

**`--with-openmp`**
This flag enables some experimental support for [OpenMP](https://en.wikipedia.org/wiki/OpenMP) multithreading parallelism on multi-core machines (*instead* of MPI, or in addition to MPI if you have multiple processor cores per MPI process). Currently, only multi-frequency [`near2far`](Python_User_Interface.md#near-to-far-field-spectra) calculations are sped up this way, but in the future this [may be expanded](https://github.com/NanoComp/meep/issues/228) with additional OpenMP parallelism. When you run Meep, you can first set the `OMP_NUM_THREADS` environment variable to the number of threads you want OpenMP to use.
This flag enables some experimental support for [OpenMP](https://en.wikipedia.org/wiki/OpenMP) multi-threading parallelism on multi-core machines (*instead* of MPI, or in addition to MPI if you have multiple processor cores per MPI process). When you run Meep, you should first set the environment variable `OMP_NUM_THREADS` to the number of threads you want OpenMP to use (the default is a single thread).

### Floating-Point Precision of the Fields and Materials Arrays

Expand Down
2 changes: 1 addition & 1 deletion doc/docs/Parallel_Meep.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ However, there is an alternative strategy for parallelization. If you have many

The `divide_parallel_processes` feature can be useful for large supercomputers which typically restrict the total number of jobs that can be executed but do not restrict the size of each job, or for large-scale optimization where many separate simulations are coupled by an optimization algorithm. Note that when using this feature using the [Python interface](Python_User_Interface.md), only the output of the subgroup belonging to the master process of the entire simulation is shown in the standard output. (In C++, the master process from *every* subgroup prints to standard output.)

Meep also supports [thread-level parallelism](https://en.wikipedia.org/wiki/Task_parallelism) (i.e., multi-threading) on a single, shared-memory, multi-core machine for multi-frequency [near-to-far field](Python_User_Interface.md#near-to-far-field-spectra) computations. Meep does not currently use thread-level parallelism for the time stepping although this feature may be added in the future (see [Issue \#228](https://github.com/NanoComp/meep/issues/228)).
Meep also supports [thread-level parallelism](https://en.wikipedia.org/wiki/Task_parallelism) (i.e., multi-threading) on a single, shared-memory, multi-core machine for the fields updates during timestepping as well as multi-frequency [near-to-far field](Python_User_Interface.md#near-to-far-field-spectra) computations. To use this feature, you will need to [compile Meep from source](Build_From_Source.md#meep) using the `--with-openmp` flag and set the environment variable `OMP_NUM_THREADS` at runtime via e.g., `$ env OMP_NUM_THREADS=2 mpirun -np 2 python foo.py`.

### Optimization Studies of Parallel Simulations

Expand Down
2 changes: 1 addition & 1 deletion python/simulation.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@

verbosity = Verbosity(mp.cvar, 'meep', 1)

mp.set_zero_subnormals(True)
mp.setup()

# Send output from Meep, ctlgeom, and MPB to Python's stdout
mp.set_meep_printf_callback(mp.py_master_printf_wrap)
Expand Down
3 changes: 3 additions & 0 deletions src/meep.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2379,6 +2379,9 @@ class binary_partition {
// control whether CPU flushes subnormal values; see mympi.cpp
void set_zero_subnormals(bool iszero);

// initialize various properties of the simulation
void setup();

} /* namespace meep */

#endif /* MEEP_H */
27 changes: 13 additions & 14 deletions src/meep/vec.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -186,20 +186,20 @@ component first_field_component(field_type ft);
loop_ibound++) \
LOOP_OVER_IVECS(gv, loop_notowned_is, loop_notowned_ie, idx)

/* The following work identically to the LOOP_* macros above,
but employ shared memory-parallelism using OpenMP. Mainly
used in step_generic and a few other time-critical loops. */
/* The following loop macros work identically to the LOOP_* macros above,
but employ shared memory-parallelism using OpenMP. These loops are mainly
used in step_generic.cpp and a few other time-critical loops.

/* for the parallel implementation, we introduce 2 dummy loops, one at the begininnging
and one at the end, in order to "trick" openMP to allow us to define our localy variables
without having to change any other code in the main codebase. We can proceed to do
For the parallel implementation, we introduce two dummy loops, one at the beginning
and one at the end, in order to "trick" OpenMP to allow us to define our local variables
without having to change any other code anywhere else. We can then proceed to do
a collapse over all three main loops. */

#define CHUNK_OPENMP _Pragma("omp parallel for")

// the most generic use case where the user
// can specify a custom clause
#define PLOOP_OVER_IVECS_C(gv, is, ie, idx, clause) \
#define PLOOP_OVER_IVECS_C(gv, is, ie, idx, clause) \
for(ptrdiff_t loop_is1 = (is).yucky_val(0), loop_is2 = (is).yucky_val(1), \
loop_is3 = (is).yucky_val(2), loop_n1 = ((ie).yucky_val(0) - loop_is1) / 2 + 1, \
loop_n2 = ((ie).yucky_val(1) - loop_is2) / 2 + 1, \
Expand All @@ -213,7 +213,7 @@ for(ptrdiff_t loop_is1 = (is).yucky_val(0), loop_is2 = (is).yucky_val(1),
(is - (gv).little_corner()).yucky_val(1) / 2 * loop_s2 + \
(is - (gv).little_corner()).yucky_val(2) / 2 * loop_s3, \
dummy_first=0;dummy_first<1;dummy_first++) \
_Pragma(clause) \
_Pragma(clause) \
for (ptrdiff_t loop_i1 = 0; loop_i1 < loop_n1; loop_i1++) \
for (ptrdiff_t loop_i2 = 0; loop_i2 < loop_n2; loop_i2++) \
for (ptrdiff_t loop_i3 = 0; loop_i3 < loop_n3; loop_i3++) \
Expand All @@ -223,20 +223,19 @@ _Pragma(clause) \
// For the main timestepping events, we know
// we want to do a simple collapse
#define PLOOP_OVER_IVECS(gv, is, ie, idx) \
/*master_printf("Entered ploop\n");*/ \
PLOOP_OVER_IVECS_C(gv, is, ie, idx, "omp parallel for collapse(3)")

#define PLOOP_OVER_VOL(gv, c, idx) \
PLOOP_OVER_IVECS(gv, (gv).little_corner() + (gv).iyee_shift(c), \
#define PLOOP_OVER_VOL(gv, c, idx) \
PLOOP_OVER_IVECS(gv, (gv).little_corner() + (gv).iyee_shift(c), \
(gv).big_corner() + (gv).iyee_shift(c), idx)

#define PLOOP_OVER_VOL_OWNED(gv, c, idx) \
#define PLOOP_OVER_VOL_OWNED(gv, c, idx) \
PLOOP_OVER_IVECS(gv, (gv).little_owned_corner(c), (gv).big_corner(), idx)

#define PLOOP_OVER_VOL_OWNED0(gv, c, idx) \
#define PLOOP_OVER_VOL_OWNED0(gv, c, idx) \
PLOOP_OVER_IVECS(gv, (gv).little_owned_corner0(c), (gv).big_corner(), idx)

#define PLOOP_OVER_VOL_NOTOWNED(gv, c, idx) \
#define PLOOP_OVER_VOL_NOTOWNED(gv, c, idx) \
for (ivec loop_notowned_is((gv).dim, 0), loop_notowned_ie((gv).dim, 0); \
loop_notowned_is == zero_ivec((gv).dim);) \
for (int loop_ibound = 0; \
Expand Down
10 changes: 9 additions & 1 deletion src/mympi.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,14 @@ void set_zero_subnormals(bool iszero)
_set_zero_subnormals(iszero); // This has to be done in every thread for OpenMP.
}

void setup() {
set_zero_subnormals(true);
#ifdef _OPENMP
if (getenv("OMP_NUM_THREADS") == NULL)
omp_set_num_threads(1);
#endif
}

initialize::initialize(int &argc, char **&argv) {
#ifdef HAVE_MPI
#ifdef _OPENMP
Expand All @@ -206,8 +214,8 @@ initialize::initialize(int &argc, char **&argv) {
#ifdef IGNORE_SIGFPE
signal(SIGFPE, SIG_IGN);
#endif
set_zero_subnormals(true);
t_start = wall_time();
setup();
}

initialize::~initialize() {
Expand Down