Integration of the double-linked node list ParticleSet structure to 'master' #1034

CKehl · 2021-06-01T15:19:00Z

major effort to include now the double-linked bode list as collection data structure for ParticleSets. Necessary prelude to the particle.add(..) function extension (as well as particle.split(...)) for internal features like fragmentation.

CKehl · 2021-06-01T16:10:49Z

@erikvansebille wow - ehm. I don't know what's going on here with the HDF5 version of those reference files of the notebook tests, but it seems those test files need to be fixed again. I didn't really change something on the NetCDF writer here of the ParticleSet here (i.e. what was the last pull request integrated to the branch ?), but it seems some package (e.g. netcdf4 or xarray) has changed on conda-forge recently, causing this issue.

…the test-failure in 'test_advection.py', failing the Linux and MacOS tests

parcels/particlefile/baseparticlefile.py

…rary version drawn from conda-forge to circumvent the file creation error.

…endencies, which is necessary for double-linked node lists to run

…s for the nodal structure. The Python-based node list will come by time, as we first need to (re-)implement the C-compilation interface with the multi-level compilation

…all packages so it can be used from installed parcels packages.

…llow multi-stage library compilation.

…library-C interface, to facilitate binding ctypes function of pre-compiled libraries, circumventing the jit code generator for ctypes portions that don't need to be bound 'just-in'time'. Could also be used for ParcelsRandom functions.

…ckage-init of 'compilation'

…acilitate and globally-unique ID generator

…ssage passing interface MPI) service to exchange and synchronize ID generation across MPI instances

…ging services into the tools-package init-function

…he node-based interface.

CKehl · 2021-07-07T18:54:58Z

@erikvansebille please have a look on the changes in "tests/test_particle_sets.py" and on the new layout of the tests. I needed to make quite some general change to the test layout to actually work with the double-linked list. In that context, the id-generator and C-library register must be a context-exclusive unique and consistent instance to guarantee correct functioning of the list itself, which means those need to be defined globally in each simulation script. If one only has 1 single particle set all the time, then they can be internally created. That said, some tests for example create 2 particle sets in order to merge them, and they need to share the same global id generator and library register. Managing ids in an MPI context is done in the backend (that was actually quite a complex thing to make it work, but it works), so multiple MPI instances are handled internally, but within one script itself they need to be defined by the user - or, in the case of the tests, for each test individually. Overall, those tests give you a first clue how working with those double-linked list particle sets will look like.

CKehl · 2021-07-07T18:57:37Z

@erikvansebille please have a look on the changes in "tests/test_particle_sets.py" and on the new layout of the tests. I needed to make quite some general change to the test layout to actually work with the double-linked list. In that context, the id-generator and C-library register must be a context-exclusive unique and consistent instance to guarantee correct functioning of the list itself, which means those need to be defined globally in each simulation script. If one only has 1 single particle set all the time, then they can be internally created. That said, some tests for example create 2 particle sets in order to merge them, and they need to share the same global id generator and library register. Managing ids in an MPI context is done in the backend (that was actually quite a complex thing to make it work, but it works), so multiple MPI instances are handled internally, but within one script itself they need to be defined by the user - or, in the case of the tests, for each test individually. Overall, those tests give you a first clue how working with those double-linked list particle sets will look like.

PS: this is still a draft, so don't bother commenting on all the code-commented lines. We're still some steps away from merging the PR, and we will even need to talk a bit what we wanna include here and what we wanna treat in follow-up PRs.

erikvansebille · 2021-07-08T07:07:49Z

Thanks for that heads-up, @CKehl. I see that the code for adding ParticleSets will now become much more complex for the nodes-list. Note that this is not something we only use for the ParticleSet unit tests, it is for example also how we add new particles in repeated release mode. And users may also want to add combine ParticleSets ion their scripts. Is there no way to make this easier/automatic with the ID generator?

CKehl · 2021-07-08T08:36:42Z

Thanks for that heads-up, @CKehl. I see that the code for adding ParticleSets will now become much more complex for the nodes-list. Note that this is not something we only use for the ParticleSet unit tests, it is for example also how we add new particles in repeated release mode. And users may also want to add combine ParticleSets ion their scripts. Is there no way to make this easier/automatic with the ID generator?

Well, so, perhaps to clarify if that didn't come through in the review: all the new id generator things and C-Library register is obviously only required for the node-based list. So, for users that just take AoS or SoA (default), nothing changes. Hope that already was apparent from the code.

Now, for the node-based list, we need those global-instance objects and it is there really a must-have. For the index-based collections like arrays, the id's really don't matter at all - they are actually just of interest when writing the files (to separate trajectories on file read and data analysis), but you'd never get an error if particles have duplicate id's with arrays. It's just how that structure works. For node-based lists, the id is the one-and-only structuring element and must be provably unique under any circumstance - there is no index to otherwise distinguish and address particles. Basically: not having that or having duplicate ids causes infinite loops. So: id generator - pre-requirement for node-based lists.

The C-library register should be a globally coherent instance, and in certain setups needs to be that globally coherent. What the library does is to allow compilation and library load of external libraries that have nothing to do with the dynamic code generator - best example: the fixed structure of a node in a C-library. Now, all nodes that (potentially) need to be connected with one another (i.e. all nodes in the same code context) need to share the same compiled-library context. Meaning: if you load say the 'libnode.so' for 2 node-instances in each a separate context, those node cannot be connected in C. Sure, we could hypothetically parse all nodes on kernel execution, copy them in an array, and copy them later back, but that is ... well, let's just say: we do not wanna do that cause it will ruin any performance gain we got from a node-based arrangement. So: those structure-instances in the C-library of a node need to share the same context because they need to be connected with one another. Now, one could think of giving each particle set its own C-library register without the need to define this externally - issue then: one cannot directly merge 2 particle sets because the one merging-into cannot access the C-stored data of the one that is being merged. So, if one wants to prove a merging- or adding process the way it is now with pset1.add(pset2), those two psets need to share the same C-library register. Again, as with the id generator, I had that previously as global variable but

Now, in my own branch I arranged those things as 'global variables'. Now, that worked in that experimental case, but first of all maintaining global variables by others is error-prone. The way it is now, I was able to integrate some asserts to check if those objects are not provided - defensive programming style. The other reason for changing this is exactly that I for example don't spawn more than one particle set - but some users may want to. And it that case, the global variable really, really doesn't work. For the C-library register, I never had more than one particle set and never needed to merge them before. With that merge-step of particle sets, it's important to define the C-library instance outside each one's individual particle set context. Lastly, the id generator is not just 'one generator', but there is a sequential, a spatial and a spatiotemporal generator to generate ids. One could even implement one based on @qubixes hash-function from the particle-particle interaction. With a global variable, you can't really change the type of generator (or: it's very complicated, to say the least). Also, letting the user define the id generator let's him/her chose the limits (in terms of depth and time at list) to get more distinctive ids that preserve id-closeness of close-by particles. So, in the end, it's those 4 lines for the id generator that isn't really asked to much I think from people who want to use the structure - of course given that this is well-documented in some notebook or example script.

So, that explains the whoel code section that look like:

    idgen = None
    c_lib_register = None

    [...]

    pset = None
    if pset_mode != 'nodes':
        pset = pset_type[pset_mode]['pset'](...)
    else:
        idgen = GenerateID_Service(SequentialIdGenerator)
        idgen.setDepthLimits(0., 1.0)
        idgen.setTimeLine(0.0, 1.0)
        c_lib_register =  LibraryRegisterC()
        pset = pset_type[pset_mode]['pset'](...)

Lastly there are the closing sections like:

    [...]
    del pset
    if idgen is not None:
        idgen.close()
        del idgen
    if c_lib_register is not None:
        c_lib_register.clear()
        del c_lib_register

They are also just needed for the node-based list. Reason: until now, parcels can rely on Python's garbage collector to allocate and free memory. That garbage collector is something in managed languages (like Python) that cannot be easily steered: it just has some global memory pool that your software uses and it just kills unreferenced objects in an arbitrary order when the processing ends. With array structures that is easy - there are contiguous memory blocks that can be thrown away in one go. With nodes, that automatic garbage collection doesn't work because the order in which objects are deleted from memory matters. Example: when you remove a node in the mid of the list, then it's removal function needs to tell its previous and next objects to update their links, cause their next-neighbour partner changes. Obviously, that requires those objects prev and next to still be valid byt the time that mid-object is deleted from memory. If the garbage collector throws away those elements in arbitrary order, those neighbour link updates will fail cause the object one wants to update is by itself deleted.

So, TL:DR on the del code: we need to steer the in-order removal of objects manually and thus cannot rely anymore on the superficial garbage collector. That also counts for the orderly tear-down of the id generator and the C-library register. That is why those functions need to be called in that order. Again: explicitly deleting particle set objects is generally a good practice, but not required for SoA and AoS - for nodes it needs to be there.

…l, old flake8 parser did not flag up.

…ration when reusing kernels from particle sets that use different particle objects.

…torial field sampling and checked that tests run well through

…the test-failure in 'test_advection.py', failing the Linux and MacOS tests

…rary version drawn from conda-forge to circumvent the file creation error.

…endencies, which is necessary for double-linked node lists to run

…s for the nodal structure. The Python-based node list will come by time, as we first need to (re-)implement the C-compilation interface with the multi-level compilation

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

…on in the ParticleCollectionInterableNodes to prevent infinite loops in list-comprehensions of very long, highly dynamic node-based particle sets.

…rdown and errors in the ID distribution for nodes.

…a dead-instance generator for IDs.

…ipt: it adds particles via a new (same) collection. In that case, internally particle data are extracted from one collection and inserted into the target. In that case, we ALWAYS need to generate a new ID for the particle, because its original ID (and index) are removed as soon as the original collection is destroyed. FIXED.

… and removing, as well as splitting and merging, of non-indexable particle sets does not preserve the element order, which needs to be accounted for in the assert-statements.

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

…rocedures and tdqm progressbars.

erikvansebille · 2022-01-05T13:13:21Z

@erikvansebille are you through with all the files and comments ? I did not see any questions or remarks on the message_service.py, for example, or some of the tests, so I thought it would be good to get your confirmation for the merge into 2.x first.

Yes, as I mentioned on 22 December, I have now gone through all the files. I just didn't have comments on each. Note that there are still a few open comments for you to address; I just went through them today and wrote an extra message under those where I think you haven't addressed them yet.

With the massive changes in this PR, we're really pushing the boundaries of what is still feasible and tractable in github tools...

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

…nd disabling the typed writing for nodes, as it didn't the way it was intented.

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

…se for baseparticleset.py BaseParticleFile::_create_trajectory_records(coords).

…torial_sampling.py.

CKehl added enhancement feature labels Jun 1, 2021

CKehl self-assigned this Jun 1, 2021

CKehl added a commit that referenced this pull request Jun 1, 2021

integrate_nodes_to_master (#1034) - located and marked the origin of …

c9f043e

…the test-failure in 'test_advection.py', failing the Linux and MacOS tests

CKehl commented Jun 1, 2021

View reviewed changes

parcels/particlefile/baseparticlefile.py Outdated Show resolved Hide resolved

CKehl added a commit that referenced this pull request Jun 1, 2021

integrate_nodes_to_master (#1034) - (temporarily) fixing the hdf5 lib…

6198de1

…rary version drawn from conda-forge to circumvent the file creation error.

CKehl added a commit that referenced this pull request Jun 3, 2021

integrate_nodes_to_master (#1034) - added sortedcontainers to the dep…

92d9755

…endencies, which is necessary for double-linked node lists to run

CKehl added a commit that referenced this pull request Jun 7, 2021

integrate_nodes_to_master (#1034) - added the node-folder to the inst…

cd4df1e

…all packages so it can be used from installed parcels packages.

CKehl added a commit that referenced this pull request Jun 7, 2021

integrate_nodes_to_master (#1034) - revisit of the code compiler to a…

59afb05

…llow multi-stage library compilation.

CKehl added a commit that referenced this pull request Jun 7, 2021

integrate_nodes_to_master (#1034) - integrate new classes into the pa…

4d615be

…ckage-init of 'compilation'

CKehl added a commit that referenced this pull request Jun 7, 2021

integrate_nodes_to_master (#1034) - add an ID generator that should f…

e37fd5c

…acilitate and globally-unique ID generator

CKehl added a commit that referenced this pull request Jun 7, 2021

integrate_nodes_to_master (#1034) - add an MPI-based messaging (-> me…

e1c10f3

…ssage passing interface MPI) service to exchange and synchronize ID generation across MPI instances

CKehl added a commit that referenced this pull request Jun 7, 2021

integrate_nodes_to_master (#1034) - integrate ID generators and messa…

4f2c8d1

…ging services into the tools-package init-function

CKehl added a commit that referenced this pull request Jun 7, 2021

integrate_nodes_to_master (#1034) - add the Python classes provding t…

1307454

…he node-based interface.

CKehl marked this pull request as draft July 7, 2021 18:55

CKehl added a commit that referenced this pull request Jul 8, 2021

integrate_nodes_to_master (#1034) - fixing more flake8 errors my loca…

173ffc9

…l, old flake8 parser did not flag up.

CKehl added a commit that referenced this pull request Jul 14, 2021

integrate_nodes_to_master (#1034) - fixing the bug in the kernel gene…

9348ae9

…ration when reusing kernels from particle sets that use different particle objects.

CKehl added a commit that referenced this pull request Jul 14, 2021

integrate_nodes_to_master (#1034) - fixed error in python notebook tu…

57c37d2

…torial field sampling and checked that tests run well through

CKehl added a commit that referenced this pull request Jul 14, 2021

integrate_nodes_to_master (#1034) - fixed error in python notebook tu…

b85a308

…torial field sampling and checked that tests run well through

CKehl added a commit that referenced this pull request Jul 15, 2021

integrate_nodes_to_master (#1034) - located and marked the origin of …

82af10a

…the test-failure in 'test_advection.py', failing the Linux and MacOS tests

CKehl added a commit that referenced this pull request Jul 15, 2021

integrate_nodes_to_master (#1034) - (temporarily) fixing the hdf5 lib…

d03e20c

…rary version drawn from conda-forge to circumvent the file creation error.

CKehl added a commit that referenced this pull request Jul 15, 2021

integrate_nodes_to_master (#1034) - added sortedcontainers to the dep…

adeec3b

…endencies, which is necessary for double-linked node lists to run

CKehl and others added 15 commits January 4, 2022 18:05

Update parcels/nodes/node.c

d4db09a

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

integrate_nodes_to_master (#1034) - fixing a loop termination conditi…

c37b308

…on in the ParticleCollectionInterableNodes to prevent infinite loops in list-comprehensions of very long, highly dynamic node-based particle sets.

integrate_nodes_to_master (#1034) - fixing errors in the nodelist tea…

6b94016

…rdown and errors in the ID distribution for nodes.

integrate_nodes_to_master (#1034) - fix ID generator problem, asking …

d900ba9

…a dead-instance generator for IDs.

integrate_nodes_to_master (#1034) - fix linter error.

f754dc2

integrate_nodes_to_master (#1034) - fix ID generator problem, asking …

08eb333

…a dead-instance generator for IDs.

integrate_nodes_to_master (#1034) - bugfix for the split-test: adding…

3c1c18f

… and removing, as well as splitting and merging, of non-indexable particle sets does not preserve the element order, which needs to be accounted for in the assert-statements.

integrate_nodes_to_master (#1034) - incorporated review comments.

23a3645

Apply suggestions from code review

0385a28

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

integrate_nodes_to_master (#1034) - incorporated review comments.

07e2b42

Apply suggestions from code review

fb812ca

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

integrate_nodes_to_master (#1034) - incorporated review comments.

c4c9f5e

Apply suggestions from code review

4b4f8f2

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

integrate_nodes_to_master (#1034) - fixing PEP8 linter errors.

a041ba6

CKehl force-pushed the integrate_nodes_to_master branch from 6c3d434 to a041ba6 Compare January 4, 2022 17:08

CKehl added 4 commits January 4, 2022 18:11

integrate_nodes_to_master (#1034) - fixing the linux env file again.

8930aa5

integrate_nodes_to_master (#1034) - fixing issues with file writing p…

621d18e

…rocedures and tdqm progressbars.

integrate_nodes_to_master (#1034) - fixing issues with file writing.

f2dbb4d

integrate_nodes_to_master (#1034) - fixing issues with file writing.

73cb1d4

CKehl and others added 8 commits January 5, 2022 14:56

Apply suggestions from code review

e0c9880

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

integrate_nodes_to_master (#1034) - fixing issues with file writing a…

4ac1caf

…nd disabling the typed writing for nodes, as it didn't the way it was intented.

integrate_nodes_to_master (#1034) - fixing linter issues.

a7a7edf

merge branches

e250250

Update tests/test_fieldset.py

e3f7bd6

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

Apply suggestions from code review

8d78b41

Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>

integrate_nodes_to_master (#1034) - incorporates feedback on code reu…

b1162e2

…se for baseparticleset.py BaseParticleFile::_create_trajectory_records(coords).

integrate_nodes_to_master (#1034) - regenerate python notebook for tu…

640b1d4

…torial_sampling.py.

CKehl merged commit 3e71958 into 2.x Jan 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration of the double-linked node list ParticleSet structure to 'master' #1034

Integration of the double-linked node list ParticleSet structure to 'master' #1034

CKehl commented Jun 1, 2021

CKehl commented Jun 1, 2021 •

edited

Loading

CKehl commented Jul 7, 2021

CKehl commented Jul 7, 2021

erikvansebille commented Jul 8, 2021

CKehl commented Jul 8, 2021

erikvansebille commented Jan 5, 2022

Integration of the double-linked node list ParticleSet structure to 'master' #1034

Integration of the double-linked node list ParticleSet structure to 'master' #1034

Conversation

CKehl commented Jun 1, 2021

CKehl commented Jun 1, 2021 • edited Loading

CKehl commented Jul 7, 2021

CKehl commented Jul 7, 2021

erikvansebille commented Jul 8, 2021

CKehl commented Jul 8, 2021

erikvansebille commented Jan 5, 2022

CKehl commented Jun 1, 2021 •

edited

Loading