Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of the double-linked node list ParticleSet structure to 'master' #1034

Merged
merged 250 commits into from
Jan 6, 2022

Conversation

CKehl
Copy link
Contributor

@CKehl CKehl commented Jun 1, 2021

major effort to include now the double-linked bode list as collection data structure for ParticleSets. Necessary prelude to the particle.add(..) function extension (as well as particle.split(...)) for internal features like fragmentation.

@CKehl
Copy link
Contributor Author

CKehl commented Jun 1, 2021

@erikvansebille wow - ehm. I don't know what's going on here with the HDF5 version of those reference files of the notebook tests, but it seems those test files need to be fixed again. I didn't really change something on the NetCDF writer here of the ParticleSet here (i.e. what was the last pull request integrated to the branch ?), but it seems some package (e.g. netcdf4 or xarray) has changed on conda-forge recently, causing this issue.

CKehl added a commit that referenced this pull request Jun 1, 2021
…the test-failure in 'test_advection.py', failing the Linux and MacOS tests
CKehl added a commit that referenced this pull request Jun 1, 2021
…rary version drawn from conda-forge to circumvent the file creation error.
CKehl added a commit that referenced this pull request Jun 3, 2021
…endencies, which is necessary for double-linked node lists to run
CKehl added a commit that referenced this pull request Jun 3, 2021
…s for the nodal structure. The Python-based node list will come by time, as we first need to (re-)implement the C-compilation interface with the multi-level compilation
CKehl added a commit that referenced this pull request Jun 7, 2021
…all packages so it can be used from installed parcels packages.
CKehl added a commit that referenced this pull request Jun 7, 2021
CKehl added a commit that referenced this pull request Jun 7, 2021
…library-C interface, to facilitate binding ctypes function of pre-compiled libraries, circumventing the jit code generator for ctypes portions that don't need to be bound 'just-in'time'. Could also be used for ParcelsRandom functions.
CKehl added a commit that referenced this pull request Jun 7, 2021
CKehl added a commit that referenced this pull request Jun 7, 2021
CKehl added a commit that referenced this pull request Jun 7, 2021
…ssage passing interface MPI) service to exchange and synchronize ID generation across MPI instances
CKehl added a commit that referenced this pull request Jun 7, 2021
…ging services into the tools-package init-function
CKehl added a commit that referenced this pull request Jun 7, 2021
@CKehl
Copy link
Contributor Author

CKehl commented Jul 7, 2021

@erikvansebille please have a look on the changes in "tests/test_particle_sets.py" and on the new layout of the tests. I needed to make quite some general change to the test layout to actually work with the double-linked list. In that context, the id-generator and C-library register must be a context-exclusive unique and consistent instance to guarantee correct functioning of the list itself, which means those need to be defined globally in each simulation script. If one only has 1 single particle set all the time, then they can be internally created. That said, some tests for example create 2 particle sets in order to merge them, and they need to share the same global id generator and library register. Managing ids in an MPI context is done in the backend (that was actually quite a complex thing to make it work, but it works), so multiple MPI instances are handled internally, but within one script itself they need to be defined by the user - or, in the case of the tests, for each test individually. Overall, those tests give you a first clue how working with those double-linked list particle sets will look like.

@CKehl CKehl marked this pull request as draft July 7, 2021 18:55
@CKehl
Copy link
Contributor Author

CKehl commented Jul 7, 2021

@erikvansebille please have a look on the changes in "tests/test_particle_sets.py" and on the new layout of the tests. I needed to make quite some general change to the test layout to actually work with the double-linked list. In that context, the id-generator and C-library register must be a context-exclusive unique and consistent instance to guarantee correct functioning of the list itself, which means those need to be defined globally in each simulation script. If one only has 1 single particle set all the time, then they can be internally created. That said, some tests for example create 2 particle sets in order to merge them, and they need to share the same global id generator and library register. Managing ids in an MPI context is done in the backend (that was actually quite a complex thing to make it work, but it works), so multiple MPI instances are handled internally, but within one script itself they need to be defined by the user - or, in the case of the tests, for each test individually. Overall, those tests give you a first clue how working with those double-linked list particle sets will look like.

PS: this is still a draft, so don't bother commenting on all the code-commented lines. We're still some steps away from merging the PR, and we will even need to talk a bit what we wanna include here and what we wanna treat in follow-up PRs.

@erikvansebille
Copy link
Member

Thanks for that heads-up, @CKehl. I see that the code for adding ParticleSets will now become much more complex for the nodes-list. Note that this is not something we only use for the ParticleSet unit tests, it is for example also how we add new particles in repeated release mode. And users may also want to add combine ParticleSets ion their scripts. Is there no way to make this easier/automatic with the ID generator?

@CKehl
Copy link
Contributor Author

CKehl commented Jul 8, 2021

Thanks for that heads-up, @CKehl. I see that the code for adding ParticleSets will now become much more complex for the nodes-list. Note that this is not something we only use for the ParticleSet unit tests, it is for example also how we add new particles in repeated release mode. And users may also want to add combine ParticleSets ion their scripts. Is there no way to make this easier/automatic with the ID generator?

Well, so, perhaps to clarify if that didn't come through in the review: all the new id generator things and C-Library register is obviously only required for the node-based list. So, for users that just take AoS or SoA (default), nothing changes. Hope that already was apparent from the code.

Now, for the node-based list, we need those global-instance objects and it is there really a must-have. For the index-based collections like arrays, the id's really don't matter at all - they are actually just of interest when writing the files (to separate trajectories on file read and data analysis), but you'd never get an error if particles have duplicate id's with arrays. It's just how that structure works. For node-based lists, the id is the one-and-only structuring element and must be provably unique under any circumstance - there is no index to otherwise distinguish and address particles. Basically: not having that or having duplicate ids causes infinite loops. So: id generator - pre-requirement for node-based lists.

The C-library register should be a globally coherent instance, and in certain setups needs to be that globally coherent. What the library does is to allow compilation and library load of external libraries that have nothing to do with the dynamic code generator - best example: the fixed structure of a node in a C-library. Now, all nodes that (potentially) need to be connected with one another (i.e. all nodes in the same code context) need to share the same compiled-library context. Meaning: if you load say the 'libnode.so' for 2 node-instances in each a separate context, those node cannot be connected in C. Sure, we could hypothetically parse all nodes on kernel execution, copy them in an array, and copy them later back, but that is ... well, let's just say: we do not wanna do that cause it will ruin any performance gain we got from a node-based arrangement. So: those structure-instances in the C-library of a node need to share the same context because they need to be connected with one another. Now, one could think of giving each particle set its own C-library register without the need to define this externally - issue then: one cannot directly merge 2 particle sets because the one merging-into cannot access the C-stored data of the one that is being merged. So, if one wants to prove a merging- or adding process the way it is now with pset1.add(pset2), those two psets need to share the same C-library register. Again, as with the id generator, I had that previously as global variable but

Now, in my own branch I arranged those things as 'global variables'. Now, that worked in that experimental case, but first of all maintaining global variables by others is error-prone. The way it is now, I was able to integrate some asserts to check if those objects are not provided - defensive programming style. The other reason for changing this is exactly that I for example don't spawn more than one particle set - but some users may want to. And it that case, the global variable really, really doesn't work. For the C-library register, I never had more than one particle set and never needed to merge them before. With that merge-step of particle sets, it's important to define the C-library instance outside each one's individual particle set context. Lastly, the id generator is not just 'one generator', but there is a sequential, a spatial and a spatiotemporal generator to generate ids. One could even implement one based on @qubixes hash-function from the particle-particle interaction. With a global variable, you can't really change the type of generator (or: it's very complicated, to say the least). Also, letting the user define the id generator let's him/her chose the limits (in terms of depth and time at list) to get more distinctive ids that preserve id-closeness of close-by particles. So, in the end, it's those 4 lines for the id generator that isn't really asked to much I think from people who want to use the structure - of course given that this is well-documented in some notebook or example script.

So, that explains the whoel code section that look like:

    idgen = None
    c_lib_register = None

    [...]

    pset = None
    if pset_mode != 'nodes':
        pset = pset_type[pset_mode]['pset'](...)
    else:
        idgen = GenerateID_Service(SequentialIdGenerator)
        idgen.setDepthLimits(0., 1.0)
        idgen.setTimeLine(0.0, 1.0)
        c_lib_register =  LibraryRegisterC()
        pset = pset_type[pset_mode]['pset'](...)

Lastly there are the closing sections like:

    [...]
    del pset
    if idgen is not None:
        idgen.close()
        del idgen
    if c_lib_register is not None:
        c_lib_register.clear()
        del c_lib_register

They are also just needed for the node-based list. Reason: until now, parcels can rely on Python's garbage collector to allocate and free memory. That garbage collector is something in managed languages (like Python) that cannot be easily steered: it just has some global memory pool that your software uses and it just kills unreferenced objects in an arbitrary order when the processing ends. With array structures that is easy - there are contiguous memory blocks that can be thrown away in one go. With nodes, that automatic garbage collection doesn't work because the order in which objects are deleted from memory matters. Example: when you remove a node in the mid of the list, then it's removal function needs to tell its previous and next objects to update their links, cause their next-neighbour partner changes. Obviously, that requires those objects prev and next to still be valid byt the time that mid-object is deleted from memory. If the garbage collector throws away those elements in arbitrary order, those neighbour link updates will fail cause the object one wants to update is by itself deleted.

So, TL:DR on the del code: we need to steer the in-order removal of objects manually and thus cannot rely anymore on the superficial garbage collector. That also counts for the orderly tear-down of the id generator and the C-library register. That is why those functions need to be called in that order. Again: explicitly deleting particle set objects is generally a good practice, but not required for SoA and AoS - for nodes it needs to be there.

CKehl added a commit that referenced this pull request Jul 8, 2021
CKehl added a commit that referenced this pull request Jul 14, 2021
…ration when reusing kernels from particle sets that use different particle objects.
CKehl added a commit that referenced this pull request Jul 14, 2021
…torial field sampling and checked that tests run well through
CKehl added a commit that referenced this pull request Jul 14, 2021
…torial field sampling and checked that tests run well through
CKehl added a commit that referenced this pull request Jul 15, 2021
…the test-failure in 'test_advection.py', failing the Linux and MacOS tests
CKehl added a commit that referenced this pull request Jul 15, 2021
…rary version drawn from conda-forge to circumvent the file creation error.
CKehl added a commit that referenced this pull request Jul 15, 2021
…endencies, which is necessary for double-linked node lists to run
CKehl added a commit that referenced this pull request Jul 15, 2021
…s for the nodal structure. The Python-based node list will come by time, as we first need to (re-)implement the C-compilation interface with the multi-level compilation
CKehl and others added 15 commits January 4, 2022 18:05
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
…on in the ParticleCollectionInterableNodes to prevent infinite loops in list-comprehensions of very long, highly dynamic node-based particle sets.
…rdown and errors in the ID distribution for nodes.
…ipt: it adds particles via a new (same) collection. In that case, internally particle data are extracted from one collection and inserted into the target. In that case, we ALWAYS need to generate a new ID for the particle, because its original ID (and index) are removed as soon as the original collection is destroyed. FIXED.
… and removing, as well as splitting and merging, of non-indexable particle sets does not preserve the element order, which needs to be accounted for in the assert-statements.
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
@CKehl CKehl force-pushed the integrate_nodes_to_master branch from 6c3d434 to a041ba6 Compare January 4, 2022 17:08
@erikvansebille
Copy link
Member

@erikvansebille are you through with all the files and comments ? I did not see any questions or remarks on the message_service.py, for example, or some of the tests, so I thought it would be good to get your confirmation for the merge into 2.x first.

Yes, as I mentioned on 22 December, I have now gone through all the files. I just didn't have comments on each. Note that there are still a few open comments for you to address; I just went through them today and wrote an extra message under those where I think you haven't addressed them yet.

With the massive changes in this PR, we're really pushing the boundaries of what is still feasible and tractable in github tools...

CKehl and others added 8 commits January 5, 2022 14:56
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
…nd disabling the typed writing for nodes, as it didn't the way it was intented.
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
…se for baseparticleset.py BaseParticleFile::_create_trajectory_records(coords).
@CKehl CKehl merged commit 3e71958 into 2.x Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants