-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration of the double-linked node list ParticleSet structure to 'master' #1034
Conversation
@erikvansebille wow - ehm. I don't know what's going on here with the HDF5 version of those reference files of the notebook tests, but it seems those test files need to be fixed again. I didn't really change something on the NetCDF writer here of the ParticleSet here (i.e. what was the last pull request integrated to the branch ?), but it seems some package (e.g. |
…the test-failure in 'test_advection.py', failing the Linux and MacOS tests
…rary version drawn from conda-forge to circumvent the file creation error.
…endencies, which is necessary for double-linked node lists to run
…s for the nodal structure. The Python-based node list will come by time, as we first need to (re-)implement the C-compilation interface with the multi-level compilation
…all packages so it can be used from installed parcels packages.
…llow multi-stage library compilation.
…library-C interface, to facilitate binding ctypes function of pre-compiled libraries, circumventing the jit code generator for ctypes portions that don't need to be bound 'just-in'time'. Could also be used for ParcelsRandom functions.
…acilitate and globally-unique ID generator
…ssage passing interface MPI) service to exchange and synchronize ID generation across MPI instances
…ging services into the tools-package init-function
@erikvansebille please have a look on the changes in "tests/test_particle_sets.py" and on the new layout of the tests. I needed to make quite some general change to the test layout to actually work with the double-linked list. In that context, the id-generator and C-library register must be a context-exclusive unique and consistent instance to guarantee correct functioning of the list itself, which means those need to be defined globally in each simulation script. If one only has 1 single particle set all the time, then they can be internally created. That said, some tests for example create 2 particle sets in order to merge them, and they need to share the same global id generator and library register. Managing ids in an MPI context is done in the backend (that was actually quite a complex thing to make it work, but it works), so multiple MPI instances are handled internally, but within one script itself they need to be defined by the user - or, in the case of the tests, for each test individually. Overall, those tests give you a first clue how working with those double-linked list particle sets will look like. |
PS: this is still a draft, so don't bother commenting on all the code-commented lines. We're still some steps away from merging the PR, and we will even need to talk a bit what we wanna include here and what we wanna treat in follow-up PRs. |
Thanks for that heads-up, @CKehl. I see that the code for adding ParticleSets will now become much more complex for the nodes-list. Note that this is not something we only use for the ParticleSet unit tests, it is for example also how we add new particles in repeated release mode. And users may also want to add combine ParticleSets ion their scripts. Is there no way to make this easier/automatic with the ID generator? |
Well, so, perhaps to clarify if that didn't come through in the review: all the new id generator things and C-Library register is obviously only required for the node-based list. So, for users that just take AoS or SoA (default), nothing changes. Hope that already was apparent from the code. Now, for the node-based list, we need those global-instance objects and it is there really a must-have. For the index-based collections like arrays, the id's really don't matter at all - they are actually just of interest when writing the files (to separate trajectories on file read and data analysis), but you'd never get an error if particles have duplicate id's with arrays. It's just how that structure works. For node-based lists, the id is the one-and-only structuring element and must be provably unique under any circumstance - there is no index to otherwise distinguish and address particles. Basically: not having that or having duplicate ids causes infinite loops. So: id generator - pre-requirement for node-based lists. The C-library register should be a globally coherent instance, and in certain setups needs to be that globally coherent. What the library does is to allow compilation and library load of external libraries that have nothing to do with the dynamic code generator - best example: the fixed structure of a node in a C-library. Now, all nodes that (potentially) need to be connected with one another (i.e. all nodes in the same code context) need to share the same compiled-library context. Meaning: if you load say the 'libnode.so' for 2 node-instances in each a separate context, those node cannot be connected in C. Sure, we could hypothetically parse all nodes on kernel execution, copy them in an array, and copy them later back, but that is ... well, let's just say: we do not wanna do that cause it will ruin any performance gain we got from a node-based arrangement. So: those structure-instances in the C-library of a node need to share the same context because they need to be connected with one another. Now, one could think of giving each particle set its own C-library register without the need to define this externally - issue then: one cannot directly merge 2 particle sets because the one merging-into cannot access the C-stored data of the one that is being merged. So, if one wants to prove a merging- or adding process the way it is now with Now, in my own branch I arranged those things as 'global variables'. Now, that worked in that experimental case, but first of all maintaining global variables by others is error-prone. The way it is now, I was able to integrate some asserts to check if those objects are not provided - defensive programming style. The other reason for changing this is exactly that I for example don't spawn more than one particle set - but some users may want to. And it that case, the global variable really, really doesn't work. For the C-library register, I never had more than one particle set and never needed to merge them before. With that merge-step of particle sets, it's important to define the C-library instance outside each one's individual particle set context. Lastly, the id generator is not just 'one generator', but there is a sequential, a spatial and a spatiotemporal generator to generate ids. One could even implement one based on @qubixes hash-function from the particle-particle interaction. With a global variable, you can't really change the type of generator (or: it's very complicated, to say the least). Also, letting the user define the id generator let's him/her chose the limits (in terms of depth and time at list) to get more distinctive ids that preserve id-closeness of close-by particles. So, in the end, it's those 4 lines for the id generator that isn't really asked to much I think from people who want to use the structure - of course given that this is well-documented in some notebook or example script. So, that explains the whoel code section that look like:
Lastly there are the closing sections like:
They are also just needed for the node-based list. Reason: until now, parcels can rely on Python's garbage collector to allocate and free memory. That garbage collector is something in managed languages (like Python) that cannot be easily steered: it just has some global memory pool that your software uses and it just kills unreferenced objects in an arbitrary order when the processing ends. With array structures that is easy - there are contiguous memory blocks that can be thrown away in one go. With nodes, that automatic garbage collection doesn't work because the order in which objects are deleted from memory matters. Example: when you remove a node in the mid of the list, then it's removal function needs to tell its previous and next objects to update their links, cause their next-neighbour partner changes. Obviously, that requires those objects So, TL:DR on the |
…l, old flake8 parser did not flag up.
…ration when reusing kernels from particle sets that use different particle objects.
…torial field sampling and checked that tests run well through
…torial field sampling and checked that tests run well through
…the test-failure in 'test_advection.py', failing the Linux and MacOS tests
…rary version drawn from conda-forge to circumvent the file creation error.
…endencies, which is necessary for double-linked node lists to run
…s for the nodal structure. The Python-based node list will come by time, as we first need to (re-)implement the C-compilation interface with the multi-level compilation
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
…on in the ParticleCollectionInterableNodes to prevent infinite loops in list-comprehensions of very long, highly dynamic node-based particle sets.
…rdown and errors in the ID distribution for nodes.
…a dead-instance generator for IDs.
…a dead-instance generator for IDs.
…ipt: it adds particles via a new (same) collection. In that case, internally particle data are extracted from one collection and inserted into the target. In that case, we ALWAYS need to generate a new ID for the particle, because its original ID (and index) are removed as soon as the original collection is destroyed. FIXED.
… and removing, as well as splitting and merging, of non-indexable particle sets does not preserve the element order, which needs to be accounted for in the assert-statements.
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
6c3d434
to
a041ba6
Compare
…rocedures and tdqm progressbars.
Yes, as I mentioned on 22 December, I have now gone through all the files. I just didn't have comments on each. Note that there are still a few open comments for you to address; I just went through them today and wrote an extra message under those where I think you haven't addressed them yet. With the massive changes in this PR, we're really pushing the boundaries of what is still feasible and tractable in github tools... |
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
…nd disabling the typed writing for nodes, as it didn't the way it was intented.
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
…se for baseparticleset.py BaseParticleFile::_create_trajectory_records(coords).
…torial_sampling.py.
major effort to include now the double-linked bode list as collection data structure for ParticleSets. Necessary prelude to the
particle.add(..)
function extension (as well asparticle.split(...)
) for internal features like fragmentation.