Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: IDAKLU / casadi serialization error - Object written in version 2 but can only read in version 1 #3100

Closed
Mrzhang-hub opened this issue Jul 4, 2023 · 17 comments · Fixed by #3637
Assignees
Labels
bug Something isn't working in-progress Assigned in the core dev monthly meeting

Comments

@Mrzhang-hub
Copy link

PyBaMM Version

23.4.1

Python Version

3.9

Describe the bug

RuntimeError:C:..\casadi\core\serialization_stream.cpp:355:Assertion"load_version==v" failed:ProtoFunction failed. Object written in version 2 but can only read in version 1.

Steps to Reproduce

import pybamm
import numpy as np
model = pybamm.lithium_ion.SPM()
geometry = model.default_geometry
param = model.default_parameter_values
param.process_model(model)
param.process_geometry(geometry)
mesh = pybamm.Mesh(geometry,model.default_submesh_types,model.default_var_pts)
disc = pybamm.Discretisation(mesh,model.default_spatial_methods)
disc.process_model(model)
t_eval = np.linspace(0,3600,100)
if pybamm.have_idaklu():
     klu_sol = pybamm.IDAKLUSolver(atol=1e-8,rtol=1e-8).solve(model,t_eval)
plot = pybamm.Quickplot(klu_sol)
plot.dynamic_plot()

Relevant log output

No response

@Mrzhang-hub Mrzhang-hub added the bug Something isn't working label Jul 4, 2023
@brosaplanella
Copy link
Sponsor Member

Strange, it works at my end (even with version 23.4.1). Could be that when you upgraded to the newer PyBaMM version, casadi didn't. Can you try upgrading both PyBaMM and Casadi to their respective latest versions?

@Mrzhang-hub
Copy link
Author

I have upgrade pybamm and casadi to their newest version.(pybamm=23.5 casadi=3.6.3) But the problem still exist. Is that has something to do with the version of c++?

@Mrzhang-hub
Copy link
Author

I downgraded pybamm to version 22.10 and casadi to version 3.5.5, which solved the problem.

@valentinsulzer
Copy link
Member

Reopening as suggested in #3193 (comment)

We need to find out why the newer pybamm or casadi version breaks this

@brosaplanella brosaplanella added the in-progress Assigned in the core dev monthly meeting label Sep 4, 2023
@jsbrittain
Copy link
Contributor

I'm having some trouble recreating this issue, even with the specific verison of PyBaMM and python mentioned, but I do think I know where the problem is coming from.

The idaklu solver has a Python component and a C++ component, both of which link to casadi. The problem appears to be that under certain circumstances these two components seem to be getting out-of-sync with each other, causing the above reported serialisation error. This doesn't typically happen since both components usually link to the same libraries which are downloaded as part of the PyBaMM install.

Simply rerunning the install procedure on the latest version of PyBaMM should be enough to update PyBaMM and resolve this. But if that doesn't work then I would recommend performing a clean install of PyBaMM, i.e. remove the previous version first, which removes any compiled idaklu code, allowing a fresh install to compile and link using the provided libraries.

If that still doesn't work then it is possible that there are environment variables pointing to a separate version of casadi on the computer (which could be getting picked up during installation) - in that case the user can remove or upgrade that version of casadi to match the version provided by PyBaMM, and then rerun the PyBaMM install scripts.

As a side note, this specific problem seems isolated to updates in casadi's serialization format (which are not likely to change that often). I can't suggest too much more without a reliable test-case. The error thrown by casadi ('RuntimeError') is quite generic so intercepting it in an attempt to provide a more directed set of instructions to the user may not be reliable. On top of that v23.4.1 still used tox, so the whole install system has been overhauled since then.

I think the best thing to do at this point is to close this issue, noting the above, and refer back in the future if we run into similar problems again.

@jsbrittain jsbrittain changed the title [Bug]: IDAKLU solver error [Bug]: IDAKLU / casadi serialization error - Object written in version 2 but can only read in version 1 Dec 1, 2023
@jsbrittain
Copy link
Contributor

Reopening as the issue has reemerged, including in pip installs of pybamm. See #3193 for related discussion.

@jsbrittain jsbrittain reopened this Dec 1, 2023
@jsbrittain
Copy link
Contributor

Some progress - Given the new information I have managed to recreate the serialisation error on my (working) machine by taking the following actions:

  1. Removed/renamed ~/local to remove any existing (local) casadi installs.
  2. Build casadi from source following their instructions (this required casadi version 3.5.5 which builds libcasadi.3.6.dylib, not version 3.6.0 which actually builds libcasadi.3.7.dylib.
  3. Reinstall PyBaMM using nox (pybamm-requires then dev)
  4. Test the installation using the script provided in Idaklu error #3193

After that I receive the Assertion "load_version==v" failed: DeSerialization of ProtoFunction failed. Object written in version 2 but can only read in version 1. error. This only impacts the new build/install, my previous installation still works fine.

So it seems that this error is in-fact due to a mismatch in the version of casadi used in Python conflicting with the version that IDAKLU uses to link at build time. Removing casadi installs outside of PyBaMM prior to install should resolve this, but I acknowledge that this is not an ideal solution and the installer should instead prioritise the python install over the system install to maintain synchrony. Further, it is not clear to me how or why this is impacting the wheels (did the wheel itself build using two casadi versions?). Now that we have a reprodicuble testbed for examining the issue we should be able to make further progress.

@agriyakhetarpal
Copy link
Member

agriyakhetarpal commented Dec 1, 2023

Tested this on my end and I think the error does not appear when a pair of casadi 3.6.0 and subsequent patch releases (3.6.X) is used for linkage and runtime, so it is ominous that there were some changes between the minor releases, i.e., between 3.5.X and 3.6. We should note that the casadi developers either delete or yank all older releases on PyPI; I am unable to install casadi==3.5.5 via pip, but this case might not be true for the vcpkg registry we have set up for casadi, which was one of the hypotheses on where the error might have been coming from. From the discussion in #3193 it seems that this was just on Windows; however, @pipliggins seems to have faced the error on macOS (not sure if vcpkg as a package manager is available on macOS?). Also, maybe the casadi developers deleted the 3.5.5 release from vcpkg as well and it led to an unreleased or nightly released version of casadi being used?

I can confirm that through the changes in the CMakeLists.txt file in #3569, pip uses the correct installation of casadi from the build-time dependency table to provide to CMake as a part of the linkage procedure. If we can get to the bottom of this, we should probably add some notes in the release workflow to ensure that the build-time and run-time versions of casadi are provided as the same for users and that the vcpkg registry is updated periodically. I am not sure what led to the creation of a vcpkg registry in the first place (I assume there could have been a discrepancy in the package's CMake configuration), but with versions >=3.6 and above it might be a good idea to try linking with the Python casadi on Windows again and see if that works, we might be able to put the registry repository to rest after that.

@jsbrittain – on a side note, does the serialisation error disappear if you manually install casadi==3.5.5 in a virtual environment wherein PyBaMM has been installed (not sure how to do so but maybe it's present in a wheelhouse elsewhere of PyPI?) and try to instantiate the IDAKLU solver, i.e., thereby having the same version as the one you had installed from source and linked at the time of compilation (3.5.5)?

P.S. Since we are not sure of how many users have been affected and not sure of the fact whether it is just for all Windows users or it is only for a smaller portion of users across all platforms (though perhaps if it were for the former, we would have received several times more complaints), should we create a patch 23.9.1 release?

@jsbrittain
Copy link
Contributor

I think I may have resolved the install problem by editing this line of CMakeLists.txt to read find_package(casadi CONFIG PATHS ${CASADI_DIR} REQUIRED NO_DEFAULT_PATH) (i.e. add NO_DEFAULT_PATH). I believe it was previously still checking system paths for casadi first, even after locating the site-packages version. @pipliggins could you check this on your machine as-well please? @agriyakhetarpal If we can confirm this it would be worth adding to your PR #3569 .

@pipliggins
Copy link
Contributor

@jsbrittain @agriyakhetarpal The CMakeLists.txt fix works for me too!

@agriyakhetarpal
Copy link
Member

Thanks, adding the necessary change(s) to #3569! I would suggest we should keep this issue open for the time being.

agriyakhetarpal added a commit to agriyakhetarpal/PyBaMM that referenced this issue Dec 1, 2023
agriyakhetarpal added a commit to agriyakhetarpal/PyBaMM that referenced this issue Dec 1, 2023
…for alternative `casadi` installations

Co-Authored-By: jsbrittain <98161205+jsbrittain@users.noreply.github.com>
@agriyakhetarpal
Copy link
Member

I have looked into this in the past week and this week; I think I can see where the issue comes from while building the Windows wheels. By examining the logs from one of the workflow runs for my fork with increased build verbosity, this is the line in specific:

-- Downloading https://github.com/casadi/casadi/archive/nightly-ship.tar.gz -> casadi-casadi-nightly-ship.tar.gz...
-- Extracting source C:/vcpkg/downloads/casadi-casadi-nightly-ship.tar.gz

which suggests that vcpkg is indeed downloading an older release of casadi, which I subsequently traced to these lines of pybamm-team/casadi-vcpkg-registry:

https://github.com/pybamm-team/casadi-vcpkg-registry/blob/70f49f3c22fee4874fb8a36ef1a559f2c185ef1f/ports/casadi/portfile.cmake#L1-L7

and the corresponding release is very old, from March 2022: https://github.com/casadi/casadi/releases/tag/nightly-ship. I assume to fix this issue we need to update the registry to download the 3.6.3 release. I managed to do that on my fork at agriyakhetarpal/casadi-vcpkg-registry and triggered a workflow run: the Windows wheels were built with 3.6.3 here: https://github.com/agriyakhetarpal/PyBaMM/actions/runs/7127859881/job/19408666456

as shown by the lines:

-- Downloading https://github.com/casadi/casadi/archive/3.6.3.tar.gz -> casadi-casadi-3.6.3.tar.gz...
-- Extracting source C:/vcpkg/downloads/casadi-casadi-3.6.3.tar.gz

I am not sure if this will fix the serialisation error but I suppose it is worth a try. @chmabaur, could you try installing the Windows wheel from here: https://github.com/agriyakhetarpal/PyBaMM/actions/runs/7127859881 and see if the error disappears?

If it does resolve everything, I can put together a PR from my fork to the registry and put up a note in the release workflow and other appropriate places to ensure that the registry is bumped to a suitable casadi version whenever we build the wheels. Another method would be to pin the build-time casadi version in pyproject.toml while keeping the run-time version without a pin.


P.S. I did try using the Python casadi instead of the vcpkg casadi on Windows while building the wheel, but it had failed with an error trace that I do not understand fully: https://github.com/agriyakhetarpal/PyBaMM/actions/runs/7126976800. I would still suggest that if moving to a Python-based installation for linkage is possible, we should try that – it removes the burden of maintenance of the registry altogether on our end

@jsbrittain
Copy link
Contributor

@agriyakhetarpal Removing the registry definitely sounds like the most maintainable solution going forwards. I took a quick look at the error from your Windows wheel build with python-casadi and, comparing it to the PyPI package listings, it looks as though there may be a misreferenced file in the casadi cmake targets (libcasadi.lib is requested, but only casadi.lib exists); I'll need to get hold of a Windows machine to see if this actually resolves the build issue, but even if it does that would mean an upstream change in the casadi codebase (we might be able to hotfix it at our end in the meantime, but that wouldn't be stable for subsequent releases). Have you tried this with the latest casadi release (3.6.4)?

@agriyakhetarpal
Copy link
Member

I haven't tried it with the latest casadi release yet since they don't seem to provide any minor version compatibility of sorts (based on the many, many changes between 3.6.3 and 3.6.4 as noted in the release notes); but there should be no harm in trying that—I will update my fork of the registry again, try both the Python and vcpkg casadi and get back to this—maybe they fixed this on their end. I had chosen 3.6.3 specifically to synchronise the versions for the build-time and run-time dependencies, but I suppose pip downloads the latest version anyway unless we don't pin them.

agriyakhetarpal added a commit to agriyakhetarpal/PyBaMM that referenced this issue Dec 8, 2023
agriyakhetarpal added a commit to agriyakhetarpal/PyBaMM that referenced this issue Dec 8, 2023
@agriyakhetarpal
Copy link
Member

Okay, I tried it with version 3.6.4 with both the Python casadi on and the vcpkg casadi on these branches respectively in my fork: agriyakhetarpal/try-windows-python-casadi-3.6.4-linkage and agriyakhetarpal/try-windows-vcpkg-casadi-3.6.4-linkage.

The Python casadi workflow failed again with the same error as the earlier 3.6.3 one but the vcpkg one passed – I am not familiar enough with vcpkg yet to know how to adjust the name of the .lib file it looks for in the Python casadi directory or to know how to write a .patch file for the same task.

@chmabaur – in addition to the wheels linked against casadi 3.6.3 in the link above, I would request you to try these wheels too: https://github.com/agriyakhetarpal/PyBaMM/actions/runs/7137563916. I hope at least one of these two sets of wheels resolves the serialisation error you have been receiving. (Perhaps we can look into doing a patch release if the error is found to be prevalent across a larger section of Windows users, now that we know where it comes from).

@chmabaur
Copy link
Contributor

chmabaur commented Dec 8, 2023

@agriyakhetarpal Thanks a lot for your efforts to solve this issue! I have tested both wheel files in separate virtual environments on Windows for python version 3.9. I am glad to say that in both cases the serialization error disappeared.

I am not able to follow your discussions above, but I hope it is ok that pip show prints the casadi version 3.6.4 in both cases.

@agriyakhetarpal
Copy link
Member

Thanks for confirming! Yes, pip show casadi displaying 3.6.4 as the output is fine. I guess the gist is that the casadi has to be 3.6.X for the serialisation to work properly and a difference between the patch versions at runtime should be okay, but linking and running with different minor versions (say, 3.5.X vs 3.6) might cause issues as has been the case here.

Tagging @martinjrobins to request for write access to the casadi vcpkg registry (I think merging a PR from my end there will require a further, redundant update to the git-trees so it would be great to directly push) and then this issue can be closed with an update to the baseline commit in the vcpkg configuration file.

P.S. I will open a new, separate issue to explore the possibility of moving away from the registry and getting back to using the Python casadi again. It looks like it can be fixed with a patch on our end, but I don't have enough experience to comment about that procedure haha

agriyakhetarpal added a commit to pybamm-team/casadi-vcpkg-registry that referenced this issue Dec 19, 2023
See pybamm-team/PyBaMM#3100 for more details and discussion.

This fixes the serialisation error coming from a CasADi ProtoFunction which was due to an old version being linked on Windows wheels.
agriyakhetarpal added a commit to pybamm-team/casadi-vcpkg-registry that referenced this issue Dec 19, 2023
See pybamm-team/PyBaMM#3100 for more details.

This commti updates the git-tree to accommodate the version bump to `casadi` `3.6.4`.
agriyakhetarpal added a commit to agriyakhetarpal/PyBaMM that referenced this issue Dec 19, 2023
agriyakhetarpal added a commit to agriyakhetarpal/PyBaMM that referenced this issue Dec 19, 2023
@agriyakhetarpal agriyakhetarpal self-assigned this Dec 19, 2023
js1tr3 pushed a commit to js1tr3/PyBaMM that referenced this issue Aug 12, 2024
…for alternative `casadi` installations

Co-Authored-By: jsbrittain <98161205+jsbrittain@users.noreply.github.com>
js1tr3 pushed a commit to js1tr3/PyBaMM that referenced this issue Aug 12, 2024
js1tr3 pushed a commit to js1tr3/PyBaMM that referenced this issue Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working in-progress Assigned in the core dev monthly meeting
Development

Successfully merging a pull request may close this issue.

7 participants