-
Notifications
You must be signed in to change notification settings - Fork 15.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python CopyFrom segmentation fault on version 4.24.0 #13485
Comments
I've seen this issue with python 3.9 and 3.11 as well. It's blocking our project's CI runs. |
Not sure why but I can NOT reproduce in google3 internal CL with
Nor opensource on my linux:
|
Just a thought, could it be that the protos that we're intending to use are "compiled" with a previous protoc version, and the issue is that this new version throws in that case? In our project, it was failing when trying to use some protos from a TensorFlow dependency (which was released before this new protobuf release from yesterday). |
#13485 PiperOrigin-RevId: 555287499
Can you help to verify:
3, Does the segment fault still raise for struct_pb2.Struct()
|
This might actually be possibly how it's happening, as I could not produce a simple repro outside of our internal product.
I am working to answer your other questions, but I can confirm (1) that we do not depend on TensorFlow. |
Confirming for (3) that this does not fail with |
Thank you for these info. As both of you mentioned that the code is depending on old code gen, can you try if upgrade these gen files with newest protoc to see if it can resolve the issue? |
This gives us some more insights, that might inform debugging this issue. We use Upgrading to the latest version of |
Are you asking this just to validate that the issue is not present if we do that? Or as a proposed solution for users encountering this issue? Note that a lot of times (like in our case with the TensorFlow dependency) we're depending on a package that is not owned by us, generally, I don't thin it's feasible to require that from every user that gets broken by this change. Rather, this would restrict projects from using newer versions of protobuf libraries unless all dependencies also migrate to the latest version. For validation, given that the protos are from dependencies we don't own, I expect you would have more context on what a "packaged" version of a proto would look like, or how to produce and use it, so it should be easier for you to find a way to produce some minimal dependency with the older version, and use it with a newer version, using virtualenv or something like this? |
Understand that you may not able to upgrade the protoc version. The "produce some minimal dependency with the older version" under design currently. Mert mentioned "This failure started with us picking up version 4.24.0 with yesterday's release. Reverting back to any version prior fixes the issue." Can you help to confirm that if 4.23.4 do not produce the segment fault and also make sure 4.23.4 is using upb python.
If 4.23.4 works, it will make us easier to identify which change may trigger the issue |
For us, version 4.23.4 works fine. Our work around was to restrict the protobuf dependency to be < 4.24: tensorflow/tensorboard@734988d, and I can see in the logs of our CI runs that indeed, protobuf==4.23.4 was used. |
If it is using protoc 4.21.10, I think it is not related to old gen code. protoc 4.21.10 and 4.24.0 do not have differences on upb python generated code. Just compared with runtime 4.23.4 and 4.24.0: However we still do not have enough info to repo which makes us hard to go forward.... As Mert said "I could not produce a simple repro outside of our internal product." Wondering if Adrian can provide a repo? |
+1'ing this that 4.23.4 worked fine on our end. This code is stressed daily, and the immediate hours after 4.24.0 release is when failures started happening. @anandolee, sounds like you have a good clue about what might be causing the crash. |
I am having this problem too. |
@anandolee, please let us know how we can help with getting a fix for this issue. We are happy to test private releases on our end to verify a hypothetical fix. |
…ffers/protobuf#13485 We should go forward with memory copy once able to repo PiperOrigin-RevId: 557178075
…ffers/protobuf#13485 We will go forward with memory copy in 25.0 PiperOrigin-RevId: 557178075
We can rollback the change to serialize/parse wrap in 24.1 release to temporary unblock projects. But main will keep the same to use memory copy and 25.0 release will back to memory copy. This means the bug will return in 25.0 if we don't find an adequate repro in time. Wondering if you can continue help us with repro for this issue? Also, can you help to verify if protocolbuffers/upb#1486 work? |
@anandolee, I can try to verify if that fix works. Can you share instructions on how to install a private release containing that fix? |
OK, we will test if 24.1 fixes this bug and that can be the verification that this is the root cause of the crash. |
@anandolee We have recently tested the crashing test on version 4.24.1 and I can confirm that the crash does not repro with this CopyFrom optimization disabled. You said that 4.25.0 will re-introduce this memory copy. Does that mean that the bug is slated to be re-introduced as-is, or are you working on an implementation that won't seg fault? |
We are hoping to figure out the root cause and fix it in the intervening time. If you can try to repro against protobuf mainline that would be helpful |
Happy to help, I'm assuming you're suggesting building the upb backend locally and testing against that. Are there dev instructions anywhere to guide with that? |
Good news, I have a repro 🎉 . Turns out, the issue isn't to do with copying one empty initialized message into another. It's calling Here is what I have:
I have built this via Then, I execute the following test:
You should get a segmentation fault on 4.24.0, and successful execution on any other version. |
Thank you Mert! I can reproduce the error with your new example! Will work on a real fix! |
Thanks for the update Jie and Matt! |
…eep copy from memory fix protocolbuffers/protobuf#13485 PiperOrigin-RevId: 559643466
…eep copy from memory fix #13485 PiperOrigin-RevId: 559643466
…eep copy from memory fix protocolbuffers/protobuf#13485 PiperOrigin-RevId: 559643466
…eep copy from memory fix #13485 PiperOrigin-RevId: 559643466
…eep copy from memory fix protocolbuffers/protobuf#13485 PiperOrigin-RevId: 559643466
…eep copy from memory fix protocolbuffers/protobuf#13485 PiperOrigin-RevId: 559870202
…nstead of deep copy from memory fix #13485 PiperOrigin-RevId: 559867160
…nstead of deep copy from memory fix #13485 PiperOrigin-RevId: 559867160
…nstead of deep copy from memory fix #13485 PiperOrigin-RevId: 559867160
…nstead of deep copy from memory fix #13485 PiperOrigin-RevId: 559888172
The issue should been fixed by #1514 |
@anandolee Thank you, this is great to hear! |
protocolbuffers/protobuf#13485 was fixed with protocolbuffers/upb#1514, so this shouldn't be an issue anymore. If CI passes successfully with this change, then it should be fine to merge.
…sion 2.15.0 Adrian RC (10): Restricts protobuf dependency to < 4.24 due to protocolbuffers/protobuf#13485 (#6538) Drop support for Python 3.8 Add 2.14.0 relnotes to RELEASE.md Bump version at HEAD to 2.15.0a0 Adds explicit dependency on `six` package. (#6581) Updates script to sync TF compat protos into TB repo. Updates compat TF protos by running our update.sh script. Pin TF version used by CI to 2.15.0.dev20231011 (closest to their branch cut) Add 2.15.0 relnotes to RELEASE.md TensorBoard 2.15.0 Brian Dubois (14): Tweak some patch and upgrade documentation. (#6520) Change return type of list_hyperparameters() operation. (#6537) Hparams: Generate metric_infos from data provider session groups. (#6539) Hparams: Fix metric info generation for sessions without run names. (#6541) Hparams: Generate metric values for data provider-based session groups. (#6543) Hparams: Fix internal Builder failure for test. (#6545) Hparams: Allow sessions without name to pull all run,tag combinations as metrics. (#6546) Filter metric values in /list_session_groups. (#6550) Import Fix: Remove DebuggerState from AppState. (#6554) Hparams: Support excluding metric information in HTTP requests. (#6556) Migrate hparams to use gtag.js-based logging for Google Analytics. (#6586) Hparams: Generate map of runs to hparams by matching with session names. (#6600) Give fixed width to runs table's checkbox and color columns. (#6637) Chore: Remove markdown freewisdom reference. (#6640) James (8): MDC Migration: Apply tb theme to new mat components (#6562) Data Table: enable data table by default (#6563) HParams: Default Runs table sorting to ascending (#6587) HParams: Make icons buttons in data table header (#6594) HParams: Do not scroll regex filter in the horizontal direction (#6596) Chore: Remove wheel from requirements.txt (#6570) Add data-id attribute to RunsDataTable rows for internal tests (#6610) MDC Migration: Merging feature branch with all new Angular Components Styled as desired. (#6631) Joyce (1): Set Token Permissions for github workflows (#6583) Leo Gallucci (1): Relax google-auth-oauthlib dependency to breaking changes (#6609) Nick Groszewski (1): Update data server build step to run with ubuntu 20.04 and change platform tag (#6636) Riley Jones (24): HParams: Create hparams data source to fetch data from the hparams plugin (#6535) HParams: Create hparams effects to trigger data to be fetched from the hparams plugin (#6540) HParams: Create filter dialog component (#6493) Bug Fix: Start using expect_angular_legacy_material_checkbox (#6547) Hparam: Add selectors to retrieve data written by the new hparams effects (#6544) HParams: Add new state property to store hparam and metric filters (#6553) Hparams: refactor arguments provided to `getDashboardRuns` (#6555) Chore: Change all copybara expects to legacy variants (#6549) Bug Fix: Update the port scanning logic to properly skip ports in use (#6560) HParams: Allow runs in the time series dashboard to be filtered by hparam (#6488) HParams: Stop fetching hparams data using run effects (#6561) Chore: Fix internal sync (#6566) Bug Fix change run effects to use combine latest instead of forkjoin (#6568) Revert HParams: Stop fetching hparams data using run effects (#6561) (#6571) Suggested Cards: Create data source for persisting card interactions to localStorage (#6439) HParams: Add effect to remove hparam filters when the related column is removed (#6572) HParams: Bug fix - removing a column closes the filter modal (#6589) HParams: Align sub context menus to the top of the button which opened them (#6590) HParams: Enable `hparamsInTimeSeries` by default (#6565) Chore: Fix CI breakage (#6593) Bug Fix: update the range input widget to only show the slider when min and max are different (#6591) Improve styling of context menus (#6592) HParams: Bug fix when adding table columns copy them rather than mutate them (#6597) Hparams: Fix repainting bug (#6642) Stanley Bileschi (1): Remove markdown test for deprecated behavior. (#6634) Yating (14): Hparams: Change `read_hyperparameters()` return type (#6542) Hparams: Add `include_in_result` field and implement support for `hparams_to_include` (#6559) Hparams: Support `limit` for `DataProvider.list_hyperparameters()`. (#6569) Hparams: Set interval domain fields for float summary hparams. (#6574) Add `last_value` to `ScalarTimeSeries` interface. (#6579) Hparams: Apply limit to hparams retrieved from protos with `_hparams_/experiment` tag. (#6577) Hparams: Populate `differs` field for TF summary hparams. (#6582) Hparams: fix the bug where the last hparam was discarded when there is no limit (#6584) Hparams: Sort and apply limit to hparams from runs. (#6588) Hparams: Sort by `differs` then by `name`. (#6601) Hparams: fix documentation of session and session group (#6613) Image Summary: Officially start using vectorized EncodePng Op (#6611) fix typos (#6622) Hparams: Update docstrings (#6633) jameshollyer (2): Add 2.14.1 relnotes to RELEASE.md fix release notes mistake
What version of protobuf and what language are you using?
Version: v4.24.0
Language: Python upb
What operating system (Linux, Windows, ...) and version?
Repros on Linux and MacOS
What runtime / compiler are you using (e.g., python version or gcc version)
python 3.8
Installed Apple clang version 14.0.3 (clang-1403.0.22.14.1)
What did you do?
We have a proto message that fundamentally looks like this:
And we have a piece of python that tries to copy an empty version of one of these messages into the other:
This is resulting on a segmentation fault on exactly the
CopyFrom
line.This failure started with us picking up version 4.24.0 with yesterday's release. Reverting back to any version prior fixes the issue.
Unfortunately building the above example proto as an MVP did not repro the segmentation fault for me. If there are any diagnostics I can provide that would make root causing this easier, happy to help.
What did you expect to see
No seg fault, behavior as before.
What did you see instead?
Segmentation fault failure on
CopyFrom
Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).
Anything else we should know about your project / environment
The text was updated successfully, but these errors were encountered: