Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-119004: fix a crash in equality testing between OrderedDict #121329

Merged
merged 19 commits into from
Sep 23, 2024

Conversation

picnixz
Copy link
Contributor

@picnixz picnixz commented Jul 3, 2024

This patch is based on #119004 (comment) but some bits needed to be added here and there.


📚 Documentation preview 📚: https://cpython-previews--121329.org.readthedocs.build/

@rhettinger
Copy link
Contributor

rhettinger commented Jul 3, 2024

I don't see the need to change the pure Python version. Since it won't segfault, there is no need for this elaborate defense against something that doesn't arise in normal practice.

@picnixz
Copy link
Contributor Author

picnixz commented Jul 3, 2024

Since it won't segfault, there is need for this elaborate defense against something that doesn't arise in normal practice.

Well.. without those changes, the tests would be broken (no exception would be raised actually). And the C implementation says:

This implementation is necessarily explicitly equivalent to the pure Python
OrderedDict class in Lib/collections/__init__.py.

@rhettinger
Copy link
Contributor

rhettinger commented Jul 3, 2024

Well.. without those changes, the tests would be broke

You can either make the new tests for the C version only, or you can make the C version emulate what the Python version does (no exception).

no exception would be raised actually

I'm fine with that. We made that decision long ago and it has worked out fine in practice.

Our only goal in this PR is make sure the C version won't segfault. There should be no other user visible API change. Nor should the pure python reference implementation change.

This implementation is necessarily explicitly equivalent to the pure Python
OrderedDict class in Lib/collections/init.py.

Add, "except where it is necessary to protect the C code from a segfault."

@rhettinger rhettinger self-assigned this Jul 3, 2024
@picnixz
Copy link
Contributor Author

picnixz commented Jul 4, 2024

you can make the C version emulate what the Python version does (no exception).

With this approach, this means I need duplicate the items buffer the entire nodes in advance otherwise calling clear() in between would free nodes that are no more available... I'm worried about performances but I'll try to benchmark and see the impact.

Our only goal in this PR is make sure the C version won't segfault. There should be no other user visible API change.

Actually, the Python implementation does segfault (well it's a segfault in the form of an AttributeError but it's essentially the same issue I think). The Python implementation of __delitem__ does:

    def __delitem__(self, key, dict_delitem=dict.__delitem__):
        'od.__delitem__(y) <==> del od[y]'
        # Deleting an existing item uses self.__map to find the link which gets
        # removed by updating the links in the predecessor and successor nodes.
        dict_delitem(self, key)
        link = self.__map.pop(key)
        link_prev = link.prev
        link_next = link.next
        link_prev.next = link_next
        link_next.prev = link_prev
        link.prev = None  # <- this has side-effects in __reversed__
        link.next = None  # <- this has side-effects in __iter__

And in __eq__ we do:

	return dict.__eq__(self, other) and all(map(_eq, self, other)))

The following test:

        count = 0
        class ChangeExternItem:
            def __eq__(self, other):
                nonlocal count
                if count == 1:
                    del lhs[self]
                count += 1
                return True
            def __hash__(self):
                return -1

        l1, r1 = ChangeExternItem(), ChangeExternItem()
        lhs = self.OrderedDict(dict.fromkeys((0, l1)))
        rhs = self.OrderedDict(dict.fromkeys((0, r1)))
		# AttributeError: 'NoneType' object has no attribute 'key'
		lhs == rhs

Now, the first dict.__eq__(lhs, rhs) call would trigger del lhs[self] which itself triggers OrderedDict.__delitem__, so in __iter__, after you're done getting the key of some item (here, l1), you would then do .next. However this one will be None (so it's not self.__root) and you would end up doing yield curr.key which fails!

So I still think that the Python implementation should be protected the same the C implementation is. I didn't catch the exception in .clear() (don't know why though) but if you delete a key during the iteration, then you end up having some error like that.

In summary the Python implementation has also a bug which can be prevented using the current guards.

@rhettinger
Copy link
Contributor

rhettinger commented Jul 4, 2024

Actually, the Python implementation does segfault

I was not able to reproduce this. The code in the referenced issue terminated normally, giving the output:

0
1
cleared l
True

(well it's a segfault in the form of an AttributeError but it's essentially the same issue I think).

That is not as severe as a segfault. An exception is a normal way for functions and methods to die when fed bogus inputs.

I'll look at this more when I get back from vacation. In the mean time, think about the least invasive, least behavior changing way to avoid the C code segfault. You've explored state tracking (an approach we intentionally decided not to do); now look for less impactful approaches.

Keep in mind, OrderedDict is mostly obsolete and not much used anymore. We're trying to leave it as stable as possible. We definitely don't want a segfault, but we want to avoid redesign and behavior changes. This is doubly true in this case where we're solving a problem that doesn't seem to have ever arisen in practice.

@picnixz
Copy link
Contributor Author

picnixz commented Jul 5, 2024

I was not able to reproduce this. The code in the referenced issue terminated normally, giving the output:

Yes, that's what I thought as well. Actually, the .clear() is not the issue here (probably because how it's actually implemented) but the problem is with the test example I gave with the del.

That is not as severe as a segfault

Maybe I should have phrased differently in the sense that the reason why such AttributeError arises is the same reason why it segfaults (at least for this specific test case I put).

You've explored state tracking (an approach we intentionally decided not to do); now look for less impactful approaches.

I see that I forgot to put that in my previous comment but I've thought of copying the keys before calling == but it comes at a high cost which I'd prefer not to do (i,e, making the C version equivalent to list(self.keys()) == list(other.keys())). And even if I were to this, I don't know what to return as a result of the comparison. For instance, if calling == has a side effect which changes a value that would previously be compared equal, what should I do? in some sense, the keys are equal and beforehand we already verified that the values are equal, but now it's no more the case, so what should self == other return in this case? By catching any mutation, at least I don't need to answer that question (or make a choice).

I'll try to come up with other ideas (it's something I didn't thought of, but maybe I could try to see how dicts are compared in general as well and try to apply that logic here, or see how other languages deal with that issue).

Keep in mind, OrderedDict is mostly obsolete and not much used anymore. We're trying to leave it as stable as possible

Oh don't worry about it, I think it should be the focus. I can suggest fixing the C implementation but leaving the Python implementation untouched and add a comment saying that mutating the operands may result in undesired side-effects and unexpected exceptions (at least, people would know that we know about the behaviour). One thing I would like to know is whether you want the Python implementation to possibly behave differently than the C implementation (at least, in the handling of errors).

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @rhettinger: no need to change the Python code.

Objects/odictobject.c Outdated Show resolved Hide resolved
Objects/odictobject.c Outdated Show resolved Hide resolved
Objects/odictobject.c Outdated Show resolved Hide resolved
Objects/odictobject.c Outdated Show resolved Hide resolved
Objects/odictobject.c Outdated Show resolved Hide resolved
Copy link
Contributor Author

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've left the messages since they were also used in other functions of this module. Changing them at one place does not feel right to me, so I think we should change them simultaneously if needed.

Doc/library/collections.rst Outdated Show resolved Hide resolved
Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks quite overcomplicated, especially tests. There are 10 tests for the C implementation, and complex helper class with many attributes and methods. Usually only 1-2 tests are added for similar bugfixes. I andrestand that you want to test many different cases, but are they all necessary? Would not test be simpler if you inline the code of setup() and teardown()?

Doc/library/collections.rst Outdated Show resolved Hide resolved
Lib/collections/__init__.py Outdated Show resolved Hide resolved
Objects/odictobject.c Outdated Show resolved Hide resolved
Objects/odictobject.c Outdated Show resolved Hide resolved
@picnixz
Copy link
Contributor Author

picnixz commented Jul 22, 2024

Would not test be simpler if you inline the code of setup() and teardown()?

I'll make a commit with this approach and see how it renders. But yes, it might be easier to review.

EDIT: It's much simpler now.

I andrestand that you want to test many different cases, but are they all necessary

I think I can remove the asymmetric cases now (or the symmetric ones maybe).

@picnixz
Copy link
Contributor Author

picnixz commented Aug 1, 2024

@rhettinger Friendly ping in case you forgot about this PR

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

…5MoUu.rst

Co-authored-by: Kirill Podoprigora <kirill.bast9@mail.ru>
@rhettinger
Copy link
Contributor

At the sprint, I discussed this briefly with Eric Snow (the original author). He concurred that the pure python code should not change, and he had no objections to adding the state counter for the C version.

@rhettinger rhettinger merged commit 38a887d into python:main Sep 23, 2024
37 of 38 checks passed
@picnixz picnixz deleted the fix-ordered-dict branch September 25, 2024 09:36
@picnixz
Copy link
Contributor Author

picnixz commented Sep 25, 2024

Hum, do we backport it to 3.12 and 3.13? (AFAICT, this is a bug fix so there should be a backport)

@serhiy-storchaka serhiy-storchaka added needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Sep 25, 2024
@miss-islington-app
Copy link

Thanks @picnixz for the PR, and @rhettinger for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Thanks @picnixz for the PR, and @rhettinger for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 25, 2024
pythonGH-121329)

(cherry picked from commit 38a887d)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@bedevere-app
Copy link

bedevere-app bot commented Sep 25, 2024

GH-124507 is a backport of this pull request to the 3.13 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Sep 25, 2024
pythonGH-121329)

(cherry picked from commit 38a887d)

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Sep 25, 2024
@bedevere-app
Copy link

bedevere-app bot commented Sep 25, 2024

GH-124508 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants