-
-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds get_new_ids and at_addrs functions #36
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please revert the changes to generated images (and the one .dot file) in docs/? These clutter the diff and make it difficult to review the PR.
I also did not notice any tests for the new functions? Currently objgraph has 100% test coverage and I do not want to lose that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I finally understand what this is for, I quite like the idea. Thanks for sharing it!
The PR will need some massaging to be mergeable. I'll try to help.
objgraph.py
Outdated
__version__ = '3.3.1.dev0' | ||
__date__ = '2017-12-28' | ||
__version__ = '3.3.2.dev0' | ||
__date__ = '2018-01-08' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New function are new features, so it'll be 3.4.0.dev0, according to SemVer.
docs/objgraph.txt
Outdated
@@ -19,6 +19,7 @@ Statistics | |||
|
|||
.. autofunction:: show_growth([limit=10, peak_stats={}, shortnames=True, file=sys.stdout]) | |||
|
|||
.. autofunction:: get_ids([skip_update=False, limit=10, sortby='deltas']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my, I don't remember how my own documentation works! I totally forgot to include growth()
and perhaps a few other new functions in here!
objgraph.py
Outdated
weakref 3807 3864 +57 +57 | ||
dict 6892 6947 +73 +55 | ||
frame 34 70 +53 +36 | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you have some tests! Sorry I missed them on the first glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API-wise I'm not very happy about functions that both print and return values.
Despite having read the description, I don't quite get how this one differs from growth()
/show_growth()
.
Also, the name (get_ids
) is too generic, and doesn't describe what the function does.
(I'm afraid my criticism is not very constructive at this point. I'll have to think about this more.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, so this function also observes (or tries to observe) the object churn! Interesting.
Given how object IDs can be reused by Python, are the numbers accurate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for the function name: get_new_ids()
. It could also return just the NEW_IDs, for simplicity's sake.
I'm thinking the three parameters for tracking state are a bit unwieldy (e.g. you cannot do OLD_IDS = CURRENT_IDS; CURRENT_IDS = defaultdict(set)
), so how about one _state={}
parameter that you initialize like this:
def get_new_ids(..., _state={}):
...
_state['old'] = old = _state.get('current', defaultdict(set))
_state['current'] = current = defaultdict(set)
_state['new'] = new = defaultdict(set)
...
return new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given how object IDs can be reused by Python, are the numbers accurate?
I'm convinced now that given the purpose of this function that doesn't matter. Objects that were freed and reallocated at the same memory address (while maintaining the same type name) aren't contributing to memory leaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change the code to:
_state['old'] = old = _state.get('current', defaultdict(set))
Then the id of _state['old'] will change to the id of _state['current'].
This would also require a new dict be created every time for _state['current'] with new sets under every class_name key.
I want to minimize the creation of new ids to store this info, so I prefer the old_dict[class_name].clear(), old_dict[class_name].update(current_dict[class_name])
current_dict[class_name].clear(), current_dict[class_name].add(new_id)
pattern that way we re-use dictionaries and sets that are already reserved in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a very good point!
objgraph.py
Outdated
|
||
if ``skip_update`` is True, the sets of [OLD_IDS, CURRENT_IDS, NEW_IDS] | ||
will be returned from when the function was last run without examining the | ||
objects currently iin memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: iin -> in.
objgraph.py
Outdated
(width, 'Type', 'Old_ids', 'Current_ids', 'New_ids', 'Count_Deltas')) | ||
print('='*(width+13*4)) | ||
for row in rows: | ||
row_class, old, current, new, delta = row |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do for row_class, old, current, new, delta in rows:
directly.
objgraph.py
Outdated
for k, v in CURRENT_IDS.items(): | ||
OLD_IDS[k].update(v) | ||
for k in CURRENT_IDS.keys(): | ||
CURRENT_IDS[k].clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OLD_IDS.clear()
OLD_IDS.update(CURRENT_IDS)
CURRENT_IDS.clear()
might be simpler than the three loops.
objgraph.py
Outdated
for o in objects: | ||
CURRENT_IDS[type(o).__name__].add(id(o)) | ||
for k in NEW_IDS.keys(): | ||
NEW_IDS[k].clear() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or just NEW_IDS.clear()
.
objgraph.py
Outdated
for k in NEW_IDS.keys(): | ||
NEW_IDS[k].clear() | ||
rows = [] | ||
for class_name in CURRENT_IDS.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for class_name in CURRENT_IDS:
would skip generating an unnecessary intermediate list object on Python 2.
objgraph.py
Outdated
rows = [] | ||
for class_name in CURRENT_IDS.keys(): | ||
new_ids_set = CURRENT_IDS[class_name] - OLD_IDS[class_name] | ||
NEW_IDS[class_name].update(new_ids_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're clearing NEW_IDs before this loop, and class names will not repeat, so you can do NEW_IDS[...] = new_ids_set
and skip the update.
objgraph.py
Outdated
>>> [old, current, new_ids] = get_ids() | ||
new_lists = at_addrs(new_ids['list']) | ||
for new_list in new_lists: | ||
call show_chain on each new_list object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doctests need a >>>
in front of every statement, and a ...
in front of every continuation line of a compound statement such as a for
loop.
Doctests are executed by the test runner, and so cannot contain pseudocode.
Any other updates? |
@mgedmin Any other updates? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry! I'm trying to find the time to look at the code again.
objgraph.py
Outdated
# remove the key from our dicts if we don't have any old or | ||
# curent class_name objects | ||
del old_ids[class_name] | ||
del current_ids[class_name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're modifying a dictionary while iterating over it. This can cause problems (RuntimeError: dictionary changed size during iteration).
objgraph.py
Outdated
rows.sort(key=lambda row: row[index_by_sortby[sortby]], reverse=True) | ||
if limit: | ||
rows = rows[:limit] | ||
width = max(len(row[0]) for row in rows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could raise ValueError: max() arg is an empty sequence if rows is empty (which is probably unlikely, but could happen if e.g. a user accidentally passes limit=0
).
…eption if bad limit is passed in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I've a couple of small suggestions, and I'll wait a couple of days before merging so you could implement them if you wish to do so.
Thank you for your patience!
objgraph.py
Outdated
for class_name in current_ids: | ||
current_ids[class_name].clear() | ||
for o in objects: | ||
class_name = type(o).__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this should support long names, but that can be added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(TBH it doesn't handle old-style classes on Python 2 well either. _short_typename
is there for a reason.)
objgraph.py
Outdated
new_ids[class_name].update(new_ids_set) | ||
num_new = len(new_ids_set) | ||
num_delta = num_current - num_old | ||
row = [class_name, num_old, num_current, num_new, num_delta] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this a tuple.
objgraph.py
Outdated
del current_ids[key] | ||
del new_ids[key] | ||
index_by_sortby = {'old': 1, 'current': 2, 'new': 3, 'deltas': 4} | ||
rows.sort(key=lambda row: row[index_by_sortby[sortby]], reverse=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could simplify this to rows.sort(key=operator.itemgetter(index_by_sortby[sortby]), reverse=True)
.
objgraph.py
Outdated
def at_addrs(address_set): | ||
"""Returns a list of objects for a given set of memory addresses. | ||
|
||
The reverse of [id(obj1), id(obj2), ...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention that the objects are returned in an arbitrary order.
…and at_addrs docstring tweak
@mgedmin I made the updates that you suggested. Can you merge them in and post the new version on pypi? |
Thank you! |
objgraph 3.4.0 is out on PyPI. |
Thanks! |
In conducting a memory leak investigation, I found it helpful to look at my new objects created between calls to show_growth. So I made two new functions to help users out here.
get_new_ids is like show_growth, except that it stores dictionaries which hold sets of object address ids.
So now one can grab a set of all the new 'list' ids.
at_addrs: Once one has the new 'list' ids, one can call at_addrs to go from a set of ids to the objects at those addresses. One can then make graphs of the back references on those new objects.