Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds get_new_ids and at_addrs functions #36

Merged
merged 10 commits into from
Feb 13, 2018
Merged

Conversation

spacether
Copy link
Contributor

@spacether spacether commented Jan 8, 2018

In conducting a memory leak investigation, I found it helpful to look at my new objects created between calls to show_growth. So I made two new functions to help users out here.

get_new_ids is like show_growth, except that it stores dictionaries which hold sets of object address ids.
So now one can grab a set of all the new 'list' ids.

at_addrs: Once one has the new 'list' ids, one can call at_addrs to go from a set of ids to the objects at those addresses. One can then make graphs of the back references on those new objects.

Copy link
Owner

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please revert the changes to generated images (and the one .dot file) in docs/? These clutter the diff and make it difficult to review the PR.

I also did not notice any tests for the new functions? Currently objgraph has 100% test coverage and I do not want to lose that.

Copy link
Owner

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I finally understand what this is for, I quite like the idea. Thanks for sharing it!

The PR will need some massaging to be mergeable. I'll try to help.

objgraph.py Outdated
__version__ = '3.3.1.dev0'
__date__ = '2017-12-28'
__version__ = '3.3.2.dev0'
__date__ = '2018-01-08'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New function are new features, so it'll be 3.4.0.dev0, according to SemVer.

@@ -19,6 +19,7 @@ Statistics

.. autofunction:: show_growth([limit=10, peak_stats={}, shortnames=True, file=sys.stdout])

.. autofunction:: get_ids([skip_update=False, limit=10, sortby='deltas'])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my, I don't remember how my own documentation works! I totally forgot to include growth() and perhaps a few other new functions in here!

objgraph.py Outdated
weakref 3807 3864 +57 +57
dict 6892 6947 +73 +55
frame 34 70 +53 +36
...
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you have some tests! Sorry I missed them on the first glance.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API-wise I'm not very happy about functions that both print and return values.

Despite having read the description, I don't quite get how this one differs from growth()/show_growth().

Also, the name (get_ids) is too generic, and doesn't describe what the function does.

(I'm afraid my criticism is not very constructive at this point. I'll have to think about this more.)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so this function also observes (or tries to observe) the object churn! Interesting.

Given how object IDs can be reused by Python, are the numbers accurate?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for the function name: get_new_ids(). It could also return just the NEW_IDs, for simplicity's sake.

I'm thinking the three parameters for tracking state are a bit unwieldy (e.g. you cannot do OLD_IDS = CURRENT_IDS; CURRENT_IDS = defaultdict(set)), so how about one _state={} parameter that you initialize like this:

def get_new_ids(..., _state={}):
    ...
    _state['old'] = old = _state.get('current', defaultdict(set))
    _state['current'] = current = defaultdict(set)
    _state['new'] = new = defaultdict(set)
    ...
    return new

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how object IDs can be reused by Python, are the numbers accurate?

I'm convinced now that given the purpose of this function that doesn't matter. Objects that were freed and reallocated at the same memory address (while maintaining the same type name) aren't contributing to memory leaks.

Copy link
Contributor Author

@spacether spacether Jan 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the code to:
_state['old'] = old = _state.get('current', defaultdict(set))
Then the id of _state['old'] will change to the id of _state['current'].
This would also require a new dict be created every time for _state['current'] with new sets under every class_name key.

I want to minimize the creation of new ids to store this info, so I prefer the old_dict[class_name].clear(), old_dict[class_name].update(current_dict[class_name])
current_dict[class_name].clear(), current_dict[class_name].add(new_id)
pattern that way we re-use dictionaries and sets that are already reserved in memory.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a very good point!

objgraph.py Outdated

if ``skip_update`` is True, the sets of [OLD_IDS, CURRENT_IDS, NEW_IDS]
will be returned from when the function was last run without examining the
objects currently iin memory.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: iin -> in.

objgraph.py Outdated
(width, 'Type', 'Old_ids', 'Current_ids', 'New_ids', 'Count_Deltas'))
print('='*(width+13*4))
for row in rows:
row_class, old, current, new, delta = row
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do for row_class, old, current, new, delta in rows: directly.

objgraph.py Outdated
for k, v in CURRENT_IDS.items():
OLD_IDS[k].update(v)
for k in CURRENT_IDS.keys():
CURRENT_IDS[k].clear()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OLD_IDS.clear()
OLD_IDS.update(CURRENT_IDS)
CURRENT_IDS.clear()

might be simpler than the three loops.

objgraph.py Outdated
for o in objects:
CURRENT_IDS[type(o).__name__].add(id(o))
for k in NEW_IDS.keys():
NEW_IDS[k].clear()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just NEW_IDS.clear().

objgraph.py Outdated
for k in NEW_IDS.keys():
NEW_IDS[k].clear()
rows = []
for class_name in CURRENT_IDS.keys():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for class_name in CURRENT_IDS: would skip generating an unnecessary intermediate list object on Python 2.

objgraph.py Outdated
rows = []
for class_name in CURRENT_IDS.keys():
new_ids_set = CURRENT_IDS[class_name] - OLD_IDS[class_name]
NEW_IDS[class_name].update(new_ids_set)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're clearing NEW_IDs before this loop, and class names will not repeat, so you can do NEW_IDS[...] = new_ids_set and skip the update.

objgraph.py Outdated
>>> [old, current, new_ids] = get_ids()
new_lists = at_addrs(new_ids['list'])
for new_list in new_lists:
call show_chain on each new_list object
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doctests need a >>> in front of every statement, and a ... in front of every continuation line of a compound statement such as a for loop.

Doctests are executed by the test runner, and so cannot contain pseudocode.

@spacether spacether changed the title Adds get_ids and at_addrs functions Adds get_new_ids and at_addrs functions Jan 23, 2018
@spacether
Copy link
Contributor Author

@mgedmin

  • I reverted my changes so there are now no new generated images in the docs folder
  • Version number is fixed
  • Function name is changed from get_ids to get_new_ids
  • Object ids are now stored in a _state argument in get_new_ids
  • Working tests added to the docstrings of get_new_ids and at_addrs

Any other updates?

@spacether
Copy link
Contributor Author

@mgedmin Any other updates?

Copy link
Owner

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! I'm trying to find the time to look at the code again.

objgraph.py Outdated
# remove the key from our dicts if we don't have any old or
# curent class_name objects
del old_ids[class_name]
del current_ids[class_name]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're modifying a dictionary while iterating over it. This can cause problems (RuntimeError: dictionary changed size during iteration).

objgraph.py Outdated
rows.sort(key=lambda row: row[index_by_sortby[sortby]], reverse=True)
if limit:
rows = rows[:limit]
width = max(len(row[0]) for row in rows)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could raise ValueError: max() arg is an empty sequence if rows is empty (which is probably unlikely, but could happen if e.g. a user accidentally passes limit=0).

@spacether
Copy link
Contributor Author

@mgedmin

  • Key deletion has been moved outside of the iteration loop
  • limit = None or >= 0 may now be passed in to get_new_ids

Copy link
Owner

@mgedmin mgedmin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I've a couple of small suggestions, and I'll wait a couple of days before merging so you could implement them if you wish to do so.

Thank you for your patience!

objgraph.py Outdated
for class_name in current_ids:
current_ids[class_name].clear()
for o in objects:
class_name = type(o).__name__
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this should support long names, but that can be added later.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(TBH it doesn't handle old-style classes on Python 2 well either. _short_typename is there for a reason.)

objgraph.py Outdated
new_ids[class_name].update(new_ids_set)
num_new = len(new_ids_set)
num_delta = num_current - num_old
row = [class_name, num_old, num_current, num_new, num_delta]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a tuple.

objgraph.py Outdated
del current_ids[key]
del new_ids[key]
index_by_sortby = {'old': 1, 'current': 2, 'new': 3, 'deltas': 4}
rows.sort(key=lambda row: row[index_by_sortby[sortby]], reverse=True)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify this to rows.sort(key=operator.itemgetter(index_by_sortby[sortby]), reverse=True).

objgraph.py Outdated
def at_addrs(address_set):
"""Returns a list of objects for a given set of memory addresses.

The reverse of [id(obj1), id(obj2), ...]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention that the objects are returned in an arbitrary order.

@spacether
Copy link
Contributor Author

@mgedmin I made the updates that you suggested. Can you merge them in and post the new version on pypi?

@mgedmin mgedmin merged commit c8a8015 into mgedmin:master Feb 13, 2018
@mgedmin
Copy link
Owner

mgedmin commented Feb 13, 2018

Thank you!

@mgedmin
Copy link
Owner

mgedmin commented Feb 13, 2018

objgraph 3.4.0 is out on PyPI.

@spacether
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants