Adds get_new_ids and at_addrs functions #36

spacether · 2018-01-08T23:00:18Z

In conducting a memory leak investigation, I found it helpful to look at my new objects created between calls to show_growth. So I made two new functions to help users out here.

get_new_ids is like show_growth, except that it stores dictionaries which hold sets of object address ids.
So now one can grab a set of all the new 'list' ids.

at_addrs: Once one has the new 'list' ids, one can call at_addrs to go from a set of ids to the objects at those addresses. One can then make graphs of the back references on those new objects.

…moved

mgedmin

Could you please revert the changes to generated images (and the one .dot file) in docs/? These clutter the diff and make it difficult to review the PR.

I also did not notice any tests for the new functions? Currently objgraph has 100% test coverage and I do not want to lose that.

mgedmin

Now that I finally understand what this is for, I quite like the idea. Thanks for sharing it!

The PR will need some massaging to be mergeable. I'll try to help.

mgedmin · 2018-01-09T10:24:14Z

objgraph.py

-__version__ = '3.3.1.dev0'
-__date__ = '2017-12-28'
+__version__ = '3.3.2.dev0'
+__date__ = '2018-01-08'


New function are new features, so it'll be 3.4.0.dev0, according to SemVer.

mgedmin · 2018-01-09T10:25:52Z

docs/objgraph.txt

@@ -19,6 +19,7 @@ Statistics

 .. autofunction:: show_growth([limit=10, peak_stats={}, shortnames=True, file=sys.stdout])

+.. autofunction:: get_ids([skip_update=False, limit=10, sortby='deltas'])


Oh my, I don't remember how my own documentation works! I totally forgot to include growth() and perhaps a few other new functions in here!

mgedmin · 2018-01-09T10:27:43Z

objgraph.py

+        weakref                    3807         3864          +57          +57
+        dict                       6892         6947          +73          +55
+        frame                        34           70          +53          +36
+        ...


Ah, you have some tests! Sorry I missed them on the first glance.

API-wise I'm not very happy about functions that both print and return values.

Despite having read the description, I don't quite get how this one differs from growth()/show_growth().

Also, the name (get_ids) is too generic, and doesn't describe what the function does.

(I'm afraid my criticism is not very constructive at this point. I'll have to think about this more.)

Ah, so this function also observes (or tries to observe) the object churn! Interesting.

Given how object IDs can be reused by Python, are the numbers accurate?

Suggestion for the function name: get_new_ids(). It could also return just the NEW_IDs, for simplicity's sake.

I'm thinking the three parameters for tracking state are a bit unwieldy (e.g. you cannot do OLD_IDS = CURRENT_IDS; CURRENT_IDS = defaultdict(set)), so how about one _state={} parameter that you initialize like this:

def get_new_ids(..., _state={}): ... _state['old'] = old = _state.get('current', defaultdict(set)) _state['current'] = current = defaultdict(set) _state['new'] = new = defaultdict(set) ... return new

Given how object IDs can be reused by Python, are the numbers accurate?

I'm convinced now that given the purpose of this function that doesn't matter. Objects that were freed and reallocated at the same memory address (while maintaining the same type name) aren't contributing to memory leaks.

If we change the code to:
_state['old'] = old = _state.get('current', defaultdict(set))
Then the id of _state['old'] will change to the id of _state['current'].
This would also require a new dict be created every time for _state['current'] with new sets under every class_name key.

I want to minimize the creation of new ids to store this info, so I prefer the old_dict[class_name].clear(), old_dict[class_name].update(current_dict[class_name])
current_dict[class_name].clear(), current_dict[class_name].add(new_id)
pattern that way we re-use dictionaries and sets that are already reserved in memory.

That is a very good point!

mgedmin · 2018-01-09T10:32:11Z

objgraph.py

+
+    if ``skip_update`` is True, the sets of [OLD_IDS, CURRENT_IDS, NEW_IDS]
+    will be returned from when the function was last run without examining the
+    objects currently iin memory.


Typo: iin -> in.

mgedmin · 2018-01-09T10:34:30Z

objgraph.py

+          (width, 'Type', 'Old_ids', 'Current_ids', 'New_ids', 'Count_Deltas'))
+    print('='*(width+13*4))
+    for row in rows:
+        row_class, old, current, new, delta = row


You can do for row_class, old, current, new, delta in rows: directly.

mgedmin · 2018-01-09T10:37:13Z

objgraph.py

+    for k, v in CURRENT_IDS.items():
+        OLD_IDS[k].update(v)
+    for k in CURRENT_IDS.keys():
+        CURRENT_IDS[k].clear()


OLD_IDS.clear() OLD_IDS.update(CURRENT_IDS) CURRENT_IDS.clear()

might be simpler than the three loops.

mgedmin · 2018-01-09T10:39:00Z

objgraph.py

+    for o in objects:
+        CURRENT_IDS[type(o).__name__].add(id(o))
+    for k in NEW_IDS.keys():
+        NEW_IDS[k].clear()


Or just NEW_IDS.clear().

mgedmin · 2018-01-09T10:40:07Z

objgraph.py

+    for k in NEW_IDS.keys():
+        NEW_IDS[k].clear()
+    rows = []
+    for class_name in CURRENT_IDS.keys():


for class_name in CURRENT_IDS: would skip generating an unnecessary intermediate list object on Python 2.

mgedmin · 2018-01-09T10:40:51Z

objgraph.py

+    rows = []
+    for class_name in CURRENT_IDS.keys():
+        new_ids_set = CURRENT_IDS[class_name] - OLD_IDS[class_name]
+        NEW_IDS[class_name].update(new_ids_set)


You're clearing NEW_IDs before this loop, and class names will not repeat, so you can do NEW_IDS[...] = new_ids_set and skip the update.

mgedmin · 2018-01-09T10:43:20Z

objgraph.py

+        >>> [old, current, new_ids] = get_ids()
+        new_lists = at_addrs(new_ids['list'])
+        for new_list in new_lists:
+            call show_chain on each new_list object


Doctests need a >>> in front of every statement, and a ... in front of every continuation line of a compound statement such as a for loop.

Doctests are executed by the test runner, and so cannot contain pseudocode.

… returns only NEW_IDS

spacether · 2018-01-23T21:33:28Z

@mgedmin

I reverted my changes so there are now no new generated images in the docs folder
Version number is fixed
Function name is changed from get_ids to get_new_ids
Object ids are now stored in a _state argument in get_new_ids
Working tests added to the docstrings of get_new_ids and at_addrs

Any other updates?

spacether · 2018-01-29T19:30:39Z

@mgedmin Any other updates?

mgedmin

Sorry! I'm trying to find the time to look at the code again.

mgedmin · 2018-01-30T09:37:51Z

objgraph.py

+            # remove the key from our dicts if we don't have any old or
+            # curent class_name objects
+            del old_ids[class_name]
+            del current_ids[class_name]


You're modifying a dictionary while iterating over it. This can cause problems (RuntimeError: dictionary changed size during iteration).

mgedmin · 2018-01-30T09:38:56Z

objgraph.py

+    rows.sort(key=lambda row: row[index_by_sortby[sortby]], reverse=True)
+    if limit:
+        rows = rows[:limit]
+    width = max(len(row[0]) for row in rows)


This could raise ValueError: max() arg is an empty sequence if rows is empty (which is probably unlikely, but could happen if e.g. a user accidentally passes limit=0).

…eption if bad limit is passed in

spacether · 2018-01-30T22:36:27Z

@mgedmin

Key deletion has been moved outside of the iteration loop
limit = None or >= 0 may now be passed in to get_new_ids

mgedmin

Looks good to me. I've a couple of small suggestions, and I'll wait a couple of days before merging so you could implement them if you wish to do so.

Thank you for your patience!

mgedmin · 2018-02-05T11:16:08Z

objgraph.py

+    for class_name in current_ids:
+        current_ids[class_name].clear()
+    for o in objects:
+        class_name = type(o).__name__


I still think this should support long names, but that can be added later.

(TBH it doesn't handle old-style classes on Python 2 well either. _short_typename is there for a reason.)

mgedmin · 2018-02-05T11:19:36Z

objgraph.py

+        new_ids[class_name].update(new_ids_set)
+        num_new = len(new_ids_set)
+        num_delta = num_current - num_old
+        row = [class_name, num_old, num_current, num_new, num_delta]


I would make this a tuple.

mgedmin · 2018-02-05T11:20:31Z

objgraph.py

+        del current_ids[key]
+        del new_ids[key]
+    index_by_sortby = {'old': 1, 'current': 2, 'new': 3, 'deltas': 4}
+    rows.sort(key=lambda row: row[index_by_sortby[sortby]], reverse=True)


You could simplify this to rows.sort(key=operator.itemgetter(index_by_sortby[sortby]), reverse=True).

mgedmin · 2018-02-05T11:21:48Z

objgraph.py

+def at_addrs(address_set):
+    """Returns a list of objects for a given set of memory addresses.
+
+    The reverse of [id(obj1), id(obj2), ...]


Please mention that the objects are returned in an arbitrary order.

…and at_addrs docstring tweak

spacether · 2018-02-13T06:18:07Z

@mgedmin I made the updates that you suggested. Can you merge them in and post the new version on pypi?

mgedmin · 2018-02-13T12:33:59Z

Thank you!

mgedmin · 2018-02-13T15:29:54Z

objgraph 3.4.0 is out on PyPI.

spacether · 2018-02-14T02:09:28Z

Thanks!

Justin Black added 3 commits January 8, 2018 14:54

Adds get_ids and at_addrs functions

a131262

Adds fixes to get the tests to pass, collections added, *unpacking re…

c2b815d

…moved

Adds linting fix

2670b64

mgedmin requested changes Jan 9, 2018

View reviewed changes

mgedmin reviewed Jan 9, 2018

View reviewed changes

mgedmin added the enhancement label Jan 9, 2018

spacether force-pushed the master branch from 663bef1 to 2670b64 Compare January 23, 2018 17:57

Justin Black added 3 commits January 23, 2018 10:22

Updates version number, changes func name to get_new_ids, get_new_ids…

5ca2db8

… returns only NEW_IDS

Adds tests in the docstrongs for at_addrs and get_new_ids

215dd79

Adds linting fix to get_new_ids

10e4948

spacether changed the title ~~Adds get_ids and at_addrs functions~~ Adds get_new_ids and at_addrs functions Jan 23, 2018

mgedmin reviewed Jan 30, 2018

View reviewed changes

spacether added 2 commits January 30, 2018 14:17

Removes keys from dictionaries outside of loop, raises ValueError exc…

b9fa9bf

…eption if bad limit is passed in

Adds code to allow limit=None and limit>=0

09acc37

mgedmin approved these changes Feb 5, 2018

View reviewed changes

spacether added 2 commits February 12, 2018 21:59

Adds suggested updates: tuple, shortnames argument, itemgetter sort, …

7e12f99

…and at_addrs docstring tweak

Fixes class_name defintion in get_new_ids

e5a0226

mgedmin merged commit c8a8015 into mgedmin:master Feb 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds get_new_ids and at_addrs functions #36

Adds get_new_ids and at_addrs functions #36

spacether commented Jan 8, 2018 •

edited

Loading

mgedmin left a comment

mgedmin left a comment

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

spacether Jan 23, 2018 •

edited

Loading

mgedmin Jan 24, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

mgedmin Jan 9, 2018

spacether commented Jan 23, 2018

spacether commented Jan 29, 2018

mgedmin left a comment

mgedmin Jan 30, 2018

mgedmin Jan 30, 2018

spacether commented Jan 30, 2018

mgedmin left a comment

mgedmin Feb 5, 2018

mgedmin Feb 5, 2018

mgedmin Feb 5, 2018

mgedmin Feb 5, 2018

mgedmin Feb 5, 2018

spacether commented Feb 13, 2018

mgedmin commented Feb 13, 2018

mgedmin commented Feb 13, 2018

spacether commented Feb 14, 2018

		@@ -19,6 +19,7 @@ Statistics

		.. autofunction:: show_growth([limit=10, peak_stats={}, shortnames=True, file=sys.stdout])

		.. autofunction:: get_ids([skip_update=False, limit=10, sortby='deltas'])

Adds get_new_ids and at_addrs functions #36

Adds get_new_ids and at_addrs functions #36

Conversation

spacether commented Jan 8, 2018 • edited Loading

mgedmin left a comment

Choose a reason for hiding this comment

mgedmin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacether Jan 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacether commented Jan 23, 2018

spacether commented Jan 29, 2018

mgedmin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacether commented Jan 30, 2018

mgedmin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spacether commented Feb 13, 2018

mgedmin commented Feb 13, 2018

mgedmin commented Feb 13, 2018

spacether commented Feb 14, 2018

spacether commented Jan 8, 2018 •

edited

Loading

spacether Jan 23, 2018 •

edited

Loading