-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
oneshot / as_dict can cause Process instances to return incorrect results for certain methods #1373
Comments
Mmm... this is interesting. It appears this is important and it also appears I've used bad judgement regarding #1111 and #1110. On a first glance it appears this can be solved simply by using a lock inside
What do you mean by "non-equal Process instances"?
The problem I have with the code you pasted is that it's confusing. Do you think you can come up with a smaller/simpler script which reproduces the problem? Also it's not clear to me how this can be a problem on single threaded applications. Do you have a test case also for that?
Mmmm no I think this is incorrect. The cache is not global. It is implemented as a decorator which is used to decorate those functions which are meant to be cached when entering |
I've cherry-picked the changed
This is just me being a little over-precise; it basically means different instances of the ProcessA = psutil.Process(4)
ProcessB = psutil.Process(9)
ProcessC = psutil.Process(4)
Hmm… tricky. I tried to keep the code short and simple; I'll see if I can simplify it further and/or explain it in some more detail. Let's start with the single-threaded reproduce code: from __future__ import unicode_literals
from __future__ import print_function
import sys
import psutil
def ReproduceSingleThreaded():
print("psutil version: ", psutil.version_info)
print("Python version: ", sys.version_info[0:4])
#To run this experiment, we need two Processes with different parents;
#get them below, and make sure their ppid properties are different.
#Change the following two lines to specify processes running on the current system
ProcessA = psutil.Process(848)
ProcessB = psutil.Process(1832)
ProcessAparent = ProcessA.ppid()
ProcessBparent = ProcessB.ppid()
if ProcessAparent == ProcessBparent: raise Exception("Input processes have the same parent")
#Output some information and
#state our expected behavior - the return value of ProcessA.ppid() should not change
print("Process A - {0:d} {1}, parent PID is {2:d}".format( ProcessA.pid, ProcessA.name(), ProcessAparent ))
print("Process B - {0:d} {1}, parent PID is {2:d}".format( ProcessB.pid, ProcessB.name(), ProcessBparent ))
print("A Process' parent should never change (on Windows, and should be reasonably fixed on other platforms); the PID of Process A's parent should always be {0:d}".format( ProcessAparent ))
print("Running single-threaded experiment . . .")
print()
print("Beginning oneshot cache for Process B")
with ProcessB.oneshot():
#with ProcessA.oneshot(): produces same results
ProcessBparent2 = ProcessB.ppid()
print("Read Process B's parent as {0:d}".format( ProcessBparent2 ))
ProcessAparent2 = ProcessA.ppid()
print("Read Process A's parent as {0:d}".format( ProcessAparent2 ))
print("Ending oneshot cache")
print("Reading Process A's parent again-")
ProcessAparent3 = ProcessA.ppid()
print("Read Process A's parent as {0:d}".format( ProcessAparent3 ))
print("Experiment complete")
print()
if ProcessAparent2 != ProcessAparent:
print(" - Process A's parent changed from {1:d} to {0:d} - ".format( ProcessAparent2, ProcessAparent ))
else:
print("Process A's parent remained the same value of {0:d}".format( ProcessAparent2 )) Be sure to change the assignment of
This script, and the others, use the The multithreaded reproduce functions posted are just an extension on this idea — that, whilst within from __future__ import unicode_literals
from __future__ import print_function
import sys
import threading
import psutil
def ReproduceMultithreaded():
print("psutil version: ", psutil.version_info)
print("Python version: ", sys.version_info[0:4])
#To run this experiment, we need two Processes with different parents;
#get them below, and make sure their ppid properties are different.
#Change the following two lines to specify processes running on the current system
ProcessA = psutil.Process(848)
ProcessB = psutil.Process(1832)
ProcessAparent = ProcessA.ppid()
ProcessBparent = ProcessB.ppid()
if ProcessAparent == ProcessBparent: raise Exception("Input processes have the same parent")
#Set up the second thread, passing it the Process to query and some synchronization primitives
synchronizeA = threading.Event()
synchronizeB = threading.Event()
ConflictThread = threading.Thread(
name= "Conflict Thread",
target= RunConflictThread,
args= ( ProcessB, synchronizeA, synchronizeB ),
)
#Output some information and
#state our expected behavior - the return value of ProcessA.ppid() should not change
print("Process A - {0:d} {1}, parent PID is {2:d}".format( ProcessA.pid, ProcessA.name(), ProcessAparent ))
print("Process B - {0:d} {1}, parent PID is {2:d}".format( ProcessB.pid, ProcessB.name(), ProcessBparent ))
print("A Process' parent should never change (on Windows, and should be reasonably fixed on other platforms); the PID of Process A's parent should always be {0:d}".format( ProcessAparent ))
print("Running multithreaded experiment . . .")
print()
print("Main Thread: Starting Conflict Thread and passing control to")
ConflictThread.start()
synchronizeB.wait()
synchronizeB.clear()
print("Main Thread: Received back control; reading details of Process A, without using a oneshot cache")
ProcessAparent2 = ProcessA.ppid()
print("Main Thread: Process A {1:d} {2} parent PID is {0:d}".format( ProcessAparent2, ProcessA.pid, ProcessA.name() ))
print("Performing cleanup")
print("Main Thread: Signalling Conflict Thread to end")
synchronizeA.set()
ConflictThread.join()
print("Ended Conflict Thread, experiment complete")
print()
if ProcessAparent2 != ProcessAparent:
print(" - Process A's parent changed from {1:d} to {0:d} - ".format( ProcessAparent2, ProcessAparent ))
else:
print("Process A's parent remained the same value of {0:d}".format( ProcessAparent2 ))
def RunConflictThread(ProcessB,synchronizeA,synchronizeB):
print("Conflict Thread: Received control")
print("Conflict Thread: Beginning oneshot cache for reading details of Process B")
with ProcessB.oneshot():
ProcessBparent = ProcessB.ppid()
print("Conflict Thread: Read Process B's parent PID, which is {0:d}".format( ProcessBparent ))
print("Conflict Thread: Simulating a thread interrupt, and handing back control to Main Thread whilst this thread is still within oneshot")
synchronizeB.set()
synchronizeA.wait()
synchronizeA.clear()
print("Conflict Thread: Received back control; ending oneshot cache")
print("Conflict Thread: Cache ended, exiting") As before, be sure to change the assignments of
After some setup, the code that actually runs the experiment to reproduce the problem starts after the “Beginning multithreaded experiment” line. Here we have another thread, “Conflict Thread”, that enters a
Quite possibly! My understanding of Python in-detail is a little limited. The cache may not be strictly global/static, but I think it may be behaving as if it is, or in other words, I believe it is being shared by all |
OK, I can reproduce the problem with a single-threaded app: import psutil
pids = psutil.pids()
a = psutil.Process(pids.pop())
b = psutil.Process(pids.pop())
print("without oneshot()")
print(a.ppid())
print(b.ppid())
with b.oneshot():
print("with oneshot()")
print(a.ppid())
print(b.ppid()) It prints (on Linux):
...so ppid() changes depending on whether |
Nice. Let us know how it goes, and if you'd like me to check anything. I made a quick multithreaded version of your script, if it's useful: from __future__ import print_function
import threading
import psutil
pids = psutil.pids()
aid = pids.pop()
bid = pids.pop()
def RunOtherThread():
b = psutil.Process(bid)
print("{0:d} {1} parent is {2:d}".format(b.pid, b.name(), b.ppid()))
print("Without oneshot . . .")
a = psutil.Process(aid)
OtherThread = threading.Thread(target=RunOtherThread)
print("{0:d} {1} parent is {2:d}".format(a.pid, a.name(), a.ppid()))
OtherThread.start()
OtherThread.join()
print("With oneshot . . .")
a = psutil.Process(aid)
OtherThread = threading.Thread(target=RunOtherThread)
with a.oneshot():
print("{0:d} {1} parent is {2:d}".format(a.pid, a.name(), a.ppid()))
OtherThread.start()
OtherThread.join()
|
OK, after all you were right. The problem is |
OK this was addressed in 2cdf81d. I will fix the thread-safety problem in a different branch/commit. |
Nice. I've given 2cdf81d (cherry-picked into 5.4.8) a few tests myself and it looks like it works. All of my multithreading tests now succeed, too. |
* origin/master: (182 commits) giampaolo#1394 / windows / process exe(): convert errno 0 into ERROR_ACCESS_DENIED; errno 0 occurs when the Python process runs in 'Virtual Secure Mode' pre-release fix win num_handles() test update readme fix giampaolo#1111: use a lock to make Process.oneshot() thread safe pdate HISTORY giampaolo#1373: different approach to oneshot() cache (pass Process instances around - which is faster) use PROCESS_QUERY_LIMITED_INFORMATION also for username() Linux: refactor _parse_stat_file() and return a dict instead of a list (+ maintainability) fix giampaolo#1357: do not expose Process' memory_maps() and io_counters() methods if not supported by the kernel giampaolo#1376 Windows: check if variable is NULL before free()ing it enforce lack of support for Win XP fix giampaolo#1370: improper usage of CloseHandle() may lead to override the original error code resulting in raising a wrong exception update HISTORY (Windows) use PROCESS_QUERY_LIMITED_INFORMATION access rights (giampaolo#1376) update HISTORY revert 5398c48; let's do it in a separate branch giampaolo#1111 make Process.oneshot() thread-safe sort HISTORY give CREDITS to @EccoTheFlintstone for giampaolo#1368 fix ionice set not working on windows x64 due to LENGTH_MISMATCH (giampaolo#1368) make flake8 happy give CREDITS to @amanusk for giampaolo#1369 / giampaolo#1352 and update doc Add CPU frequency support for FreeBSD (giampaolo#1369) giampaolo#1359: add test case for cpu_count(logical=False) against lscpu utility disable false positive mem test on travis + osx fix PEP8 style mistakes give credits to @koenkooi for giampaolo#1360 Fix giampaolo#1354 [Linux] disk_io_counters() fails on Linux kernel 4.18+ (giampaolo#1360) giampaolo#1350: give credits to @amanusk FreeBSD adding temperature sensors (WIP) (giampaolo#1350) pre release sensors_temperatures() / linux: convert defaultdict to dict fix giampaolo#1004: Process.io_counters() may raise ValueError fix giampaolo#1307: [Linux] disk_partitions() does not honour PROCFS_PATH refactor hasattr() checks as global constants giampaolo#1197 / linux / cpu_freq(): parse /proc/cpuinfo in case /sys/devices/system/cpu fs is not available fix giampaolo#1277 / osx / virtual_memory: 'available' and 'used' memory were not calculated properly travis / osx: set py 3.6 travis: disable pypy; se py 3.7 on osx skip test on PYPY + Travis fix travis fix giampaolo#715: do not print exception on import time in case cpu_times() fails. fix different travis failures give CREDITS for giampaolo#1320 to @truthbk [aix] improve compilation on AIX, better support for gcc/g++ + fix cpu metrics (giampaolo#1320) give credits to @alxchk for giampaolo#1346 (sunOS) Fix giampaolo#1346 (giampaolo#1347) giampaolo#1284, giampaolo#1345 - give credits to @amanusk Add parsing for /sys/class/thermal (giampaolo#1345) Fix decoding error in tests catch UnicodeEncodeError on print() use memory tolerance in occasionally failing test Fix random 0xC0000001 errors when querying for Connections (giampaolo#1335) Correct capitalization of PyPI (giampaolo#1337) giampaolo#1341: move open() utilities/wrappers in _common.py Refactored ps() function in test_posix (giampaolo#1341) fix giampaolo#1343: document Process.as_dict() attrs values giampaolo#1332 - update HISTORY make psutil_debug() aware of PSUTIL_DEBUG (giampaolo#1332) also include PYPY (or try to :P) travis: add python 3.7 build add download badge remove failing test assertions remove failing test make test more robust pre release pre release set version to 5.4.7 OSX / SMC / sensors: revert giampaolo#1284 (giampaolo#1325) setup.py: add py 3.7 fix giampaolo#1323: [Linux] sensors_temperatures() may fail with ValueError fix failing linux tests giampaolo#1321 add unit tests giampaolo#1321: refactoring make disk_io_counters more robust (giampaolo#1324) fix typo Fix DeprecationWarning: invalid escape sequence (giampaolo#1318) remove old test update is_storage_device() docstring fix giampaolo#1305 / disk_io_counters() / Linux: assume SECTOR_SIZE is a fixed 512 giampaolo#1313 remove test which no longer makes sense disk_io_counters() - linux: mimic iostat behavior (giampaolo#1313) fix wrong reference link in doc disambiguate TESTFN for parallel testing fix giampaolo#1309: add STATUS_PARKED constant and fix STATUS_IDLE (both on linux) give CREDITS to @sylvainduchesne for giampaolo#1294 retain GIL when querying connections table (giampaolo#1306) Update index.rst (giampaolo#1308) fix giampaolo#1279: catch and skip ENODEV in net_if_stat() appveyor: retire 3.5, add 3.7 revert file renaming of macos files; get them back to 'osx' prefix winmake: add upload-wheels cmd Rename OSX to macOS (giampaolo#1298) apveyor: reset py 3.4 and remove 3.7 (not available yet) try to fix occasional children() failure on Win: https://ci.appveyor.com/project/giampaolo/psutil/build/job/je3qyldbb86ff66h appveyor: remove py 3.4 and add 3.7 giampaolo#1284: give credits to @amanusk + some minor adjustments little refactoring Osx temps (giampaolo#1284) ...
Closely related to Issue #1111, but the problem's a little more extensive, and not just a simple case of
oneshot
being thread-unsafe. I also thought a “more formal” report/investigation into the problem and its effects was due; hopefully this'll be of some use to other developers.See also — <//issues.apache.org/jira/browse/AURORA-1939>, <//github.com/apache/aurora/commit/cdc5b8efd5bb86d38f73cca6d91903078b120333>, and Issue #1110.
The Problem
Whilst using
oneshot
, and by extensionas_dict
, calling certain methods on multiple non-equalProcess
instances will return incorrect results. This will typically occur in multithreaded applications, but the phenomenon can be reproduced in single-threaded code as well. The problem will occur even ifoneshot
/as_dict
is only used in one place in one thread in a multithreaded application, so long as such an application can access differentProcess
instances at the same time that theoneshot
/as_dict
is in effect. For single-threaded code, when insideoneshot
, attempts to access affected methods on multiple differentProcess
instances will return incorrect results, including if those attempts are made further down the call stack of such code, in some place distant from whereoneshot
was called.Affected Methods
An incomplete list of methods that I believe are affected; I haven't delved far into platform-specific psutil code:
ppid
uids
parent
children
username
(POSIX platforms only)cpu_times
cpu_percent
memory_info
memory_percent
If any of these are called on multiple different
Process
instances whilstoneshot
oras_dict
is in effect / executing, regardless of where it is in effect / executing, they will return incorrect results.Reproduce
Call one of the
Reproduce*
functions in the following script to run one of the reproduce experiments; they should finish with a message being displayed regarding a change in process' parent PID, which should not normally happen, and not at all under Windows. TheFindHeterogeneousProcesses
function can be ignored/replaced; it just finds two suitable processes to run the experiments with.I've run these tests on Windows (7) and Linux (Ubuntu), with both Python 2.7 and 3, against the latest release of
psutil
, 5.4.8 at time of writing.Cause
I believe the cause of this problem is the
oneshot
cache being global, rather than per-Process
-instance. When an affected method, such asppid
, is called on a particularProcess
instance, the result is stored in the globaloneshot
cache. As a result, whilst theoneshot
cache is active, any further calls to the affected method, such asppid
, on anyProcess
instance, will return the globally cached value, instead of the value for that particularProcess
instance.I believe the “cache is active” variable is also global, which will switch the cache on for all
Process
instances, rather than for just theProcess
for whichoneshot
/as_dict
was called.Recommendations
If not planning to fix the issue, add two warnings to the documentation for
oneshot
andas_dict
.One warning should be that these methods are not safe for use in multithreaded code. I'd be wary of using the term “not thread safe” in such a warning, as most will then assume that a lock around all calls to
oneshot
/as_dict
will solve the problem — it won't;oneshot
/as_dict
will affect the return values of the affected methods regardless of whether they are called from within anotheroneshot
/as_dict
or not, as demonstrated in the reproduce script.The other warning should be that, whilst in
oneshot
, only oneProcess
instance (or multiple instances that are all equal) should be dealt with, and that applications should be careful that they're not accidentally dealing with another instance somewhere further down the call stack of their code. This would include being wary of firing any event handlers / Observer-pattern code from withinoneshot
, especially (but not exclusively) if it may cause re-entrancy.If planning to fix the problem, assuming my diagnosis is correct (I haven't used Python for very long), it should be a simple case of making the
oneshot
cache per-instance instead of global, along with the “cache is active” variable.The text was updated successfully, but these errors were encountered: