-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks in production applications #102
Comments
This is great! @markshannon will write something about the kinds of numbers we'd like to see. |
I think there are a few things that would be useful.
|
@markshannon For the perf breakdown you means something like % of every opcode? |
Here's a progress report from @pablogsal:
|
No. Just the usual perf output: %time in each C function, etc. |
Hi, I spotted this project and I have a thought - it was said that there is a need for a "production" like application. This is quite a popular framework for doing performance tests - an alternative for JMeter / Gatling / k6. It has master/worker concept and uses requests/urllib3 | gevents | ZMQ. So I guess "normal" usage of the basic libs. Apart from that there would be some "basic" code usage of lists etc. People might add their own code on top of it - like here is a bunch of plugins - https://github.com/SvenskaSpel/locust-plugins I guess any optimization in the usage of libraries underneath like requests/SSL will bring gains that would be visible as the possibility to generate more Virtual Users per worker/CPU. Normally when too many Virtual Users are generated on a single CPU - it will get exhausted and the context switch will affect response time giving unrealistic values. This is the point when more workers have to be added to the pool. This could work hand in hand with some very basic Flask application that would just serve as a testing point and return " hello world" - although you could embed some "real code" underneath to have a more realistic scenario of application usage. I can imagine it could save/get data using Mongo or call simple SQL with some other DB. The logs would interact with disk giving you and I/O stats. Depending on the scenario you could observe - CPU usage on Locust side - with same load and optimization usage should drop. At the same time optimization on the server side will give probably a slightly higher RPS and/or faster response time to the user - it is also possible that CPU usage will change. I'm not sure how memory allocation would change - I guess driver will be the number of concurrent sessions. I'm not sure if this is something that you are looking for. |
It looks like Locust is mainly intended for app developers to be able to create flexible testing scenarios for their web apps (I presume by creating synthetic loads). That's not exactly what we're looking for here -- we're looking for apps that are already developed for which we can measure their performance under a load already defined. So if you have a particular app that you don't mind sharing and you have developed a Locust-based synthetic load for it, we would love to be able to compare how your app performs under Python 3.10 and 3.11. But if you have some other way of stress-testing your app we would be just as happy -- as long as your app can run under 3.10 and 3.11 (or, hopefully, what's in main, but I don't want to press my luck :-). |
We have a FastAPI microservice (py3.10) in production for which we've developed tests in locust to repeat the requests we get in production. |
I would be super interested if you could do those benchmarks! In particular comparing 3.10 and 3.11 in a real world app would be huge. I am prepared to be disappointed, because usually real-world performance is not just about CPU usage of the language the main app is written in... But nevertheless I would love to hear from you. |
All times are in milliseconds. Python 3.10.7: Python 3.11.0rc2 Memory usage was almost the same, but CPU usage looked slightly lower. But I can't be more specific without a graph. Nonetheless, this is an impressive result considering that this service is very database intensive (Around 4-10 redis gets, 1-6 redis sets, 1-5 solr queries, & 1 TF serving call). But it also has a lot of list & dict comprehensions as well as multiple dicts with 700K & 1M elements, which I think is responsible for the performance improvement. I'll see if I can share the source code by obfuscating some parts. It might be better if I replace the database calls completely with |
Fun fact: I had rewritten this service in rust (with actix-web & async for all database calls) & I got 17-18ms median latency with it. I think this makes the 3.11 improvement more impressive if we consider the python overhead compared to rust. |
Thanks for the quick results! It sounds like 3-5% faster, which for a real-world app is not easy. |
Might be interesting to gather 3.11 stats, if you're comfortable compiling Python from source. This basically dumps tons of internal counters that allow us to see how well the interpreter is handling your program (without giving us access to the source of the program itself). Basically, configure |
Sure, I'll try it tomorrow. |
Hmm, I will try to ask my company's (LINE/NAVER) machine-learning team(I can sure that they have enough workload for the APAC area, especially for Korea/Japan) if they can share bytecode execution information. |
I built python by modifying this Dockerfile which is used in the official py3.11.0 docker builds. I just added Here is the output after 15k requests: Execution countsexecution counts for all instructions
Pair countsPair counts for top 100 pairs
Predecessor/Successor PairsTop 3 predecessors and successors of each opcodeASYNC_GEN_WRAPSuccessors and predecessors for ASYNC_GEN_WRAP
BEFORE_ASYNC_WITHSuccessors and predecessors for BEFORE_ASYNC_WITH
BEFORE_WITHSuccessors and predecessors for BEFORE_WITH
BINARY_OPSuccessors and predecessors for BINARY_OP
BINARY_OP_ADAPTIVESuccessors and predecessors for BINARY_OP_ADAPTIVE
BINARY_OP_ADD_FLOATSuccessors and predecessors for BINARY_OP_ADD_FLOAT
BINARY_OP_ADD_INTSuccessors and predecessors for BINARY_OP_ADD_INT
BINARY_OP_ADD_UNICODESuccessors and predecessors for BINARY_OP_ADD_UNICODE
BINARY_OP_INPLACE_ADD_UNICODESuccessors and predecessors for BINARY_OP_INPLACE_ADD_UNICODE
BINARY_OP_MULTIPLY_FLOATSuccessors and predecessors for BINARY_OP_MULTIPLY_FLOAT
BINARY_OP_MULTIPLY_INTSuccessors and predecessors for BINARY_OP_MULTIPLY_INT
BINARY_OP_SUBTRACT_FLOATSuccessors and predecessors for BINARY_OP_SUBTRACT_FLOAT
BINARY_OP_SUBTRACT_INTSuccessors and predecessors for BINARY_OP_SUBTRACT_INT
BINARY_SUBSCRSuccessors and predecessors for BINARY_SUBSCR
BINARY_SUBSCR_ADAPTIVESuccessors and predecessors for BINARY_SUBSCR_ADAPTIVE
BINARY_SUBSCR_DICTSuccessors and predecessors for BINARY_SUBSCR_DICT
BINARY_SUBSCR_GETITEMSuccessors and predecessors for BINARY_SUBSCR_GETITEM
BINARY_SUBSCR_LIST_INTSuccessors and predecessors for BINARY_SUBSCR_LIST_INT
BINARY_SUBSCR_TUPLE_INTSuccessors and predecessors for BINARY_SUBSCR_TUPLE_INT
BUILD_CONST_KEY_MAPSuccessors and predecessors for BUILD_CONST_KEY_MAP
BUILD_LISTSuccessors and predecessors for BUILD_LIST
BUILD_MAPSuccessors and predecessors for BUILD_MAP
BUILD_SETSuccessors and predecessors for BUILD_SET
BUILD_SLICESuccessors and predecessors for BUILD_SLICE
BUILD_STRINGSuccessors and predecessors for BUILD_STRING
BUILD_TUPLESuccessors and predecessors for BUILD_TUPLE
CALLSuccessors and predecessors for CALL
CALL_ADAPTIVESuccessors and predecessors for CALL_ADAPTIVE
CALL_FUNCTION_EXSuccessors and predecessors for CALL_FUNCTION_EX
CALL_PY_EXACT_ARGSSuccessors and predecessors for CALL_PY_EXACT_ARGS
CALL_PY_WITH_DEFAULTSSuccessors and predecessors for CALL_PY_WITH_DEFAULTS
CHECK_EXC_MATCHSuccessors and predecessors for CHECK_EXC_MATCH
COMPARE_OPSuccessors and predecessors for COMPARE_OP
COMPARE_OP_ADAPTIVESuccessors and predecessors for COMPARE_OP_ADAPTIVE
COMPARE_OP_FLOAT_JUMPSuccessors and predecessors for COMPARE_OP_FLOAT_JUMP
COMPARE_OP_INT_JUMPSuccessors and predecessors for COMPARE_OP_INT_JUMP
COMPARE_OP_STR_JUMPSuccessors and predecessors for COMPARE_OP_STR_JUMP
CONTAINS_OPSuccessors and predecessors for CONTAINS_OP
COPYSuccessors and predecessors for COPY
COPY_FREE_VARSSuccessors and predecessors for COPY_FREE_VARS
DELETE_ATTRSuccessors and predecessors for DELETE_ATTR
DELETE_FASTSuccessors and predecessors for DELETE_FAST
DELETE_NAMESuccessors and predecessors for DELETE_NAME
DELETE_SUBSCRSuccessors and predecessors for DELETE_SUBSCR
DICT_MERGESuccessors and predecessors for DICT_MERGE
DICT_UPDATESuccessors and predecessors for DICT_UPDATE
END_ASYNC_FORSuccessors and predecessors for END_ASYNC_FOR
EXTENDED_ARGSuccessors and predecessors for EXTENDED_ARG
EXTENDED_ARG_QUICKSuccessors and predecessors for EXTENDED_ARG_QUICK
FORMAT_VALUESuccessors and predecessors for FORMAT_VALUE
FOR_ITERSuccessors and predecessors for FOR_ITER
GET_AITERSuccessors and predecessors for GET_AITER
GET_ANEXTSuccessors and predecessors for GET_ANEXT
GET_AWAITABLESuccessors and predecessors for GET_AWAITABLE
GET_ITERSuccessors and predecessors for GET_ITER
GET_YIELD_FROM_ITERSuccessors and predecessors for GET_YIELD_FROM_ITER
IMPORT_FROMSuccessors and predecessors for IMPORT_FROM
IMPORT_NAMESuccessors and predecessors for IMPORT_NAME
IMPORT_STARSuccessors and predecessors for IMPORT_STAR
IS_OPSuccessors and predecessors for IS_OP
JUMP_BACKWARDSuccessors and predecessors for JUMP_BACKWARD
JUMP_BACKWARD_NO_INTERRUPTSuccessors and predecessors for JUMP_BACKWARD_NO_INTERRUPT
JUMP_BACKWARD_QUICKSuccessors and predecessors for JUMP_BACKWARD_QUICK
JUMP_FORWARDSuccessors and predecessors for JUMP_FORWARD
JUMP_IF_FALSE_OR_POPSuccessors and predecessors for JUMP_IF_FALSE_OR_POP
JUMP_IF_TRUE_OR_POPSuccessors and predecessors for JUMP_IF_TRUE_OR_POP
KW_NAMESSuccessors and predecessors for KW_NAMES
LIST_APPENDSuccessors and predecessors for LIST_APPEND
LIST_EXTENDSuccessors and predecessors for LIST_EXTEND
LIST_TO_TUPLESuccessors and predecessors for LIST_TO_TUPLE
LOAD_ATTRSuccessors and predecessors for LOAD_ATTR
LOAD_ATTR_ADAPTIVESuccessors and predecessors for LOAD_ATTR_ADAPTIVE
LOAD_ATTR_INSTANCE_VALUESuccessors and predecessors for LOAD_ATTR_INSTANCE_VALUE
LOAD_ATTR_MODULESuccessors and predecessors for LOAD_ATTR_MODULE
LOAD_ATTR_SLOTSuccessors and predecessors for LOAD_ATTR_SLOT
LOAD_ATTR_WITH_HINTSuccessors and predecessors for LOAD_ATTR_WITH_HINT
LOAD_BUILD_CLASSSuccessors and predecessors for LOAD_BUILD_CLASS
LOAD_CLASSDEREFSuccessors and predecessors for LOAD_CLASSDEREF
LOAD_CLOSURESuccessors and predecessors for LOAD_CLOSURE
LOAD_CONSTSuccessors and predecessors for LOAD_CONST
LOAD_CONST__LOAD_FASTSuccessors and predecessors for LOAD_CONST__LOAD_FAST
LOAD_DEREFSuccessors and predecessors for LOAD_DEREF
LOAD_FASTSuccessors and predecessors for LOAD_FAST
LOAD_FAST__LOAD_CONSTSuccessors and predecessors for LOAD_FAST__LOAD_CONST
LOAD_FAST__LOAD_FASTSuccessors and predecessors for LOAD_FAST__LOAD_FAST
LOAD_GLOBALSuccessors and predecessors for LOAD_GLOBAL
LOAD_GLOBAL_ADAPTIVESuccessors and predecessors for LOAD_GLOBAL_ADAPTIVE
LOAD_GLOBAL_BUILTINSuccessors and predecessors for LOAD_GLOBAL_BUILTIN
LOAD_GLOBAL_MODULESuccessors and predecessors for LOAD_GLOBAL_MODULE
LOAD_METHODSuccessors and predecessors for LOAD_METHOD
LOAD_METHOD_ADAPTIVESuccessors and predecessors for LOAD_METHOD_ADAPTIVE
LOAD_METHOD_CLASSSuccessors and predecessors for LOAD_METHOD_CLASS
LOAD_METHOD_MODULESuccessors and predecessors for LOAD_METHOD_MODULE
LOAD_METHOD_NO_DICTSuccessors and predecessors for LOAD_METHOD_NO_DICT
LOAD_METHOD_WITH_DICTSuccessors and predecessors for LOAD_METHOD_WITH_DICT
LOAD_METHOD_WITH_VALUESSuccessors and predecessors for LOAD_METHOD_WITH_VALUES
LOAD_NAMESuccessors and predecessors for LOAD_NAME
MAKE_CELLSuccessors and predecessors for MAKE_CELL
MAKE_FUNCTIONSuccessors and predecessors for MAKE_FUNCTION
MAP_ADDSuccessors and predecessors for MAP_ADD
NOPSuccessors and predecessors for NOP
POP_EXCEPTSuccessors and predecessors for POP_EXCEPT
POP_JUMP_BACKWARD_IF_FALSESuccessors and predecessors for POP_JUMP_BACKWARD_IF_FALSE
POP_JUMP_BACKWARD_IF_NOT_NONESuccessors and predecessors for POP_JUMP_BACKWARD_IF_NOT_NONE
POP_JUMP_BACKWARD_IF_TRUESuccessors and predecessors for POP_JUMP_BACKWARD_IF_TRUE
POP_JUMP_FORWARD_IF_FALSESuccessors and predecessors for POP_JUMP_FORWARD_IF_FALSE
POP_JUMP_FORWARD_IF_NONESuccessors and predecessors for POP_JUMP_FORWARD_IF_NONE
POP_JUMP_FORWARD_IF_NOT_NONESuccessors and predecessors for POP_JUMP_FORWARD_IF_NOT_NONE
POP_JUMP_FORWARD_IF_TRUESuccessors and predecessors for POP_JUMP_FORWARD_IF_TRUE
POP_TOPSuccessors and predecessors for POP_TOP
PRECALLSuccessors and predecessors for PRECALL
PRECALL_ADAPTIVESuccessors and predecessors for PRECALL_ADAPTIVE
PRECALL_BOUND_METHODSuccessors and predecessors for PRECALL_BOUND_METHOD
PRECALL_BUILTIN_CLASSSuccessors and predecessors for PRECALL_BUILTIN_CLASS
PRECALL_BUILTIN_FAST_WITH_KEYWORDSSuccessors and predecessors for PRECALL_BUILTIN_FAST_WITH_KEYWORDS
PRECALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDSSuccessors and predecessors for PRECALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS
PRECALL_NO_KW_BUILTIN_FASTSuccessors and predecessors for PRECALL_NO_KW_BUILTIN_FAST
PRECALL_NO_KW_BUILTIN_OSuccessors and predecessors for PRECALL_NO_KW_BUILTIN_O
PRECALL_NO_KW_ISINSTANCESuccessors and predecessors for PRECALL_NO_KW_ISINSTANCE
PRECALL_NO_KW_LENSuccessors and predecessors for PRECALL_NO_KW_LEN
PRECALL_NO_KW_LIST_APPENDSuccessors and predecessors for PRECALL_NO_KW_LIST_APPEND
PRECALL_NO_KW_METHOD_DESCRIPTOR_FASTSuccessors and predecessors for PRECALL_NO_KW_METHOD_DESCRIPTOR_FAST
PRECALL_NO_KW_METHOD_DESCRIPTOR_NOARGSSuccessors and predecessors for PRECALL_NO_KW_METHOD_DESCRIPTOR_NOARGS
PRECALL_NO_KW_METHOD_DESCRIPTOR_OSuccessors and predecessors for PRECALL_NO_KW_METHOD_DESCRIPTOR_O
PRECALL_NO_KW_STR_1Successors and predecessors for PRECALL_NO_KW_STR_1
PRECALL_NO_KW_TUPLE_1Successors and predecessors for PRECALL_NO_KW_TUPLE_1
PRECALL_NO_KW_TYPE_1Successors and predecessors for PRECALL_NO_KW_TYPE_1
PRECALL_PYFUNCSuccessors and predecessors for PRECALL_PYFUNC
PUSH_EXC_INFOSuccessors and predecessors for PUSH_EXC_INFO
PUSH_NULLSuccessors and predecessors for PUSH_NULL
RAISE_VARARGSSuccessors and predecessors for RAISE_VARARGS
RERAISESuccessors and predecessors for RERAISE
RESUMESuccessors and predecessors for RESUME
RESUME_QUICKSuccessors and predecessors for RESUME_QUICK
RETURN_GENERATORSuccessors and predecessors for RETURN_GENERATOR
RETURN_VALUESuccessors and predecessors for RETURN_VALUE
SENDSuccessors and predecessors for SEND
SETUP_ANNOTATIONSSuccessors and predecessors for SETUP_ANNOTATIONS
SET_ADDSuccessors and predecessors for SET_ADD
SET_UPDATESuccessors and predecessors for SET_UPDATE
STORE_ATTRSuccessors and predecessors for STORE_ATTR
STORE_ATTR_ADAPTIVESuccessors and predecessors for STORE_ATTR_ADAPTIVE
STORE_ATTR_INSTANCE_VALUESuccessors and predecessors for STORE_ATTR_INSTANCE_VALUE
STORE_ATTR_SLOTSuccessors and predecessors for STORE_ATTR_SLOT
STORE_ATTR_WITH_HINTSuccessors and predecessors for STORE_ATTR_WITH_HINT
STORE_DEREFSuccessors and predecessors for STORE_DEREF
STORE_FASTSuccessors and predecessors for STORE_FAST
STORE_FAST__LOAD_FASTSuccessors and predecessors for STORE_FAST__LOAD_FAST
STORE_FAST__STORE_FASTSuccessors and predecessors for STORE_FAST__STORE_FAST
STORE_GLOBALSuccessors and predecessors for STORE_GLOBAL
STORE_NAMESuccessors and predecessors for STORE_NAME
STORE_SUBSCRSuccessors and predecessors for STORE_SUBSCR
STORE_SUBSCR_ADAPTIVESuccessors and predecessors for STORE_SUBSCR_ADAPTIVE
STORE_SUBSCR_DICTSuccessors and predecessors for STORE_SUBSCR_DICT
STORE_SUBSCR_LIST_INTSuccessors and predecessors for STORE_SUBSCR_LIST_INT
SWAPSuccessors and predecessors for SWAP
UNARY_INVERTSuccessors and predecessors for UNARY_INVERT
UNARY_NEGATIVESuccessors and predecessors for UNARY_NEGATIVE
UNARY_NOTSuccessors and predecessors for UNARY_NOT
UNARY_POSITIVESuccessors and predecessors for UNARY_POSITIVE
UNPACK_SEQUENCESuccessors and predecessors for UNPACK_SEQUENCE
UNPACK_SEQUENCE_ADAPTIVESuccessors and predecessors for UNPACK_SEQUENCE_ADAPTIVE
UNPACK_SEQUENCE_LISTSuccessors and predecessors for UNPACK_SEQUENCE_LIST
UNPACK_SEQUENCE_TUPLESuccessors and predecessors for UNPACK_SEQUENCE_TUPLE
UNPACK_SEQUENCE_TWO_TUPLESuccessors and predecessors for UNPACK_SEQUENCE_TWO_TUPLE
WITH_EXCEPT_STARTSuccessors and predecessors for WITH_EXCEPT_START
YIELD_VALUESuccessors and predecessors for YIELD_VALUE
Specialization statsspecialization stats by familyBINARY_SUBSCRspecialization stats for BINARY_SUBSCR family
Specialization attempts
STORE_SUBSCRspecialization stats for STORE_SUBSCR family
Specialization attempts
UNPACK_SEQUENCEspecialization stats for UNPACK_SEQUENCE family
Specialization attempts
FOR_ITERspecialization stats for FOR_ITER family
Specialization attempts
STORE_ATTRspecialization stats for STORE_ATTR family
Specialization attempts
LOAD_ATTRspecialization stats for LOAD_ATTR family
Specialization attempts
COMPARE_OPspecialization stats for COMPARE_OP family
Specialization attempts
LOAD_GLOBALspecialization stats for LOAD_GLOBAL family
Specialization attempts
BINARY_OPspecialization stats for BINARY_OP family
Specialization attempts
LOAD_METHODspecialization stats for LOAD_METHOD family
Specialization attempts
PRECALLspecialization stats for PRECALL family
Specialization attempts
CALLspecialization stats for CALL family
Specialization attempts
Specialization effectivenessspecialization effectiveness
Call statsInlined calls and frame stats
Object statsallocations, frees and dict materializatons
Stats gathered on: 2022-10-31 |
We should gather these results somewhere. Presumably in the |
After some talk with @gvanrossum we discuss the possibility of gathering benchmarks from big production-grade applications. I talked with my employer (Bloomberg) and we have a couple of very large, performance-critical Python applications we could use as a performance target (as another source of data). I discussed the possibility of getting some engineer hours from the teams in charge of these applications for helping to build some kind of automated system that allows running some benchmark based on production data with different versions of CPython (so we can compare different commits in the
main
branch or for different proposals).I open this issue to gather first what kind of "requirements" or data are we interested in so we can talk internally on how to prepare and build this. For instance, are we just looking at "time it takes per request" or "time per opcode" or something like that.
As a note, the code of the application itself cannot be made public for obvious reasons but (after checking with the legal department) we can possibly discuss the general nature of the applications if there are questions.
The text was updated successfully, but these errors were encountered: