Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to rewrite cython extensions in pure C #97

Closed
asvetlov opened this issue Jun 20, 2017 · 5 comments
Closed

Need to rewrite cython extensions in pure C #97

asvetlov opened this issue Jun 20, 2017 · 5 comments

Comments

@asvetlov
Copy link
Member

The reason is multidict.add(key, val) is ten times slower than dict[key] = val.

This is because multidict stores data internally as a list of cythonized _Item objects.
But creation of python (ever cythonized) object is too expensive for our use case.

The solution is using C structs for internal data but Cython has no support for visiting values stored in these structures: tp_visit and tp_clear slots.

Thus for sake of speed we need pure C implementation.

@mind1m
Copy link

mind1m commented Jun 30, 2017

Update.
First we are going to do the MVP version:

  • rewrite multidict._Pair as a C struct in cython
  • multidict._items becomes C array instead of list
  • add corresponding methods to manipulate array size (start with 32 elements)
  • use PyObject* for Pair elements
  • use incref/decref for Pair elements
  • allow adding only simple types as elements (see _PyObject_GC_MAY_BE_TRACKED)

@samuelcolvin
Copy link
Member

Given that python had now agreed to guarantee insertion order for standards dicts in python 3.6+ can't multidict just use a states standard dictionary not a list as it's core datastructures and improve performance?

@asvetlov
Copy link
Member Author

dict keeps insertion order but doesn't allow multiple keys.
HTTP standard requires the feature, it was the main reason for multidict package creation.
Very many python web frameworks have multidicts in some form: Django, Flask, Pyramid etc etc.

Sure, the library can be reimplemented by borrowing CPython compact dict ideas but it is another level of complication (need a C level coding anyway).

Assuming that usual HTTP headers count is limited (10-30-50 at least) the sequential scan is as fast as hash table lookup, everything should fit into CPU cache.

The current problem is Python list usage as internal storage, switching to C array can help to drop the bottleneck I hope.

@samuelcolvin
Copy link
Member

Makes sense, I was wondering if you could use a dict to avoid sequential scan but I see that it might not make much difference.

@asvetlov
Copy link
Member Author

Duplicate of #249

@asvetlov asvetlov marked this as a duplicate of #249 Nov 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants