-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default cython flags to have smaller compiled files #2276
Comments
I noticed there is a version difference, however, even with |
Hello @peterroelants and thanks for reporting. |
I uninstalled the conda pydantic and installed it again with Could it be that cython is not skipped? |
Can you try clone the repo and build it with the On linux the compiled files are big because a lot of optimizations are done to improve perf but at the cost of quite some space. |
Note that the files in the conda build are the same, it also contains the What is the sequence of commands I need to run to build the repo? |
You might be right that it's related to the cython flags, see this comment: cython/cython#2102 (comment) |
I finally got my hands on a ubuntu and tested with default (env) ~/pydantic-test/pydantic$ du -sh build
13M build
(env) ~/pydantic-test/pydantic$ du -sh build-original
74M build-original You can try it out yourself by just cloning it, change in Now I don't know if we should make the change directly in pydantic as |
Thanks for verifying. I see that However, I haven't been able to reproduce your build results. |
This is currently in "feedback wanted" state. What kind of feedback is wanted? The big binaries are problematic enough in that they warranted a special warning here: https://awslabs.github.io/aws-lambda-powertools-python/utilities/parser/ Even if the optimized binaries were 2x faster, they wouldn't be worth the size increase in modern workloads (lambdas, containers etc) - and people needing to extract the last bit of performance should be able to compile their own version of pydantic |
This has already been discussed here: #1325. The .so files were an issue in AWS Lambda already back then but now the situation is even worse as the files have grown a lot. (I actually already hit the limits in one of my AWS Lambda deployments and after investigating why the package was so big I found this discussion.) In order FastAPI to become the next recommended framework for serverless APIs this major dependency should be more lightweight (i.e. smaller). Installing packages straight from Git repository doesn't sound right. |
First of all, I'm surprised and disappointed by how impassioned people get without taking the time to think about a work around for a problem. Indignance needs to be matched by investigation. Remember pydantic isn't developed or sponsored by some big organisation, I don't get paid to develop it (well, what I get from github sponsors represents a lot less than the minimum wage). Myself and other worked hard to compile pydantic because some people cared a lot about performance. If when installing pydantic you care more about space than performance, spend 3 minutes reading the docs on pip and you'll notice it's trivial to install pydantic without downloading binaries:
I believe pydantic is around 2x slower when not compiled. In terms of reducing the size of the binaries and fixing this issue, please can someone who's concerned about this compile pydantic with different flags, run the benchmarks and provide a breakdown here of performance vs. size for different options. |
Even better, you can custom compile it again during pip install with preferred flags to optimize size vs speed. For example:
Will result in a around 7MB pydantic install. Note that this will need build tools (e.g. I only discovered this option myself last week, and was in the process of writing a post on it since it seems to be generally unknown pip install option. |
Don't get at least me wrong. I love Pydantic and I see there a lot of more potential than just as a main dependency to FastAPI. Actually I see huge momentum and potential when it comes to defining any kind of contracts between components and services building larger microservice architectures. With Pydantic you can easily share the DTOs between services and clients and by that avoid writing a lot of boilerplate and validation code yourself. Of course FastAPI is great as well as it provides asyncio but even bigger reason why I'm about to switch to FastAPI is that it comes with Pydantic. I can imagine that FastAPI benefits from speed optimized Pydantic. But when we find projects from big players, like AWS, who use Pydantic because of its typed approach to data structures and validation, we realize that this really is not only some sub component of FastAPI but very interesting library as its own. So don't be surprised if you suddenly get a sponsor. This discussion also shows that there are projects that value smaller disk footprint so the feedback is really valid. Thanks for the tip for --no-binary. However, this approach isn't perfect either as bigger projects have a lot of other dependencies as well and they may not want to prevent pip from installing binaries for every component. Pip resolves version dependencies best when provided with the list of all components at a time so it's not optimal to install Pydantic first with special way and then the rest of the packages in a usual way. |
@jvuori If I understand it correctly, you can still install multiple packages using the |
@peterroelants Nice! And that can be added also to requirements.txt as its own line. This is very much needed with toolchains like AWS SAM which installs the packages purely based on requirements.txt and you may not be able to affect on any extra parameters it passes to pip. |
@peterroelants this is a very good tip, hopefully people will find it when googling about this problem |
Very happy to accept a PR to add some notes about this to the install page of the docs. |
@samuelcolvin I've run some tests and these are the results:
These are run on the validation benchmark task. What is strange to me is that "
|
Additionally I also updated the documentation and suggested a small change to |
@peterroelants how did you manage to have a 6.4M installation package with cython? I did exactly the same and got 40M. The only way I can get ~5M is on windows. Anything in Linux pushes me up to 40M. What am I doing wrong here? |
@alexanderluiscampino That's weird, I hope I didn't make a mistake. I ran my tests on a |
Hi @peterroelants here are my size results, I tested every possible form in this thread, always get 40M there results below are for the command:
suggested by you. But whichever way I go about it, always get 40M. I'm running on Docker as well,
|
Weird, I remember checking the Let me try to reproduce my results, and maybe clean up the testing scripts for sharing. It will probably take me a few days to find the time. |
@alexanderluiscampino I tried reproducing my results and I have a Dockerfile for you to try at https://gist.github.com/peterroelants/e344ac416948296f7fcdc84a20ce6eb5 For me this results in a working Python environment with Pydantic env of 6.4M:
|
Thank you for your dedication @peterroelants ! I managed to replicate your results using your dockerfile. I'm assuming the difference between your method and the mine, is that you do a clean installation of pip and cython from the beginning, whereas I already started with a pip installation. Somehow, when I ran the pydantic installation script (w/ cython) it wasn't using it, and defaulted to the normal way of installing it. |
@alexanderluiscampino I can't replicate this, the docker build doesn't complete successfully and fails on processing the install of pydantic:
Seems to blow up on compiling pydantic.typing extension every time for me, did you change anything in the gist posted by @peterroelants to get it to work? I've tried manually setting different versions of pip, setuptools and cython but same issue each time, I cannot figure out why this would work for others and not me (I am on Macbook with M1 chip, but its docker so 🤷 ) |
@dotorg-richard you mention you are running on a Macbook with M1 chip, this means that the base image pulled will be of a different architecture. The Can you try the following?: Make sure you have the latest Docker version and fulfil the system requirements as documented in "Docker Desktop for Apple silicon". And then build the exact same image again using docker buildx:
I'm curious if this would solve your problem. If not, could you share the results of running the following on your M1 machine:
|
@peterroelants I was feeling confident realizing I was still running the older Docker Preview and not the latest release but after installing the latest release and following the directions to install I still get the exact same issue as above, failing in the exact same place. The output you requested is:
|
That's weird, I can build the same exact same Docker container I posted without any issues. And apparently so can @alexanderluiscampino? Note from the output |
@peterroelants I can confirm, able to build it with the dockerfile provided, running on windows. |
@dotorg-richard I'm curious if a different base image would resolve your issue. Could you try building the following Alpine-based-image: https://gist.github.com/peterroelants/e344ac416948296f7fcdc84a20ce6eb5#file-python-alpine-dockerfile with:
|
@peterroelants thank you so much I'm happy to report that worked! Hopefully others with the M1 chip can use this alternate base image. Package size report is 6.2MB! |
That's great to hear! Interesting that the Debian based image resulted in issues on the M1, while the Alpine based images work. |
To clarify a bit, the excessive size is almost entirely due to the debug info, which is usually included by default (see the output of e.g. Perhaps given the large size overhead, pydantic could disable debug info by default? The debug info is useful for gdb debugging, crashes, benchmarking etc., but most library users probably won't have a use for it. If disabling debuginfo entirely is not desirable, a middle ground can be to reduce the debug level from
The size comparison is:
|
@bluetech How did you test this (since pydantic overwrites cflags in setup.py afaik)? Btw, depending on the system I'm running on I get different results when building Pydantic:
|
@peterroelants I cloned pydantic, edited the line you linked to, ran
That is to be expected, but the differences do not seem too big, the major thing is the debug info. |
+1 for adding -g0 to CFLAGS |
Bumping this issue - any plans on this? |
This will be solved in V2, we're stopping using cython completely. Instead the core validation logic is written in rust, meaning:
|
Although looking at this again, I don't think I had realised that this could be solved by setting a debug flag, sorry. Maybe we can fix this in v1.10.3 as I know that will be used for a while. |
For anyone else looking for a post-install solution (i.e., you are fine with downloading the large binaries and you just want to save space), you can simply run |
Just to bump the discussion - we're also waiting (albeit passively) for some news on this issue. Thank you! |
happy to accept a fix for this for v1.10.3, otherwise I'll try to get to it. In terms of ETA for V1.10.3, I had hoped to get it done over the last few weeks but COVID followed by bad cold combined with sick toddler has pretty much wiped me out. I'll devote what time I have to it over the Christmas "break", but giving a firm deadline seems foolish at this point. Sorry for the delay. |
Sorry to hear that, get well soon! No worries about the timeline, we'll be happy to just get a notification whenever this issue is referenced in the eventual patch release. Thank you and take care. |
Previously, pydantic used the default Python CFLAGS which include `-g` (debug level 2). This is good for debugging at the C level, but it significantly increases the size of the C extension shared library, and is probably not needed by the vast majority of pydantic users. Thus, it seems a better tradeoff to turn debug info off. This can be overridden when building pydantic from source (not from PyPI wheel) by using `CFLAGS='-O3 -g'`. This change reduces the pydantic binary on cp310-linux-x86_64 from 31MB (12MB wheel) to 8.9MB (3MB wheel). Fixes pydantic#2276
Pip install results in a much larger lib artefact compared to conda install (both in a conda environment):
pip install pydantic
: 80Mconda install pydantic -c conda-forge
: 6.6MI don't know if this is expected or not. I encountered this while trying to minimize the size of a Docker container I'm building, and was surprised that pydantic took up 80M when installed with pip.
I wasn't sure to file this as bug or not, but given the extreme difference in size I thought there might be something going wrong with the pip install.
I've added a full list of the files in
site-packages
:Pip install file sizes
Conda install file sizes:
Bug
Output of
python -c "import pydantic.utils; print(pydantic.utils.version_info())"
from conda install:Output of
python -c "import pydantic.utils; print(pydantic.utils.version_info())"
from pip install:conda info:
The text was updated successfully, but these errors were encountered: