Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily import pandas to speedup non-pandas use #2

Merged
merged 1 commit into from
Feb 17, 2023

Conversation

hugovk
Copy link
Contributor

@hugovk hugovk commented Feb 14, 2023

I'm using pytablewriter in some CLIs, and noticed they were sometimes slow so I profiled it.

Here's how to use python -X importtime and tuna to identify bottlenecks: https://medium.com/alan/how-we-improved-our-python-backend-start-up-time-2c33cd4873c8

For example with this test script:

from pytablewriter import MarkdownTableWriter

def main():
    writer = MarkdownTableWriter(
        table_name="example_table",
        headers=["int", "float", "str", "bool", "mix", "time"],
        value_matrix=[
            [0,   0.1,      "hoge", True,   0,      "2017-01-01 03:04:05+0900"],
            [2,   "-2.23",  "foo",  False,  None,   "2017-12-23 45:01:23+0900"],
            [3,   0,        "bar",  "true",  "inf", "2017-03-03 33:44:55+0900"],
            [-10, -9.9,     "",     "FALSE", "nan", "2017-01-01 00:00:00+0900"],
        ],
    )
    return writer.dumps()

Then run:

python -m pip install tuna
python -X importtime -c "import pandastest; pandastest.main()" 2> pandastest-master.log
tuna pandastest-master.log

Shows most of the import time comes from pandas:

image

image

It takes almost half a second to import pandas, 78.1% of the total 0.63s time.

This is not surprising, pandas and its dependency NumPy are big libraries.

However with this PR, if we lazily import pandas, that is, only import it when needed, we get a big speedup for all the non-pandas use cases, which covers a lot of pytablewriter's formats:

image

Now it only takes 0.164s for the whole program, a huge improvement over the 0.63s before, and very noticeable on the command line.

Another quick before and after comparison:

$ time python pandastest.py
python3 pandastest.py  0.95s user 2.02s system 432% cpu 0.687 total
$ time python pandastest.py
python3 pandastest.py  0.10s user 0.03s system 57% cpu 0.223 total

Copy link
Owner

@thombashi thombashi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great analysis.
Thank you for your contribution.

@thombashi thombashi merged commit b162600 into thombashi:master Feb 17, 2023
@hugovk hugovk deleted the lazy-pandas branch February 17, 2023 21:41
thombashi added a commit to thombashi/pytablewriter that referenced this pull request Feb 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants