Lazily import pandas to speedup non-pandas use #2

hugovk · 2023-02-14T20:30:11Z

I'm using pytablewriter in some CLIs, and noticed they were sometimes slow so I profiled it.

Here's how to use python -X importtime and tuna to identify bottlenecks: https://medium.com/alan/how-we-improved-our-python-backend-start-up-time-2c33cd4873c8

For example with this test script:

from pytablewriter import MarkdownTableWriter

def main():
    writer = MarkdownTableWriter(
        table_name="example_table",
        headers=["int", "float", "str", "bool", "mix", "time"],
        value_matrix=[
            [0,   0.1,      "hoge", True,   0,      "2017-01-01 03:04:05+0900"],
            [2,   "-2.23",  "foo",  False,  None,   "2017-12-23 45:01:23+0900"],
            [3,   0,        "bar",  "true",  "inf", "2017-03-03 33:44:55+0900"],
            [-10, -9.9,     "",     "FALSE", "nan", "2017-01-01 00:00:00+0900"],
        ],
    )
    return writer.dumps()

Then run:

python -m pip install tuna
python -X importtime -c "import pandastest; pandastest.main()" 2> pandastest-master.log
tuna pandastest-master.log

Shows most of the import time comes from pandas:

It takes almost half a second to import pandas, 78.1% of the total 0.63s time.

This is not surprising, pandas and its dependency NumPy are big libraries.

However with this PR, if we lazily import pandas, that is, only import it when needed, we get a big speedup for all the non-pandas use cases, which covers a lot of pytablewriter's formats:

Now it only takes 0.164s for the whole program, a huge improvement over the 0.63s before, and very noticeable on the command line.

Another quick before and after comparison:

$ time python pandastest.py
python3 pandastest.py  0.95s user 2.02s system 432% cpu 0.687 total
$ time python pandastest.py
python3 pandastest.py  0.10s user 0.03s system 57% cpu 0.223 total

thombashi

Great analysis.
Thank you for your contribution.

thombashi/tabledata#2

Lazily import pandas to speedup non-pandas use

c50adb3

thombashi approved these changes Feb 17, 2023

View reviewed changes

thombashi merged commit b162600 into thombashi:master Feb 17, 2023

hugovk deleted the lazy-pandas branch February 17, 2023 21:41

thombashi added a commit to thombashi/pytablewriter that referenced this pull request Feb 18, 2023

Reduce import time for non-pandas use

56c5b8b

thombashi/tabledata#2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazily import pandas to speedup non-pandas use #2

Lazily import pandas to speedup non-pandas use #2

hugovk commented Feb 14, 2023

thombashi left a comment

Lazily import pandas to speedup non-pandas use #2

Lazily import pandas to speedup non-pandas use #2

Conversation

hugovk commented Feb 14, 2023

thombashi left a comment

Choose a reason for hiding this comment