Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing benchmarking #1851

Merged
merged 7 commits into from
Jan 24, 2018
Merged

Conversation

fujiisoup
Copy link
Member

Just added some benchmarks for basic, outer, and vectorized indexing and assignments.

@@ -34,6 +34,9 @@ nosetests.xml
.cache
.ropeproject/

# asv environments
.asv
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any problem if .asv is added to .gitignore?
A bunch of files in this directory makes my gui-git-client freeze.

@@ -29,3 +29,12 @@ def randn(shape, frac_nan=None, chunks=None):
x.flat[inds] = np.nan

return x


def randint(low, high=None, size=None, frac_minus=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a seed for all these random numbers? That should decrease the variance for these test results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 9.

jhamman was careful enough :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A global seed for all benchmarks isn't a great idea -- it means that results will vary depending upon whether we run the full benchmark suite or not, and depending upon the order in which benchmark tests are run. It would be better to set a seed for each test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it means that results will vary depending upon whether we run the full benchmark suite or not, and depending upon the order in which benchmark tests are run.

Thanks for the details.
Understood.
Done.



def time_indexing_basic():
for ind in basic_indexes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a loop, another option would be to separate these cases into separate benchmarks. That would give more granular information.



try:
ds_dask = ds.chunk({'x': 100, 'y': 50, 't': 50})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to use a subclass, like I did in #1847


def time_indexing_basic_dask():
for ind in basic_indexes:
ds_dask.isel(**ind)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a good idea to call .load() on these to load them into numpy. Otherwise it's only measure the time to construct the dask graph, not to evaluate it. Usually evaluation time dominates.

@fujiisoup
Copy link
Member Author

Thanks. All done.

@fujiisoup
Copy link
Member Author

It might be another issue, but I think we can have a clear link to our benchmark result page.
(maybe in our doc, README or in our PR template?)

@jhamman
Copy link
Member

jhamman commented Jan 24, 2018

@fujiisoup - I don't see any problem including a link in the docs (probably on the testing section) and/or on the readme. Its maybe too soon to include in the PR template since we just don't have very good coverage yet.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fujiisoup - glad to see the ASV setup getting used. Everything here looks good to me.

@fujiisoup fujiisoup merged commit 04974b9 into pydata:master Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants