Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-79: [Python] Add benchmarks #44

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions python/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,6 @@ dist
# coverage
.coverage
coverage.xml

# benchmark working dir
.asv
73 changes: 73 additions & 0 deletions python/asv.conf.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
{
// The version of the config file format. Do not change, unless
// you know what you are doing.
"version": 1,

// The name of the project being benchmarked
"project": "pyarrow",

// The project's homepage
"project_url": "https://arrow.apache.org/",

// The URL or local path of the source code repository for the
// project being benchmarked
"repo": "https://github.com/apache/arrow/",

// List of branches to benchmark. If not provided, defaults to "master"
// (for git) or "tip" (for mercurial).
// "branches": ["master"], // for git
// "branches": ["tip"], // for mercurial

// The DVCS being used. If not set, it will be automatically
// determined from "repo" by looking at the protocol in the URL
// (if remote), or by looking for special directories, such as
// ".git" (if local).
"dvcs": "git",

// The tool to use to create environments. May be "conda",
// "virtualenv" or other value depending on the plugins in use.
// If missing or the empty string, the tool will be automatically
// determined by looking for tools on the PATH environment
// variable.
"environment_type": "virtualenv",

// the base URL to show a commit for the project.
"show_commit_url": "https://github.com/apache/arrow/commit/",

// The Pythons you'd like to test against. If not provided, defaults
// to the current version of Python used to run `asv`.
// "pythons": ["2.7", "3.3"],

// The matrix of dependencies to test. Each key is the name of a
// package (in PyPI) and the values are version numbers. An empty
// list indicates to just test against the default (latest)
// version.
// "matrix": {
// "numpy": ["1.6", "1.7"]
// },

// The directory (relative to the current directory) that benchmarks are
// stored in. If not provided, defaults to "benchmarks"
"benchmark_dir": "benchmarks",

// The directory (relative to the current directory) to cache the Python
// environments in. If not provided, defaults to "env"
"env_dir": ".asv/env",


// The directory (relative to the current directory) that raw benchmark
// results are stored in. If not provided, defaults to "results".
"results_dir": ".asv/results",

// The directory (relative to the current directory) that the html tree
// should be written to. If not provided, defaults to "html".
"html_dir": "build/benchmarks/html",

// The number of characters to retain in the commit hashes.
// "hash_length": 8,

// `asv` will cache wheels of the recent builds in each
// environment, making them faster to install next time. This is
// number of builds to keep, per environment.
// "wheel_cache_size": 0
}
17 changes: 17 additions & 0 deletions python/benchmarks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

38 changes: 38 additions & 0 deletions python/benchmarks/array.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

import pyarrow

class Conversions(object):
params = (1, 10 ** 5, 10 ** 6, 10 ** 7)

def time_from_pylist(self, n):
pyarrow.from_pylist(list(range(n)))

def peakmem_from_pylist(self, n):
pyarrow.from_pylist(list(range(n)))

class ScalarAccess(object):
params = (1, 10 ** 5, 10 ** 6, 10 ** 7)

def setUp(self, n):
self._array = pyarrow.from_pylist(list(range(n)))

def time_as_py(self, n):
for i in range(n):
self._array[i].as_py()

11 changes: 11 additions & 0 deletions python/doc/Benchmarks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
## Benchmark Requirements

The benchmarks are run using [asv][1] which is also their only requirement.

## Running the benchmarks

To run the benchmarks, call `asv run --python=same`. You cannot use the
plain `asv run` command at the moment as asv cannot handle python packages
in subdirectories of a repository.

[1]: https://asv.readthedocs.org/