Skip to content

Commit

Permalink
SPARK-1426: Make MLlib work with NumPy versions older than 1.7
Browse files Browse the repository at this point in the history
Currently it requires NumPy 1.7 due to using the copyto method (http://docs.scipy.org/doc/numpy/reference/generated/numpy.copyto.html) for extracting data out of an array.
Replace it with a fallback

Author: Sandeep <sandeep@techaddict.me>

Closes #391 from techaddict/1426 and squashes the following commits:

d365962 [Sandeep] SPARK-1426: Make MLlib work with NumPy versions older than 1.7 Currently it requires NumPy 1.7 due to using the copyto method (http://docs.scipy.org/doc/numpy/reference/generated/numpy.copyto.html) for extracting data out of an array. Replace it with a fallback
  • Loading branch information
techaddict authored and mateiz committed Apr 15, 2014
1 parent c99bcb7 commit df36091
Show file tree
Hide file tree
Showing 4 changed files with 16 additions and 16 deletions.
9 changes: 4 additions & 5 deletions docs/mllib-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ namely, binary classification, regression, clustering and collaborative
filtering, as well as an underlying gradient descent optimization primitive.

# Available Methods
The following links provide a detailed explanation of the methods and usage examples for each of them:
The following links provide a detailed explanation of the methods and usage examples for each of them:

* <a href="mllib-classification-regression.html">Classification and Regression</a>
* Binary Classification
Expand All @@ -33,10 +33,9 @@ The following links provide a detailed explanation of the methods and usage exam

# Dependencies
MLlib uses the [jblas](https://github.com/mikiobraun/jblas) linear algebra library, which itself
depends on native Fortran routines. You may need to install the
depends on native Fortran routines. You may need to install the
[gfortran runtime library](https://github.com/mikiobraun/jblas/wiki/Missing-Libraries)
if it is not already present on your nodes. MLlib will throw a linking error if it cannot
if it is not already present on your nodes. MLlib will throw a linking error if it cannot
detect these libraries automatically.

To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.7 or newer.

To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer.
6 changes: 3 additions & 3 deletions docs/python-programming-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ $ MASTER=local[4] ./bin/pyspark

## IPython

It is also possible to launch PySpark in [IPython](http://ipython.org), the
enhanced Python interpreter. PySpark works with IPython 1.0.0 and later. To
It is also possible to launch PySpark in [IPython](http://ipython.org), the
enhanced Python interpreter. PySpark works with IPython 1.0.0 and later. To
use IPython, set the `IPYTHON` variable to `1` when running `bin/pyspark`:

{% highlight bash %}
Expand Down Expand Up @@ -153,7 +153,7 @@ Many of the methods also contain [doctests](http://docs.python.org/2/library/doc
# Libraries

[MLlib](mllib-guide.html) is also available in PySpark. To use it, you'll need
[NumPy](http://www.numpy.org) version 1.7 or newer. The [MLlib guide](mllib-guide.html) contains
[NumPy](http://www.numpy.org) version 1.4 or newer. The [MLlib guide](mllib-guide.html) contains
some example applications.

# Where to Go from Here
Expand Down
6 changes: 3 additions & 3 deletions python/pyspark/mllib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
Python bindings for MLlib.
"""

# MLlib currently needs and NumPy 1.7+, so complain if lower
# MLlib currently needs and NumPy 1.4+, so complain if lower

import numpy
if numpy.version.version < '1.7':
raise Exception("MLlib requires NumPy 1.7+")
if numpy.version.version < '1.4':
raise Exception("MLlib requires NumPy 1.4+")
11 changes: 6 additions & 5 deletions python/pyspark/mllib/_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
# limitations under the License.
#

from numpy import ndarray, copyto, float64, int64, int32, ones, array_equal, array, dot, shape, complex, issubdtype
from numpy import ndarray, float64, int64, int32, ones, array_equal, array, dot, shape, complex, issubdtype
from pyspark import SparkContext, RDD
import numpy as np

Expand Down Expand Up @@ -72,8 +72,8 @@ def _serialize_double_vector(v):
header = ndarray(shape=[2], buffer=ba, dtype="int64")
header[0] = 1
header[1] = length
copyto(ndarray(shape=[length], buffer=ba, offset=16,
dtype="float64"), v)
arr_mid = ndarray(shape=[length], buffer=ba, offset=16, dtype="float64")
arr_mid[...] = v
return ba

def _deserialize_double_vector(ba):
Expand Down Expand Up @@ -112,8 +112,9 @@ def _serialize_double_matrix(m):
header[0] = 2
header[1] = rows
header[2] = cols
copyto(ndarray(shape=[rows, cols], buffer=ba, offset=24,
dtype="float64", order='C'), m)
arr_mid = ndarray(shape=[rows, cols], buffer=ba, offset=24,
dtype="float64", order='C')
arr_mid[...] = m
return ba
else:
raise TypeError("_serialize_double_matrix called on a "
Expand Down

0 comments on commit df36091

Please sign in to comment.