Skip to content

Commit

Permalink
extend parallelization example
Browse files Browse the repository at this point in the history
  • Loading branch information
svenkreiss committed May 22, 2017
1 parent a856e42 commit 1b2287c
Showing 1 changed file with 17 additions and 6 deletions.
23 changes: 17 additions & 6 deletions docs/sphinx/parallel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,22 @@ Pysparkling supports parallelizations on the local machine and across clusters
of computers.


Threads and Processes
Processes and Threads
---------------------

Single machine parallelization either with
Single machine parallelization with
``concurrent.futures.ThreadPoolExecutor``,
``concurrent.futures.ProcessPoolExecutor`` and
``multiprocessing.Pool`` is supported.
``concurrent.futures.ProcessPoolExecutor`` or
``multiprocessing.Pool`` is supported. Use ``cloudpickle`` instead of ``pickle`` for
serialization to support lambda functions (and more) for data transformations.


.. code-block:: python
import cloudpickle
import concurrent
import pysparkling
sc = pysparkling.Context(
pool=concurrent.futures.ProcessPoolExecutor(4),
serializer=cloudpickle.dumps,
Expand All @@ -27,8 +32,14 @@ Single machine parallelization either with
Experimental
------------

The following are experimental notes. Most of them don't even contain examples how to make
use of these techniques with pysparkling.

ipcluster and IPython.parallel
------------------------------
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Local test setup:

Expand Down Expand Up @@ -77,7 +88,7 @@ https://ipython.org/ipython-doc/dev/parallel/parallel_process.html#using-ipclust


StarCluster
-----------
~~~~~~~~~~~

Setting up StarCluster was an experiment. However it does not integrate well
with the rest of our EC2 infrastructure, so we switched to a Chef based setup
Expand Down

0 comments on commit 1b2287c

Please sign in to comment.