Exception when running any Spark Notebook after long period of inactivity #108

stvoutsin · 2020-05-28T17:58:17Z

Symptoms:

Occasionally, if we've run some Spark jobs via Zeppelin notebooks, if we attempt to run the same notebooks a few days/weeks later, an exception will show up and any Spark job will fail until the Zeppelin instance is restarted.

Logs

Fail to execute line 7: count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ Traceback (most recent call last): File "/tmp/zeppelin_pyspark-237895005518745498.py", line 375, in <module> File "<stdin>", line 7, in <module> File "/home/fedora/spark/python/lib/pyspark.zip/pyspark/context.py", line 513, in parallelize return self.parallelize([], numSlices).mapPartitionsWithIndex(f) File "/home/fedora/spark/python/lib/pyspark.zip/pyspark/context.py", line 527, in parallelize jrdd = self._serialize_to_jvm(c, serializer, reader_func, createRDDServer) File "/home/fedora/spark/python/lib/pyspark.zip/pyspark/context.py", line 556, in _serialize_to_jvm tempFile = NamedTemporaryFile(delete=False, dir=self._temp_dir) File "/usr/lib64/python2.7/tempfile.py", line 475, in NamedTemporaryFile (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags) File "/usr/lib64/python2.7/tempfile.py", line 244, in _mkstemp_inner fd = _os.open(file, flags, 0600) OSError: [Errno 2] No such file or directory: '/tmp/spark-temp/spark-f8efede5-5ffe-4b5d-a916-3197bd1f1232/pyspark-4a1de947-7798-4aae-9d81-91baa6ee2a1f/tmpoeV_tI'

How to Reproduce

Run a Spark job on the current Zeppelin/Hadoop prototype, leave idle for > 2 weeks, then attempt to run the same job again.

This may be related to issue #83

The text was updated successfully, but these errors were encountered:

stvoutsin · 2020-05-28T18:15:36Z

This is being worked on here:
stvoutsin@87b2e3d

It looks like it's an issue with Zeppelin / Spark jobs storing files in /tmp which gets garbage collected after some time, and the Zeppelin interpreter expects to find a folder which is not there.

The commit above describes a potential fix, of using a different directory as the Spark local dir for Zeppelin.

Zarquan · 2020-06-03T14:14:47Z

Do we close this one (we have successfully identified the bug) and create some new issues, one for making it easier to flush the stale state of a notebook, one for placing temp files in a separate directory managed by us and finally creating the tools for managing a user's temp files ?

stvoutsin · 2020-06-04T13:28:05Z

Closing this task, which will be addressed with issue #112

Added notes on fixing issue #108

stvoutsin self-assigned this May 28, 2020

stvoutsin added the bug Something isn't working label May 28, 2020

stvoutsin changed the title ~~Exception when running any Notebook example after long period of inactivity~~ Exception when running any Spark Notebook after long period of inactivity May 28, 2020

stvoutsin mentioned this issue Jun 4, 2020

Document and Automate the Setup & management of the Spark local directory #112

Open

stvoutsin closed this as completed Jun 4, 2020

stvoutsin mentioned this issue Jun 4, 2020

Sky counts map broken #83

Closed

stvoutsin mentioned this issue Jun 16, 2020

Added notes on fixing issue #108 #122

Merged

Zarquan added a commit that referenced this issue Jun 18, 2020

Merge pull request #122 from stvoutsin/stv-issue-108

91e35df

Added notes on fixing issue #108

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exception when running any Spark Notebook after long period of inactivity #108

Exception when running any Spark Notebook after long period of inactivity #108

stvoutsin commented May 28, 2020

stvoutsin commented May 28, 2020

Zarquan commented Jun 3, 2020

stvoutsin commented Jun 4, 2020

Exception when running any Spark Notebook after long period of inactivity #108

Exception when running any Spark Notebook after long period of inactivity #108

Comments

stvoutsin commented May 28, 2020

stvoutsin commented May 28, 2020

Zarquan commented Jun 3, 2020

stvoutsin commented Jun 4, 2020