Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception when running any Spark Notebook after long period of inactivity #108

Closed
stvoutsin opened this issue May 28, 2020 · 3 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@stvoutsin
Copy link
Collaborator

Symptoms:

Occasionally, if we've run some Spark jobs via Zeppelin notebooks, if we attempt to run the same notebooks a few days/weeks later, an exception will show up and any Spark job will fail until the Zeppelin instance is restarted.

Logs

Fail to execute line 7: count = sc.parallelize(xrange(0, NUM_SAMPLES)) \ Traceback (most recent call last): File "/tmp/zeppelin_pyspark-237895005518745498.py", line 375, in <module> File "<stdin>", line 7, in <module> File "/home/fedora/spark/python/lib/pyspark.zip/pyspark/context.py", line 513, in parallelize return self.parallelize([], numSlices).mapPartitionsWithIndex(f) File "/home/fedora/spark/python/lib/pyspark.zip/pyspark/context.py", line 527, in parallelize jrdd = self._serialize_to_jvm(c, serializer, reader_func, createRDDServer) File "/home/fedora/spark/python/lib/pyspark.zip/pyspark/context.py", line 556, in _serialize_to_jvm tempFile = NamedTemporaryFile(delete=False, dir=self._temp_dir) File "/usr/lib64/python2.7/tempfile.py", line 475, in NamedTemporaryFile (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags) File "/usr/lib64/python2.7/tempfile.py", line 244, in _mkstemp_inner fd = _os.open(file, flags, 0600) OSError: [Errno 2] No such file or directory: '/tmp/spark-temp/spark-f8efede5-5ffe-4b5d-a916-3197bd1f1232/pyspark-4a1de947-7798-4aae-9d81-91baa6ee2a1f/tmpoeV_tI'

How to Reproduce

Run a Spark job on the current Zeppelin/Hadoop prototype, leave idle for > 2 weeks, then attempt to run the same job again.

This may be related to issue #83

@stvoutsin stvoutsin self-assigned this May 28, 2020
@stvoutsin stvoutsin added the bug Something isn't working label May 28, 2020
@stvoutsin
Copy link
Collaborator Author

This is being worked on here:
stvoutsin@87b2e3d

It looks like it's an issue with Zeppelin / Spark jobs storing files in /tmp which gets garbage collected after some time, and the Zeppelin interpreter expects to find a folder which is not there.

The commit above describes a potential fix, of using a different directory as the Spark local dir for Zeppelin.

@stvoutsin stvoutsin changed the title Exception when running any Notebook example after long period of inactivity Exception when running any Spark Notebook after long period of inactivity May 28, 2020
@Zarquan
Copy link
Collaborator

Zarquan commented Jun 3, 2020

Do we close this one (we have successfully identified the bug) and create some new issues, one for making it easier to flush the stale state of a notebook, one for placing temp files in a separate directory managed by us and finally creating the tools for managing a user's temp files ?

@stvoutsin
Copy link
Collaborator Author

Closing this task, which will be addressed with issue #112

Zarquan added a commit that referenced this issue Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants