Skip to content

Debugging Spark

Louis Bergelson edited this page Jan 27, 2016 · 3 revisions

It is possible to connect a remote debugger to a spark process.

To connect a debugger to the driver

Append the following to your spark submit (or gatk-launch) options:

replace 5005 with a different available port if necessary

--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=5005

This will suspend the driver until it gets a remote connection from intellij.

Configure a new intellij remote debugging configuration as follows:

  • Select Run -> Edit Configurations
  • Hit the + to add a new configuration.
  • Choose Remote
    • set Mode to Attach
    • set Host to your driver node name i.e. dataflow01.broadinstitute.org
    • set Port to whatever port you used before
  • Click OK

Now start your spark tool and then run your debug configuration.

#To debug an executor

add the following to your gatk-launch command

  --num-executors 1 --executor-cores 1 --conf "spark.executor.extraJavaOptions=-agentlib:jdwp=transport=dt_socket,server=n,address=wm1b0-8ab.broadinstitute.org:5005,suspend=n"

Replace the given address with your local computer's address and port. (intellij's remote debug configuration screen will show you the address if you're not sure what it is)

(It's important to set num-executors to 1 or each executor will try to connect to your debugger causing problems.)

Note that this will not suspend the executor (or the spark program will crash when run..) Instead, set the Mode in your run configuration to listen. Start your debug configuration before you start the spark program and it will wait for a connection from the executor.