Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Management of working directories #124

Closed
AndreaGuarracino opened this issue Feb 11, 2023 · 3 comments
Closed

Management of working directories #124

AndreaGuarracino opened this issue Feb 11, 2023 · 3 comments

Comments

@AndreaGuarracino
Copy link

AndreaGuarracino commented Feb 11, 2023

Hi, I have installed verkko with conda (conda install -c conda-forge -c bioconda -c defaults verkko) and launched a job with sbatch (SLURM job scheduler) in our cluster:

# I put spaces in the command to make it more readable here on GitHub

sbatch -p workers -c 48 --wrap "hostname &&
cd /scratch && 
\time -v verkko \
    -d /scratch/HG00673 --threads 48 
    --hifi /lizardfs/guarracino/HPRC/raw_data/HG00673/PacBio_HiFi/*.ccs.trim.fq.gz 
    --nano /lizardfs/guarracino/HPRC/raw_data/HG00673/nanopore/*.fastq.gz 
    --hap-kmers /lizardfs/guarracino/HPRC/verkko/meryl/HG00673/maternal_compress.k30.hapmer.meryl 
                /lizardfs/guarracino/HPRC/verkko/meryl/HG00673/paternal_compress.k30.hapmer.meryl
                trio"

Useful information to understand the incoming problem:

  • /lizardfs is where we store our data and it's shared by all nodes on our cluster.
  • /scratch is on a fast SSD. We use it as the working directory for writing temporary files or for writing final files before moving them to /lizardfs. Each node has its own SSD.

So, the command first specifies to go in /scratch and then runs verkko by setting /scratch/HG00673 as the output directory for its intermediate and final results.

After ~7 days or running, I have got this error:

...
[Tue Feb  7 04:38:02 2023]
rule generateConsensus:
    input: 7-consensus/packages/part014.cnspack, 7-consensus/packages.tigName_to_ID.map, 7-consensus/packages.report
    output: 7-consensus/packages/part014.fasta
    log: 7-consensus/packages/part014.err
    jobid: 1029
    reason: Missing output files: 7-consensus/packages/part014.fasta
    wildcards: nnnn=014
    threads: 8
    resources: tmpdir=/tmp, job_id=14, n_cpus=8, mem_gb=35, time_h=24

[Tue Feb  7 04:48:31 2023]
Finished job 1024.
1022 of 1039 steps (98%) done
Select jobs to execute...
Traceback (most recent call last):
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/snakemake/__init__.py", line 757, in snakemake
    success = workflow.execute(
              ^^^^^^^^^^^^^^^^^
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/snakemake/workflow.py", line 1089, in execute
    raise e
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/snakemake/workflow.py", line 1085, in execute
    success = self.scheduler.schedule()
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/snakemake/scheduler.py", line 571, in schedule
    run = self.job_selector(needrun)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/snakemake/scheduler.py", line 835, in job_selector_ilp
    self._solve_ilp(prob)
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/snakemake/scheduler.py", line 884, in _solve_ilp
    prob.solve(solver)
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/pulp/pulp.py", line 1913, in solve
    status = solver.actualSolve(self, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/pulp/apis/coin_api.py", line 137, in actualSolve
    return self.solve_CBC(lp, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/pulp/apis/coin_api.py", line 153, in solve_CBC
    vs, variablesNames, constraintsNames, objectiveName = lp.writeMPS(
                                                          ^^^^^^^^^^^^
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/pulp/pulp.py", line 1782, in writeMPS
    return mpslp.writeMPS(self, filename, mpsSense=mpsSense, rename=rename, mip=mip)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/guarracino/.conda/envs/andrea/lib/python3.11/site-packages/pulp/mps_lp.py", line 250, in writeMPS
    with open(filename, "w") as f:
OSError: [Errno 28] No space left on device
Command exited with non-zero status 1
...

At that moment we had the /home completely full. This highlights that some space in /home/ is being used during execution. If this is correct:

  • may I ask you for help in understanding what is written and where is it written?
  • and more importantly, how can I control this process by specifying the directory in which write these temporary files?

Full log:
slurm-122917.txt

@skoren
Copy link
Member

skoren commented Feb 13, 2023

Verkko shouldn't be writing anything outside of its home folder. This seems to be snakemake/conda intermediates, which may be related to: snakemake/snakemake#1003 and nextstrain/ncov#830.

Perhaps you can try setting TMPDIR before launching verkko.sh and see if that uses the appropriate location instead.

@skoren
Copy link
Member

skoren commented Feb 28, 2023

Was this resolved?

@AndreaGuarracino
Copy link
Author

I "solved" the issue by making sure I have a bit of free space in /home. With a few Gigabytes free I was able to run 8 instances of verkko together. About conda/snakemake writing stuff in /home, it might be related to some specific aspects of our cluster.

Thank you for the quick support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants