Update documentation

EddyRivasLab · Aug 20, 2024 · 2ee3a77 · 2ee3a77
1 parent 5c25ffd
commit 2ee3a77
Show file tree

Hide file tree

Showing 5 changed files with 237 additions and 90 deletions.
diff --git a/index.html b/index.html
@@ -96,6 +96,7 @@ <h1 class="websitetitle"><a href="https://eddyrivaslab.github.io/">Eddy and Riva
               <h2>Available HOWTOs</h2>
               <ul>
                   <li><a href="https://eddyrivaslab.github.io/pages/cluster-computing-in-the-eddy-and-rivas-labs.html">Eddy and Rivas Lab Cluster Resources and how to Access Them</a></li>
+                  <li><a href="https://eddyrivaslab.github.io/pages/leaving-the-lab.html">Leaving the Lab</a></li>
                   <li><a href="https://eddyrivaslab.github.io/pages/modifying-this-website.html">Modifying This Website</a></li>
                   <li><a href="https://eddyrivaslab.github.io/pages/my-jobs-arent-running.html">My Jobs Aren't Running</a></li>
                   <li><a href="https://eddyrivaslab.github.io/pages/running-jobs-on-our-cluster.html">Running Jobs on Our RC Machines</a></li>

diff --git a/pages/cluster-computing-in-the-eddy-and-rivas-labs.html b/pages/cluster-computing-in-the-eddy-and-rivas-labs.html
@@ -98,7 +98,7 @@ <h2>Overview</h2>
 When you log in, that's where you'll land. You have 100GB of space
 here. </p>
 <p>Our <em>lab storage</em> is <code>/n/eddy_lab/</code>. We have 400TB of what RC calls
-Tier 1 storage. </p>
+Tier 1 storage, which is fast but expensive. </p>
 <p>Both your home directory and our lab storage are backed up nightly to
 what RC calls <em>snapshots</em>, and periodically to what RC calls <em>disaster
 recovery</em> (DR) backups.</p>
@@ -109,44 +109,17 @@ <h2>Overview</h2>
 machine using <code>samba</code>. (Warning: a samba mount is slow, and may
 sometimes be flaky; don't rely on it except for lightweight tasks.)
 Instructions are below.</p>
-<p>RC also provides <em>shared scratch storage</em> for us in
-<code>/n/holyscratch01/eddy_lab</code>. You have write access here, so at any
-time you can create your own temp directory(s). Best practice is to
-use a directory of your own, in
-<code>/n/holyscratch01/eddy_lab/Users/&lt;username&gt;</code>. We have a 50TB
-allocation. This space can't be remote mounted, isn't backed up, and
-is automatically deleted after 90 days.</p>
+<p>RC also provides <em>shared scratch storage</em>, which is very fast but not backed up.  Files on the scratch storage that are older than 90 days are automatically deleted, and RC strongly frowns on playing tricks to make files look younger than they are.  Because RC occasionally moves the scratch storage to different devices, the easiest way to access it is through the \<span class="math">\(SCRATCH variable, which is defined on all RC machines.  Our lab has an eddy_lab directory on the scratch space with a 50TB quota, which contains a Users directory, so '\\\)</span>SCRATCH/eddy_lab/Users/<yourusername>' will point to your directory on the scratch space <span class="marginnote">The Users directory was pre-populated with space for a set of usernames at some point in the past.  If your username wasn't included, you'll have to email RC to get a directory created for you.</span>.  </p>
+<p>The scratch space is intended for temporary data, so is a great place to put input or output files from jobs, particularly if you intend to post-process your outputs to extract a smaller amount of data from them.</p>
 <p>You can read
 <a href="https://docs.rc.fas.harvard.edu/kb/cluster-storage/">more documentation on how RC storage works</a>.</p>
-<p>We have three compute partitions dedicated to our lab (the <code>-p</code>, for
-partition, will make sense when you learn how to launch compute jobs
-with the <code>slurm</code> scheduler):</p>
-<ul>
-<li>
-<p><strong>-p eddy:</strong> 640 cores, 16 nodes (40 cores/node). We use this partition for most of
-  our computing.</p>
-</li>
-<li>
-<p><strong>-p eddy_gpu:</strong> 
-  4 GPU nodes [holyb0909,holyb0910,holygpu2c0923,holygpu2c1121].
-  Each holyb node has 4 <a href="https://www.nvidia.com/en-us/data-center/v100/">NVIDIA Tesla V100 NVLINK GPUs</a>
-  with 32G VRAM, 2 16-core Xeon CPUs, and 192G RAM [installed 2018].
-  Each holygpu2c node has 8 <a href="https://www.nvidia.com/en-us/data-center/a40/">NVIDIA Ampere A40 GPUs</a>
-  with 48G VRAM, 2 24-core Xeon CPUs, and 768G RAM [installed 2022].</p>
-</li>
-</ul>
-<p>We are awaiting one more GPU node with 4 <a href="https://www.nvidia.com/en-us/data-center/hgx/">NVIDIA HGX A100 GPUs</a>
-  with 80G VRAM, 2 24-core AMD CPUs, and 1024G RAM [shipping expected Nov 2022].</p>
-<p>We use this partition for GPU-enabled machine learning stuff, TensorFlow and the like.</p>
-<ul>
-<li><strong>-p eddy_hmmer:</strong> 576 cores in 16 nodes. These are older cores
-  (circa 2016). We use this partition for long-running or large jobs, to
-  keep them from getting in people's way on <code>-p eddy</code>.</li>
-</ul>
-<p>We are awaiting installation of another 1536 CPU cores (in 24 nodes,
-64 cores/node) [expected fall 2022].</p>
+<p>All of our lab's computing equipment is contained in the eddy partition, which contains 1,872 cores.  Most of our machines have 8GB of RAM per core.  In addition, we have three GPU-equipped machines, which are part of the partition: holygpu2c0923, holygpu2c1121, and holygpu7c0920<span class="marginnote">The "holy" at the beginning of our machine names refers to their location in the Holyoke data center.</span></p>
+<p>Each holygpu2c node has 8 <a href="https://www.nvidia.com/en-us/data-center/a40/">NVIDIA Ampere A40 GPUs</a>
+  with 48G VRAM [installed 2022].  </p>
+<p>The holygpu7 node has 4 <a href="https://www.nvidia.com/en-us/data-center/hgx/">NVIDIA HGX A100 GPUs</a>
+  with 80G VRAM [installed 2023].  </p>
 <p>We can also use Harvard-wide shared partitions on the RC cluster. <code>-p
-shared</code> is 17,952 cores (in 375 nodes), for example. RC has
+shared</code> is 19,104 cores (in 399 nodes), for example (as of Jan 2023). RC has
 <a href="https://docs.rc.fas.harvard.edu/kb/running-jobs/#Slurm_partitions">much more documentation on available partitions</a>.</p>
 <h2>Accessing the cluster</h2>
 <h3>logging on, first time</h3>
@@ -197,20 +170,20 @@ <h3>configuring an ssh host alias</h3>
 
 <p>You still have to authenticate by password and OpenAuth code, though.</p>
 <h3>configuring single sign-on scp access</h3>
-<p>It can get tedious to have to authenticate every time you <code>ssh</code> to RC,
-especially if you're using ssh-based tools like <code>scp</code> to copy
-individual files back and forth.  You can streamline this using
+<p>Even better, but a little more complicated: you can make it so you
+only have to authenticate once, and every ssh or scp after that is
+passwordless. To do this, I use
 <a href="https://docs.rc.fas.harvard.edu/kb/using-ssh-controlmaster-for-single-sign-on/">SSH ControlMaster for single sign-on</a>,
 to open a single <code>ssh</code> connection that you authenticate once, and all
 subsequent <code>ssh</code>-based traffic to RC goes via that connection.</p>
 <p>RC's
 <a href="https://docs.rc.fas.harvard.edu/kb/using-ssh-controlmaster-for-single-sign-on/">instructions are here</a>
 but briefly:</p>
 <ul>
-<li>Add another hostname alias to your <code>.ssh/config</code> file. Mine is
-  called <strong>odx</strong>:</li>
+<li>Replace the above hostname alias in <code>.ssh/config</code> file with
+  something like this:</li>
 </ul>
-<div class="highlight"><pre><span></span><code>Host              odx
+<div class="highlight"><pre><span></span><code>Host              ody
    User           seddy
    HostName       login.rc.fas.harvard.edu
    ControlMaster  auto
@@ -221,19 +194,19 @@ <h3>configuring single sign-on scp access</h3>
 <ul>
 <li>Add some aliases to your <code>.bashrc</code> file:</li>
 </ul>
-<div class="highlight"><pre><span></span><code>   <span class="nb">alias</span> odx-start<span class="o">=</span><span class="s1">&#39;ssh -Y -o ServerAliveInterval=30 -fN odx&#39;</span>   
-   <span class="nb">alias</span> odx-stop<span class="o">=</span><span class="s1">&#39;ssh -O stop odx&#39;</span>
-   <span class="nb">alias</span> odx-kill<span class="o">=</span><span class="s1">&#39;ssh -O exit odx&#39;</span>
+<div class="highlight"><pre><span></span><code>   <span class="nb">alias</span> ody-start<span class="o">=</span><span class="s1">&#39;ssh -Y -o ServerAliveInterval=30 -fN ody&#39;</span>   
+   <span class="nb">alias</span> ody-stop<span class="o">=</span><span class="s1">&#39;ssh -O stop ody&#39;</span>
+   <span class="nb">alias</span> ody-kill<span class="o">=</span><span class="s1">&#39;ssh -O exit ody&#39;</span>
 </code></pre></div>
 
 <p>Now you can launch a session with:</p>
-<div class="highlight"><pre><span></span><code><span class="w">    </span><span class="c">% odx-start</span><span class="w"></span>
+<div class="highlight"><pre><span></span><code><span class="w">    </span><span class="c">% ody-start</span><span class="w"></span>
 </code></pre></div>
 
 <p>It'll ask you to authenticate. After you do this, all your ssh-based
 commands (in any terminal window) will work without further
 authentication. To stop the connection, do</p>
-<div class="highlight"><pre><span></span><code><span class="w">    </span><span class="c">% odx-stop</span><span class="w"></span>
+<div class="highlight"><pre><span></span><code><span class="w">    </span><span class="c">% ody-stop</span><span class="w"></span>
 </code></pre></div>
 
 <p>If you forget to stop it, no big deal, the connection will eventually
@@ -395,17 +368,7 @@ <h3>writing an sbatch script</h3>
 format. An example that (stupidly) loads gcc and just calls
 <code>hostname</code>, so the output will be the name of the compute node the
 script ran on:</p>
-<div class="highlight"><table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre><span class="normal"> 1</span>
-<span class="normal"> 2</span>
-<span class="normal"> 3</span>
-<span class="normal"> 4</span>
-<span class="normal"> 5</span>
-<span class="normal"> 6</span>
-<span class="normal"> 7</span>
-<span class="normal"> 8</span>
-<span class="normal"> 9</span>
-<span class="normal">10</span>
-<span class="normal">11</span></pre></div></td><td class="code"><div><pre><span></span><code><span class="ch">#!/bin/bash</span>
+<div class="highlight"><pre><span></span><code><span class="ch">#!/bin/bash</span>
 <span class="c1">#SBATCH -c 1        # Number of cores/threads</span>
 <span class="c1">#SBATCH -N 1        # Ensure that all cores are on one machine</span>
 <span class="c1">#SBATCH -t 6-00:00  # Runtime in D-HH:MM</span>
@@ -416,7 +379,7 @@ <h3>writing an sbatch script</h3>
 
 module load gcc
 hostname
-</code></pre></div></td></tr></table></div>
+</code></pre></div>
 
 <p>Save this to a file (<code>foo.sh</code> for example) and submit it with <code>sbatch</code>:</p>
 <div class="highlight"><pre><span></span><code>    sbatch foo.sh
@@ -468,6 +431,68 @@ <h3>etiquette</h3>
 <p>You can also add <code>--nice 1000</code> to your <code>sbatch</code> command, to downgrade
 your running priority in the queue, which helps let other people's
 jobs get run before yours.</p>
+<script type="text/javascript">if (!document.getElementById('mathjaxscript_pelican_#%@#$@#')) {
+    var align = "center",
+        indent = "0em",
+        linebreak = "false";
+
+    if (false) {
+        align = (screen.width < 768) ? "left" : align;
+        indent = (screen.width < 768) ? "0em" : indent;
+        linebreak = (screen.width < 768) ? 'true' : linebreak;
+    }
+
+    var mathjaxscript = document.createElement('script');
+    mathjaxscript.id = 'mathjaxscript_pelican_#%@#$@#';
+    mathjaxscript.type = 'text/javascript';
+    mathjaxscript.src = 'https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=TeX-AMS-MML_HTMLorMML';
+
+    var configscript = document.createElement('script');
+    configscript.type = 'text/x-mathjax-config';
+    configscript[(window.opera ? "innerHTML" : "text")] =
+        "MathJax.Hub.Config({" +
+        "    config: ['MMLorHTML.js']," +
+        "    TeX: { extensions: ['AMSmath.js','AMSsymbols.js','noErrors.js','noUndefined.js'], equationNumbers: { autoNumber: 'none' } }," +
+        "    jax: ['input/TeX','input/MathML','output/HTML-CSS']," +
+        "    extensions: ['tex2jax.js','mml2jax.js','MathMenu.js','MathZoom.js']," +
+        "    displayAlign: '"+ align +"'," +
+        "    displayIndent: '"+ indent +"'," +
+        "    showMathMenu: true," +
+        "    messageStyle: 'normal'," +
+        "    tex2jax: { " +
+        "        inlineMath: [ ['\\\\(','\\\\)'] ], " +
+        "        displayMath: [ ['$$','$$'] ]," +
+        "        processEscapes: true," +
+        "        preview: 'TeX'," +
+        "    }, " +
+        "    'HTML-CSS': { " +
+        "        availableFonts: ['STIX', 'TeX']," +
+        "        preferredFont: 'STIX'," +
+        "        styles: { '.MathJax_Display, .MathJax .mo, .MathJax .mi, .MathJax .mn': {color: 'inherit ! important'} }," +
+        "        linebreaks: { automatic: "+ linebreak +", width: '90% container' }," +
+        "    }, " +
+        "}); " +
+        "if ('default' !== 'default') {" +
+            "MathJax.Hub.Register.StartupHook('HTML-CSS Jax Ready',function () {" +
+                "var VARIANT = MathJax.OutputJax['HTML-CSS'].FONTDATA.VARIANT;" +
+                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
+                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
+                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
+                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
+            "});" +
+            "MathJax.Hub.Register.StartupHook('SVG Jax Ready',function () {" +
+                "var VARIANT = MathJax.OutputJax.SVG.FONTDATA.VARIANT;" +
+                "VARIANT['normal'].fonts.unshift('MathJax_default');" +
+                "VARIANT['bold'].fonts.unshift('MathJax_default-bold');" +
+                "VARIANT['italic'].fonts.unshift('MathJax_default-italic');" +
+                "VARIANT['-tex-mathit'].fonts.unshift('MathJax_default-italic');" +
+            "});" +
+        "}";
+
+    (document.body || document.getElementsByTagName('head')[0]).appendChild(configscript);
+    (document.body || document.getElementsByTagName('head')[0]).appendChild(mathjaxscript);
+}
+</script>
 
         </article>
         <footer>Powered by <a href="https://getpelican.com/">Pelican</a>.  Site theme is a modified version of <a href="https://github.com/andrewheiss/ath-tufte-pelican">ath-tufte-pelican</a>.</footer>