Skip to content

Setting up Galaxy wrappers on PhenoMeNal Galaxy Container

Luca Pireddu edited this page Oct 27, 2017 · 11 revisions

Adding the tool to tool_conf.xml

This file places the tool in the left sidebar (where all Galaxy tools are shown). Insert the tool in the most appropriate <section>...</section>, or add a new section if needed. For example, here's the ramid tool in the "Fluxomics" section:

<section name="Fluxomics" id="pheno-fluxomics">
  <tool file="phenomenal/fluxomics/ramid/ramid.xml"/>
...

Note that <tool file="..."/> points to the path for the tool wrapper relative to the Galaxy tools folder.

Tool CPU and memory usage requests and limits

The CPU and memory made available to your tool by PhenoMeNal will be limited to default values. To set more appropriate values, you need to perform the following steps. This feature exists since our Galaxy container build 111. Remember: the last number of the tag is the build number.

Add the tool to container mapping

If your tool existed in the previous scheme that didn't consider resources, it might be already added here; search by id the in the file mentioned below.

In the config/phenomenal_tools2container.yaml file, add the association of the tool to the container that will be used to run it. More than one tool can be assigned to the same container; for instance, many XCMS Galaxy tools will probably use the same XCMS container.

Option 1: add new entry (if the desired container is not available)

...
- tools_id:
   - my-tool-id
  docker_repo_override: container-registry.phenomenal-h2020.eu
  docker_owner_override: phnmnl
  docker_image_override: my-tool-id-container
  docker_tag_override: v0.4_cv0.3.11
  max_pod_retrials: 3
...

Option 2: add your tool id to an existing container definition

...
- tools_id:
   - other-tool-id, my-tool-id
  docker_repo_override: container-registry.phenomenal-h2020.eu
  docker_owner_override: phnmnl
  docker_image_override: many-tools-container
  docker_tag_override: v0.4_cv0.3.11
  max_pod_retrials: 3
...

Add the tool to the job_conf.xml file

In the job_conf.xml file you will specify the resource usage profile for your tool. The Galaxy-Kubernetes runner will ensure the tool doesn't exceed its configured CPU and memory usage; moreover, it will use this information to schedule the tool to run on a node with adequate computing resources (you wouldn't want your tool to run out of memory, right? :-) ).

1.- Choose the right resource usage profile based on your knowledge of how much CPU and memory are required for a standard run of the tool. For some tools, requirements could vary depending on input dataset size; in those cases, consider an "average" dataset size. Remember: these are presets for your users, but should they need to they will be able to override the settings from the Galaxy interface (e.g., for an abnormally large dataset). Avoid using "bigger" than strictly required. Requiring too much memory or CPU might delay (or even make impossible) the execution of a tool depending on how busy the Kubernetes cluster is, or how much memory and CPU the individual nodes in the cluster have.

Rule of thumb: pick a resource profile with only slightly more memory and CPU than minimally needed for an normal tool execution. Users will have the ability to change CPU and memory usage in the interface if needed, though ideally that should be avoided.

The resource profiles currently available are:

Category CPU Request CPU Limit RAM Request RAM Limit
tiny 0.1 0.5 300 MB 600 MB
small 0.4 0.8 500 MB 900 MB
medium 0.7 2 800 MB 2 GB
large 1.5 4 1.8 GB 5 GB
xlarge 4 8 8 GB 16 GB

What are requests and limits?

Request: predicted usage, used at scheduling time to choose a node with sufficient resources to run your tool.

Limit: a constraint on the amount of resource the tool will be allowed to use.

For example, a tool requests 500 MB of RAM and has a limit of 1 GB. It will be scheduled to run on a system with at least 500 MB of RAM free. If it ends up using more than 1 GB of RAM it will be killed. CPU limits are normally softer, in the sense that the job will be throttled to only use the limit amount CPU.

To understand how CPU/Memory requests and limits work in more detail you can read the Kubernetes documentation of the topic.

2.- Assign the tool a dynamic destination bound to a resource usage category

To use the small resource usage profile with your tool (with id my-tool-id), add the following line to the job_conf.xml file, inside the <tools>...</tools> section:

<tool id="my-tool-id" destination="dynamic-k8s-small" resources="all"/>

For the medium profile you would use the dynamic-k8s-medium destination. In general, the destination name of the form dynamic-k8s-xxx, where xxx is on of the profile names listed in the table above (you can also find them in the job_conf.xml file by looking at the <destination> tags). Note that it's key that the tool id matches the id in the tool's XML wrapper. Also, remember that any other <tool id="my-tool-id"../> for the same tool-id need to be removed/replaced by the one being inserted.

Deprecated: original scheme

These are the indications for adding tools to the job_conf file valid until the Bucetin release 17.08. Past that date, the newer implementation explained above should be used. This approach is now deprecated and left here only for reference.

There are two pieces that are added to the file job_conf.xml residing on config/ on the container-galaxy-k8s-runtime setup:

1.- container destination: this is the container where the tool wrapped expects to run, to be added within the <destinations>...</destinations> place holder.

An example of this would be, for a tool called ramid using a container named container-registry.phenomenal-h2020.eu/phnmnl/ramid:latest:

<destination id="ramid-container" runner="k8s">
   <param id="docker_repo_override">container-registry.phenomenal-h2020.eu</param>
   <param id="docker_owner_override">phnmnl</param>
   <param id="docker_image_override">ramid</param>
   <param id="docker_tag_override">latest</param>
   <param id="max_pod_retrials">3</param>
   <param id="docker_enabled">true</param>
</destination>

2.- tool to destination binding: this matches the tool, through its ID, to the previously added container.

An example for the same ramid tool would be (within the <tools>...</tools> placeholder):

<tool id="ramid" destination="ramid-container"/>
Clone this wiki locally