Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: HubSpot/Singularity
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: logfetch-0.20.2
Choose a base ref
...
head repository: HubSpot/Singularity
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref

Commits on Jan 22, 2015

  1. Copy the full SHA
    3669146 View commit details

Commits on Jan 23, 2015

  1. progress on new status page

    ssalinas committed Jan 23, 2015
    Copy the full SHA
    11bfb1c View commit details

Commits on Jan 30, 2015

  1. Copy the full SHA
    77295e0 View commit details

Commits on Mar 5, 2015

  1. add new handlebar helpers

    benrlodge committed Mar 5, 2015
    Copy the full SHA
    02c2bf4 View commit details
  2. update colors

    benrlodge committed Mar 5, 2015
    Copy the full SHA
    852e3b0 View commit details
  3. Copy the full SHA
    c234fa9 View commit details
  4. Copy the full SHA
    cd3bb26 View commit details
  5. more updates

    benrlodge committed Mar 5, 2015
    Copy the full SHA
    1879479 View commit details

Commits on Mar 6, 2015

  1. clean up charts and styles

    benrlodge committed Mar 6, 2015
    Copy the full SHA
    49b3874 View commit details
  2. Copy the full SHA
    a5403a0 View commit details

Commits on Mar 31, 2015

  1. Copy the full SHA
    c3c1266 View commit details
  2. Copy the full SHA
    c568edd View commit details

Commits on Apr 1, 2015

  1. Copy the full SHA
    91facca View commit details
  2. Copy the full SHA
    1a4ac85 View commit details

Commits on Apr 2, 2015

  1. Copy the full SHA
    a745be7 View commit details

Commits on Apr 3, 2015

  1. Copy the full SHA
    96ae799 View commit details
  2. Copy the full SHA
    e746275 View commit details

Commits on Apr 6, 2015

  1. Merge branch 'lb_cleanups' into hs_staging

    tpetr committed Apr 6, 2015
    Copy the full SHA
    ffa64a9 View commit details
  2. Copy the full SHA
    f5b647e View commit details
  3. Copy the full SHA
    8120df3 View commit details
  4. Copy the full SHA
    bfcc01a View commit details

Commits on Apr 7, 2015

  1. Copy the full SHA
    a622525 View commit details
  2. Copy the full SHA
    fec0bea View commit details
  3. Copy the full SHA
    9fdca00 View commit details
  4. Copy the full SHA
    6642494 View commit details
  5. Copy the full SHA
    a2d5803 View commit details
  6. Copy the full SHA
    2cf3f35 View commit details

Commits on Apr 8, 2015

  1. MVP + test

    wsorenson committed Apr 8, 2015
    Copy the full SHA
    261ab99 View commit details

Commits on Apr 9, 2015

  1. Copy the full SHA
    82851c1 View commit details

Commits on Apr 10, 2015

  1. Merge pull request #519 from HubSpot/client_request_history_support

    SingularityClient request history support
    gchomatas committed Apr 10, 2015
    Copy the full SHA
    fa1c25c View commit details

Commits on Apr 13, 2015

  1. Merge branch 'add_cols_slaves' into hs_staging

    tpetr committed Apr 13, 2015
    Copy the full SHA
    98e7502 View commit details
  2. Merge branch 'fix-cleanup-bug' into hs_staging

    tpetr committed Apr 13, 2015
    Copy the full SHA
    4195186 View commit details
  3. Copy the full SHA
    2e8c4a9 View commit details
  4. Copy the full SHA
    003dee2 View commit details

Commits on Apr 14, 2015

  1. Merge branch 'fix-cleanup-bug' into hs_staging

    tpetr committed Apr 14, 2015
    Copy the full SHA
    d843c6f View commit details
  2. Copy the full SHA
    58286f8 View commit details
  3. Copy the full SHA
    af013e7 View commit details

Commits on Apr 15, 2015

  1. Copy the full SHA
    709340e View commit details
  2. Copy the full SHA
    6bf644a View commit details
  3. Copy the full SHA
    d1a6196 View commit details

Commits on Apr 16, 2015

  1. Copy the full SHA
    6fb603f View commit details
  2. Copy the full SHA
    2c2f16c View commit details

Commits on Apr 17, 2015

  1. Copy the full SHA
    fbcc343 View commit details
  2. Copy the full SHA
    ba40701 View commit details
  3. Copy the full SHA
    028b9c2 View commit details

Commits on Apr 20, 2015

  1. Merge branch 'healthcheck2' into hs_staging

    tpetr committed Apr 20, 2015
    Copy the full SHA
    d802c31 View commit details

Commits on Apr 21, 2015

  1. Copy the full SHA
    66c62c6 View commit details

Commits on Apr 22, 2015

  1. progress on new status page

    ssalinas authored and tpetr committed Apr 22, 2015
    Copy the full SHA
    8efc6fe View commit details
  2. add new handlebar helpers

    benrlodge authored and tpetr committed Apr 22, 2015
    Copy the full SHA
    dd1a95c View commit details
  3. update colors

    benrlodge authored and tpetr committed Apr 22, 2015
    Copy the full SHA
    a0b0a7a View commit details
Showing 1,569 changed files with 180,478 additions and 37,019 deletions.
File renamed without changes.
2 changes: 2 additions & 0 deletions .blazar.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
env:
SET_VERSION_OVERRIDE: "1.5.0-$GIT_BRANCH-SNAPSHOT"
16 changes: 16 additions & 0 deletions .bookignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
cookbook
eclipse
EmbedSingularityExample
mysql
node_modules
scripts
Singularity*
target
vagrant
.travis.yml
compose-dev.yml
docker-compose.yml
pom.xml
dev
*.sh
*.py
File renamed without changes.
22 changes: 22 additions & 0 deletions .github/workflows/semgrep.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
on:
pull_request: {}
push:
branches:
- main
- master
paths:
- .github/workflows/semgrep.yml
schedule:
- cron: '0 0 * * 0'
name: Semgrep
jobs:
semgrep:
name: Scan
runs-on: ubuntu-20.04
env:
SEMGREP_APP_TOKEN: ${{ secrets.SEMGREP_APP_TOKEN }}
container:
image: returntocorp/semgrep
steps:
- uses: actions/checkout@v3
- run: semgrep ci
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
_book
*.orig
target
.DS_Store
@@ -9,6 +10,7 @@ target-eclipse
bin
*~
JavaConsole.java
.checkstyle
*.iml
.idea
*.pyc
@@ -22,6 +24,7 @@ SingularityUI/bower_components/*
SingularityUI/brunchlog
SingularityUI/app/env.coffee

node_modules/*
.vagrant
vagrant/Berksfile.lock
SingularityUI/.vee
File renamed without changes.
14 changes: 2 additions & 12 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,18 +1,8 @@
language: java
jdk:
- oraclejdk7
- openjdk7
- oraclejdk8
install: mvn -Pbuild-swagger-documentation -DskipTests=true -B -q -fae install
script: mvn -B -q -fae verify

git:
depth: 100

sudo: false
- openjdk8

script: mvn -B -DskipSingularityWebUI verify
cache:
directories:
- $HOME/.m2
- SingularityUI/bower_components
- SingularityUI/node_modules
14 changes: 14 additions & 0 deletions Docs/about/adopters.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Adopters

These organizations proudly use Singularity:

- [HubSpot](http://www.hubspot.com/)
- [Groupon](http://www.groupon.com/)
- [OpenTable](http://www.opentable.com/)
- [EverTrue](http://www.evertrue.com/)
- [Grepsr](http://www.grepsr.com/)
- [Nitro](http://www.gonitro.com/)
- [Bdmreco](https://bdmreco.io/)
- [Captify Technologies Ltd](https://www.captifytechnologies.com/)

If you're using Singularity and aren't on this list, feel free to submit a Pull Request.
106 changes: 106 additions & 0 deletions Docs/about/how-it-works.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
## What is Singularity
**Singularity** is a platform that enables deploying and running services and scheduled jobs in the cloud or data centers. Combined with Apache Mesos, it provides efficient management of the underlying processes life cycle and effective use of cluster resources.

![HubSpot PaaS](../images/HubSpot_PaaS.png)

Singularity is an essential part of the HubSpot Platform and is ideal for deploying micro-services. It is optimized to manage thousands of concurrently running processes in hundreds of servers.

## How it Works
Singularity is an [**Apache Mesos framework**](http://mesos.apache.org/documentation/latest/frameworks/). It runs as a *task scheduler* on top of **Mesos Clusters** taking advantage of Apache Mesos' scalability, fault-tolerance, and resource isolation. [Apache Mesos](http://mesos.apache.org/documentation/latest/architecture/) is a cluster manager that simplifies the complexity of running different types of applications on a shared pool of servers. In Mesos terminology, *Mesos applications* that use the Mesos APIs to schedule tasks in a cluster are called [*frameworks*](http://mesos.apache.org/documentation/latest/app-framework-development-guide/).

![Mesos Frameworks](../images/Mesos_Frameworks.png)

There are different types of frameworks and most frameworks concentrate on a specific type of task (e.g. long-running vs scheduled cron-type jobs) or supporting a specific domain and relevant technology (e.g. data processing with hadoop jobs vs data processing with spark).

Singularity tries to be more generic by combining **long-running tasks** and **job scheduling** functionality in one framework to support many of the common process types that developers need to deploy every day to build modern web applications and services. While Mesos allows multiple frameworks to run in parallel, it greatly simplifies the PaaS architecture by having a consistent and uniform set of abstractions and APIs for handling deployments across the organization. Additionally, it reduces the amount of framework boilerplate that must be supported - as all Mesos frameworks must keep state, handle failures, and properly interact with the Mesos APIs. These are the main reasons HubSpot engineers initiated the development of a new framework. As of this moment, Singularity supports the following process types:

- **Web Services**. These are long running processes which expose an API and may run with multiple load balanced instances. Singularity supports automatic configurable health checking of the instances at the process and API endpoint level as well as load balancing. Singularity will automatically restart these tasks when they fail or exit.
- **Workers**. These are long running processes, similar to web services, but do not expose an API. *Queue consumers* are a common type of worker processes. Singularity does automatic health checking, cool-down and restart of worker instances.
- **Scheduled (CRON-type) Jobs**. These are tasks that periodically run according to a provided CRON schedule. Scheduled jobs will not be restarted when they fail unless instructed to do so. Singularity will run them again on the next scheduling cycle.
- **On-Demand Processes**. These are manually run processes that will be deployed and ready to run but Singularity will not automatically run them. Users can start them through an API call or using the Singularity Web UI, which allows them to pass command line parameters on-demand.

## Singularity Components
Mesos frameworks have two major components. A **scheduler component** that registers with the **Mesos master** to be offered resources and an **executor component** that is launched on cluster agent nodes by the **Mesos agent process** to run the framework tasks.

The *Mesos master* determines how many resources are offered to each framework and the *framework scheduler* selects which of the offered resources to use to run the required tasks. Mesos agents do not directly run the tasks but delegate the running to the appropriate *executor* that has knowledge about the nature of the allocated task and the special handling that might be required.

![Singularity Components](../images/framework_components.png)

As depicted in the figure, Singularity implements the two basic framework components as well as a few more to solve common complex / tedious problems such as task cleanup and log tailing / archiving without requiring developers to implement it for each task they want to run:

### Singularity Scheduler
The scheduler is the core of Singularity: a [DropWizard](http://www.dropwizard.io/) API that implements the Mesos Scheduler Driver. The scheduler matches client deploy requests to Mesos resource offers and acts as a web service offering a JSON REST API for accepting deploy requests.

Clients use the Singularity API to register the type of deployable item that they want to run (web service, worker, cron job) and the corresponding runtime settings (cron schedule, # of instances, whether instances are load balanced, rack awareness, etc.).

After a deployable item (a **request**, in API terms) has been registered, clients can post *Deploy requests* for that item. Deploy requests contain information about the command to run, the executor to use, executor specific data, required cpu, memory and port resources, health check URLs and a variety of other runtime configuration options. The Singularity scheduler will then attempt to match Mesos offers (which in turn include resources as well as rack information and what else is running on agent hosts) with its list of *Deploy requests* that have yet to be fulfilled.

<a name="deploys"/>

Rollback of failed deploys, health checking and load balancing are also part of the advanced functionality the Singularity Scheduler offers. A new deploy for a long runing service will run as shown in the diagram below.

![Singularity Deploy](../images/deploy.png)

When a service or worker instance fails in a new deploy, the Singularity scheduler will rollback all instances to the version running before the deploy, keeping the deploys always consistent. After the scheduler makes sure that a Mesos task (corresponding to a service instance) has entered the TASK_RUNNING state it will use the provided health check URL and the specified health check timeout settings to perform health checks. If health checks go well, the next step is to perform load balancing of service instances. Load balancing is attempted only if the corresponding deployable item has been defined to be *loadBalanced*. To perform load balancing between service instances, Singularity supports a rich integration with a specific Load Balancer API. Singularity will post requests to the Load Balancer API to add the newly deployed service instances and to remove those that were previously running. Check [Integration with Load Balancers](../development/load-balancer-integration.md) to learn more. Singularity also provides generic webhooks which allow third party integrations, which can be registered to follow request, deploy, or task updates.

<a name="placement"/>

#### Agent Placement

When matching a Mesos resource offer to a deploy, Singularity can use one of several strategies to determine if the host in the offer is appropriate for the task in question, or `placement` in Singularity terms. Available placement strategies are:

- `GREEDY`: uses whatever agents are available
- `SEPARATE_BY_DEPLOY`/`SEPARATE`: ensures no 2 instances / tasks of the same request *and* deploy id are ever placed on the same agent
- `SEPARATE_BY_REQUEST`: ensures no two tasks belonging to the same request (regardless if deploy id) are placed on the same host
- `OPTIMISTIC`: attempts to spread out tasks but may schedule some on the same agent
- `SPREAD_ALL_AGENTS`: ensure the task is running on every agent. Some behaviour as `SEPARATE_BY_DEPLOY` but with autoscaling the Request to keep instances equal number of agents.

Agent placement can also be impacted by agent attributes. There are three scenarios that Singularity supports:

1. *Specific Agents -> For a certain request, only run it on agents with matching attributes* - In this case, you would specify `requiredAgentAttributes` in the json for your request, and the tasks for that request would only be scheduled on agents that have all of those attributes.

2. *Reserved Agents -> Reserve a agent for specific requests, only run those requests on those agents* - In your Singularity config, specify the `reserveAgentsWithAttributes` field. Singularity will then only schedule tasks on agents with those attributes if the request's required attributes also match those.

3. *Test Group of Agents -> Reserve a agent for specific requests, but don't restrict the requests to that agent* - In your Singularity config, specify the `reserveAgentsWithAttributes` field as in the previous example. But, in the request json, specify the `allowedAgentAttributes` field. Then, the request will be allowed to run elsewhere in the cluster, but will also have the matching attributes to run on the reserved agent.

#### Singularity Scheduler Dependencies
The Singularity scheduler uses ZooKeeper as a distributed replication log to maintain state and keep track of registered deployable items, the active deploys for these items and the running tasks that fulfill the deploys. As shown in the drawing, the same ZooKeeper quorum utilized by Mesos masters and agents can be reused for Singularity.

Since ZooKeeper is not meant to handle large quantities of data, Singularity can optionally (and recommended for any real usage) utilize a database (MySQL or PostgreSQL) to periodically offload historical data from ZooKeeper and keep records of deployable item changes, deploy request history as well as the history of all launched tasks.

In production environments Singularity should be run in high-availability mode by running multiple instances of the Singularity Scheduler component. As depicted in the drawing, only one instance is always active with all the other instances waiting in stand-by mode. While only one instance is registered for receiving resource offers, all instances can process API requests. Singularity uses ZooKeeper to perform leader election and maintain a single leader. Because of the ability for all instances to change state, Singularity internally uses queues which are consumed by the Singularity leader to make calls to Mesos.

#### Singularity UI
The [*Singularity UI*](ui.md) is a single page static web application served from the Singularity Scheduler that uses the Singularity API to present information about deployed items.

It is a fully-featured application which provides historical as well as active task information. It allows users to view task logs and interact directly with tasks and deploy requests.

<a name="optional-components"/>

### Optional Agent Components

#### Singularity Executor
Users can opt for the default Mesos executor, the Docker container executor, or the Singularity executor. Like the other executors, the Singularity executor is executed directly by the Mesos agent process for each task that executes on a agent. The requests sent to the executor contain all the required data for setting up the running environment like the command to execute, environment variables, executable artifact URLs, application configuration files, etc. The Singularity executor provides some advanced (configurable) features:

- **Custom Fetcher** Downloads and extracts artifacts over HTTP, directly from S3, or using the S3 Downloader component.
- **Log Rotation** Sets up logrotate for specified log files inside the task directory.
- **Task Sandbox Cleanup**. Can cleanup large (uninteresting) application files but leave important logs and debugging files.
- **Graceful Task Killing**. Can send SIGTERM and escalate to SIGKILL for graceful shutdown of tasks.
- **Environment Setup and Runner Script**. Provides for setup of environment variables and corresponding bash script to run the command.

#### S3 Uploader
The S3 uploader reliably uploads rotated task log files to S3 for archiving. These logs can then be downloaded directly from the Singularity UI.

#### S3 Downloader
The S3 downloader downloads and extract artifacts from S3 outside of the context of an executor - this is useful to avoid using the memory (page cache) of the executor process and also downloads from S3 without pre-generating expiring URIs (a bad idea inside Mesos.)

#### Singularity Executor Cleanup
While the Mesos agent has the ability to garbage collect tasks, the cleanup process maintains consistent state with other Singularity services (like the uploader and log watcher). This is a utility that is meant to run in each agent on CRON (e.g once per hour) and will clean the sandbox of finished or failed tasks that the Singualrity executor failed to clean.

#### Log Watcher
The log watcher is an experimental service that provides log tailing and streaming / forwarding of executor task log lines to third party services like *fluentd* or *logstash* to support real-time log viewing and searching.

#### OOM Killer
The **Out of Memory process Killer** is an experimental service that replaces the default memory limit checking supported by Mesos and **Linux Kernel CGROUPS**. The intention of the OOM Killer is to provide more consistent task notification when tasks are killed. It is also an attempt to workaround Linux Kernel issues with CGROUP OOMs and also prevents the CGROUP OOM killer from killing tasks due to page cache overages.


75 changes: 75 additions & 0 deletions Docs/about/requests-and-deploys.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Requests and Deploys

Singularity uses the concepts of a `Request` and `Deploy` to keep track of changes to a particular task or set of related tasks.

A `Request` can be thought of as the high level information for a single project or deployable item. For example, the name and owner's email for a single web service, or the name and cron schedule for a single scheduled job.

A `Deploy` can then be thought of as the specific configuration or version of the running code for that deployable item.

To illustrate the differences between and usages of these two concepts, we will walk through an example using Singularity to run a single web service.

## The `TestService` Example

### Creating a `Request`
You have a new web service called `TestService` that you want to run via Singularity. The first thing you need to do is create a `Request` for `TestService`. To create this request, you would `POST` json over http to the Singularity API ([`/api/requests`](../reference/api.html)) or create a request via the new request page in the Singularity UI. Example json:

```json
{
"id": "TestService",
"requestType": "SERVICE",
"owners": ["me@test.com"],
"instances": 1,
}
```

This `Request` now holds high level information about your web service. Singularity knows it is a long running `SERVICE` named `TestService` and that you want to run `1` instance of this service.

### Creating a `Deploy`

Now you want `TestService` to actually run. To do this, you need to create a `Deploy` for the `TestService` `Request`. This deploy will let Singularity know all of the information necessary to actually build and launch a task. This information includes things like the command to run, environment variables to set, the location of any artifacts to download, or the resources that should be allocated to a task. You would create this deploy by `POST`ing json to the Singularity API's deploy endpoint ([`/api/deploys`](../reference/api.html)), or creating a new deploy in the Singularity UI. Example json:

```json
{
"deploy": {
"requestId":"TestService",
"id":"5",
"resources": {
"cpus":1,
"memoryMb":128,
"diskMb": 1024,
"numPorts":2
},
"command":"java -Ddw.server.applicationConnectors[0].port=$PORT0 -Ddw.server.adminConnectors[0].port=$PORT1 -jar singularitytest-1.0-SNAPSHOT.jar server example.yml",
"uris": [
"https://github.com/HubSpot/singularity-test-service/releases/download/1.0/singularitytest-1.0-SNAPSHOT.jar",
"https://github.com/HubSpot/singularity-test-service/releases/download/1.0/example.yml"
],
"healthcheck": {
"uri": "/"
}
}
}
```

Posting this to Singularity creates a `PendingDeploy`. Singularity will then try to build and launch tasks using the information provided in the `Deploy` json (i.e. a task with those artifacts that needs 1 cpu, 128MB of memory, etc and is run with that command). Singularity will also know to only build and launch `1` of these tasks based on the number of instances you set in the `Request` earlier. Once that task is launched and Singularity determines it is healthy, the `Deploy` succeeds and the `Deploy` json you provided is now the `ActiveDeploy`.

### A New `Deploy`

Let's say some changes were made to the `TestService` code and you want to run the new version in Singularity, maybe with a bit more memory as well. You would create a new `Deploy` json with information about the new code to run, and updated memory value and `POST` that to the Singularity API ([`/api/deploys`](../reference/api.html)), or create a new deploy in the Singularity UI.

Singularity sees a new `Deploy` has been started and makes that the `PendingDeploy` for the `TestService` `Request`. Singularity will then try to build and launch `1` new task with the configuration specified in the new `PendingDeploy` (note that the task from the previous `Deploy` is still running and it's settings are unchanged). Once the task is launched and determined to be healthy, that new `Deploy` now becomes the `ActiveDeploy` and the task from the old deploy is shut down.

### Updating the Request

Now, for example, you notice `TestService` is getting more traffic than expected and you want to run `3` instances instead of just `1`. You can `PUT` a new number of instances over http to the Singularity API ([`/api/requests/request/{requestId}/scale`](../reference/api.html)), or click `Scale` on the Singularity UI request page and enter a new number of instances to update our number of instances. Example json:

```json
{
"id": "TestService",
"requestType": "SERVICE",
"owners": ["me@test.com"],
"instances": 3,
}
```

Singularity sees this update to the request and will try to build and launch `2` additional tasks so that there are `3` total instances running. The configuration/resources/command/etc for these new tasks will be determined from the current `ActiveDeploy`. In other words, there is no need to re-deploy `TestService` to scale the number of instances up or down. This information is saved separate from the more detailed information in the deploy. If you were to create another new deploy after scaling, Singularity would then build and launch `3` new tasks for that new deploy.
45 changes: 45 additions & 0 deletions Docs/about/ui.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
### Singularity UI
The *Singularity UI* is a single page web app that uses the Singularity API to present information about deployed items.

![Singularity UI status screen](../images/singularity_ui_status.png)

It allows searching and display of information about active, paused, cooled down, cleaning, and pending items.

![Singularity UI Deployed items Screen](../images/singularity_ui_requests.png)

Selecting a specific deployed item will display information about its active running tasks, all historical tasks from past deploys, a list of all executed deploys and a list of all updates to item settings (owners, instances, etc.)

![Singularity UI Single Deployed Item Screen](../images/singularity_ui_request.png)

Selecting an active task will display information about the task and provide access to application files and task logs.

![Singularity UI Active Task Screen](../images/singularity_ui_active_task.png)

Historical information about past task executions are also available and the Singularity UI allows users to directly retrieve the archived log files.

![Singularity UI Historical Task Screen](../images/singularity_ui_historical_task.png)

Clicking on a deploy id will show all tasks associated with that deploy and any failure information if the deploy has failed.

![Singularity UI Deploy](../images/singularity_ui_deploy.png)

A dashboard with the user's deployed and favored items is available.

![Singularity UI Dashboard](../images/singularity_ui_dashboard.png)

The Singularity UI contains functionality for performing certain actions on registered deployable items and their tasks:

- **Remove a deployed item**. All running tasks (e.g. the service instances if it is a web service) are terminated and the item is unregistered from Singularity. Historical logs are not removed and will be connected with the item if it is re-registered to Singularity at a later stage.
- **Pause a deployed item**. All running tasks are stopped but the item is not removed. Deploys of paused items are not possible. Users can un-pause the item to restart its tasks and be able to deploy.
- **Manually run** a *Scheduled Job* or *On-Demand* item
- **Kill a running task**. Tasks corresponding to instances of *web service* or *worker* items will be re-started instantly (if possible) - likely in another agent. Scheduled tasks will behave as if the task failed and may be rescheduled to run in the future depending on whether or not the job item has *numRetriesOnFailure* set.
- **Decommission a agent** which means that all tasks running in the specific agent will be migrated to other agents.
- **Decommission a *logical rack***, meaning that all agent hosts in the rack will be decommissioned. The *rackid* attribute can be used when running the Mesos agent process to specify which rack the agent belongs to. For example when running in AWS a rack could corresponds to the availability zone ( /usr/local/sbin/Mesos-agent --attributes=rackid:us-east-1e).

![Singularity UI Agents screen](../images/singularity_ui_agents.png)

![Singularity UI Racks screen](../images/singularity_ui_racks.png)

For all displayed information, access is provided to the API payloads from which views are created. This can greatly help debugging of deploys and can be used by developers that create tools on top of Singularity API.

![SingularityUI Task Status JSON output](../images/singularity_ui_json.png)
10 changes: 0 additions & 10 deletions Docs/adopters.md

This file was deleted.

98 changes: 0 additions & 98 deletions Docs/containers.md

This file was deleted.

94 changes: 0 additions & 94 deletions Docs/details.md

This file was deleted.

41 changes: 4 additions & 37 deletions Docs/development/basepom.md
Original file line number Diff line number Diff line change
@@ -25,13 +25,7 @@ settings are kept at default value.
#### `project.build.targetJdk`

Controls the JDK level to which the code is compiled. Singularity uses
*1.7* (JDK 7).

#### `project.jdk7.home`

To ensure a stable build independent of the JDK used, the singularity
build enforces using a JDK7 class library if a newer JDK (JDK8 or
newer) is used. See below for *[Compilation using JDK8 or newer]*.
*1.8* (JDK 8).

#### `basepom.check.skip-license`

@@ -61,10 +55,10 @@ suitable for new projects. It may be necessary to override some
dependency versions when converting legacy codes or when a third-party
library requires a fixed version of a dependency.

For Singularity, dropwizard 0.7.x enforces the following versions:
For Singularity, dropwizard 1.0.x enforces the following versions:

* `dep.jackson.version`, `dep.jackson.core.version`, `dep.jackson.databind.version` - Enforces *2.3.2*, because dw uses Jackson 2.3.x.
* `dep.jetty.version` - Enforces *9.0.7.v20131107* because dw uses Jetty 9.0.x.
* `dep.jackson.version`, `dep.jackson.core.version`, `dep.jackson.databind.version` - Enforces *2.7.9*, because dw uses Jackson 2.3.x.
* `dep.jetty.version` - Enforces *9.3.9.v20160517* because dw uses Jetty 9.3.x.

## Notes on dependencies

@@ -145,30 +139,3 @@ optional arguments:
-h, --help show this help message and exit
-v, --version show the application version and exit
```


## Compilation using JDK8 or newer

Singularity targets JDK7 but can be compiled with any JDK starting
with JDK7. Especially, the code base can be built with JDK8 (but will
generate code that can be run on JDK7). However, one of the biggest
gotchas here is that the compiler will use the Java 8 runtime library
(bundled with the JDK8) to build code that is supposed to run on JDK7
with the JDK7 runtime library. As the JDK8 runtime contains a newer
version of the runtime, the bindings in the java code will match JDK8
but may or may not match JDK7.

Therefore the Singularity build enforces that a JDK7 is installed on
the machine if JDK8 or newer is used to compile Singularity and its
home location must be set as an environment variable, `JAVA7_HOME`.

### Compiling with Travis CI

As Travis only supports a single installed JDK at a time; when the
build detects that it runs in the Travis environment (by checking the
`TRAVIS` environment variable), it will set the jdk7 home to be the
same as the current JDK. Therefore, the Travis build to JDK8 or newer
does actually build against the JDK8 runtime. As Travis is only used
as a smoke test to see whether the code base builds (and not for
continous delivery), this is not a problem.

61 changes: 61 additions & 0 deletions Docs/development/developing-with-docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
## Setup

For developing or testing out Singularity with Docker, you will need to install [docker](https://docs.docker.com/installation/) and [docker-compose](https://docs.docker.com/compose/#installation-and-set-up).

## Example cluster with Docker Compose

Run `docker-compose pull` first to get all of the needed images. *Note: This may take a few minutes*

Then simply run `docker-compose up` and it will start containers for...
- mesos master
- mesos agent (docker/mesos containerizers enabled)
- zookeeper
- Singularity
- [Baragon Service](https://github.com/HubSpot/Baragon) for load balancer management
- [Baragon Agent](https://github.com/HubSpot/Baragon) + Nginx as a load balancer

...and the following UIs will be available:
- Singularity UI => [http://localhost:7099/singularity](http://localhost:7099/singularity)
- Baragon UI => [http://localhost:8080/baragon/v2/ui](http://localhost:8080/baragon/v2/ui)

*if using [boot2docker](http://boot2docker.io/) or another vm, replace `localhost` with the ip of your vm*

The docker-compose example clsuter will always run off of the most recent release tag.

## Developing With Docker

### `dev`

In the root of this project is a `dev` wrapper script to make developing easier. It will run using images from the current snapshot version. You can do the following:

```
./dev pull # Get the latest images from docker hub
./dev start # start mesos clsuter in background
./dev attach # start mesos cluster and watch output in console
./dev restart # stop all containers and restart in background
./dev rebuild # stop all containers, rebuild Singularity and docker images, then start in background
./dev rebuild attach # rebuild and watch output when started
./dev remove # remove stopped containers
./dev stop # stop all containers
./dev kill # kill all containers (ungraceful term)
```

The output from the dev script will give you information about where the SingularityUI can be reached.

### Building new images

Singularity uses the docker-maven-plugin for building its images. There are a few images related to Singularity:

- `hubspot/singularityservice` - The Singularity scheduler itself
- `hubspot/singularityexecutoragent` - A mesos agent with java/logrotate and the custom SingularityExecutor installed
- `hubspot/singularitybase` - A base image for `singularityexecutoragent` that takes care of installing java/logrotate/etc on top of the mesos agent image (not built with maven plugin)

### Logs and Entering Containers

If you are not attached to the docker-compose process, you can check the output of your containers using `docker logs`. Start by checking `docker ps` to see what containers are running. Generally they will have names like `singularity_(process name)`. From there you can run `docker logs (name)` to see the stdout for that container.

Need to see more than stdout? You can also get a shell inside the container and poke around. Once you know the name of your container, you can run `docker exec -it (name) /bin/bash` to get am interactive shell inside the running container.

### Integration Tests

The SingularityServiceIntegrationTests module will run tests on a cluster consisting of a singularity scheduler, zk instance, mesos master, and three mesos agents. These will run during the `integration-test` lifecycle phase.
61 changes: 0 additions & 61 deletions Docs/development/docker.md

This file was deleted.

56 changes: 0 additions & 56 deletions Docs/development/lbs.md

This file was deleted.

56 changes: 56 additions & 0 deletions Docs/development/load-balancer-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Load Balancers

Singularity supports integration with a Load Balancer API (LB API) like [Baragon](https://github.com/HubSpot/Baragon) for the purpose of coordinating deploys and normal task operations.

## Requirements

- Provide a loadBalancerUri in the configuration yaml
- On request creation, set loadBalanced to true

## How it works

### Request

Singularity POSTs a LoadBalancerRequest (LBR) with an id (LBR ID) and tasks to add and/or remove from load balancers. Singularity expects the LB API to be asynchronous and store state about operations using the provided LBR ID, which the LB API relies on Singularity to supply and should not infer context from. Singularity expects the LB API to respond to all LBR (POST, GET, or DELETE) with a LoadBalancerResponse JSON object which has the following fields:

- LoadBalancerState (LBS) (one of FAILED, WAITING, SUCCESS, CANCELING, CANCELED, INVALID_REQUEST_NOOP)
- LoadBalancerRequestId (echos back the LBR ID)

Singularity makes a POST request to start a change of state (add or remove tasks from load balancers), but can handle any LBS response from any request.
Singularity makes a DELETE request to request a cancel of a previously requested POST.

### Edge cases

- Singularity may make multiple POST or DELETE requests to the same LBR ID, especially if the LB API responds with a failure status code or does not respond quickly enough (configurable in Singularity.)
- Singularity may make a DELETE request to an LBR ID which has already succeeded, in which case the LB API should return SUCCESS, which is the state of the LBR, not the response to the DELETE request.
- Singularity may make a POST request to add a task on a different LBR ID then the LBR ID it uses to remove that task.
- Singularity assumes that CANCELED = FAILED in terms of the final result of the LBR.
- Singularity may request removal of a task that is not actually load balanced or has already been removed from load balancers. In this case, the LB API should return a status code of SUCCESS to indicate that all is well.

## When it works

### Deploys

Once all of the tasks in a new deploy are healthy (have entered TASK_RUNNING and have passed healthchecks, IF present), Singularity posts a LBR to add the new deploy tasks and to remove all other tasks associated with this request. The LBR ID is requestId-deployId.

Singularity will poll (a GET on the LBR) until it receives either a SUCCESS, FAILED, or CANCELED state. While Singularity is polling, it may exceed the deadline allowed for the deploy, in which case it will make a DELETE request to the LBR, which signifies its desire to cancel the LB update. Regardless, Singularity will continue to poll until it receives one of the 3 final states and will update the deploy accordingly.

### New Tasks

When a new task starts for reasons other than a new deploy, such as to replace a failed task or due to scaling, Singularity will attempt to add that task to the LB API as part of its NewTaskChecker process. The NewTaskChecker ensures that new tasks eventually make it into TASK_RUNNING and are added to the LB API (if loadBalanced is true for that Request.) Once a new task has passed healthchecks (if necessary), Singularity will make a POST request to the LBR with the LBR ID taskId-ADD.

Singularity will poll the GET request for this LBR until it receives a final state. If it gets a CANCELED or FAILED it will kill the new task (the scheduler should request a new one to take its place.)

### Task failures

When a task fails, is lost, or killed, Singularity will add it to a queue to ensure that it is removed from the LB API if it was ever added in the first place.

Singularity will make a POST request using the LBR ID taskId-REMOVE. It will continue to make GET and POST requests to this LBR ID until it is successfully removed (SUCCESS.)

### Graceful task cleanup (decommissions, bounces)

Singularity will attempt to gracefully decommission tasks from the LB API when those tasks are being killed from within Singularity.

Singularity will make a POST request using the LBR ID taskId-REMOVE. It will continue to make GET and POST requests to this LBR ID until it is successfully removed (SUCCESS.)

Singularity will not kill the task until it is removed from the LB API.
14 changes: 0 additions & 14 deletions Docs/development/maven.md

This file was deleted.

99 changes: 44 additions & 55 deletions Docs/development/ui.md
Original file line number Diff line number Diff line change
@@ -4,7 +4,7 @@ This document is intended for people who want to work on SingularityUI independe

If you're here just looking for information on how to build SingularityUI, please note that the Maven build process is configured to automatically build and package the UI into SingularityService.

The compiled static files are placed in [`../SingularityService/target/generated-resources/static/`](../SingularityService/target/generated-resources/static/).
The compiled static files are placed in `../SingularityService/target/generated-resources/static/`.

## Contents

@@ -18,54 +18,46 @@ The compiled static files are placed in [`../SingularityService/target/generated

## Developer overview

SingularityUI is a static app that relies on SingularityService for its data.
SingularityUI is a single page webapp that relies on SingularityService for its data.

The app is built using Brunch (i.e. compiling CoffeeScript, etc), with Bower being used to manage its dependencies (e.g. jQuery, Backbone).
The app is built using Gulp (i.e. compiling ES6, etc), with npm being used to manage its dependencies (e.g. React, Bootstrap).

We recommend you familiarise yourself with the following if you haven't used them before:
We recommend you familiarize yourself with the following if you haven't used them before:

* [CoffeeScript](http://coffeescript.org/) is the language used for all logic.
* [ES6](http://es6-features.org/#Constants) is the language used for all logic.
* [Stylus](http://learnboost.github.io/stylus/) is used instead of CSS.
* [Handlebars](http://handlebarsjs.com/) is the templating library.
* [Backbone](http://backbonejs.org/) acts as the front-end framework.
* [React](https://facebook.github.io/react/) acts as the front-end framework and templating library.
* [Redux](http://redux.js.org/docs/introduction/) provides state managment.
* [Bootstrap](http://getbootstrap.com/) provides standard responsive components, and [react-bootstrap](https://react-bootstrap.github.io/) provides react versions of said components.

## Set-up

You will need to have the following:

* [nodejs](http://nodejs.org/) with [npm](https://www.npmjs.org/).
* [Brunch](http://brunch.io) & [Bower](http://bower.io), installable via `npm install -g brunch bower`.
* [gulp](http://gulpjs.com/), installable via `npm install --global gulp-cli`.

Below are some commands you might find useful when working on SingularityUI:

```bash
# cd Singularity/SingularityUI
# Install NPM deps, Bower deps, and do one production build
# Install NPM deps
npm install

# Install just the Bower dependencies
bower install
# Build the app
gulp build

# Remove dependencies (reinstall them using 'npm install')
rm -rf node_modules bower_components

# Build SingularityUI. '--production' (optional) optimises the output files
brunch build [--production]

# Watch the project and build it when there are changes
brunch watch

# Same as above, but also start an HTTP server that serves the static files. '-p <number>' (optional) specifies what port it runs on
brunch watch --server [-p 3333]
# Serve the app locally at localhost:3334 and rebuild whenever files are changed.
gulp serve
```

When you first start, run `npm install` to download all the dependencies. Once that's done, you're ready to roll!

## Developing locally

So far you have SingularityUI with all its dependencies installed. You're able to run SingularityUI and have it served using `brunch watch --server`. What we need now is a running SingularityService to pull information from.
So far you have SingularityUI with all its dependencies installed. You're able to run SingularityUI and have it served using `gulp serve`. What we need now is a running SingularityService to pull information from.

If you don't have one already (e.g. your team might be running one you can use), you can easily run your own via [Docker](docker.md). If running via docker, it is helpful to add the host that docker is running on to your `/etc/hosts` file as `docker` so we can reference it by hostname. If using `boot2docker` this is your `boot2docker ip`. We will reference the hostname as `docker` in the examples below.
If you don't have one already (e.g. your team might be running one you can use), you can easily run your own via [Docker](developing-with-docker.md). If running via docker, it is helpful to add the host that docker is running on to your `/etc/hosts` file as `docker` so we can reference it by hostname. If using `boot2docker` this is your `boot2docker ip`. We will reference the hostname as `docker` in the examples below.


Once the cluster is up and running, the API's root is available at [`http://docker/singularity/api`](http://docker/singularity/api) by default.
@@ -95,8 +87,8 @@ name: SingularityUI

routes:

# Redirect static assets to local brunch server (assuming it is on port 3333)
".*/static/.*": "http://localhost:3333/"
# Redirect static assets to local server (assuming it is on port 3334)
".*/static/.*": "http://localhost:3334/"

# Redirect any API calls to the QA Singularity service (the slash after the domain is necessary)
".*/api/.*": "http://docker/"
@@ -124,11 +116,11 @@ Assuming you used the second command, you can now access SingularityUI by going
If you're confused as to what's going on here, all your requests are being processed by vee so that:

* Requests to `localhost:4001/singularity/api` are sent to the server at `docker`.
* All other requests, including static files, are sent to the Brunch server running locally.
* All other requests, including static files, are sent to the gulp server running locally.

### Connecting to the API

So far you have SingularityUI being served by Brunch, and SingularityService running somewhere. If you have a proxy like vee running too, please replace the ports/URIs that follow with the ones you're using for the proxy.
So far you have SingularityUI being served by gulp, and SingularityService running somewhere. If you have a proxy like vee running too, please replace the ports/URIs that follow with the ones you're using for the proxy.

Open up SingularityUI in your browser by going to [`http://localhost:3333`](http://localhost:3333).

@@ -137,10 +129,10 @@ You'll be prompted to input an API root. This is the service that SingularityUI
You can change the value of this at any point by typing the following in your JS console:

```javascript
localStorage.set('apiRootOverride', 'http://docker/singularity/api')
localStorage.setItem('apiRootOverride', 'http://docker/singularity/api')
```

And there you go! You should at this point have SingularityUI running in your browser with it connected to SingularityService. Just let Brunch watch and compile your files as you work and try it out in your browser.
And there you go! You should at this point have SingularityUI running in your browser with it connected to SingularityService. Just let gulp watch and compile your files as you work and try it out in your browser.

While we're on the topic of localStorage overrides, another useful one you can use disables the auto-refresh which will stop the page re-rendering so you can properly inspect the DOM:

@@ -150,42 +142,39 @@ localStorage.setItem('suppressRefresh', true)

## Code structure

As mentioned before, SingularityUI uses [Backbone](http://backbonejs.org/). If you're not familiar with how it does things, please look into it and familiarise yourself with Views, Models, Collections, and the event-based interaction between them all.
As mentioned before, SingularityUI uses [React](https://facebook.github.io/react/) and [Redux](http://redux.js.org/docs/introduction/). If you're not familiar with how they do things, please look into them and familiarize yourself with React's lifecycle and the Redux store and dispatch.

What follows is a run-down of how things work in Singularity, using the Webhooks page as an example.

What follows is a run-down of how things work in Singularity, using the Slaves page as an example.
First you request `/singularity/webhooks`. This triggers [our router](../SingularityUI/app/router.jsx) to fire up [`Webhooks`](../SingularityUI/app/components/webhooks/webhooks.jsx).

First you request `/singularity/slaves`. This triggers [our router](../SingularityUI/app/router.coffee) to fire up [`SlavesController`](../SingularityUI/app/controllers/Slaves.coffee).
When the Webhooks component is called, the initial action occurs in the `connect()` function call at the bottom of the page.

The controller bootstraps the things we need for the requested page. First, it creates 3 collections--one for each API endpoint we're going to hit.
First, `connect()` calls `mapStateToProps()`. Though it is called with the redux store and the component's own props, the Webhooks page doesn't have props passed into it. This returns props that are obtained from the redux store, such as API calls.

Afterwards, it creates 3 instances of [`SimpleSubview`](../SingularityUI/app/views/simpleSubview.coffee) and gives each one a template to render and a collection to use for data.
Then `connect()` calles `mapDispatchToProps()`. This is called with the redux dispatch and the component's own props, and returns props are functions which can perform actions.

`SimpleSubview` is a reusable class that renders its template in response to change events from the collection you gave it. For the slaves page, when one of the collections receives a response from the service `SimpleSubview` renders the required table and nothing more, therefore giving it ownership over one component of the page, with the collection telling it when to render it.
`connect()` combines the outputs of `mapStateToProps()` and `mapDispatchToProps()` into one object and passes that in as props to the component the result of `connect()` is called with.

The controller also creates a [`SlavesView`](../SingularityUI/app/views/slaves.coffee) and assigns it as its primary view using `Controller.setView(View)`. This view is special because it represents the entire page being rendered. We feed it with references to our sub-views so that it can embed them into itself.
The result of `connect()` is called with the [`rootComponent`](../SingularityUI/app/rootComponent.jsx). `rootComponent` sets up automatically refreshing the page and can display a 404 page if the component sets a `notFound` prop. The passed-in `refresh()` function fetches the page data from the API (in some cases in which initial data needs to not be fetched again, an `initialize()` function is also passed in to perform only initial calls). While the `rootComponent` is waiting for this to finish, the loading animation is displayed.

Finally, we tell the app to render this main view of ours and to start all of the collection fetches, which will eventually trigger the subview renders when completed.
Finally, once the API call does complete, `rootComponent` takes the props provided by `connect()` and passes them into the Webhooks component itself, which will render the table of webhooks that you see.

Everything else is standard [Backbone](http://backbonejs.org/)-structured code. Please refer to the official docs for how to do things like respond to UI events, etc.
Everything else is standard [React](https://facebook.github.io/react/)-structured code. Please refer to the official docs for how to do things like respond to UI events, etc.

To summarise:
* A controller bootstraps everything (collections, models, views) for a page.
* If there is more than one collection/model involved, we split the view up into subviews in order to keep things modular and easy to change/render. A primary view glues everything together.
* If there is one/no collection/model being used, we just use the primary view for everything.
* Use Backbone conventions wherever possible. Try to rely on events, not callbacks.
To summarize:
* React Router bootstraps everything for the page.
* All API calls necessary for rendering the page are performed in the primary component's `refresh()` or `initialize()` function.
* Use React conventions wherever possible. Try to rely on props, not component state.

### Useful links

There are some libraries/classes in SingularityUI which you should be aware of if you plan on working on it:

* [Application](../SingularityUI/app/application.coffee) is responsible for a lot of global behaviour, including error-handling.
* [Router](../SingularityUI/app/router.coffee) points requests to their respective controllers.
* [Utils](../SingularityUI/app/utils.coffee) contains a bunch of reusable static functions.
* The base classes. The various components extend the following:
* [Model](../SingularityUI/app/models/model.coffee)
* [Collection](../SingularityUI/app/collections/collection.coffee) & [PaginableCollection](../SingularityUI/app/collections/PaginableCollection.coffee)
* [View](../SingularityUI/app/views/view.coffee)
* [Controller](../SingularityUI/app/controllers/Controller.coffee)
* Reusable components:
* [SimpleSubview](../SingularityUI/app/views/simpleSubview.coffee)
* [ExpandableTableSubview](../SingularityUI/app/views/expandableTableSubview.coffee)
* [Initialize](../SingularityUI/app/initialize.jsx) sets up the Redux store, user settings, and the router. It also prompts the user for API root if necessary.
* [Router](../SingularityUI/app/router.jsx) points requests to their respective components.
* [Application](../SingularityUI/app/components/common/Application.jsx) provides the naviagtion header and global search functionality.
* [Base](../SingularityUI/app/actions/api/base.es6) is responsible for API call behavior, including error-handling.
* [Utils](../SingularityUI/app/utils.jsx) contains a bunch of reusable static functions.
* [UITable](../SingularityUI/app/components/common/table/UITable.jsx) is a comprehensive table component that provides sorting, pagination and other utilities. Data is provided by child [Column](../SingularityUI/app/components/common/table/Column.jsx) components.
* [FormModal](../SingularityUI/app/components/common/modal/FormModal.jsx) provides a base for most of Singularity's modals, such as the Run Now and Pause Request modals.
202 changes: 202 additions & 0 deletions Docs/features/auth.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
# Singularity Webhook Auth

Singularity contains a few options for authentication andautorization. The most reliable of these is webhook auth. The setup for Singularity's webhook auth is based on the [webhook token auth](https://kubernetes.io/docs/admin/authentication/#webhook-token-authentication) provided in kubernetes.

- [Group Only Auth](#group-only-auth)
- [Groups and Scopes Auth](#ggroups-and-scopes-auth)

### Group Only Auth

When webhook auth is configured, Singularity will look for an `Authorization` header on each api call. Singularity will then make a `GET` call to the configured `webhookAuth.authVerificationUrl` with the same `Authorization` header send. The system expects a response which describes the user as JSON in a format like:

```json
{
"user": {
"id": "required-user-id",
"name": "Optional User Name",
"email": "useremail@example.com",
"groups": [
"engineering",
"singularity-admin"
],
"authenticated": true
},
"error": "Optional exception message"
}
```

The authenticating system should return `"authenticated": true` for the user if the user was successfully authenticated, as well as any groups the user is part of (for later use with authorization). If there was an exception/error while authenticating, Singularity can display that to the user if it is returned in the `error` field.

### Configuring Webhook Authentication and Authorization Groups

To enable webhook auth in Singularity, there are two main sections of the configuration yaml to update. An example is shown below with explanations

```yaml
auth:
enabled: true
authMode: GROUPS
authenticators:
- WEBHOOK
adminGroups:
- singularity-admin
requiredGroups:
- engineering
jitaGroups:
- engineering-jita
defaultReadOnlyGroups:
- engineering-ro
globalReadOnlyGroups:
- engineering-leaders-ro

webhookAuth:
authVerificationUrl: https://my-auth-domain.com/singularity
```
***Verification URL***
The `webhookAuth.authVerificationUrl` will be sent a `GET` with the `Authorization` header provided as described in the section above

***Auth Config***

- `enabled` - defaults to `false`. If set to `true`, auth will be enforced
- `authenticators` - a list of authentication types to use. For webhook auth, this is simply `- WEBHOOK`
- `adminGroups` - If a user is part of these groups they are allowed admin actions in Singularity (actions on agents, view all requests, etc)
- `requiredGroups` - A user must be part of one at least of these groups in order to access Singularity
- `jitaGroups` - Groups that can be allowed access to any SingularityRequest (but not admin actions)
- `defaultReadOnlyGroups` - If read only groups are not set for a SingularityRequest, these groups are used for read access
- `globalReadOnlyGroups` - These groups are allowed read access to requests regardless of what groups are set on the SingularityRequest

***SingularityRequest Fields***

You can configure access to individual SingularityRequests using the `group`, `readWriteGroups`, and `readOnlyGroups` fields.

- `group` - The primary group for this request. A user in this group is allowed read/write access
- `readWriteGroups` - alternate groups that are also allowed read-write access
- `readOnlyGroups` - alternative groups that are only allowed read access

## Groups and Scopes Auth

A newer update to Singularity auth in version 1.3.0 contains some additional options inside the `auth` section of your config yaml:

```
auth:
enabled: true
authMode: GROUPS_SCOPES
```

This enables a more granular mode of checking auth, verifying scopes for groups as well as scopes on users. In order to use this mode of auth, you need a slightly different response format in your webhook auth. You can ue one of two options:

```
auth:
authResponseParser: RAW
```

In this format, the response from your webhook auth url will conform to the shape of the `SingularityUser` object like:

```
{
"id": "id",
"name": "name",
"email": "user@test.com",
"groups": ["group1", "group2"],
"scopes": ["SINGULARITY_READONLY"],
"authenticated": true
}
```

or with `authResponseParser` set to `WRAPPED` you can conform to the shape of the `SingularityUserPermissionsResponse` object:

```
{
"user": { # present if no error
"id": "id",
"name": "name",
"email": "user@test.com",
"groups": ["group1", "group2"],
"scopes": ["SINGULARITY_READONLY"],
"authenticated": true
},
"error": "" # present if auth failed
}
```

This would grant this user read-only privileges for only requests belonging to the groups group1 and group2.

### Scopes

You can customize the strings used to specify scopes. Defaults are shown below:

```
auth:
scopes:
admin:
- SINGULARITY_ADMIN
write:
- SINGULARITY_WRITE
read:
- SINGULARITY_READONLY
```

### Default and Global Groups

Several other parameters are also available to allow certain permissions globally to all users:

`defaultReadOnlyGroups` - Users in these groups are allowed read access to a request as long as 1) the user has the readonly scopes and 2) no readOnlyGroups are specified on the request json. readOnlyGroups on the request serve to override the defaultReadOnlyGroups
- `globalReadOnlyGroups` - These groups are allowed readonly access to all requests in any group assuming they have the readonly scope. Useful for things like bots performing automation across all things in Singularity
- `globalReadWriteGroups` - Similar to `globalReadOnlyGroups` but for the read/write permissions and scopes
- `jitaGroups` - These groups will be allowed all access, but any action taken that the user would normally not be able to perform will be logged at WARN level. Note, the user still must have the appropriate scope to perform actions


### Admin Actions

All webhook configurations and actions on agents require admin level credentials


### Example Config

```
auth:
enabled: true
authMode: GROUPS_SCOPES
authResponseParser: RAW
authenticators:
- WEBHOOK
jitaGroups:
- perm-jita-singularity-rw
- perm-jita-singularity-ro
defaultReadOnlyGroups:
- sgy-read
globalReadOnlyGroups:
- perm-singularity-ro
globalReadWriteGroups:
- perm-singularity-rw
webhookAuth:
authVerificationUrl: https://something.com/user/permissions
```
## Token Auth
Singularity also supports token based authentication by adding `TOKEN` to the lis of configured authenticators in `auth.authenticators` in your config yaml.
You can create a token as an admin via an api call to:
`POST {appRoot}/auth/token` with a body like:
```
{
"token": "new-token",
"user": {
"id": "id",
"name": "name",
"email": "user@test.com",
"groups": ["group1", "group2"],
"scopes": ["SINGULARITY_READONLY"],
"authenticated": true
}
}
```
You can then utilize this token by including a header of `Authorization: Token new-token` on each request to the Singularity API.
You can clear all tokens for a given user with a call like `DELETE {appRoot}/auth/{user.name}`
137 changes: 137 additions & 0 deletions Docs/features/canary-deploys.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
## Canary Deploys

As of `1.5.0`, a new implementation of canary deploys replaces the old incremental deploys in Singularity. Some initial notes on the update:

- Previous behavior will be preserved if using `incrementalDeploys` and not specifying a `canarySettings` object in your SingularityDeploy json
- Default behavior remains unchanged for deploys, settings can be explicitly enabled per deploy

### Starting a Canary Deploy

To run a canary deploy, simply specify a `canarySettings` object in your SingularityDeploy json (defaults with canary disabled are shown in the json below):

```
{
"deploy": {
"canarySettings": {
"enableCanaryDeploy":false,
"instanceGroupSize":1,
"acceptanceMode":"NONE",
"waitMillisBetweenGroups":0,
"allowedTasksFailuresPerGroup":0,
"canaryCycleCount":3
},
...
}
}
```

- `acceptanceMode` - defaults to `NONE`
- `NONE` - No additional checks are run against a deploy
- `TIMED` - Wait a set amount of time between deploy steps. Relevant if `enableCanaryDeploy` is `true`
- `CHECKS` - Run all bound implementations of `DeployAcceptanceHook` (see more info below) after each deploy step. Applies to all tasks at once if `enableCanaryDeploy` is `false` and will run on each individual canary step if `enableCanaryDeploy` is `true`
- `enableCanaryDeploy` - Defaults to `false`. If `true` enables a step-wise deploy and use of the other canary settings fields (see below for more details). If `false`, performs a normal atomic deploy where all new instances are spun up and all old ones taken down ones new are healthy.
- _Load balancer note_: If `false` will add all new and remove all old instances in the LB during a deploy in one atomic operation. If `true` new instances will be added to the load balancer alongside old ones, and old ones cleaned after the deploy has fully succeeded.
- `instanceGroupSize` - The number of instances to start per canary group. e.g. if set to `1`, the canary deploy will start `1` instance -> health/acceptance check -> spin down `1` old instance -> start `1` new -> etc
- `waitMillisBetweenGroups` - If `acceptanceMode` is set to `TIMED`, wait this long between groups of new instances of size `instanceGroupSize` (e.g. launch `1`, wait 10 minutes, launch `1` and so on)
- `canaryCycleCount` - Run this many rounds of canary steps before skipping to the full request scale. e.g. if deploying a request of scale `10` and `canaryCycleCount` is set to `3`, 3 instances will be launched one at a time, then the remaining 7 will be launched all at once in a final step
- `allowedTasksFailuresPerGroup` - Replaces the global configuration for allowed task failures in a deploy. For each canary deploy step, this many tasks are allowed to fail and retry before the deploy is considered to have failed

### Custom Deploy Hooks

Applies when `acceptanceMode` is set to `CHECKS`. For those extending SingularityService, you can bind any additional number of implementations of `DeployAcceptanceHook` in guice modules like:

```
Multibinder
.newSetBinder(binder, DeployAcceptanceHook.class)
.addBinding()
.to(MyAcceptanceHook.class);
```

Each implementation of an acceptance hook should look like:

```java
public class MyAcceptanceHook implements DeployAcceptanceHook {

@Inject
public MyAcceptanceHook() {}

@Override
public boolean isFailOnUncaughtException() {
// If `true` an uncaught exception fails a deploy,
// if `false` the deploy can still succeed. Useful for testing
return false;
}

@Override
public String getName() {
// Should be unique per hook
return "My-Test-Hook";
}

@Override
public DeployAcceptanceResult getAcceptanceResult(
SingularityRequest request, // request object. Reflects any updates made during deploy
SingularityDeploy deploy, // Full deploy json object
SingularityPendingDeploy pendingDeploy, // Pending deploy state
Collection<SingularityTaskId> activeTasksForPendingDeploy, // Tasks that are part of the current pending deploy
Collection<SingularityTaskId> inactiveTasksForPendingDeploy, // Tasks from the pending deploy which may have shut down or crashed
Collection<SingularityTaskId> otherActiveTasksForRequest // Tasks from other deploys (e.g. the previous active one)
) {
// Do stuff here
return new DeployAcceptanceResult(
DeployAcceptanceState.SUCCEEDED,
"Test hook passed"
);
}
}
```

The `canarySettings` object will change the state/time during which `getAcceptanceResult` is called:
- If `enableCanaryDeploy` is set to `false`, the state of tasks will be:
- All new tasks in `activeTasksForPendingDeploy` are launched and health checked
- If the deploy is load balanced, tasks in `otherActiveTasksForRequest` are no longer in the load balancer. Only the new deploy tasks in `activeTasksForPendingDeploy` are active in the load balancer
- *Note* - Singularity will re-add the old tasks back to the load balancer if deploy acceptance checks fail
- If `enableCanaryDeploy` is set to `true`, `getAcceptanceResult` is called after each deploy step
- `activeTasksForPendingDeploy` contains _all_ active tasks launched so far, not just those for the current canary step. These tasks are in running state and have passed initial health checks
- If load balanced, all tasks in `activeTasksForPendingDeploy` as well as all in `otherActiveTasksForRequest` are active in the load balancer at once

#### Available Data For Hooks

Since hooks are compiled into the Singularity jar and exstentions of SingularityService, all classes available in guice are also available to the hook. In particular:
- `TaskManager` - On the leader most calls here will be in memory lookups and can be used to fetch the full data for a task (ports, environment, etc)
- `AsyncHttpClient` - Singularity's default http client
- `@Singularity ObjectMapper` - pre-configured object mapper for Singularity objects

### Incremental Deploys (Deprecated)

_Deprecated_: behavior will be preserved, but prefer using the newer `canarySettings` documented above. Incremental deploys are essentially equivalent to using an `acceptanceMode` of `TIMED`.

As of `0.5.0` Singularity supports an incremental deploy for finer-grained control when rolling out new changes. This deploy is enabled via a few extra fields on the `SingularityDeploy` object when starting a deploy:

- `deployInstanceCountPerStep`: Deploy this many instances at a time until the total instance count for the request is reached is reached (`Optional<Integer>`, default is all instances at once)
- `deployStepWaitTimeMs`: Wait this many milliseconds between deploy steps before continuing to deploy the next `deployInstanceCountPerStep` instances (`Optional<Integer>`, default is 0, i.e. continue immediately)
- `autoAdvanceDeploySteps`: automatically advance to the next target instance count after `deployStepWaitTimeMs` seconds (`Optional<Boolean>`, defaults to `true`). If this is `false`, then manual confirmation will be needed to move to the next target instance count. This can be done via the ui.


#### Example

`TestService` is currently running `3` instances. During the next deploy, you want to replace only `1` of these instances at a time and have Singularity wait at least a minute after deploying one so you can verify that everything works as expected. The following fields can be added to the deploy json to accomplish this:

```
deployInstanceCountPerStep: 1
deployStepWaitTimeMs: 60000
autoAdvanceDeploySteps: true
```

When the deploy starts, Singularity will start `1` (`deployInstanceCountPerStep`) instance from the new deploy (The `3` old instances will still be running). Once the new task is determined to be healthy a few things happen:

- Singularity will add the instance from the new deploy to the load balancer (if applicable)
- Singularity will shut down `1` (`deployInstanceCountPerStep`) of the instances from the old deploy after removing it from the load balancer (if applicable)
- Singularity will start counting down the `60000 ms` until it launches the next `deployInstanceCountPerStep` instances

Once the `deployStepWaitTimeMs` of wait time has elapsed, Singularity will start this process again, launching a second task for the new deploy, waiting until it is healthy, then shutting down a task from the old deploy. This will continue until the deploy fails, the deploy is cancelled, or all instances are part of the new deploy and it succeeds.

A few more things to note about the incremental deploy process:
- If the deploy fails or is cancelled, Singularity replaces any missing instances from the old deploy and makes sure they are healthy before shutting down active/healthy instances from the new deploy. (i.e. you will never be under capacity)
- At any time, it is possible to advance the deploy to another target instance count via the UI or API. In other words, you can skip the remaining `deployStepWaitTimeMs`, skip steps of the deploy, or even decrease the instance count to roll back a step.

26 changes: 26 additions & 0 deletions Docs/features/custom-ports.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
### Choosing Custom Ports

As of release `0.4.10`, you can specify an index for the dynamically allocated port Singularity should use when healthchecking and when adding the service to the load balancer. Previously, Singularity would always use the first dynamically allocated port.

To change the healthcheck port, simply add:

```json
"healthcheck": {
"portIndex": 1 # or another integer
}
```

to your `SingularityDeploy` object. This will tell Singularity to use the dynamically allocated port at index 1 (i.e. second allocated port) when performing a health check. Alternatively, you can specify a specif port the use for the healthcheck with an option like:
```json
"healthcheck": {
"portNumberx": 80 # or another integer
}
```

Similarly, you can also specify the port index to use for the load balancer by specifying:

```json
"loadBalancerPortIndex": 1 # or another integer
```

in your `SingularityDeploy` object. Keep in mind the dynamically all ocated ports will be available to your process as environment variables in the format `PORT{index}` (e.g. `PORT0=32091` for a first dynamically allocated port of 32091)
26 changes: 26 additions & 0 deletions Docs/features/disaster-detection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
### Disaster Detection

Singularity can be configured to automatically detect 'disaster' scenarios based on a number of indicators and can react in such a way as too limit further stress on the cluster while it is recovering.

Disaster detection can be enabled by adding `enabled: true` to the `disasterDetection` portion of you Singularity config yaml. There are a number of other fields, explained below, that can control the behavior and thresholds for disaster detection.

- `enabled` - set to `true` to start running disaster detection and collecting stats abotu lost tasks, lost agents, and task lag
- `runEveryMillis` - Run the poller on this interval (defaults to every `30` seconds)

**Task Lag**
- `checkLateTasks` - Use late tasks (aka task lag) as a metric for determining if a disaster is in progress (defaults to `true`)
- `criticalAvgTaskLagMillis` - If the average time past due for all pending tasks is greater than this, a disaster is in progress (likely due to a severe lack of resources in the cluster), defaults to `4 minutes (240000)`
- `criticalOverdueTaskPortion` - If the portion of tasks taht are considered overdue is this fraction of the total running tasks in the cluster, a disaster is in progress. Defaults to `0.1` or one tenth of tasks are pending and overdue.

**Lost Agents**
- `checkLostAgents` - Use lost agents as a metric for determining if a disaster is in progress. Disaster detection only counts agents that have transitioned from `ACTIVE` to `DEAD`. Agents that are gracefully decommissioned and removed won't trigger a disaster. (defaults to `true`)
- `criticalLostAgentPortion` - If, during the past run of the poller, this portion of the total _active_ agents in the clsuter have transitioned from `ACTIVE` to `DEAD` a disaster is in progress. Defaults to `0.2` or one fifth of the agents in the cluster

**Lost Tasks**
- `checkLostTasks` - Use lost tasks as a metric for determining if a disaster is in progress (defaults to `true`)
- `lostTaskReasons` - Consider status updates matching these reasons towards the lost tasks for disaster detection. This is a list of mesos `Reason` enum values (`org.apache.mesos.Protos.TaskStatus.Reason`) and defaults to `[REASON_INVALID_OFFERS, REASON_AGENT_UNKNOWN, REASON_AGENT_REMOVED, REASON_AGENT_RESTARTED, REASON_MASTER_DISCONNECTED]`
- `criticalLostTaskPortion` - If this portion of the total _active_ tasks in the cluster have transitioned to `LOST` for one of the above reasons in the last run of the poller, a disaster is in progress. Defaults to `0.2`

### Disabled Actions

Singularity also supports globally disabling certain actions, which can aid in maintenance or cluster recovery after an outage. These can be added and removed manually on the `/disasters` UI page. You can also specify a list of actions that can automatically be disabled when a disaster is detected by specifying `disableActionsOnDisaster` in the `disasterDetection` portion of you Singularity config yaml. When a disaster is detected, any action specified will automatically be disabled, and will be enabled again when the disaster has cleared. If during runtime you want to stop the disaster detector from disabling actions (for example, it keeps detecting a false positive), you can disable the automated actions in the UI or `POST` to the `/api/disasters/disable` endpoint.
135 changes: 135 additions & 0 deletions Docs/features/expiring-actions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
### Expiring Actions

Released in `0.4.9`

### Action expiration + additional action metadata
Some actions in Singularity now have the concept of expiration (as in, giving up after a certain period of time). Corresponding endpoints have been updated to accept more information about action expiration and action metadata.

#### Rack and agent operations
- `/racks/rack/{rackId}/decommission`
- `/racks/rack/{rackId}/freeze`
- `/racks/rack/{rackId}/activate`
- `/agents/agent/{agentId}/decommission`
- `/agents/agent/{agentId}/freeze`
- `/agents/agent/{agentId}/activate`

These URLs accept a JSON object with this format:

| name | type | required | description |
|------|------|----------|-------------|
| message | string | optional | A message to show to users about why this action was taken |

#### Request bounce
- `/requests/request/{requestId}/bounce`

This URL accepts a JSON object with this format:

| name | type | required | description |
|------|------|----------|-------------|
| skipHealthchecks | boolean | optional | Instruct replacement tasks for this bounce only to skip healthchecks |
| durationMillis | long | optional | The number of milliseconds to wait before reversing the effects of this action (letting it expire) |
| message | string | optional | A message to show to users about why this action was taken |
| actionId | string | optional | An id to associate with this action for metadata purposes |
| incremental | boolean | optional | If present and set to true, old tasks will be killed as soon as replacement tasks are available, instead of waiting for all replacement tasks to be healthy |

#### Scheduling a request to run immediately
- `/requests/request/{requestId}/run`

This URL accepts a JSON object with this format:

| name | type | required | description |
|------|------|----------|-------------|
| runId | string | optional | An id to associate with this request which will be associated with the corresponding launched tasks |
| skipHealthchecks | boolean | optional | If set to true, healthchecks will be skipped for this task run |
| commandLineArgs | Array[string] | optional | Command line arguments to be passed to the task |
| message | string | optional | A message to show to users about why this action was taken |

#### Unpausing a request
- `/requests/request/{requestId}/unpause`

This URL accepts a JSON object with this format:

| name | type | required | description |
|------|------|----------|-------------|
| skipHealthchecks | boolean | optional | If set to true, instructs new tasks that are scheduled immediately while unpausing to skip healthchecks |
| message | string | optional | A message to show to users about why this action was taken |
| actionId | string | optional | An id to associate with this action for metadata purposes |

#### Exit request cooldown
- `/requests/request/{requestId}/exit-cooldown`

This URL accepts a JSON object with this format:

| name | type | required | description |
|------|------|----------|-------------|
| skipHealthchecks | boolean | optional | Instruct new tasks that are scheduled immediately while executing cooldown to skip healthchecks |
| message | string | optional | A message to show to users about why this action was taken |
| actionId | string | optional | An id to associate with this action for metadata purposes |

#### Deleting a request
- `/requests/request/{requestId}`

This URL accepts a JSON object with this format:

| name | type | required | description |
|------|------|----------|-------------|
| message | string | optional | A message to show to users about why this action was taken |
| actionId | string | optional | An id to associate with this action for metadata purposes |

#### Killing a task
- `/tasks/task/{taskId}`

This URL accepts a JSON object with this format:

| name | type | required | description |
|------|------|----------|-------------|
| waitForReplacementTask | boolean | optional | If set to true, treats this task kill as a bounce - launching another task and waiting for it to become healthy |
| override | boolean | optional | If set to true, instructs the executor to attempt to immediately kill the task, rather than waiting gracefully |
| message | string | optional | A message to show to users about why this action was taken |
| actionId | string | optional | An id to associate with this action for metadata purposes |

#### Scaling requests
- `/requests/request/{requestId}/scale` (previously `/requests/request/{requestId}/instances`)

This URL accepts a JSON object with this format:

| name | type | required | description |
|------|------|----------|-------------|
| skipHealthchecks | boolean | optional | If set to true, healthchecks will be skipped while scaling this request (only) |
| durationMillis | long | optional | The number of milliseconds to wait before reversing the effects of this action (letting it expire) |
| message | string | optional | A message to show to users about why this action was taken |
| actionId | string | optional | An id to associate with this action for metadata purposes |
| instances | int | optional | The number of instances to scale to |

#### Pausing a request
- `/requests/request/{requestId}/pause`

This URL accepts a JSON object with this format:

| name | type | required | description |
|------|------|----------|-------------|
| durationMillis | long | optional | The number of milliseconds to wait before reversing the effects of this action (letting it expire) |
| killTasks | boolean | optional | If set to false, tasks will be allowed to finish instead of killed immediately |
| message | string | optional | A message to show to users about why this action was taken |
| actionId | string | optional | An id to associate with this action for metadata purposes |

**NOTE:** The `user` field has been removed from this object.

#### Disabling request healthchecks
- `/requests/request/{requestId}/skip-healthchecks`

This URL accepts a JSON object with this format:

| name | type | required | description |
|------|------|----------|-------------|
| skipHealthchecks | boolean | optional | If set to true, healthchecks will be skipped for all tasks for this request until reversed |
| durationMillis | long | optional | The number of milliseconds to wait before reversing the effects of this action (letting it expire) |
| message | string | optional | A message to show to users about why this action was taken |
| actionId | string | optional | An id to associate with this action for metadata purposes |

### New endpoints for cancelling actions
These endpoints were added in order to support cancelling certain actions:
- `DELETE /requests/request/{requestId}/scale` -- Cancel an expiring scale
- `DELETE /requests/request/{requestId}/skip-healthchecks` -- Cancel an expiring skip healthchecks override
- `DELETE /request/{requestId}/pause` -- Cancel (unpause) an expiring pause
- `DELETE /request/{requestId}/bounce` -- Cancel a bounce
47 changes: 47 additions & 0 deletions Docs/features/mesos-1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
Upgrading Singularity to use Mesos `1.x`
========================================

Starting with release version `0.18.0`, Singularity will use the mesos http api to communicate with the mesos master via the [mesos rx-java library](https://github.com/mesosphere/mesos-rxjava). Documentation on upgrading mesos itself can be found on the [mesos website](http://mesos.apache.org/documentation/latest/upgrades/). In order to upgrade Singularity there are a few things to keep in mind beyond the scope of a normal release:

### Mesos Version Selection

As of mesos `1.2`, the mesos master will no longer accept registrations from mesos agents running `0.x.x` versions. As a result, we have chosen to release Singularity `0.18.0` built against mesos `1.1.2`, allowing for a smoother upgrade path for users.

For future `1.x` version upgrades, less overall change should be needed due to the fact that we are now using the http api and do not depend on native libraires being installed.

### Singularity Executor Updates

If you are running the custom Singularity Executor, we recomend updating mesos on your agents _before_ updating the Singularity Executor. We have found in our testing that the older executor (built against `0.x`) can run smoothly on mesos `1.1`, but the inverse is not always true.

### Singularity Service Configuration Updates

The configuration to connect to the mesos master is the only field that has changed with the 1.x upgrade. The new `mesos.master` field in the configuration yaml is now a comma seaprated list of mesos master `host:port` vaules. Singularity will randomly select from the list when searching for a master (1.x masters will automatically redirect requests to the leading master), trying other hosts in the list if it is not successful.

For example an old configuration of:

```yaml
mesos:
master: zk://my-zk-hostname.com:2181/mesos/singularity

```

Would now become:

```yaml
mesos:
master: my-mesos-master-host.com:5050

```

### SingularityClient Considerations

As part of the mesos 1 update, the `org.apache.mesos:mesos` library is now pulling in a newer version of protobuf. This can cause issues for users using any other protobuf version. As a result, we have refactored the models in `SingularityBase` such that `SingularityBase` and the `SingularityClient` no longer have a dependency on `org.apache.mesos:mesos`.

For users of the java client, this means that a few of the previously accessible methods on the `SingularityTask` object may not be present. All information from the mesos `TaskInfo` protos is still being saved as json for later usage, but only the parts needed by Singularity internals are mapped to POJO fields, with the remainder being caught by jackson's `@JsonAnyGetter`/`@JsonAnySetter`. Extra fields on objects are available as a `Map<String, Object>` under `getAllOtherFields` on the objects.

See [#1648](https://github.com/HubSpot/Singularity/pull/1648) for more details.

### Other Mesos Considerations

- The `--work_dir` flag _must_ be set on all mesos agents or they will error on startup
- internally the agent -> agent rename is being used, but all apis endpoints and fields still reference agent as they did before. Singularity will tackle the agent -> agent rename in future versions
50 changes: 50 additions & 0 deletions Docs/features/shell-commands.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
### Shell Commands

As of release `0.4.6`, when using the SingularityExecutor you have the ability to run pre-configured shell commands from the ui, eliminating the need to ssh to the mesos-agent.

The following example shows how to configure SingularityService and the SingularityExecutor for the simple command `lsof`.


#### Configuration

First, add a configuration section to SingularityService's config, this will allow the shell command to appear as an option in the UI. The section will be nested under the `ui` item and look like:

```yaml
ui:
shellCommands:
- name: lsof
description: List open files, including all sockets.
```
`shellCommands` is a list of objects with fields of `name` and `description. Now, in the ui there will be a Shell Commands section available for running tasks like the following:

![Shell Commands in UI](../images/singularity_ui_shell_commands.png)


Next, configure the SingularityExecutor to properly run the command. Add a `shellCommands` section under the `executor` section of the config similar to the following:

```yaml
executor:
shellCommands:
- name: lsof
command: ["/usr/sbin/lsof", "-P", "-p", "{PID}"]
```

A few important notes about the `shellCommands` objects:
- `name` must match the name of the command from the ui config
- `command` is an array of strings that form the shell command. Each argument should be a separate array item
- `{PID}` and `{USER}` are special items and will be replaced with the current process pid or the current user before the command is executed. Placeholder values can be changed by specifying `shellCommandPidPlaceholder` or `shellCommandUserPlaceholder` under the `executor` section of the config.
- If the process you are operating on is not the parent pid (e.g. using a wrapper script), you can instead pull the process pid from a file. The file to pull from is specified by `shellCommandPidFile` under the `executor` section of the config and defaults to `.task-pid` (found in the root of the sandbox)
- You can optionally prefix all commands specified with an array of command strings specified in `shellCommandPrefix`. This is useful for things like being sure to execute a switch user command for each command to execute.

#### Running a Shell Command

Now that it is configured, navigate to a running task in the Singularity UI. Near the bottom there is a Shell Commands section that can be expanded. In that section, select the shell command you want from the dropdown and hit `Run`. An option of *Redirect to command output upon success* is specified by default, which will wait for the command to return and redirect you to view the output file.

You should see a message similar to the following indicating that the command has been sent to the executor to be run.

![Command Queued](../images/singularity_ui_command_queued.png)

Once that completes, you will be redirected to the output of your command (saved as a file in the task sandbox). Back on the task detail page you will also be able to view a history of shell commands run for that process:

![Command History](../images/singularity_ui_command_history.png)
23 changes: 23 additions & 0 deletions Docs/features/task-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#### Task Search

As of `0.5.0`, Singularity has better support for searching historical tasks. A global task search endpoint was added:

`/api/history/tasks` -> Retrieve the history sorted by startedAt for all inactive tasks.

The above endpoint as well as `/api/history/request/{requestId}/tasks` now take additonal query parameters:

- `requestId`: Optional request id to match (only for `/api/history/tasks` endpoint as it is already specified in the path for `/request/{requestId}/tasks`)
- `deployId`: Optional deploy id to match
- `host`: Optional host (agent host name) to match
- `lastTaskStatus`: Optional [`ExtendedTaskState`](../reference/api-docs/models#model-ExtendedTaskState) to match
- `startedAfter`: Optionally match only tasks started after this time (13 digit unix timestamp)
- `startedBefore`: Optionally match only tasks started before this time (13 digit unix timestamp)
- `orderDirection`: Sort direction (by `startedAt`), can be ASC or DESC, defaults to DESC (newest tasks first)
- `count`: Maximum number of items to return, defaults to 100 and has a maximum value of 1000
- `page`: Page of items to view (e.g. page 1 is the first `count` items, page 2 is the next `count` items), defaults to 1

For clusters using a database that have a large number of tasks in the history, a relevant configuration option of `taskHistoryQueryUsesZkFirst` has been added in the base Singularity Configuration. This option can be used to either prefer efficiency or exact ordering when searching through task history, it defaults to `false`.

- When `false` the setting will prefer correct ordering. This may require multiple database calls, since Singularity needs to determine the overall order of items base on persisted (in the database) and non-persisted (still in zookeeper) tasks. The overall search may be less efficient, but the ordering is guranteed to be correct.

- When `true` the setting will prefer efficiency. In this case, it will be assumed that all task histories in zookeeper (not yet persisted) come before those in the database (persisted). This results in faster results and fewer queries, but ordering is not guaranteed to be correct between persisted and non-persisted items.
Loading