You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Singularity (Service) is configured by DropWizard via a YAML file referenced on the command line. Top-level configuration elements reside at the root of the configuration file alongside DropWizard configuration.
enum / string [GREEDY, OPTIMISTIC, SEPARATE (deprecated), SEPARATE_BY_DEPLOY, SEPARATE_BY_REQUEST, SPREAD_ALL_AGENTS]
defaultValueForKillTasksOfPausedRequests
true
When a task is paused, the API allows for the tasks of that request to optionally not be killed. If that parameter is not set in the pause request, this value is used
boolean
deltaAfterWhichTasksAreLateMillis
30000 (30 seconds)
The amount of time after a task's schedule time that Singularity will classify it (in state API and dashboard) as a late task
long
deployHealthyBySeconds
120
Default amount of time to allow pending deploys to run for before transitioning them into active deploys. If more than this time passes before a deploy can be considered healthy (all of its tasks either make it to TASK_RUNNING or pass healthchecks), then the deploy will be rejected
long
killNonLongRunningTasksInCleanupAfterSeconds
86400 (1 day)
Kills scheduled and one-off tasks after this amount of time if they have been scheduled for cleaning (a new deploy succeeds, the underlying agent is decomissioned)
long
hostname
null
Hostname of this Singularity instance
string
Healthchecks and New Task Checks
Parameter
Default
Description
Type
considerTaskHealthyAfterRunningForSeconds
5
Tasks which make it to TASK_RUNNING and run for at least this long (that are not health-checked) are considered healthy
long
healthcheckIntervalSeconds
5
Default amount of time to wait in between attempting task healthchecks
int
healthcheckTimeoutSeconds
5
Default amount of time to wait for healthchecks to return before considering them failed
int
killAfterTasksDoNotRunDefaultSeconds
600 (10 minutes)
Amount of time after which new tasks (that are not part of a deploy) will be killed if they do not enter TASK_RUNNING
long
healthcheckMaxRetries
Default max number of time to retry a failed healthcheck for a task before considering the task to be unhealthy
int
startupDelaySeconds
By default, wait this long before starting any healthchecks on a task
int
startupTimeoutSeconds
45
If a healthchecked task has not responded with a valid http response in startupTimeoutSeconds consider it unhealthy
int
startupIntervalSeconds
2
In the startup period (before a valid http response has been received) wait this long between healthcheck attempts
int
healthcheckFailureStatusCodes
[]
If any of these status codes is received during a healthcheck, immediately consider the task unhealthy, do not retry the check
List
Deploys
Parameter
Default
Description
Type
defaultDeployStepWaitTimeMs
0
If using an incremental deploy, wait this long between deploy steps if not specified in the deploy
int
defaultDeployMaxTaskRetries
0
Allow this many tasks to fail and be retried before failing a new deploy
int
allowDeployOfPausedRequests
false
If true, paused requests can be deployed without unpausing or starting new tasks at deploy time
boolean
Limits
Parameter
Default
Description
Type
maxDeployIdSize
50
Deploy ids over this size will cause deploy requests to fail with 400
int
maxRequestIdSize
100
Request ids over this size will cause new requests to fail with 400
int
Cooldown
Cooldown is divided into 2 types, fast and slow. These are essentially two sets of differing thresholds for cooldown, meant to act quickly for cases where there are rapid failures, but still provide a notification/signal for cases where there are slow but repeated failures
Parameter
Default
Description
Type
fastFailureCooldownCount/slowFailureCooldownCount
3/5
The number of sequential failures after which a request is placed into system cooldown
int
fastFailureCooldownMs/slowFailureCooldownMs
30000/600000
The time window during which ...CooldownCount failures must occur
If there are no failures after this time period, the request will exit cooldown
int
cooldownMinScheduleSeconds
120
When a request enters cooldown, new tasks are delayed by at least this long
long
Load Balancer API
Parameter
Default
Description
Type
loadBalancerQueryParams
null
Additional query parameters to pass to the Load Balancer API
Map<String, String>
loadBalancerRequestTimeoutMillis
2000
The timeout for making API calls to the Load Balancer API (these will be retried)
long
loadBalancerUri
null
The URI of the Load Balancer API (Baragon)
string
deleteRemovedRequestsFromLoadBalancer
false
If a request is removed from Singularity, issue a DELETE to the load balancer for that service
boolean
User Interface
Parameter
Default
Description
Type
sandboxDefaultsToTaskId
false
If true, the Singularity API will return the sandbox view of root/taskId when queried without a path (Useful when using SingularityExecutor)
boolean
enableCorsFilter
false
If true, provides a Bundle which will enable CORS
boolean
Internal Scheduler Configuration
These settings are less likely to be changed, but were included in the configuration instead of hardcoding values.
Pollers
Parameter
Default
Description
Type
checkDeploysEverySeconds
5
Check the status (health) of pending deploys, promoting them to active or removing them on this interval
long
checkNewTasksEverySeconds
5
Check the health of new (non-deployed, non-healthchecked) tasks to make sure they eventually get to running on this interval
long
checkSchedulerEverySeconds
5
Runs scheduler checks (processes decommissions and pending queue) on this interval (these tasks also run when an offer is received)
long
checkWebhooksEveryMillis
10000 (10 seconds)
Will check for and send new queued webhooks on this interval
long
cleanupEverySeconds
5
Will cleanup request, task, and other queues on this interval
long
persistHistoryEverySeconds
3600 (1 hour)
Moves stale historical task data from ZooKeeper into the database, setting to 0 will disable history persistence
long
saveStateEverySeconds
60
State about this Singularity instance is saved (available over API) on this interval
long
checkJobsEveryMillis
600000 (10 mins)
Check for jobs running longer than the expected time on this interval
long
checkExpiringUserActionEveryMillis
45000
Check for expiring actions that should be expired on this interval
long
Mesos
Parameter
Default
Description
Type
checkReconcileWhenRunningEveryMillis
30000 (30 seconds)
When reconciling tasks, will re-request task updates on this interval until reconciliation finishes
long
startNewReconcileEverySeconds
600 (10 minutes)
Starts a new reconciliation cycle (if one is not currently running) on this interval (A relatively costly operation that detects updates Mesos failed to deliver)
long
askDriverToKillTasksAgainAfterMillis
300000 (5 minutes)
Amount of time to wait before instruction mesos to kill a task which has been killed by Singularity but is still running
long
Thread Pools
Parameter
Default
Description
Type
checkNewTasksScheduledThreads
3
Max number of threads to use to check new tasks
int
healthcheckStartThreads
3
Max number of threads to use to start healthchecks
int
logFetchMaxThreads
15
Max number of threads to use to fetch log directories from Mesos REST API
int
Operational
Parameter
Default
Description
Type
closeWaitSeconds
5
Will wait at least this many seconds when shutting down thread pools
long
compressLargeDataObjects
true
Will compress larger objects inside of ZooKeeper and the database
boolean
maxHealthcheckResponseBodyBytes
8192
Number of bytes to save from healthcheck responses (displayed in UI)
int
maxQueuedUpdatesPerWebhook
50
Max number of updates to queue for a given webhook url, after which some webhooks will not be delivered
int
zookeeperAsyncTimeout
5000
Milliseconds for ZooKeeper timeout. Calls to ZooKeeper which take over this timeout will cause the operations to fail and Singularity to abort
long
cacheStateForMillis
30000 (30 seconds)
Amount of time to cache internal state for when requested over API
long
sandboxHttpTimeoutMillis
5000 (5 seconds)
Sandbox HTTP calls will timeout after this amount of time (fetching logs for emails / UI)
newTaskCheckerBaseDelaySeconds
1
Added to the the amount of deploy to wait before checking a new task
long
allowTestResourceCalls
false
If true, allows calls to be made to the test resource, which can test internal methods
boolean
deleteDeploysFromZkWhenNoDatabaseAfterHours
336 (14 days)
Delete deploys from zk when they are older than this if we are not using a database
long
maxStaleDeploysPerRequestInZkWhenNoDatabase
infinite (disabled)
Delete oldest deploys from zk when there are more than this number for a given request, if we're not already persisting them to a database
int
deleteStaleRequestsFromZkWhenNoDatabaseAfterHours
336 (14 days)
Delete stale requests after this amount of time if we are not using a database
long
maxRequestsWithHistoryInZkWhenNoDatabase
infinite (disabled)
Delete history of oldest requests from zk when there are more than this number of requests, if we're not already persisting them to a database
int
deleteTasksFromZkWhenNoDatabaseAfterHours
168 (7 days)
Delete old tasks from zk after this amount of time if we are not using a database
long
maxStaleTasksPerRequestInZkWhenNoDatabase
infinite (disabled)
Delete oldest tasks from zk when there are more than this number for a given request, if we're not already persisting them to a database
int
taskPersistAfterStartupBufferMillis
60000ms (1 min)
Wait this long after a task starts before persisting it in history
long
deleteDeadAgentsAfterHours
168 (7 days)
Remove dead agents from the list after this amount of time
long
deleteUndeliverableWebhooksAfterHours
168 (7 days)
Delete (and stop retrying) failed webhooks after this amount of time
long
waitForListeners
true
If true, the event system waits for all listeners having processed an event.
boolean
warnIfScheduledJobIsRunningForAtLeastMillis
86400000 (1 day)
Warn if a scheduled job has been running for this long
long
warnIfScheduledJobIsRunningPastNextRunPct
200
Warn if a scheduled job has run this much past its next scheduled run time (e.g. 200 => ran through next two run times)
int
pendingDeployHoldTaskDuringDecommissionMillis
600000ms (10 minutes)
Don't kill tasks on a decommissioning agent that are part of a pending deploy for this amount of time to allow the deploy to complete
long
defaultBounceExpirationMinutes
60
Expire a bounce after this many minutes if an expiration is not provided in the request to bounce
int
cacheOffers
false
Hold on to unused offers for up to cacheOffersForMillis
boolean
cacheOffersForMillis
If cacheOffers is true, decline offers after this amount of time if they ahve not been used
long
offerCacheSize
The maximum number of offers to cache at once
int
Mesos Configuration
These settings should live under the "mesos" field inside the root configuration.
Framework
Parameter
Default
Description
Type
master
null
A comma separated list of mesos master http(s)://user:password@host:port user and password are optional, http is used if no protocol is provided
String
frameworkName
null
String
frameworkId
null
String
frameworkFailoverTimeout
0.0
double
frameworkRole
null
Specify framework's desired role when Singularity registers with the master
String
checkpoint
true
boolean
credentialPrincipal
Used to enable authorization based on the authenticated principal
String
Resource Limits
Parameter
Default
Description
Type
defaultCpus
1
Number of CPUs to request for a task if none are specified
int
defaultMemory
64
MB of memory to request for a task if none is specified
int
defaultDisk
1024
MB of disk to request for a task if none is specified
int
maxNumInstancesPerRequest
25
Max instances (tasks) to allow for a request (requests using over this will return a 400)
int
maxNumCpusPerInstance
50
Max number of CPUs allowed on a given task
int
maxNumCpusPerRequest
900
Max number of CPUs allowed for a given request (cpus per task * task instance)
int
maxMemoryMbPerInstance
24000
Max MB of memory allowed on a given task
int
maxMemoryMbPerRequest
450000
Max MB of memory allowed for a given request (memoryMb per task * task instances)
int
Racks
Parameter
Default
Description
Type
rackIdAttributeKey
rackid
The Mesos agent attribute to denote a rack
string
defaultRackId
DEFAULT
The rackId to assign to a agent if no rackId attribute value is present
string
Agents
Parameter
Default
Description
Type
agentHttpPort
5051
The port to talk to agents on
int
agentHttpsPort
absent
The HTTPS port to talk to agents on
Integer (Optional)
Offers
Parameter
Default
Description
Type
allocatedResourceWeight
0.5
This portion of an offer's score depends on the amount of resources currently allocated by mesos on the mesos agent
double
inUseResourceWeight
0.5
This portion of an offer's score depends on the currently used resources on a mesos agent as reported by the agent statistics endpoint
double
cpuWeight
0.4
The weight the agent's cpu carries when scoring an offer
double
memWeight
0.4
The weight the agent's memory carries when scoring an offer
double
diskWeight
0.2
The weight the agent's disk carries when scoring an offer
These settings should live under the "network" field of the root configuration.
Parameter
Default
Description
Type
defaultPortMapping
false
If no port mapping is provided, map all Mesos-provided ports to the host
boolean
History Purging
These settings live under the "historyPuring" field in the root configuration
Parameter
Default
Description
Type
deleteTaskHistoryAfterDays
365
Purge tasks older than this many days
int
deleteTaskHistoryAfterTasksPerRequest
10000
Purge oldest tasks when there are more than this many associated with a single request
int
deleteTaskHistoryBytesInsteadOfEntireRow
true
Only delete the taskHistoryBytes instead of the entire record of the task (e.g. to save space)
boolean
checkTaskHistoryEveryHours
24
Run the purge every x hours
int
enabled
false
Should we run the database purge
boolean
S3
These settings live under the "s3" field in the root configuration. If using the SingularityS3Uploader, this section will need to be provided in order to view lists of and download s3 logs from the SingularityUI.
Parameter
Default
Description
Type
maxS3Thread
3
Max threads to run for fetching logs from s3
int
waitForS3ListSeconds
5
Timeout in seconds for fetching list of s3 logs
int
waitForS3LinksSeconds
1
Timeout in seconds for creating new s3 links
int
expireS3LinksAfterMillis
86400000 (1 day)
Expire generated s3 log links after this amount of time
long
s3Bucket
S3 bucket to search for logs
String
groupOverrides
Extra s3 configurations provided such that individual requests may use separate s3 buckets. Each S3GroupOverrideConfiguration has a name specified by the Map key and consists of an s3Bueckt, s3AccessKey, and s3SecretKey
Map<String, S3GroupOverrideConfiguration>
s3KeyFormat
Search for logs with keys in this format, should be the same as the key format set in the SingularityS3Uploader
String
s3AccessKey
aws access key for the specified s3 bucket
String
s3SecretKey
aws secret key for the specified s3 bucket
String
missingTaskDefaultS3SearchPeriodMillis
259200000ms (3 days)
Search over this many days for s3 logs when no task data is found
long
Sentry
These settings live under the "sentry" field in the root config and enable Singularity error reporting to sentry.
Parameter
Default
Description
Type
dsn
Sentry DSN (Data Source Name)
String
prefix
""
Prefix string for event culprit naming and messages
String
SMTP
These settings live under the "smtp" field in the root config.
Rate limit email sending after this many notifications have been sent in rateLimitPeriodMillis
int
rateLimitPeriodMillis
60000 (10 mins)
time period for rateLimitAfterNotifications
long
rateLimitCooldownMillis
3600000 (1 hour)
Cooldown time before rate limiting is removed
long
taskEmailTailFiles
[stdout, stderr]
Send the tail of these files in messages about tasks
List<String>
emails
See below
See below
Map<EmailType, List<EmailDestination>>
subjectPrefix
unset
String prepended to the email subject line
String
ssl
false
Connect to SMTP host over ssl
boolean
You may need libmail-java installed on your Singularity master host in order to connect to your smtp server.
Emails List
The emails list determines what emails to send notifications to and for what events. You can specify a map of EmailType
to a list of EmailDestinations
EmailType corresponds to different events that could trigger emails such as TASK_LOST or TASK_FAILED
EmailDestination corresponds to one of OWNERS (as listed on the Singularity Request), ACTION_TAKER (user who triggered the action causing the email update), or ADMINS (specified in config as seen above)
Generate link to this log for running tasks on the request page
String
finishedTaskLogPath
stdout
Generate link to this log for finished tasks on the request page
String
hideNewDeployButton
false
Don't show the 'New Deploy' button
boolean
hideNewRequestButton
false
Don't show the 'New Request' button
boolean
rootUrlMode
INDEX_CATCHALL
INDEX_CATCHALL: UI is served off of / using a catchall resource. UI_REDIRECT: UI is served off of /ui, path and index redirects there. DISABLED: UI is served off of /ui and the root resource is not served at all