Configuration¶
By default, microservices will look for /etc/moira/<servicename>.yml
,
but you can change this location by passing your path as a command-line
parameter --config
.
Important: Incorrect configuration format may cause the service to crash at startup
On this page you can find examples of configuration files for Moira microservices.
Filter¶
# Redis configuration depends on fields specified in redis config section:
# 1. Use field `master_name` to enable Redis Sentinel support
# 2. Specify two or more `addrs` to enable cluster support
# 3. Otherwise, standalone configuration is enabled
redis:
# Sentinel master name
master_name: ""
# address list, format: {host1_name:port},{ip:port}
addrs: "localhost:6379"
# Redis username
username: "username"
# Redis password
password: "password"
# Redis Sentinel username
sentinel_username: "sentinel_username"
# Redis Sentinel password
sentinel_password: "sentinel_password"
# Moira will delete metrics older than this value from Redis. Large values will lead to various problems everywhere.
# See https://github.com/moira-alert/moira/pull/519
metrics_ttl: 3h
# Dial timeout for establishing new connections
# Default is 500 milliseconds
dial_timeout: 1s
# Timeout for socket reads. If reached, commands will fail
# with a timeout instead of blocking. Default is 3 seconds.
# Skip this setting or set 0 for default.
read_timeout: 5s
# Timeout for socket writes. If reached, commands will fail
# with a timeout instead of blocking. Default is ReadTimeout.
# Skip this setting or set 0 for default.
write_timeout: 5s
# Enables read-only commands on slave nodes
# The flag does not work without `route_randomly` or `route_by_latency` set to true
read_only: true
# Allows routing read-only commands to the **random** master or slave node
# It automatically enables ReadOnly
route_randomly: true
# Allows routing read-only commands to the **closest** master or slave node
# It automatically enables ReadOnly
route_by_latency: false
# Minimum backoff between retries. Used to calculate exponential backoff. Default value is 0
min_retry_backoff: 1s
# Maximum backoff between retries. Used to calculate exponential backoff. Default value is 0
max_retry_backoff: 10s
# Max amount of attempts to find alive node (in Redis cluster)
max_redirects: 4
telemetry:
# Common port for all telemetry data: Prometheus scraping, pprof, etc.
listen: ":8091"
pprof:
# If true, pprof will be enabled on common telemetry port.
enabled: false
graphite:
# If true, graphite sender will be enabled.
enabled: true
# If true, runtime stats will be captured and sent to graphite. Note: It takes to call stoptheworld() with configured "graphite.interval" to capture runtime stats (https://golang.org/src/runtime/mstats.go)
runtime_stats: false
# Graphite relay URI, format: ip:port
uri: "graphite-relay:2003"
# Moira metrics prefix. Use 'prefix: {hostname}' to use hostname autoresolver.
prefix: DevOps.moira
# Metrics sending interval
interval: 60s
filter:
# Metrics listener uri
listen: ":2003"
# Retentions config file path. Simply use your original storage-schemas.conf or create new if you're using Moira without existing Graphite installation.
retention_config: /etc/moira/storage-schemas.conf
# Number of metrics to cache before checking them.
# Note: As this value increases, Redis CPU usage decreases.
# Normally, this value must be an order of magnitude less than graphite.prefix.filter.recevied.matching.count | nonNegativeDerivative() | scaleToSeconds(1)
# For example: with 100 matching metrics, set cache_capacity to 10. With 1000 matching metrics, increase cache_capacity up to 100.
cache_capacity: 10
# Defines number of threads to match incoming graphite-metrics.
# Equals to the number of processor cores found on Moira host by default or when variable is defined as 0.
max_parallel_matches: 0
# Period in which patterns will be reloaded from Redis
patterns_update_period: 1s
# Compatibility with different versions of graphite supported (carbon and clickhouse)
# By default moira is compatible with carbon, but can be configured to be compatible with clickhouse
graphite_compatibility:
# Controls how regices in tag matching are treated
# If false (default value), regex will match start of the string strictly. 'tag~=foo' is equivalent to 'tag~=^foo.*'
# If true, regex will match start of the string loosely. 'tag~=foo' is equivalent to 'tag~=.*foo.*'
allow_regex_loose_start_match: false
# Controls how absent tags are treated
# If true (default value), empty tags in regices will be matched
# If false, empty tags will be discarded
allow_regex_match_empty: true
# Time after which the batch of metrics is forced to be saved, default is 1s
batch_forced_save_timeout: 1s
# Defines the configuration for pattern storage
pattern_storage:
# Determines the size of the pattern matching cache
# It is better to put no less than the number of triggers
pattern_matching_cache_size: 1000
log:
log_file: stdout
log_level: info
storage-schemas.conf is graphite carbon configuration file that should match similarly-named file in your Graphite installation.
Checker¶
# Redis configuration depends on fields specified in redis config section:
# 1. Use field `master_name` to enable Redis Sentinel support
# 2. Specify two or more `addrs` to enable cluster support
# 3. Otherwise, standalone configuration is enabled
redis:
# Sentinel master name
master_name: ""
# address list, format: {host1_name:port},{ip:port}
addrs: "localhost:6379"
# Redis username
username: "username"
# Redis password
password: "password"
# Redis Sentinel username
sentinel_username: "sentinel_username"
# Redis Sentinel password
sentinel_password: "sentinel_password"
# Moira will delete metrics older than this value from Redis. Large values will lead to various problems everywhere.
# See https://github.com/moira-alert/moira/pull/519
metrics_ttl: 3h
# Dial timeout for establishing new connections
# Default is 500 milliseconds
dial_timeout: 1s
# Timeout for socket reads. If reached, commands will fail
# with a timeout instead of blocking. Default is 3 seconds.
# Skip this setting or set 0 for default.
read_timeout: 5s
# Timeout for socket writes. If reached, commands will fail
# with a timeout instead of blocking. Default is ReadTimeout.
# Skip this setting or set 0 for default.
write_timeout: 5s
# Enables read-only commands on slave nodes
# The flag does not work without `route_randomly` or `route_by_latency` set to true
read_only: true
# Allows routing read-only commands to the **random** master or slave node
# It automatically enables ReadOnly
route_randomly: true
# Allows routing read-only commands to the **closest** master or slave node
# It automatically enables ReadOnly
route_by_latency: false
# Minimum backoff between retries. Used to calculate exponential backoff. Default value is 0
min_retry_backoff: 1s
# Maximum backoff between retries. Used to calculate exponential backoff. Default value is 0
max_retry_backoff: 10s
# Max amount of attempts to find alive node (in Redis cluster)
max_redirects: 4
telemetry:
# Common port for all telemetry data: Prometheus scraping, pprof, etc.
listen: ":8091"
pprof:
# If true, pprof will be enabled on common telemetry port.
enabled: false
graphite:
# If true, graphite sender will be enabled.
enabled: true
# If true, runtime stats will be captured and sent to graphite. Note: It takes to call stoptheworld() with configured "graphite.interval" to capture runtime stats (https://golang.org/src/runtime/mstats.go)
runtime_stats: false
# Graphite relay URI, format: ip:port
uri: "graphite-relay:2003"
# Moira metrics prefix. Use 'prefix: {hostname}' to use hostname autoresolver.
prefix: DevOps.moira
# Metrics sending interval
interval: 60s
checker:
# Min period to perform triggers re-check. Note: Reducing of this value leads to increasing of CPU and memory usage values
# Used to be `checker.check_interval` before v2.10
metric_event_trigger_check_interval: 30s
# In Moira 2.4 we add a new entity - Lazy Trigger. This is a regular trigger but without any subscription for it.
# By default Moira treats any trigger equally regardless on its subscriptions number.
# You can change this behaviour using option below. This can reduce CPU usage on your server.
# Lazy triggers checker works if lazy_triggers_check_interval > check_interval. We recommend setting it to 10m.
lazy_triggers_check_interval: 10m
# Period to cancel forced checks for all triggers if no metrics were received (greater than 'local.check_interval')
stop_checking_interval: 3600s
# Time after which the check is considered critically slow. Such checks are logged for debug purposes
# default: 1h
critical_time_of_check: 10m
local:
# Period for every non-lazy trigger to perform a check on
# Used to be `checker.nodata_check_interval` before v2.10
check_interval: 60s
# Equals to the number of processor cores found on Moira host by default or when variable is defined as 0.
max_parallel_checks: 512
# This section configures the list of graphite remote triggers sources.
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#graphite-remote-triggers-checker for further information
# Used to be `remote` and contain only one source before v2.10
graphite_remote:
- # Unique cluster id (no other graphite_remote cluster should have the same one)
# Use `default` for compatibility with old triggers without cluster_id
# Should not be changed if there are any triggers using it
cluster_id: default
# Cluster name to be displayed in UI
cluster_name: Graphite Remote
# URL of Graphite HTTP API: graphite-web, carbonapi, etc.
# Specify full URL including '/render'
url: "http://graphite.example.com/render"
# Auth username. Only Basic-auth supported
user: graphite_admin
# Auth password. Only Basic-auth supported
password: verySecurePassword
# Minimal period to perform triggers re-check.
# Note: Reducing of this value leads to increasing of CPU and memory usage values and extra load on Graphite HTTP API
check_interval: 60s
# Don't fetch metrics older than this value from remote storage
metrics_ttl: 168h
# Maximum timeout for HTTP-request made to Graphite HTTP API
timeout: 60s
# Equals to the number of processor cores found on Moira host by default or when variable is defined as 0.
max_parallel_checks: 0
# From 2.14.0
# Retries configuration for requests, that fetch data.
# Library used for calculating exponential backoff retries: https://pkg.go.dev/github.com/cenkalti/backoff/v4
retries:
# Initial interval for retry attempt
initial_interval: 60s
# Used to calc interval between retries: RandomizedRetryInterval = RetryInterval *
# * (random value from range [1 - randomization_factor, 1 + randomization_factor])
randomization_factor: 0.5
# For calculating next: RetryInterval = RetryInterval * multiplier
multiplier: 1.5
# Caps RetryInterval (NOT RandomizedRetryInterval)
max_interval: 120s
# If already max_retries_count retries performed, stop retrying.
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_retries_count: 3
# If time passed since first try is more than max_elapsed_time than stop retrying
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_elapsed_time: 360s
# From 2.14.0
# Maximum timeout for healthcheck requests
heathcheck_timeout: 60s
# From 2.14.0
# Retries configuration for healthcheck requests (same fields as for retries)
heathcheck_retries:
# Initial interval for retry attempt
initial_interval: 10s
# Used to calc interval between retries: RandomizedRetryInterval = RetryInterval *
# * (random value from range [1 - randomization_factor, 1 + randomization_factor])
randomization_factor: 0.5
# For calculating next: RetryInterval = RetryInterval * multiplier
multiplier: 1.5
# Caps RetryInterval (NOT RandomizedRetryInterval)
max_interval: 60s
# If already max_retries_count retries performed, stop retrying.
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_retries_count: 3
# If time passed since first try is more than max_elapsed_time than stop retrying
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_elapsed_time: 200s
# This section configures the list of prometheus remote triggers sources.
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#prometheus-remote-triggers-checker for further information
# Used to be `prometheus` and contain only one source before v2.10
prometheus_remote:
- # Unique cluster id (no other prometheus_remote cluster should have the same one)
# Use `default` for compatibility with old triggers without cluster_id
# Should not be changed if there are any triggers using it
cluster_id: default
# Cluster name to be displayed in UI
cluster_name: Prometheus Remote
# URL of Prometheus HTTP API: Prometheus or VMSelect
# Only domain name must be specified, no URL-path
url: https://prometheus.example.com
# Auth username. Only Basic-auth supported
user: prometheus_admin
# Auth password. Only Basic-auth supported
password: verySecurePassword
# Minimal period to perform triggers re-check.
# Note: Reducing of this value leads to increasing of CPU and memory usage values and extra load on Prometheus HTTP API
check_interval: "60s"
# Maximum timeout for HTTP-request made to Prometheus HTTP API
timeout: "10s"
# Number of times failed request should be retried
retries: 3
# Delay between retries
retry_timeout: "3s"
# Don't fetch metrics older than this value from remote storage
metrics_ttl: "168h"
# Equals to the number of processor cores found on Moira host by default or when variable is defined as 0.
max_parallel_checks: 0
log:
log_file: stdout
log_level: info
Remote Triggers Checker¶
One of Moira key feature is Graphite independance. Some Graphite queries are very ineffective. Tools like Seyren multiply this effect every minute making lots of ineffective queries and overloading your cluster. Moira relies on the incoming metric stream, and has its own fast cache for recent data.
Enabling Remote triggers Checker allows user to create triggers that relies on Graphite Storage instead of Redis DB.
Warning
Use this feature with caution, because it can create an extra load on Graphite HTTP API.
Lazy Triggers Checker¶
In Moira 2.4 we add a new entity - Lazy Trigger. This is a regular trigger
but without any subscription for it. By default Moira treats any trigger
equally regardless on its subscriptions number. You can change this behaviour
using lazy_triggers_check_interval
option in checker section. This can
reduce CPU usage on your server. Lazy triggers checker works if
lazy_triggers_check_interval
> check_interval
. We recommend set
it to 10m
(10 minutes).
Prometheus Checker¶
In Moira 2.9 Prometheus Remote metric source was added. It works like Graphite remote metric source, but uses prometheus metrics and PromQL instead. It makes queries to Prometheus api or Victoria Metrics VMSelect api.
Prometheus Checker can be counfigured to use retries when trying to fetch metrics. We recommend to use 3 retries with the retry timeout of 10s.
Notifier¶
# Redis configuration depends on fields specified in redis config section:
# 1. Use field `master_name` to enable Redis Sentinel support
# 2. Specify two or more `addrs` to enable cluster support
# 3. Otherwise, standalone configuration is enabled
redis:
# Sentinel master name
master_name: ""
# address list, format: {host1_name:port},{ip:port}
addrs: "localhost:6379"
# Redis username
username: "username"
# Redis password
password: "password"
# Redis Sentinel username
sentinel_username: "sentinel_username"
# Redis Sentinel password
sentinel_password: "sentinel_password"
# Moira will delete metrics older than this value from Redis. Large values will lead to various problems everywhere.
# See https://github.com/moira-alert/moira/pull/519
metrics_ttl: 3h
# Dial timeout for establishing new connections
# Default is 500 milliseconds
dial_timeout: 1s
# Timeout for socket reads. If reached, commands will fail
# with a timeout instead of blocking. Default is 3 seconds.
# Skip this setting or set 0 for default.
read_timeout: 5s
# Timeout for socket writes. If reached, commands will fail
# with a timeout instead of blocking. Default is ReadTimeout.
# Skip this setting or set 0 for default.
write_timeout: 5s
# Enables read-only commands on slave nodes
# The flag does not work without `route_randomly` or `route_by_latency` set to true
# Notifer requires master and slaves to synchronize too fast, so it is safer to turn this flag off
read_only: false
# Allows routing read-only commands to the **random** master or slave node
# It automatically enables ReadOnly
route_randomly: false
# Allows routing read-only commands to the **closest** master or slave node
# It automatically enables ReadOnly
route_by_latency: false
# Minimum backoff between retries. Used to calculate exponential backoff. Default value is 0.
min_retry_backoff: 1s
# Maximum backoff between retries. Used to calculate exponential backoff. Default value is 0.
max_retry_backoff: 10s
# Max amount of attempts to find alive node (in Redis cluster)
max_redirects: 4
telemetry:
# Common port for all telemetry data: Prometheus scraping, pprof, etc.
listen: ":8091"
pprof:
# If true, pprof will be enabled on common telemetry port.
enabled: false
graphite:
# If true, graphite sender will be enabled.
enabled: true
# If true, runtime stats will be captured and sent to graphite. Note: It takes to call stoptheworld() with configured "graphite.interval" to capture runtime stats (https://golang.org/src/runtime/mstats.go)
runtime_stats: false
# Graphite relay URI, format: ip:port
uri: "graphite-relay:2003"
# Moira metrics prefix. Use 'prefix: {hostname}' to use hostname autoresolver.
prefix: DevOps.moira
# Metrics sending interval
interval: 60s
notifier:
# Soft timeout to start retrying to send notification after single failed attempt
sender_timeout: 10s
# Hard timeout to stop retrying to send notification after multiple failed attempts
resending_timeout: "1:00"
# Delay before performing one more send attempt
rescheduling_delay: 60s
# Web-UI uri prefix for trigger links in notifications. For example: with 'http://localhost' every notification will contain link like 'http://localhost/trigger/triggerId'
front_uri: "https://moira.example.com"
# Timezone to use to convert ticks. Default is UTC. See https://golang.org/pkg/time/#LoadLocation for more details.
timezone: Europe/Moscow
# Format for email sender. Default is "15:04 02.01.2006". See https://golang.org/pkg/time/#Time.Format for more details about golang time formatting.
date_time_format: "15:04 02.01.2006"
# Amount of messages notifier reads from Redis per iteration, -1 for unlimited
read_batch_size: -1
# List of senders, every element has required "sender_type" field (one of ["pushover", "slack", "mail", "telegram", "twilio sms", "twilio voice", "script", "discord", "selfstate", "webhook", "opsgenie", "victorops", "pagerduty", "msteams", "mattermost"]) and "contact_type"
# It is possible to create several senders with the same "sender_type", but different "contact_type"
# Every type of sender has additional config fields
senders:
# sender_type characterizes the specific structure of the sender in the code
- sender_type: msteams
# contact_type matches 1-1 sender with the contact template type in the api web config, uniquely identifies the sender
contact_type: msteams
#the max amount of events you want to be sent to your channel, -1 for unlimited, any other positive value to limit events
max_events: -1
- sender_type: pushover
contact_type: pushover
# Api token for your pushover channel, for more info see https://pushover.net/api#registration
api_token: ...
- sender_type: slack
contact_type: slack
# Api token for your moira notifications slack user, for more info see https://get.slack.help/hc/en-us/articles/215770388-Create-and-regenerate-API-tokens
api_token: ...
# If true, notification will be sent with state-specific icon, for more info see https://moira.readthedocs.io/en/latest/installation/configuration.html#slack-icons.
use_emoji: true
# Used if `emoji_map` does not find `emoji` by any metric state in `emoji_map`
default_emoji: ':moira-state-ok:'
# Sets the correspondence between the emoji used and the state of the metrics in the notification in Slack
# Default fill is shown below
emoji_map:
'OK': ':moira-state-ok:'
'WARN': ':moira-state-warn:'
'ERROR': ':moira-state-error:'
'NODATA': ':moira-state-nodata:'
'EXCEPTION': ':moira-state-exception:'
'TEST': ':moira-state-test:'
- sender_type: mattermost
contact_type: mattermost
# Sets the url to the Mattermost API for the client
url: ...
# Controls whether a client verifies the server's certificate chain and host name
# If it's is true, crypto/tls accepts any certificate presented by the server and any host name in that certificate
insecure_tls: false
# Api token for moira notifications Mattermost client
api_token: ...
# If true, notification will be sent with state-specific icon
use_emoji: true
# Used if `emoji_map` does not find `emoji` by any metric state in `emoji_map`
default_emoji: ':moira-state-ok:'
# Sets the correspondence between the emoji used and the state of the metrics in the notification in Mattermost
# Default fill is shown below
emoji_map:
'OK': ':moira-state-ok:'
'WARN': ':moira-state-warn:'
'ERROR': ':moira-state-error:'
'NODATA': ':moira-state-nodata:'
'EXCEPTION': ':moira-state-exception:'
'TEST': ':moira-state-test:'
- sender_type: telegram
contact_type: telegram
# Api token for your telegram bot, for more info about creating bot and get token see https://core.telegram.org/bots#3-how-do-i-create-a-bot
api_token: ...
- sender_type: mail
contact_type: mail
mail_from: ...
smtp_host: ...
smtp_port: ...
# Skip SMTP server certificate chain validation if false
insecure_tls: false
# Uses "mail_from" if empty
smtp_user: ...
smtp_pass: ...
# Email template file path (standard Go templates). By default use 'Fancy' template (see screenshot below). If empty, use build-in template with no markups and styles.
template_file: '/etc/moira/fancy-template.html'
- sender_type: twilio voice
contact_type: twilio voice
api_asid: ...
api_authtoken: ...
api_fromphone: ...
# URL that responds with TwiML config for voice message generation, see https://www.twilio.com/docs/api/twiml/voice-overview
voiceurl: ...
append_message: true
- sender_type: twilio sms
contact_type: twilio sms
api_asid: ...
api_authtoken: ...
api_fromphone: ...
# Script and webhook senders support additional templated parameters:
# ${contact_id} contact ID
# ${contact_value} contact value (as specified by user via web UI)
# ${contact_type} contact type (as specified in web UI config file)
# ${trigger_id} trigger ID
- sender_type: script
contact_type: script
# Executable path. File must exist on all machines where notifier is running.
# You can use templated parameters here (see above), they will be replaced with appropriate values.
exec: ...
- sender_type: webhook
contact_type: webhook
# URL to send POST request (you can use templated parameters, see above)
url: ...
timeout: ...
# Basic authorization parameters (if required)
user: ...
password: ...
# optional field body allows you to set the json request body using go-templates, the available fields for use are: .Contact.Type and .Contact.Value
body: '{ "value": {{ .Contact.Value }}, "text": "test" }'
# headers a field that allows you to set headers in the "key: value" format
headers:
test: test-header
# Example of creating multiple webhook type senders
- sender_type: webhook
contact_type: webhook2
url: ...
timeout: ...
- sender_type: pagerduty
contact_type: pagerduty
- sender_type: opsgenie
contact_type: opsgenie
api_key: ...
- sender_type: victorops
contact_type: victorops
routing_url: ...
- sender_type: discord
contact_type: discord
token: ...
# Self state monitor configuration section. Note: No inner subscriptions is required. Moira will use its notification mechanism to send messages.
moira_selfstate:
enabled: true
# If true, Moira selfstate will check remote triggers checker works properly and notify admin if remote checker fails
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#graphite-remote-triggers-checker for futher information
remote_triggers_enabled: false
# Max Redis disconnect delay to send alert when reached
redis_disconect_delay: 60s
# Max Filter metrics receive delay to send alert when reached
last_metric_received_delay: 120s
# Max Checker checks perform delay to send alert when reached
last_check_delay: 120s
# Max Remote triggers Checker checks perform delay to send alert when reached
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#graphite-remote-triggers-checker for futher information
last_remote_check_delay: 300s
# Self state monitor alerting interval
notice_interval: 300s
# Maximum contact events storage interval
notification_history:
ttl: "48h"
# Self state monitor check interval (default 10s)
check_interval: 10s
# Count available mute resend call, if more than set - you see error in logs (default 3)
max_fail_attempt_to_send_available: 3
# Contact list for Self state monitor alerts, use this like delivery channels in web-ui
contacts:
- type: mail
value: devopsteam@example.com
log:
log_file: stdout
log_level: info
# notification sets the configuration setting for sending notifications
notification:
# resave_time is the time by which the timestamp of notifications with triggers or metrics on Maintenance is incremented
# Important! Do not set this time too long, as this parameter sets the delay in sending notifications
resave_time: 30s
# Need to determine if notification is delayed - the difference between creation time and sending time is greater than delayed_time
delayed_time: 1m
# transaction_timeout defines the timeout between fetch notifications transactions
transaction_timeout: 100ms
# transaction_max_retries defines the maximum number of attempts to make a transaction
transaction_max_retries: 10
# transaction_heuristic_limit maximum allowable limit, after this limit all notifications without limit will be taken
transaction_heuristic_limit: 10000
# This section configures the list of graphite remote triggers sources.
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#graphite-remote-triggers-checker for further information
# Used to be `remote` and contain only one source before v2.10
graphite_remote:
- # Unique cluster id (no other graphite_remote cluster should have the same one)
# Use `default` for compatibility with old triggers without cluster_id
# Should not be changed if there are any triggers using it
cluster_id: default
# Cluster name to be displayed in UI
cluster_name: Graphite Remote
# URL of Graphite HTTP API: graphite-web, carbonapi, etc.
# Specify full URL including '/render'
url: "http://graphite.example.com/render"
# Auth username. Only Basic-auth supported
user: graphite_admin
# Auth password. Only Basic-auth supported
password: verySecurePassword
# Minimal period to perform triggers re-check.
# Note: Reducing of this value leads to increasing of CPU and memory usage values and extra load on Graphite HTTP API
check_interval: 60s
# Don't fetch metrics older than this value from remote storage
metrics_ttl: 168h
# Maximum timeout for HTTP-request made to Graphite HTTP API
timeout: 60s
# From 2.14.0
# Retries configuration for requests, that fetch data.
# Library used for calculating exponential backoff retries: https://pkg.go.dev/github.com/cenkalti/backoff/v4
retries:
# Initial interval for retry attempt
initial_interval: 60s
# Used to calc interval between retries: RandomizedRetryInterval = RetryInterval *
# * (random value from range [1 - randomization_factor, 1 + randomization_factor])
randomization_factor: 0.5
# For calculating next: RetryInterval = RetryInterval * multiplier
multiplier: 1.5
# Caps RetryInterval (NOT RandomizedRetryInterval)
max_interval: 120s
# If already max_retries_count retries performed, stop retrying
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_retries_count: 3
# If time passed since first try is more than max_elapsed_time than stop retrying
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_elapsed_time: 360s
# From 2.14.0
# Maximum timeout for healthcheck requests
heathcheck_timeout: 60s
# From 2.14.0
# Retries configuration for healthcheck requests (same fields as for retries)
heathcheck_retries:
# Initial interval for retry attempt
initial_interval: 10s
# Used to calc interval between retries: RandomizedRetryInterval = RetryInterval *
# * (random value from range [1 - randomization_factor, 1 + randomization_factor])
randomization_factor: 0.5
# For calculating next: RetryInterval = RetryInterval * multiplier
multiplier: 1.5
# Caps RetryInterval (NOT RandomizedRetryInterval)
max_interval: 60s
# If already max_retries_count retries performed, stop retrying
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_retries_count: 3
# If time passed since first try is more than max_elapsed_time than stop retrying
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_elapsed_time: 200s
# This section configures the list of prometheus remote triggers sources.
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#prometheus-remote-triggers-checker for further information
# Used to be `prometheus` and contain only one source before v2.10
prometheus_remote:
- # Unique cluster id (no other prometheus_remote cluster should have the same one)
# Use `default` for compatibility with old triggers without cluster_id
# Should not be changed if there are any triggers using it
cluster_id: default
# Cluster name to be displayed in UI
cluster_name: Prometheus Remote
# URL of Prometheus HTTP API: Prometheus or VMSelect
# Only domain name must be specified, no URL-path
url: https://prometheus.example.com
# Auth username. Only Basic-auth supported
user: prometheus_admin
# Auth password. Only Basic-auth supported
password: verySecurePassword
# Minimal period to perform triggers re-check.
# Note: Reducing of this value leads to increasing of CPU and memory usage values and extra load on Prometheus HTTP API
check_interval: "60s"
# Maximum timeout for HTTP-request made to Prometheus HTTP API
timeout: "10s"
# Number of times failed request should be retried
retries: 3
# Delay between retries
retry_timeout: "3s"
# Don't fetch metrics older than this value from remote storage
metrics_ttl: "168h"
Slack icons¶

By default Slack sender won’t change default icon configured for your bot. To use state-specific icons in notifications:

Download and unzip notification icons
Add icons from
..icons/slack
directory as custom emojis according to their filenames to SlackSet
use_emoji
totrue
for Slack sender section in notifier configuration file
Email Template¶
By default mail sender will use ‘Fancy’ template:

Self State Monitor¶
If self state monitor is enabled, Moira will periodically check the Redis connection, the number of incoming metrics in the Moira-Filter and the number of triggers to be checked by Moira-Checker.
See Self State Monitor for more details.
API and Web¶
# Redis configuration depends on fields specified in redis config section:
# 1. Use field `master_name` to enable Redis Sentinel support
# 2. Specify two or more `addrs` to enable cluster support
# 3. Otherwise, standalone configuration is enabled
redis:
# Sentinel master name
master_name: ""
# address list, format: {host1_name:port},{ip:port}
addrs: "localhost:6379"
# Redis username
username: "username"
# Redis password
password: "password"
# Redis Sentinel username
sentinel_username: "sentinel_username"
# Redis Sentinel password
sentinel_password: "sentinel_password"
# Moira will delete metrics older than this value from Redis. Large values will lead to various problems everywhere.
# See https://github.com/moira-alert/moira/pull/519
metrics_ttl: 3h
# Dial timeout for establishing new connections
# Default is 500 milliseconds
dial_timeout: 1s
# Timeout for socket reads. If reached, commands will fail
# with a timeout instead of blocking. Default is 3 seconds.
# Skip this setting or set 0 for default.
read_timeout: 5s
# Timeout for socket writes. If reached, commands will fail
# with a timeout instead of blocking. Default is ReadTimeout.
# Skip this setting or set 0 for default.
write_timeout: 5s
# Enables read-only commands on slave nodes
# The flag does not work without `route_randomly` or `route_by_latency` set to true
read_only: true
# Allows routing read-only commands to the **random** master or slave node
# It automatically enables ReadOnly
route_randomly: true
# Allows routing read-only commands to the **closest** master or slave node
# It automatically enables ReadOnly
route_by_latency: false
# Minimum backoff between retries. Used to calculate exponential backoff. Default value is 0
min_retry_backoff: 1s
# Maximum backoff between retries. Used to calculate exponential backoff. Default value is 0
max_retry_backoff: 10s
# Max amount of attempts to find alive node (in Redis cluster)
max_redirects: 4
telemetry:
# Common port for all telemetry data: Prometheus scraping, pprof, etc.
listen: ":8091"
pprof:
# If true, pprof will be enabled on common telemetry port.
enabled: false
graphite:
# If true, graphite sender will be enabled.
enabled: true
# If true, runtime stats will be captured and sent to graphite. Note: It takes to call stoptheworld() with configured "graphite.interval" to capture runtime stats (https://golang.org/src/runtime/mstats.go)
runtime_stats: false
# Graphite relay URI, format: ip:port
uri: "graphite-relay:2003"
# Moira metrics prefix. Use 'prefix: {hostname}' to use hostname autoresolver.
prefix: DevOps.moira
# Metrics sending interval
interval: 60s
api:
# Api local network address. Default is ':8081' so api will be available at http://moira.company.com:8081/api
listen: ":8081"
# If true, CORS for cross-domain requests will be enabled. This option can be used only for debugging purposes.
enable_cors: false
# Web_UI config file path. If file not found, api will return 404 in response to "api/config"
web_config_path: "/etc/moira/web.json"
# Configuration of user roles
authorization:
# If disabled (default), every use has access to every resource (except for subscriptions and contacts, which are only available for their owners)
# If enabled, some resources are limited to admins-only, and admins have access to users' subscriptions and contacts
enabled: true
# List of users who have admin rights
# User login is extracted from `x-webauth-user` header`
admin_list:
- alice
- bob
# Configurable limits to some Moira entities
limits:
# Limits applied to trigger
pager:
# Default ttl for pager
ttl: "30m"
# Limits applied to trigger
trigger:
# Max amount of characters allowed in trigger name
# We do not recommend to greatly increase this limit because it can cause failures on sending alerts
max_name_size: 200
# Limits applied to team
team:
# Max amount of characters allowed in team name
max_name_size: 100
# Max amount of characters allowed in team description
max_description_size: 1000
web:
# Moira administrator email address
supportEmail: "devops@example.com"
# List of enabled contact types
contacts_template:
- type: mail
# logo_uri is an optional field that specifies the path to the file with the contact's logo
# If you do not specify a value for this field, the default logo option will be selected
# All logos are stored in the web2.0 repository, the value can be one of: [discord-logo.svg, facebook-logo.svg, mail-logo.svg, mattermost-logo.svg, msteams-logo.svg, opsgenie-logo.svg,
# pagerduty-logo.svg, phone-logo.svg, pushover-logo.svg, slack-logo.svg, telegram-logo.svg, twilio-logo.svg, twitter-logo.svg, viber-logo.svg, victorops-logo.svg,
# webhook-logo.svg, whatsapp-logo.svg, sms-logo.svg]
logo_uri: mail-logo.svg
label: E-mail
validation: "^.+@.+\\..+$"
- type: msteams
logo_uri: msteams-logo.svg
label: Microsoft Teams
- type: pushover
logo_uri: pushover-logo.svg
label: Pushover
placeholder: "Pushover user key"
- type: slack
logo_uri: slack-logo.svg
label: Slack
validation: "^[@#][a-zA-Z0-9-_]+"
placeholder: "Slack #channel or @user"
- type: telegram
logo_uri: telegram-logo.svg
label: Telegram
placeholder: "#public_channel, %private_channel, @username or group"
help: |
### To make things work you should:
### In personal chat:
- start conversation with bot [@YourMoiraBot](https://t.me/YourMoiraBot);
- execute command `/start`;
- type your login in above field as `@login`.
### In group chat (with or without topics):
- invite bot [@YourMoiraBot](https://t.me/YourMoiraBot) into chat;
- execute command `/start@YourMoiraBot`;
- bot will send you chat name, you should type it without extra characters in above field.
### In public channel:
- add bot [@YourMoiraBot](https://t.me/YourMoiraBot) into channel;
- promote bot as channel administrator;
- type channel name in above field as `#channel`.
### In private channel:
- add bot [@YourMoiraBot](https://t.me/YourMoiraBot) into the channel;
- promote bot as channel administrator;
- open your private channel on the [web](https://web.telegram.org/#/im);
- get channel id from URL (e.g., `https://web.telegram.org/#/im?p=c1494975744_17340166617136722341`) between `c` and `_`;
- type channel id in the above field as `%1494975744`.
- type: twilio sms
logo_uri: twilio-logo.svg
label: Twilio SMS
validation: "^\\+79\\d{9}$"
placeholder: "Phone number format +79*********"
- type: twilio voice
logo_uri: twilio-logo.svg
label: Twilio voice
validation: "^\\+79\\d{9}$"
placeholder: "Phone number format +79*********"
- type: webhook
logo_uri: webhook-logo.svg
label: My Webhook
validation: "^(http|https):\\/\\/.*(example.com|example.org)(:[0-9]{2,5})?\\/"
placeholder: "https://example.com/webhooks/moira"
help: "### Domains whitelist:\n - example.com\n - example.org"
- type: pagerduty
logo_uri: pagerduty-logo.svg
label: PagerDuty
placeholder: "Integration key"
- type: opsgenie
logo_uri: opsgenie-logo.svg
label: OpsGenie
placeholder: "Responder Name or ID"
- type: victorops
logo_uri: victorops-logo.svg
label: VictorOps
placeholder: "Routing key"
- type: discord
logo_uri: discord-logo.svg
label: Discord
placeholder: "Discord channel (eg: general-text) or user (eg: @user)"
# Feature flags settings
feature_flags:
# Sets whether sending graphics in subscriptions is enabled by default
is_plotting_default_on: true
# Sets whether sending graphics is available
is_plotting_available: true
# Sets whether it is allowed to create subscriptions to all tags
is_subscription_to_all_tags_available: true
# Sets the readonly mode in the Moira, which disables state-changing requests
is_readonly_enabled: false
# Sets special celebration theme in Moira. Default "", which means no celebration modes.
# Now available one of: [new_year]
celebration_mode: new_year
# sentry sets sentry settings for the frontend
sentry:
# Sets the dsn key
dsn: https://public@sentry.example.com/
# Sets the platform where the frontend is deployed
platform: prod
log:
log_file: stdout
log_level: info
# This section configures the list of graphite remote triggers sources.
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#graphite-remote-triggers-checker for further information
# Used to be `remote` and contain only one source before v2.10
graphite_remote:
- # Unique cluster id (no other graphite_remote cluster should have the same one)
# Use `default` for compatibility with old triggers without cluster_id
# Should not be changed if there are any triggers using it
cluster_id: default
# Cluster name to be displayed in UI
cluster_name: Graphite Remote
# URL of Graphite HTTP API: graphite-web, carbonapi, etc.
# Specify full URL including '/render'
url: "http://graphite.example.com/render"
# Auth username. Only Basic-auth supported
user: graphite_admin
# Auth password. Only Basic-auth supported
password: verySecurePassword
# Minimal period to perform triggers re-check.
# Note: Reducing of this value leads to increasing of CPU and memory usage values and extra load on Graphite HTTP API
check_interval: 60s
# Don't fetch metrics older than this value from remote storage
metrics_ttl: 168h
# Maximum timeout for HTTP-request made to Graphite HTTP API
timeout: 60s
# From 2.14.0
# Retries configuration for requests, that fetch data.
# Library used for calculating exponential backoff retries: https://pkg.go.dev/github.com/cenkalti/backoff/v4
retries:
# Initial interval for retry attempt
initial_interval: 60s
# Used to calc interval between retries: RandomizedRetryInterval = RetryInterval *
# * (random value from range [1 - randomization_factor, 1 + randomization_factor])
randomization_factor: 0.5
# For calculating next: RetryInterval = RetryInterval * multiplier
multiplier: 1.5
# Caps RetryInterval (NOT RandomizedRetryInterval)
max_interval: 120s
# If already max_retries_count retries performed, stop retrying.
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_retries_count: 3
# If time passed since first try is more than max_elapsed_time than stop retrying.
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_elapsed_time: 360s
# From 2.14.0
# Maximum timeout for healthcheck requests
heathcheck_timeout: 60s
# From 2.14.0
# Retries configuration for healthcheck requests (same fields as for retries)
heathcheck_retries:
# Initial interval for retry attempt
initial_interval: 10s
# Used to calc interval between retries: RandomizedRetryInterval = RetryInterval *
# * (random value from range [1 - randomization_factor, 1 + randomization_factor])
randomization_factor: 0.5
# For calculating next: RetryInterval = RetryInterval * multiplier
multiplier: 1.5
# Caps RetryInterval (NOT RandomizedRetryInterval)
max_interval: 60s
# If already max_retries_count retries performed, stop retrying.
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_retries_count: 3
# If time passed since first try is more than max_elapsed_time than stop retrying
# At least one of (max_retries_count, max_elapsed_time) should be specified
max_elapsed_time: 200s
# This section configures the list of prometheus remote triggers sources.
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#prometheus-remote-triggers-checker for further information
# Used to be `prometheus` and contain only one source before v2.10
prometheus_remote:
- # Unique cluster id (no other prometheus_remote cluster should have the same one)
# Use `default` for compatibility with old triggers without cluster_id
# Should not be changed if there are any triggers using it
cluster_id: default
# Cluster name to be displayed in UI
cluster_name: Prometheus Remote
# URL of Prometheus HTTP API: Prometheus or VMSelect
# Only domain name must be specified, no URL-path
url: https://prometheus.example.com
# Auth username. Only Basic-auth supported
user: prometheus_admin
# Auth password. Only Basic-auth supported
password: verySecurePassword
# Minimal period to perform triggers re-check.
# Note: Reducing of this value leads to increasing of CPU and memory usage values and extra load on Prometheus HTTP API
check_interval: "60s"
# Maximum timeout for HTTP-request made to Prometheus HTTP API
timeout: "10s"
# Number of times failed request should be retried
retries: 3
# Delay between retries
retry_timeout: "3s"
# Don't fetch metrics older than this value from remote storage
metrics_ttl: "168h"
Web contact fields:
type (any uniq string) required — contact type: pushover, slack, mail, script, telegram, twilio sms, twilio voice, etc.;
label required — contact label type. Uses in add/edit contact form in select control;
validation — regular expression for user contact, uses for validation in add/edit contact form;
placeholder — hint shown in input field;
help — help text in Markdown markup;

Remote API¶
By default, Web uses local API server (both containers are running on the same host). But if you need to reconfigure Web to interact with API running on remote server then simply set container environment variable MOIRA_API_URI equal to required URI:
MOIRA_API_URI: remoteapi.domain:8081