Moira Documentation¶
Contents¶
Overview¶
Moira is a real-time alerting tool, based on Graphite data.
Key Features¶
Graphite storage independence
Some Graphite queries are very ineffective. Tools like Seyren multiply this effect every minute making lots of ineffective queries and overloading your cluster. Moira relies on the incoming metric stream, and has its own fast cache for recent data.
Support for (almost) all Graphite functions
Graphite function library (carbonapi) is embedded directly into Moira source code. You can use any function and get predictable results, like in your Graphite or Grafana dashboards.
Support for custom expressions
If simple warning/error threshold is not enough, you can write flexible govaluate expressions to calculate trigger state based on metric data.
Tags for triggers and subscriptions
When several teams/services share one monitoring tool, it is essential to provide some way of filtering triggers and subscriptions in the UI. Moira has a flexible tag system.
Extendable notification channels
Moira supports email, Slack, Pushover and many other channels of notification out-of-the-box. But you can always write your own plugin in Go and rebuild Moira Notifier microservice.
Alarm fatigue protection
Sometimes one of your triggers goes mad and switches back and forth between states, sending you hundreds of notifications. Sometimes you just ignore and delete all messages, accidentally also deleting one that is actually important. Moira tries to protect you with a feature called throttling. It’s simple: if one of your triggers starts to send over 10 messages per hour, Moira limits this trigger to one message per 30 minutes. Alerts from this trigger are combined, and not lost - just packaged into a single message.
Limitations¶
By default, Moira stores metric history for one hour. This ensures performance under heavy load. You can tweak this in config file, but note that performance will degrade.
In order to reduce database load, Moira checks every single trigger at most once every 5 seconds. Probably, your metrics arrive once every minute, so you really won’t notice this limitation. You can also tweak this in config file.
Microservices¶
In spirit of Graphite architecture, Moira consists of several loosely coupled microservices. You are welcome to replace or to add new ones.
Filter¶
Filter is a lightweight service responsible for receiving lots of metric data in Graphite format. It filters received data and saves only metrics that match any of user triggers. This reduces load on all other parts of Moira.
Checker¶
Checker is an application with embedded Graphite functions. Checker watches for incoming metric values and performs checks according to saved trigger settings. When state of any trigger changes, Checker generates an event.
Notifier¶
Notifier is an application that watches for generated events. Notifier is responsible for scheduling and sending notifications, observing quiet hours, retrying failed notifications, etc.
API¶
API is an application that serves as a backend for UI.
Release Notes¶
2.0¶
Version 2.0 is fully rewritten in Go instead of Python. This implies lower CPU load in Checker and API microservices, but also changes the list of supported Graphite functions.
We also introduce new UI based on React. It is not backwards-compatible with old API, but new API supports both old and new UI.
Breaking Changes¶
- New structure of installation/configuration files.
- New Advanced mode expression format. Moira 2.0 supports govaluate expressions instead of Python expressions. Use
moira-cli -convert-expressions
to convert. - API methods URLs do not have trailing slashes anymore.
- API
/notification
method returns valid JSON list instead of plain text. ttl
parameter in API calls is always a number instead of string.- API
PUT
methods strictly separate create and update operations. - There is no
tag maintenance
entity anymore. - Error messages return valid JSON instead of plain text.
- Support for Graphite functions changed. See carbonapi compatibility list for details.
Other Improvements¶
- Internal Graphite metric names changed.
- Numerous bugs fixed. Some new were created :)
2.1¶
- Throw an exception if any target except the first one resolves in more than one metric.
- Fix Moira version detection in CI builds.
- Add user login information to API request logs.
- Fix long interval between creating a new trigger and getting data into that trigger.
2.2¶
- Add Redis Sentinel support.
- Increase new metric event processing speed by adding a cache on metric patterns.
- Update carbonapi (new functions: map, reduce, delay; updated: asPercent).
- Optimize reading metrics while checking trigger (removed unnecessary Redis transaction).
- Add domain autoresolving for self-metrics sending to Graphite.
- Fix concurrent read/write from expression cache.
- Re-enable Markdown in Slack sender.
- Optimize internal metric collection.
- Replace pseudotags with ordinary checkboxes in Web UI (but not on backend yet).
- Fix bug that allowed to create pseudotags (ERROR, etc.) as ordinary tags.
- Add metrics for each trigger handling time.
- Translate pagination.
- Make sorting by status the default option on trigger page.
- Hide tag list on trigger edit page.
- Sort tags alphabetically everywhere.
- Highlight metric row on mouse hover.
- Automatically add tags from search bar when creating new trigger.
- Add metric name to “Trigger has same timeseries names” error message.
- Update event names in case trigger name had changed.
- Fix bug in triggers with multiple targets. Metrics from targets T2, T3, … were not deleted properly.
- Fix old-style configuration files in platform-specific packages.
- Fix bug that prevented non-integer timestamps from processing.
- Fix logo image background.
- Fix sorting on -s and 0s.
- Fix UI glitch while setting maintenance time.
- Fix retention scheme parsing for some rare cases with comments.
2.3¶
- Add API methods:
DELETE /notification/all
andDELETE /event/all
moira-alert/moira#73. - Add notifier config option: DateTime format for email sender moira-alert/moira#74.
- Add Graphite-API support for remote triggers moira-alert/moira#75. See more: Remote Triggers Checker. Thanks to @errx.
- Fix newlines in trigger description body for web and email sender moira-alert/moira#76.
- Add option to enable runtime metrics in Graphite-section of configuration moira-alert/moira#79.
- Add new fancy email template 🎂 moira-alert/moira#82.
- Change default trigger state to TTLState option instead of NODATA moira-alert/moira#83.
- Refactor maintenance logic moira-alert/moira#87. See more: Maintenance.
- Add basic false NODATA protection moira-alert/moira#90. See more: Self State Monitor.
- Prohibit removal of contact with assigned subscriptions found moira-alert/moira#91.
- Make trigger exception messages more descriptive moira-alert/moira#92.
- Make filter cache capacity configurable moira-alert/moira#93. See more Filter Configuration.
- Fix incorrect behavior in which the trigger did not return from the
EXCEPTION
state moira-alert/moira#94. - Remove deprecated pseudo-tags, use checkboxes instead moira-alert/moira#95. See more: Ignore Specific States Transitions.
- Allow to use single-valued thresholds (ex. only
WARN
or onlyERROR
) moira-alert/moira#96. - Reduce the useless CPU usage in Moira-Filter moira-alert/moira#98. Thanks to @errx.
- Add concurrent matching workers in Moira-Filter moira-alert/moira#99. Thanks to @errx.
- Update Carbonapi to 1.0.0-rc.0 moira-alert/moira#101.
- Improve checker performance moira-alert/moira#103.
- Add Markdown support in contact edit modal view moira-alert/web2.0#138.
- Fix default timezone in trigger moira-alert/web2.0#173.
- Add ability to type negative numbers in simple trigger edit mode moira-alert/web2.0#169.
- Fix trailing whitespaces in tag search bar moira-alert/web2.0#139.
- Update Moira Client 2.3.4.
- Update Moira Trigger Role 2.3.
Important
Redis DB conversion is desirable.
Moira 2.3 has some structure changes in Redis DB. It will work fluently out of the box, but we recommend you to run converter once Moira is updated.
moira-cli -update --config=/etc/moira/cli.yml
redis:
host: localhost
port: "6379"
dbid: 0
log_file: stdout
log_level: debug
If you would like to downgrade back to Moira 2.2, you should run CLI-converter.
moira-cli -downgrade --config=/etc/moira/cli.yml
Both cases imply usage of Moira-Cli v.2.3, you can find it on Release Page.
2.3.1¶
- Fix
last_remote_check_delay
option in Notifier configuration moira-alert/moira#114.
2.4.0¶
- Timeseries graphs in notifications moira-alert/moira#148. See more Plotting.
- Add api method
GET trigger/{{triggerId}}/render
to imlement timeseries plotting in api moira-alert/moira#137. - Add maintenance for a whole trigger. Add new api method
PUT trigger/{{triggerId}}/setMaintenance
.PUT trigger/{{triggerId}}/maintenance
is deprecated now moira-alert/moira#138, moira-alert/web2.0#199. - Add extra maintenance intervals: 14 and 30 days moira-alert/web2.0#198.
- Add option to mute notifications about new metrics in the trigger moira-alert/moira#120. See more: Dealing with NODATA.
- Allow user to remove all
NODATA
metrics from trigger moira-alert/moira#124. - Check Lazy triggers (triggers without any subscriptions) less frequently moira-alert/moira#131. See more Lazy Triggers Checker.
- Run single NODATA checker worker at single moment moira-alert/moira#129.
- Avoid throttling of remote-triggers when trigger switches to
EXCEPTION
and back toOK
moira-alert/moira#121. - Consider the status of the trigger when rendering the trigger status indicator moira-alert/web2.0#195.
- Replace useless trigger export button with “Duplicate” moira-alert/web2.0#189.
- Add Moira-Notifier toggle on Hidden Pages moira-alert/web2.0#191. Please, read Self State Monitor first.
- Show contact type icon on Hidden Pages moira-alert/web2.0#196.
- Show TTL and TTLState in Advanced mode moira-alert/web2.0#197.
- Throw an exception if first target is no longer valid moira-alert/moira#122.
- Refactor cli. Remove old converters, whiсh were written before moira 2.2 moira-alert/moira#139.
- Update golang to version 1.11.2 moira-alert/moira#147.
- Flush trigger events when removing the trigger moira-alert/moira#116.
- Remove redundant Graphite-metrics that counted the time of check of each single trigger moira-alert/moira#117.
- Add api method
GET trigger/search
to implement full-text trigger search in api,GET trigger/page
is deprecated now moira-alert/moira#125. - Fix Redis leakages: some data was not removed properly from Redis storage moira-alert/moira#129.
- Fix bug in trigger schedule due to which triggers were considered suppressed between 23:59:00 and 00:00:59 moira-alert/moira#127.
- Fix bug in trigger when specific schedule time didn’t work if start time was bigger than end time moira-alert/moira#119.
- Fix bug in
Create and test
button when add new subscription moira-alert/web2.0#194. - Fix bug that increases updated last checks count when user create or update trigger from api (or web) moira-alert/moira#146.
- Fix bug which allowed to use other people’s contacts your in subscriptions moira-alert/moira#145.
- Fix bug that allowed to create and use an empty tag in subscriptions and triggers moira-alert/moira#144.
- Fix bug when senders didn’t resolve
EXCEPTION
state moira-alert/moira#156. - Update Moira Client 2.4.
- Update Moira Trigger Role 2.4.
Important
Redis DB conversion is required.
Moira 2.4 has some structure changes in Redis DB. It will work fluently out of the box, but lazy triggers will still be checked every time on new metrics.
You can upgrade from moira 2.2 or 2.3 using corresponding flag in --from-version
variable.
moira-cli --config=/etc/moira/cli.yml --update --from-version=2.2/2.3
If you would like to downgrade back to Moira 2.2 or 2.3, you should run CLI-converter.
moira-cli --config=/etc/moira/cli.yml --downgrade --to-version=2.2/2.3
Both cases imply usage of Moira-Cli v.2.4, you can find it on Release Page.
2.5.0¶
Upgrading¶
Warning
This release is not compatible with Redis version below 3.2, please upgrade your Redis instance.
Warning
It is not possible to upgrade from Moira 2.2 to Moira 2.5 directly. To upgrade Moira from version 2.2 or older to 2.5, please first run moira-cli version 2.4 (see important note).
Please update your web configuration according to the following rules:
- Add
label
(new required field). You can see default type and label field mappings in default WEB UI configuration. - Rename
title
toplaceholder
.
See WEB UI configuration guide for more information.
Incompatible changes¶
- Removed deprecated method
PUT trigger/{{triggerId}}/maintenance
. UsePUT trigger/{{triggerId}}/setMaintenance
instead (request body has not changed). - Removed deprecated version 2.2 related conversion code. Now if you want to upgrade Moira from version 2.2 or older use moira-cli version 2.4 (see important note).
- Fixed contact types configuration. It was hardcoded in the web UI, and now it is configurable via config file (as it should have been originally).
- Renamed the
title
field to more semantically correctplaceholder
in web UI config. - Added a new required
label
field to web UI config, which is used as a contact label in web UI instead oftype
. - Removed
ERROR_VALUE
andWARN_VALUE
as valid variables in expression. Old triggers with these variables will still work, but you can not update these triggers until you deleteERROR_VALUE
andWARN_VALUE
variables from the expression. - Dropped support for all old browsers. Only last 2 major versions of Chrome, Firefox, Safari (all mobile and desktop) and Edge (only desktop) are supported.
New features¶
- Added Graphite tags support #142.
- Reworked trigger search input control in web UI. Fulltext search is now available, as long as the old tag filters #185.
- Added Webhook sender #123. For more info see documantation.
- Added information who and when turned on maintenance mode. You can see it as a hint in web UI near the metric, and in metric alert message #192.
- Added a meaningful title to all Moira web pages #177.
- Added environment variable that customizes api URL path for web UI Docker image #173.
- Added new variables to script sender. Variable ${trigger_name} is now deprecated, removed from documentation and will be removed in the future versions of Moira #228. For more information about new variables and script configuration see documantation.
Bug fixes and improvements¶
- Limited connection count in Redis connection pool, added a separate pool for remote locks, added ConnectionsLimit config field in Redis configuration #163.
- Prohibited saving trigger with both
expression
andwarn_value
+error_value
. If you set both of these fields, API will return 400 status code. Web UI saves only fields that are displayed in the open tab (simple or advanced mode) #172. - Fixed handling incoming metrics with Windows-style line breaks (/r/n) #268.
- Fixed checking of triggers that do not have any metrics stored in remote or local storage #166.
- Fixed execution of self-checks: do it only once, regardless of how many notifier instances are running #186.
- Fixed response code on invalid requests to update or create trigger (was 500, now 400) #323.
- Combined telegram alert and plot in one message #248.
- Added icons in Slack notifications #180. See more: Slack icons.
- Got rid of old ugly mail template, now we use only new email template. #181.
- Fixed bug that turned old pseudo-tags
ERROR
DEGRADATION
HIGH DEGRADATION
to subscription settings checkboxes #184. - Fixed advanced schedule in subscriptions #162.
- Fixed multibyte splitting bug in graph titles #179.
- Fixed sending a message “This metric changed its state…” if a state does not change during maintenance interval #328.
- Removed useless broken links in test and self-state notifications #178.
- Fixed symbols counting bug in telegram messages #248.
- Changed mobile detection logic from “get window width” to “parse user agent and detect mobile browser” #218.
- Fixed 500 status code then trying to update subscription if one of the subscribed triggers was removed #271.
- Properly encoded parameters that are passed in a web to API requests #174.
- Fixed layout with long words or URLs in name and description on the trigger web page #176.
- Fixed showing tags that exist in the user local browser storage, but don’t exist in server-side #175.
- Fixed external loader on non-existing trigger page in a mobile version of the web #168.
- Removed cancel button and restyled delete button in subscription modal #221.
- Prohibited creating simple mode trigger with several targets via API #171.
- Fixed data source toggle that was missing from simple edit trigger mode #236.
- Fixed rising/falling mode selector when switching between simple and advanced modes #172.
- Limited Twilio SMS sender to 5 events per SMS #237.
- Fixed Pushover message priority calculation when sending over five events #237.
- Added contact type icon in choose subscription contact combobox #219.
- Changed remove subscription contact icon #220.
- Improved plotting conditions to render non-empty timeseries only #197.
- Upgraded NPM dependencies for security reasons #194.
- Added log message describing the reason why self-state monitor disabled notifications #323.
2.5.1¶
Upgrading¶
Config for web is moved to config for API. Please read API and Web to detect the changes and merge two configs. Old config for web is not needed anymore.
New features¶
- Added ability to subscribe for all triggers without specifying tags #236.
- Added ability to send markdown for email, fix markdown formatting in slack senders #353.
- Added new senders: Discord, VictorOps, PagerDuty, OpsGenie.
- ⚡️✨💫🔥🔥🔥 Graphs now support emojis #333.
- Y-axis graph now uses algorithm to define “beautiful” ticks #217.
Bug fixes and improvements¶
- Added support for magic -1 timestamp #426.
- Fixed incorrect timezone in maintenance notification text #356.
- Dependency management switched to Go modules mechanism #423.
- Linter was switched to GolangCI Lint #436.
- Go version was switched to 1.13.1 #435.
- Alert which contain NODATA now uses timestamp of NODATA detection instead of data loose time #355.
- Readyness and liveness probes delay was upgraded in helm chart to fit long triggers indexing in database #2.
- API now exits with error if unable to index triggers for full-text search #327.
- Deleting tags that are used in existing subscriptions is now disallowed #344.
2.6.2¶
What’s Changed¶
- Changed naming for feature docker images by @litleleprikon in https://github.com/moira-alert/moira/pull/440
- Added subscription transfer and contact deletion to CLI by @Nixolay in https://github.com/moira-alert/moira/pull/443
- Added Msteams support by @imavroukakis in https://github.com/moira-alert/moira/pull/432
- Upgraded golangci-lint to v1.21.0 by @titusjaka in https://github.com/moira-alert/moira/pull/439
- Fixed memory leak in Scorch-type index in Bleve by @Nixolay in https://github.com/moira-alert/moira/pull/444
- Made image cli by @Nixolay in https://github.com/moira-alert/moira/pull/454
- Checked nil pointer by @Nixolay in https://github.com/moira-alert/moira/pull/455
- Fixed nil pointer dereference in notifier by @litleleprikon in https://github.com/moira-alert/moira/pull/452
- Added twimlets support for Twilio sender by @prizov in https://github.com/moira-alert/moira/pull/450
- Panic in filter by @Nixolay in https://github.com/moira-alert/moira/pull/467
- Added cleaning moira from fired users by @Nixolay in https://github.com/moira-alert/moira/pull/458
- Cleaned up metrics interface by @Pliner in https://github.com/moira-alert/moira/pull/475
- Run goimports by @Pliner in https://github.com/moira-alert/moira/pull/478
- Metrics facade by @Pliner in https://github.com/moira-alert/moira/pull/477
- Reworked Counter.Inc and drop NewMetersCollection from Registry by @Pliner in https://github.com/moira-alert/moira/pull/479
- Set up metrics prefix at start up by @Pliner in https://github.com/moira-alert/moira/pull/480
- Added CODEOWNERS by @Pliner in https://github.com/moira-alert/moira/pull/483
- Extracted telemetry config and setup telemetry server by @Pliner in https://github.com/moira-alert/moira/pull/482
- Added Prometheus endpoint for internal metrics by @Pliner in https://github.com/moira-alert/moira/pull/474
- Fixed incorrect parsing of multiple equal signs in label by @idoqo in https://github.com/moira-alert/moira/pull/490
- Added community guides by @litleleprikon in https://github.com/moira-alert/moira/pull/485
- Moved plot boundaries by @A1bemuth in https://github.com/moira-alert/moira/pull/495
- Reduced nodes slice capacity by @A1bemuth in https://github.com/moira-alert/moira/pull/496
- Implemented stable pagination by @litleleprikon in https://github.com/moira-alert/moira/pull/498
- Enabled pprof heap handler by @A1bemuth in https://github.com/moira-alert/moira/pull/497
- Handled pager params on /page url by @litleleprikon in https://github.com/moira-alert/moira/pull/502
- Made filter optimizations by @A1bemuth in https://github.com/moira-alert/moira/pull/503
- Upgraded to Go 1.14 by @beevee in https://github.com/moira-alert/moira/pull/504
- Updated golang/mock to 1.4.1 by @A1bemuth in https://github.com/moira-alert/moira/pull/507
- Updated GolangCI Lint version by @litleleprikon in https://github.com/moira-alert/moira/pull/509
- Added reviewdog by @litleleprikon in https://github.com/moira-alert/moira/pull/468
- Improved docker-compose by @litleleprikon in https://github.com/moira-alert/moira/pull/515
- Fixed pattern matching in filter by @beevee in https://github.com/moira-alert/moira/pull/517
- Added slave replicas read by @A1bemuth in https://github.com/moira-alert/moira/pull/510
- Added cap on metric fetching to prevent checker OOM by @beevee in https://github.com/moira-alert/moira/pull/519
- Added metrics for every possible state transition by @beevee in https://github.com/moira-alert/moira/pull/527
- Metrics mixed up in graph legend by @Nixolay in https://github.com/moira-alert/moira/pull/526
- Test on a non-primary database by @Nixolay in https://github.com/moira-alert/moira/pull/529
- Added templates to trigger description #484 by @Nixolay in https://github.com/moira-alert/moira/pull/487
- Added limit to FetchNotifications function for read notifications from db by @ifireice in https://github.com/moira-alert/moira/pull/505
- Improved advanced mode by @litleleprikon in https://github.com/moira-alert/moira/pull/470
- Rewrote self-state check by @Nixolay in https://github.com/moira-alert/moira/pull/417
- Changed regexp submatch index by @litleleprikon in https://github.com/moira-alert/moira/pull/534
- Added alone metrics to get trigger reply by @litleleprikon in https://github.com/moira-alert/moira/pull/538
- Rewrote validation for empty targets by @litleleprikon in https://github.com/moira-alert/moira/pull/540
- Now we save all evaluated metrics not only from T1 target to compere it then save trigger by @borovskyav in https://github.com/moira-alert/moira/pull/541
- Do not use t1: prefix in trigger alerts that have only one target by @borovskyav in https://github.com/moira-alert/moira/pull/539
- Fixed alone metrics check error message by @borovskyav in https://github.com/moira-alert/moira/pull/543
- Improved performance of check by @litleleprikon in https://github.com/moira-alert/moira/pull/542
- Added test selfstate by @Nixolay in https://github.com/moira-alert/moira/pull/546
- Improved alone metrics error message by @litleleprikon in https://github.com/moira-alert/moira/pull/547
- Allowed stale read for pattern metrics by @litleleprikon in https://github.com/moira-alert/moira/pull/549
- Changed default metric name to T1 by @litleleprikon in https://github.com/moira-alert/moira/pull/548
- Denied usage of asterisk pattern by @litleleprikon in https://github.com/moira-alert/moira/pull/555
- Added debug exception by @Nixolay in https://github.com/moira-alert/moira/pull/552
- Fixed goroutines leak in filter by @litleleprikon in https://github.com/moira-alert/moira/pull/562
- Added tagging current master branch with latest tag by @beevee in https://github.com/moira-alert/moira/pull/565
- Fixed incorrect shutdown and conflict of data types during output in MetricsMatcher. by @JIexa24 in https://github.com/moira-alert/moira/pull/566
- Fixed multiple connections closing in Moira-Filter caused by PR-562. by @JIexa24 in https://github.com/moira-alert/moira/pull/570
- Updated golang by @Nixolay in https://github.com/moira-alert/moira/pull/571
- Added return ‘not found’ when rendering non-existing trigger chart by @idoqo in https://github.com/moira-alert/moira/pull/572
- Added more linters by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/573
- Fixed full-text search if the text is in uppercase by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/574
- Added private channels support by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/578
- Updated Go to 1.15.2 by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/579
- Fixed Telegram group chat response message by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/582
- Improved sendAsAlbum Telegram function by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/581
- Fixed sending of plots in notifications by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/580
- Added support of Slack user-group mentioning in the alert message by @ArXa1L in https://github.com/moira-alert/moira/pull/585
- Fixed CarbonAPI pow function by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/586
- Bumped golangci-lint version by @beevee in https://github.com/moira-alert/moira/pull/587
- Updated templates by @Nixolay in https://github.com/moira-alert/moira/pull/536
- Removed “parse” post message argument by @ArXa1L in https://github.com/moira-alert/moira/pull/588
- Fixed Telegram group chat response message by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/589
- Made responding only to messages beginning with /start in Telegram by @beevee in https://github.com/moira-alert/moira/pull/590
- Allowed targets be single if it not declared by @litleleprikon in https://github.com/moira-alert/moira/pull/554
- Marked all dangerous Graphite functions as such by @Nixolay in https://github.com/moira-alert/moira/pull/531
- Improved logging by @androndo in https://github.com/moira-alert/moira/pull/599
- Cleaned last check on trigger update by @litleleprikon in https://github.com/moira-alert/moira/pull/596
- Removed populate check in trigger update by @litleleprikon in https://github.com/moira-alert/moira/pull/602
- Added error logging in notifier by @litleleprikon in https://github.com/moira-alert/moira/pull/604
- Cloned logger by @androndo in https://github.com/moira-alert/moira/pull/605
- Fixed logging place by @litleleprikon in https://github.com/moira-alert/moira/pull/606
- Updated Slack client by @androndo in https://github.com/moira-alert/moira/pull/608
- Switched to github actions instead of travis CI by @litleleprikon in https://github.com/moira-alert/moira/pull/610
- Detailed logs by @androndo in https://github.com/moira-alert/moira/pull/600
- Disabled excluded logs if plots by @androndo in https://github.com/moira-alert/moira/pull/612
- Changed trigger/check method to PUT and body params by @androndo in https://github.com/moira-alert/moira/pull/611
- Detected broken contacts by @androndo in https://github.com/moira-alert/moira/pull/615
- Added metrics export by @litleleprikon in https://github.com/moira-alert/moira/pull/613
- Made expression not in uppercase only by @balalay12 in https://github.com/moira-alert/moira/pull/622
- Added team subscriptions and contacts by @litleleprikon in https://github.com/moira-alert/moira/pull/537
- Added pager deletion by @litleleprikon in https://github.com/moira-alert/moira/pull/623
- Added strings methods to templating functions by @androndo in https://github.com/moira-alert/moira/pull/624
- Fixed api bugs by @litleleprikon in https://github.com/moira-alert/moira/pull/628
- Fixed http schema escaping when build url in webhook sender by @androndo in https://github.com/moira-alert/moira/pull/627
- Moved coverage to codecov by @androndo in https://github.com/moira-alert/moira/pull/630
- Fixed checker bugs by @litleleprikon in https://github.com/moira-alert/moira/pull/621
- Fixed prepare test by @litleleprikon in https://github.com/moira-alert/moira/pull/644
- Simplified ConvertForCheck function by @litleleprikon in https://github.com/moira-alert/moira/pull/645
- Bumped go to 1.16.5 by @litleleprikon in https://github.com/moira-alert/moira/pull/642
- Improved speed of metrics matching in filter by @KiskachiMaria in https://github.com/moira-alert/moira/pull/682
- Added performance test for matching of tagged metrics by @dmitryanchikov in https://github.com/moira-alert/moira/pull/686
- Updated module github.com/golang/snappy by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/698
- Changed Kontur logo by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/704
- Added shared test configuration for GoLand by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/703
- Updated bleve package by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/706
- Fixed redis port exposing by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/711
- Added automaxprocs package to filter by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/712
- Decreased level of logging for broken contact errors to ‘warning’ by @dmitryanchikov in https://github.com/moira-alert/moira/pull/716
- Added automaxprocs package to api, checker, cli, notifier by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/719
- Added CodeQL analysis by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/705
- Fixed plotting error in notifier by @dmitryanchikov in https://github.com/moira-alert/moira/pull/724
2.7.0¶
Upgrading¶
Important
Redis config section update is required.
Please update redis section in your config files according to the examples.
Important
Redis DB conversion is required.
Moira 2.7 has some structure changes in Redis DB.
You can upgrade from moira 2.6.2 using corresponding flag in --from-version
variable.
moira-cli --config=/etc/moira/cli.yml --update --from-version=2.6
If you would like to downgrade back to Moira 2.6.2, you should run CLI-converter.
moira-cli --config=/etc/moira/cli.yml --downgrade --to-version=2.6
Both cases imply usage of Moira-Cli v.2.7, you can find it on Release Page.
What’s Changed¶
- Added Redis Cluster support by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/696
2.7.1¶
What’s Changed¶
- Updated golang to 1.17.7 by @zhelyabuzhsky in https://github.com/moira-alert/moira/pull/732
- Extended list of broken contact errors by @dmitryanchikov in https://github.com/moira-alert/moira/pull/729
- Improved speed of tagged metrics matching in filter by @KiskachiMaria in https://github.com/moira-alert/moira/pull/726
Installation¶
Manual Installation¶
Tip
To get Moira running quickly, try Docker version
There are following components you need to install before running Moira microservices:
Build Moira Microservices¶
go get -u github.com/moira-alert/moira
cd $GOPATH/src/github.com/moira-alert/moira
make build
You will find binaries in $GOPATH/src/github.com/moira-alert/moira/build
.
Download Web UI Application¶
https://github.com/moira-alert/web2.0/releases/latest
Download and unpack .tar.gz
file into Nginx static
files directory (e.g. /var/local/www/moira
).
Configure¶
- If you need to override default settings, place configuration
files somewhere on your disk (e.g.
/etc/moira/
). You can dive into Configuration syntax on a separate page. - Place nginx configuration file to proper location
(e.g.
/etc/nginx/conf.d/moira.conf
):
server {
listen 127.0.0.1:80;
location / {
root /var/local/www/moira;
index index.html;
try_files $uri $uri/ /index.html;
}
location /api/ {
proxy_pass http://127.0.0.1:8081;
}
}
3. If you need to override UI settings, edit web.json file. You can find its location in API configuration.
Run¶
- Run nginx and redis-server
- Run microservices
$GOPATH/src/github.com/moira-alert/moira/build/cache
$GOPATH/src/github.com/moira-alert/moira/build/checker
$GOPATH/src/github.com/moira-alert/moira/build/notifier
$GOPATH/src/github.com/moira-alert/moira/build/api
Now you need to feed your metrics to Moira (see Feeding Metrics to Moira) on port 2003 and to create alerts in UI (see User Guide).
Docker¶
You can quickly test a local Moira installation using Docker containers from Docker Hub and docker-compose file in documentation repository.
git clone https://github.com/moira-alert/docker-compose.git
cd docker-compose
docker-compose pull
docker-compose up
Containers are preconfigured to serve Web UI at localhost:8080
and accept graphite metrics at localhost:2003
.
RPM and DEB Packages¶
All stable versions of Moira components are tagged on GitHub. For every tag, we automatically build RPM and DEB packages. You can download these packages on each repository release page:
Configuration¶
By default, microservices will look for /etc/moira/<servicename>.yml
,
but you can change this location by passing your path as a command-line
parameter --config
.
On this page you can find examples of configuration files for Moira microservices.
Filter¶
# Redis configuration depends on fields specified in redis config section:
# 1. Use field `master_name` to enable Redis Sentinel support
# 2. Specify two or more `addrs` to enable cluster support
# 3. Otherwise, standalone configuration is enabled
redis:
# Sentinel master name
master_name: ""
# address list, format: {host1_name:port},{ip:port}
addrs: "localhost:6379"
# Redis username
username: "username"
# Redis password
password: "password"
# Moira will delete metrics older than this value from Redis. Large values will lead to various problems everywhere.
# See https://github.com/moira-alert/moira/pull/519
metrics_ttl: 3h
# Dial timeout for establishing new connections
# Default is 500 milliseconds
dial_timeout: 1s
# Timeout for socket reads. If reached, commands will fail
# with a timeout instead of blocking. Default is 3 seconds.
# Skip this setting or set 0 for default.
read_timeout: 5s
# Timeout for socket writes. If reached, commands will fail
# with a timeout instead of blocking. Default is ReadTimeout.
# Skip this setting or set 0 for default.
write_timeout: 5s
telemetry:
# Common port for all telemetry data: Prometheus scraping, pprof, etc.
listen: ":8091"
pprof:
# If true, pprof will be enabled on common telemetry port.
enabled: false
graphite:
# If true, graphite sender will be enabled.
enabled: true
# If true, runtime stats will be captured and sent to graphite. Note: It takes to call stoptheworld() with configured "graphite.interval" to capture runtime stats (https://golang.org/src/runtime/mstats.go)
runtime_stats: false
# Graphite relay URI, format: ip:port
uri: "graphite-relay:2003"
# Moira metrics prefix. Use 'prefix: {hostname}' to use hostname autoresolver.
prefix: DevOps.moira
# Metrics sending interval
interval: 60s
filter:
# Metrics listener uri
listen: ":2003"
# Retentions config file path. Simply use your original storage-schemas.conf or create new if you're using Moira without existing Graphite installation.
retention_config: /etc/moira/storage-schemas.conf
# Number of metrics to cache before checking them.
# Note: As this value increases, Redis CPU usage decreases.
# Normally, this value must be an order of magnitude less than graphite.prefix.filter.recevied.matching.count | nonNegativeDerivative() | scaleToSeconds(1)
# For example: with 100 matching metrics, set cache_capacity to 10. With 1000 matching metrics, increase cache_capacity up to 100.
cache_capacity: 10
# Defines number of threads to match incoming graphite-metrics.
# Equals to the number of processor cores found on Moira host by default or when variable is defined as 0.
max_parallel_matches: 0
# Period in which patterns will be reloaded from Redis
patterns_update_period: 1s
log:
log_file: stdout
log_level: info
storage-schemas.conf is graphite carbon configuration file that should match similarly-named file in your Graphite installation.
Checker¶
# Redis configuration depends on fields specified in redis config section:
# 1. Use field `master_name` to enable Redis Sentinel support
# 2. Specify two or more `addrs` to enable cluster support
# 3. Otherwise, standalone configuration is enabled
redis:
# Sentinel master name
master_name: ""
# address list, format: {host1_name:port},{ip:port}
addrs: "localhost:6379"
# Redis username
username: "username"
# Redis password
password: "password"
# Moira will delete metrics older than this value from Redis. Large values will lead to various problems everywhere.
# See https://github.com/moira-alert/moira/pull/519
metrics_ttl: 3h
# Dial timeout for establishing new connections
# Default is 500 milliseconds
dial_timeout: 1s
# Timeout for socket reads. If reached, commands will fail
# with a timeout instead of blocking. Default is 3 seconds.
# Skip this setting or set 0 for default.
read_timeout: 5s
# Timeout for socket writes. If reached, commands will fail
# with a timeout instead of blocking. Default is ReadTimeout.
# Skip this setting or set 0 for default.
write_timeout: 5s
telemetry:
# Common port for all telemetry data: Prometheus scraping, pprof, etc.
listen: ":8091"
pprof:
# If true, pprof will be enabled on common telemetry port.
enabled: false
graphite:
# If true, graphite sender will be enabled.
enabled: true
# If true, runtime stats will be captured and sent to graphite. Note: It takes to call stoptheworld() with configured "graphite.interval" to capture runtime stats (https://golang.org/src/runtime/mstats.go)
runtime_stats: false
# Graphite relay URI, format: ip:port
uri: "graphite-relay:2003"
# Moira metrics prefix. Use 'prefix: {hostname}' to use hostname autoresolver.
prefix: DevOps.moira
# Metrics sending interval
interval: 60s
checker:
# Period for every trigger to perform forced check on
nodata_check_interval: 60s
# Min period to perform triggers re-check. Note: Reducing of this value leads to increasing of CPU and memory usage values
check_interval: 10s
# In Moira 2.4 we add a new entity - Lazy Trigger. This is a regular trigger but without any subscription for it.
# By default Moira treats any trigger equally regardless on its subscriptions number.
# You can change this behaviour using option below. This can reduce CPU usage on your server.
# Lazy triggers checker works if lazy_triggers_check_interval > check_interval. We recommend setting it to 10m.
lazy_triggers_check_interval: 10m
# Period for every trigger to cancel forced check (greater than 'NoDataCheckInterval') if no metrics were received
stop_checking_interval: 30s
# Equals to the number of processor cores found on Moira host by default or when variable is defined as 0.
max_parallel_checks: 0
# Is related with remote triggers (see remote section)
# Equals to the number of processor cores found on Moira host by default or when variable is defined as 0.
max_parallel_remote_checks: 0
# This section configures remote triggers Checker.
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#remote-triggers-checker for futher information
remote:
enabled: false
# URL of Graphite HTTP API: graphite-web, carbonapi, etc.
# Specify full URL including '/render'
url: "http://graphite.example.com/render"
# Auth username. Only Basic-auth supported
user: devops
# Auth password. Only Basic-auth supported
password: verySecurePassword
# Min period to perform triggers re-check.
# Note: Reducing of this value leads to increasing of CPU and memory usage values and extra load on Graphite HTTP API
check_interval: 60s
# Don't fetch metrics older than this value from remote storage
metrics_ttl: 168h
# Maximum timeout for HTTP-request made to Graphite HTTP API
timeout: 60s
log:
log_file: stdout
log_level: info
Remote Triggers Checker¶
One of Moira key feature is Graphite independance. Some Graphite queries are very ineffective. Tools like Seyren multiply this effect every minute making lots of ineffective queries and overloading your cluster. Moira relies on the incoming metric stream, and has its own fast cache for recent data.
Enabling Remote triggers Checker allows user to create triggers that relies on Graphite Storage instead of Redis DB.
Warning
Use this feature with caution, because it can create an extra load on Graphite HTTP API.
Lazy Triggers Checker¶
In Moira 2.4 we add a new entity - Lazy Trigger. This is a regular trigger
but without any subscription for it. By default Moira treats any trigger
equally regardless on its subscriptions number. You can change this behaviour
using lazy_triggers_check_interval
option in checker section. This can
reduce CPU usage on your server. Lazy triggers checker works if
lazy_triggers_check_interval
> check_interval
. We recommend set
it to 10m
(10 minutes).
Notifier¶
# Redis configuration depends on fields specified in redis config section:
# 1. Use field `master_name` to enable Redis Sentinel support
# 2. Specify two or more `addrs` to enable cluster support
# 3. Otherwise, standalone configuration is enabled
redis:
# Sentinel master name
master_name: ""
# address list, format: {host1_name:port},{ip:port}
addrs: "localhost:6379"
# Redis username
username: "username"
# Redis password
password: "password"
# Moira will delete metrics older than this value from Redis. Large values will lead to various problems everywhere.
# See https://github.com/moira-alert/moira/pull/519
metrics_ttl: 3h
# Dial timeout for establishing new connections
# Default is 500 milliseconds
dial_timeout: 1s
# Timeout for socket reads. If reached, commands will fail
# with a timeout instead of blocking. Default is 3 seconds.
# Skip this setting or set 0 for default.
read_timeout: 5s
# Timeout for socket writes. If reached, commands will fail
# with a timeout instead of blocking. Default is ReadTimeout.
# Skip this setting or set 0 for default.
write_timeout: 5s
telemetry:
# Common port for all telemetry data: Prometheus scraping, pprof, etc.
listen: ":8091"
pprof:
# If true, pprof will be enabled on common telemetry port.
enabled: false
graphite:
# If true, graphite sender will be enabled.
enabled: true
# If true, runtime stats will be captured and sent to graphite. Note: It takes to call stoptheworld() with configured "graphite.interval" to capture runtime stats (https://golang.org/src/runtime/mstats.go)
runtime_stats: false
# Graphite relay URI, format: ip:port
uri: "graphite-relay:2003"
# Moira metrics prefix. Use 'prefix: {hostname}' to use hostname autoresolver.
prefix: DevOps.moira
# Metrics sending interval
interval: 60s
notifier:
# Soft timeout to start retrying to send notification after single failed attempt
sender_timeout: 10s
# Hard timeout to stop retrying to send notification after multiple failed attempts
resending_timeout: "1:00"
# Web-UI uri prefix for trigger links in notifications. For example: with 'http://localhost' every notification will contain link like 'http://localhost/trigger/triggerId'
front_uri: "https://moira.example.com"
# Timezone to use to convert ticks. Default is UTC. See https://golang.org/pkg/time/#LoadLocation for more details.
timezone: Europe/Moscow
# Format for email sender. Default is "15:04 02.01.2006". See https://golang.org/pkg/time/#Time.Format for more details about golang time formatting.
date_time_format: "15:04 02.01.2006"
# Amount of messages notifier reads from Redis per iteration, -1 for unlimited
read_batch_size: -1
# List of senders, every element has "type" field (one of ["pushover", "slack", "mail", "telegram", "twilio sms", "twilio voice", "script"])
# Every type of sender has additional config fields
senders:
- type: msteams
#the max amount of events you want to be sent to your channel, -1 for unlimited, any other positive value to limit events
max_events: -1
- type: pushover
# Api token for your pushover channel, for more info see https://pushover.net/api#registration
api_token: ...
- type: slack
# Api token for your moira notifications slack user, for more info see https://get.slack.help/hc/en-us/articles/215770388-Create-and-regenerate-API-tokens
api_token: ...
# If true, notification will be sent with state-specific icon, for more info see https://moira.readthedocs.io/en/latest/installation/configuration.html#slack-icons.
use_emoji: false
- type: telegram
# Api token for your telegram bot, for more info about creating bot and get token see https://core.telegram.org/bots#3-how-do-i-create-a-bot
api_token: ...
- type: mail
mail_from: ...
smtp_host: ...
smtp_port: ...
# Skip SMTP server certificate chain validation if false
insecure_tls: false
# Uses "mail_from" if empty
smtp_user: ...
smtp_pass: ...
# Email template file path (standard Go templates). By default use 'Fancy' template (see screenshot below). If empty, use build-in template with no markups and styles.
template_file: '/etc/moira/fancy-template.html'
- type: twilio voice
api_asid: ...
api_authtoken: ...
api_fromphone: ...
# URL that responds with TwiML config for voice message generation, see https://www.twilio.com/docs/api/twiml/voice-overview
voiceurl: ...
append_message: true
- type: twilio sms
api_asid: ...
api_authtoken: ...
api_fromphone: ...
# Script and webhook senders support additional templated parameters:
# ${contact_id} contact ID
# ${contact_value} contact value (as specified by user via web UI)
# ${contact_type} contact type (as specified in web UI config file)
# ${trigger_id} trigger ID
- type: script
name: ...
# Executable path. File must exist on all machines where notifier is running.
# You can use templated parameters here (see above), they will be replaced with appropriate values.
exec: ...
- type: webhook
name: ...
# URL to send POST request (you can use templated parameters, see above)
url: ...
timeout: ...
# Basic authorization parameters (if required)
user: ...
password: ...
- type: pagerduty
- type: opsgenie
api_key: ...
- type: victorops
routing_url: ...
- type: discord
token: ...
# Self state monitor configuration section. Note: No inner subscriptions is required. Moira will use its notification mechanism to send messages.
moira_selfstate:
enabled: true
# If true, Moira selfstate will check remote triggers checker works properly and notify admin if remote checker fails
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#remote-triggers-checker for futher information
remote_triggers_enabled: false
# Max Redis disconnect delay to send alert when reached
redis_disconect_delay: 60s
# Max Filter metrics receive delay to send alert when reached
last_metric_received_delay: 120s
# Max Checker checks perform delay to send alert when reached
last_check_delay: 120s
# Max Remote triggers Checker checks perform delay to send alert when reached
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#remote-triggers-checker for futher information
last_remote_check_delay: 300s
# Self state monitor alerting interval
notice_interval: 300s
# Contact list for Self state monitor alerts, use this like delivery channels in web-ui
contacts:
- type: mail
value: devopsteam@example.com
log:
log_file: stdout
log_level: info
# This section configures remote triggers Checker.
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#remote-triggers-checker for futher information
remote:
enabled: false
# URL of Graphite HTTP API: graphite-web, carbonapi, etc.
# Specify full URL including '/render'
url: "http://graphite.example.com/render"
# Auth username. Only Basic-auth supported
user: devops
# Auth password. Only Basic-auth supported
password: verySecurePassword
# Min period to perform triggers re-check.
# Note: Reducing of this value leads to increasing of CPU and memory usage values and extra load on Graphite HTTP API
check_interval: 60s
# Don't fetch metrics older than this value from remote storage
metrics_ttl: 168h
# Maximum timeout for HTTP-request made to Graphite HTTP API
timeout: 60s
Slack icons¶

By default Slack sender won’t change default icon configured for your bot. To use state-specific icons in notifications:

- Download and unzip notification icons
- Add icons from
..icons/slack
directory as custom emojis according to their filenames to Slack - Set
use_emoji
totrue
for Slack sender section in notifier configuration file
Self State Monitor¶
If self state monitor is enabled, Moira will periodically check the Redis connection, the number of incoming metrics in the Moira-Filter and the number of triggers to be checked by Moira-Checker.
See Self State Monitor for more details.
API and Web¶
# Redis configuration depends on fields specified in redis config section:
# 1. Use field `master_name` to enable Redis Sentinel support
# 2. Specify two or more `addrs` to enable cluster support
# 3. Otherwise, standalone configuration is enabled
redis:
# Sentinel master name
master_name: ""
# address list, format: {host1_name:port},{ip:port}
addrs: "localhost:6379"
# Redis username
username: "username"
# Redis password
password: "password"
# Moira will delete metrics older than this value from Redis. Large values will lead to various problems everywhere.
# See https://github.com/moira-alert/moira/pull/519
metrics_ttl: 3h
# Dial timeout for establishing new connections
# Default is 500 milliseconds
dial_timeout: 1s
# Timeout for socket reads. If reached, commands will fail
# with a timeout instead of blocking. Default is 3 seconds.
# Skip this setting or set 0 for default.
read_timeout: 5s
# Timeout for socket writes. If reached, commands will fail
# with a timeout instead of blocking. Default is ReadTimeout.
# Skip this setting or set 0 for default.
write_timeout: 5s
telemetry:
# Common port for all telemetry data: Prometheus scraping, pprof, etc.
listen: ":8091"
pprof:
# If true, pprof will be enabled on common telemetry port.
enabled: false
graphite:
# If true, graphite sender will be enabled.
enabled: true
# If true, runtime stats will be captured and sent to graphite. Note: It takes to call stoptheworld() with configured "graphite.interval" to capture runtime stats (https://golang.org/src/runtime/mstats.go)
runtime_stats: false
# Graphite relay URI, format: ip:port
uri: "graphite-relay:2003"
# Moira metrics prefix. Use 'prefix: {hostname}' to use hostname autoresolver.
prefix: DevOps.moira
# Metrics sending interval
interval: 60s
api:
# Api local network address. Default is ':8081' so api will be available at http://moira.company.com:8081/api
listen: ":8081"
# If true, CORS for cross-domain requests will be enabled. This option can be used only for debugging purposes.
enable_cors: false
# Web_UI config file path. If file not found, api will return 404 in response to "api/config"
web_config_path: "/etc/moira/web.json"
web:
# Moira administrator email address
supportEmail: "devops@example.com"
# List of enabled contact types
contacts:
- type: mail
label: E-mail
validation: "^.+@.+\\..+$"
- type: msteams
label: Microsoft Teams
- type: pushover
label: Pushover
placeholder: "Pushover user key"
- type: slack
label: Slack
validation: "^[@#][a-zA-Z0-9-_]+"
placeholder: "Slack #channel or @user"
- type: telegram
label: Telegram
placeholder: "#public_channel, %private_channel, @username or group"
help: |
### To make things work you should:
### In personal chat:
- start conversation with bot [@YourMoiraBot](https://t.me/YourMoiraBot);
- execute command `/start`;
- type your login in above field as `@login`.
### In group chat:
- invite bot [@YourMoiraBot](https://t.me/YourMoiraBot) into chat;
- execute command `/start@YourMoiraBot`;
- bot will send you chat name, you should type it without extra characters in above field.
### In public channel:
- add bot [@YourMoiraBot](https://t.me/YourMoiraBot) into channel;
- promote bot as channel administrator;
- type channel name in above field as `#channel`.
### In private channel:
- add bot [@YourMoiraBot](https://t.me/YourMoiraBot) into the channel;
- promote bot as channel administrator;
- open your private channel on the [web](https://web.telegram.org/#/im);
- get channel id from URL (e.g., `https://web.telegram.org/#/im?p=c1494975744_17340166617136722341`) between `c` and `_`;
- type channel id in the above field as `%1494975744`.
- type: twilio sms
label: Twilio SMS
validation: "^\\+79\\d{9}$"
placeholder: "Phone number format +79*********"
- type: twilio voice
label: Twilio voice
validation: "^\\+79\\d{9}$"
placeholder: "Phone number format +79*********"
- type: webhook
label: My Webhook
validation: "^(http|https):\\/\\/.*(example.com|example.org)(:[0-9]{2,5})?\\/"
placeholder: "https://example.com/webhooks/moira"
help: "### Domains whitelist:\n - example.com\n - example.org"
- type: pagerduty
label: PagerDuty
placeholder: "Integration key"
- type: opsgenie
label: OpsGenie
placeholder: "Responder Name or ID"
- type: victorops
label: VictorOps
placeholder: "Routing key"
- type: discord
label: Discord
placeholder: "Discord channel (eg: general-text) or user (eg: @user)"
log:
log_file: stdout
log_level: info
# This section configures remote triggers Checker.
# See https://moira.readthedocs.io/en/latest/installation/configuration.html#remote-triggers-checker for futher information
remote:
enabled: false
# URL of Graphite HTTP API: graphite-web, carbonapi, etc.
# Specify full URL including '/render'
url: "http://graphite.example.com/render"
# Auth username. Only Basic-auth supported
user: devops
# Auth password. Only Basic-auth supported
password: verySecurePassword
# Min period to perform triggers re-check.
# Note: Reducing of this value leads to increasing of CPU and memory usage values and extra load on Graphite HTTP API
check_interval: 60s
# Don't fetch metrics older than this value from remote storage
metrics_ttl: 168h
# Maximum timeout for HTTP-request made to Graphite HTTP API
timeout: 60s
Web contact fields:
- type (any uniq string) required — contact type: pushover, slack, mail, script, telegram, twilio sms, twilio voice, etc.;
- label required — contact label type. Uses in add/edit contact form in select control;
- validation — regular expression for user contact, uses for validation in add/edit contact form;
- placeholder — hint shown in input field;
- help — help text in Markdown markup;

Remote API¶
By default, Web uses local API server (both containers are running on the same host). But if you need to reconfigure Web to interact with API running on remote server then simply set container environment variable MOIRA_API_URI equal to required URI:
MOIRA_API_URI: remoteapi.domain:8081
Webhooks and Custom Scripts¶
Moira has two special kinds of senders: webhooks and scripts.
Script runs an executable on the same machine that runs notifier instances. Webhook makes POST requests to a specified URL. Scripts and webhooks are a flexible way to add integrations with services that are not supported in Moira natively.
You may want to add several different scripts for users to choose from. Next section describes how to implement this.
Scripts¶
You can specify executable path and arguments in the notifier configuration file (see Configuration for details).
Add a separate section for each script:
- type: script
name: jira
exec: /usr/bin/post_to_jira --project=${contact_value}
- type: script
name: irc
exec: /opt/myscripts/irc_adapter ${contact_value} ${trigger_id}
Then, in web UI configuration:
{
"contacts": [
{
"type": "jira",
"label": "Create JIRA issue",
"placeholder": "Project name"
},
{
"type": "irc",
"label": "Post to IRC channel",
"placeholder": "Channel name"
}
],
...
}
Templated Parameters¶
You may have noted that we use templated parameters like
${contact_value}
in configuration examples. You can use these
parameters in script as well as webhook contacts. Notifier will
replace them with actual values extracted from event.
Parameter | Value |
---|---|
${contact_id} | Contact ID |
${contact_value} | Contact value, as specified by user via web UI |
${contact_type} | Contact type, as specified in web UI config file |
${trigger_id} | Trigger ID |
Webhooks¶
On each event, Moira will make a POST request to the URL specified in notifier configuration file with the following JSON payload.
Attribute | Type | Description |
---|---|---|
trigger | Trigger | Trigger data |
events | Event Array | List of events |
contact | Contact | Contact data |
plot | String | Base64 string containing trigger plot |
throttled | Bool | True if notifications are throttled |
Fields Description¶
Trigger¶
Attribute | Type | Description |
---|---|---|
id | String | Trigger ID |
name | String | Trigger name |
description | String | Trigger description |
tags | String Array | List of trigger tags |
Event¶
Attribute | Type | Description |
---|---|---|
metric | String | Metric name |
value | Float64 | Metric value |
timestamp | Int64 | Event timestamp |
trigger_event | Bool | Event type |
state | String | Current metric state |
old_state | String | Previous metric state |
Contact¶
Attribute | Type | Description |
---|---|---|
type | String | Contact type |
value | String | Contact value |
id | String | Contact ID |
user | String | Contact Author |
HTTP Headers¶
Name | Value |
---|---|
User-Agent | Moira |
Content-Type | application/json |
Example¶
{
"trigger": {
"id": "triggerID",
"name": "triggerName",
"description": "triggerDescription",
"tags": [
"triggerTag1",
"triggerTag2"
]
},
"events": [
{
"metric": "metricName1",
"value": 0,
"timestamp": 499165200,
"trigger_event": false,
"state": "OK",
"old_state": "ERROR"
},
{
"metric": "triggerName",
"value": 0,
"timestamp": 1445412480,
"trigger_event": true,
"state": "OK",
"old_state": "ERROR"
},
{
"metric": "metricName2",
"value": 0,
"timestamp": -446145720,
"trigger_event": false,
"state": "OK",
"old_state": "ERROR"
}
],
"contact": {
"type": "webhookContactName",
"value": "https://localhost/webhooks/moira",
"id": "9728adae-1487-4e5b-80f6-8496f59b223e",
"user": "author"
},
"plot": "",
"throttled": false
}
Feeding Metrics to Moira¶
Moira needs to keep its own local copy of your metric data to improve performance and reduce load on your existing graphite cluster. This means data needs to be duplicated from your existing stream and sent to your existing cluster and to your Moira installation.
Unfortunately, the Carbon-Relay with Graphite does not support duplication of data to multiple backends, and so you need to use a more feature rich carbon relay such as carbon-c-relay.
The following is a basic example configuration which defines two clusters and sends all metrics to both at once. One cluster is Moira installation, and the other uses consistent hashing across a three node cluster of Carbon servers.
cluster moira
forward
moira-host:2003
;
cluster graphite
carbon_ch
127.0.0.1:2006=a
127.0.0.1:2007=b
127.0.0.1:2008=c
;
match *
send to
moira
graphite
;
Security¶
Typically, internal DevOps tools like Graphite are deployed in intranet without any external access, so you can skip authentication and leave everything accessible to everyone. But powerful Moira features, like separate subscriptions for tags, work best when you have some way to tell apart users.
Moira doesn’t provide any authentication mechanism. It is hard to find
one that fits all situations. Instead, Moira accepts X-WebAuth-User
header with some user id, like login name. You are free to set up any
reverse proxy and configure it to provide this header.
If you don’t, Moira will assume that user id is “anonymous”.
Warning
Even if you do provide authentication header, please note that most parts of Moira are read and write accessible to every user, and there is no meaningful way of authorization in Moira. This is by design, because Moira is an internal DevOps tool. Separating users is a convenience, not protection feature.
Example of Nginx Configuration¶
Assuming that Moira UI static files are in /var/www/moira-web
and API is running on port 8081
server {
auth_basic "Moira";
auth_basic_user_file /etc/nginx/htpasswd;
listen 0.0.0.0:80 default_server;
location / {
root /var/www/moira;
index index.html;
try_files $uri $uri/ /index.html;
}
location /api/ {
proxy_pass http://127.0.0.1:8081;
proxy_set_header X-WebAuth-User $remote_user;
}
}
Look at auth_basic_module if you need more details of Nginx basic authentication.
Webhooks and Custom Scripts¶
When configuring Webhooks and Custom Scripts, note that
${contact_value}
is substituted with user input value from web UI.
It means that a malicious user can potentially run anything or make
arbitrary web requests on a server that runs Moira notifier.
Always make sure you can trust your users if you use ${contact_value}
templated parameter in your scripts or webhooks.
User Guide¶
This user guide is based on a number of real-life scenarios, from simple and universal to complicated and specific.
Simple Threshold Trigger¶
Let’s say you measure how much free space is left on your HDD and store
this value as DevOps.my_server.hdd.freespace_mbytes
in Graphite.
Maybe you want to get an email when you have less than 50 GB left (it’s not
a big problem), and a Pushover notification when you have less than
1 GB left (you really need to delete something asap).
You can easily accomplish this by adding a trigger in Moira’s Simple Mode:

Graphite Target¶
You can specify a single metric like we did here:
DevOps.my_server.hdd.freespace_mbytes
.
You can also specify multiple metrics like DevOps.*.hdd.freespace_mbytes
.
All metrics will be monitored separately, and you will get separate
notifications for each metric.
You can even use Graphite functions like
movingAverage(DevOps.my_server.hdd.freespace_mbytes, 10)
. Moira understands
everything that Graphite itself understands. See appropriate documentation.
Thresholds¶
In simple mode you need to at least one threshold values: WARNING and ERROR.
In our example we set both, lower values are bad, so we set warning threshold
greater than error threshold. In this case, Moira will consider any value less
than 50000 a warning and less than 1000 an error, which is what we want.
In other cases, you may need to consider large values a problem - then you
should make error threshold greater than warning and select
Watch for value rising
option.
See also
You can set only one threshold. For example, you can set WARNING equal to 50000, omit ERROR and select Watch for value falling
.
In this case you will receive only WARNING messages when free space goes under 50GB and never receive ERROR messages.
You can also do vise versa: set ERROR and omit WARNING.
Tags¶
In Moira, you cannot subscribe to a single trigger. Instead, you should categorize your triggers by tags and subscribe to a tag. It may look like an overkill here, but when you have dozens of triggers, you are much better off with tags, because you don’t have to enter your contact information over and over again. Tags also help to filter information on main screen:

You can add as many tags as you want.
Subscriptions¶
Proceed to the Setting Up Your Subscriptions page to learn how to set up a subscription to your trigger.
Setting Up Your Subscriptions¶
By now you should have at least one trigger saved. If you don’t, go back to the Simple Threshold Trigger page.
First, add some delivery channels:

If your Moira installation is configured with separate user accounts, you will see only your channels and subscriptions on this page. Otherwise, every user will see the same page with the same channels and subscriptions.
Consult Security page for instructions on separating user accounts.
Once you have at least one channel, you can create a subscription.
Press + Add subscription
button:

Plotting¶
According to two existing polling approaches:
- Local triggers are best to analyze realtime metrics
- Remote triggers allows to use wider time windows to fetch historical data directly from Graphite
there is also two different time ranges will be used according to trigger type:
- Notification based on events generated by local trigger will contain graph with timeseries for the last 30 minuntes wheter is throttled or it was scheduled earlier because of subscription’s own time limits.
- Notification based on events generated by remote trigger will contain graph with timeseries for not less than 30 minuntes until last event occured. Otherwise first and last events times will form the window.

Tags¶
Add required tags into subscription to receive notifications from triggers with these tags.
Matching rule is: Notification will be sent if trigger contains ALL of selected tags.
For example:
If subscription has only one tag, you will receive notifications from any trigger with this tag.
Create Triggger1 with tags:
["DevOps", "Moira-duty"]
Create Triggger2 with tags:
["DevOps"]
Create Subscription1 with tags:
["DevOps"]
By using Subscription1 you will receive events for both Triggger1 and Triggger2
If subscription has multiple tags, you will receive notifications only from triggers which include all these tags.
Create Subscription2 with tags:
["DevOps", "Moira-duty"]
By using Subscription2 you will receive events only for Trigger1
Ignore Specific States Transitions¶
You also can reduce number of notifications ignoring unnecessary event. For this purpose use following check boxes:
Send notifications when triggers degraded only
Only following states transitions will require notifications:
OK
→WARN
OK
→ERROR
OK
→NODATA
WARN
→ERROR
WARN
→NODATA
ERROR
→NODATA
Do not send WARN notifications
Following states transitions will be ignored:
OK
→WARN
WARN
→OK
Create and Test¶
You can just save your subscription, but if you want to be 100% sure it works, you should immediately test it. Dummy notification message will arrive shortly.
Efficient Triggers¶
To use Moira efficiently, you should understand its underlying design decisions.
We often notice that when new users create their first triggers, they set thresholds at random, or by intuition. It happens because when you configure your first 24/7/365 automated monitoring system, you don’t really know how your system works. If you have at least hundreds of metrics, it’s impossible to watch all of them with your eyes. What are the limits of your system? How often does your system reach critical resource consumption during a day? Should you immediately react when metric X reaches value N, or is it a fluctuation that passes by itself?
Later, when you learn to understand you system, you will need to adjust your triggers. And that’s when you need to understand Moira.
States¶
Unlike many other tools providing several distinct level systems like “priority” and “severity”, Moira supports a single set of states. Every state has a well-defined meaning, and you should use these states accordingly.
OK¶
This is a basic state, in which all your metrics must spend most of their time. Just like you keep your autotests green, you should keep your metrics green.
WARN¶
This state means that you should do something to prevent ERRORs in the future. Not immediately: maybe you should order more hardware from your vendor, or plan to optimize code in the next iteration. You can configure less intrusive delivery channels here, like email.
Metrics can be in this state for days or even weeks.
ERROR¶
This is a critical condition that requires immediate intervention. Your datacenter is on fire. All application processes shut down. There is no disk space left on your database server to process million-dollar transactions. These notifications are important enough to wake you up at night. You can still configure schedules to assign shifts to several engineers, though (see Schedules). You should configure more intrusive delivery channels here, like Pushover.
Metrics should not be in this state for more than several hours.
Moira will send you reminders every 24 hours if some of your metrics remain in this state.
If a delivery channel supports high-priority messages (like Pushover does), Moira will try to use them for ERRORs.
NODATA¶
This state means that Moira hasn’t been receiving data points for a metric for some time. See Dealing with NODATA for details. This state is considered as bad as an ERROR in Moira (because it can actually be an ERROR - we don’t receive any data, so we don’t know for sure). It may be even worse than an ERROR, because users tend to ignore metrics in this state and leave them hanging in the web interface, greatly increasing the chance to miss something actually important. You should delete old unused metrics from Moira when they stop providing data points:

In the beginning every metric is in this state. You will receive one NODATA → OK notification when the first data point arrives.
Moira will send you reminders every 24 hours if some of your metrics remain in this state.
Moira will set NODATA state only for known metrics - i.e. for metrics that have sent at least one data point to Moira.
EXCEPTION¶
This is an error inside Moira. Unless you have bad syntax in your Advanced Mode Trigger trigger, this has nothing to do with your metric state. You should try to fix or update Moira, or contact Moira developers (see Contact Moira Developers).
Dealing With False Positives¶
Sometimes it’s hard to maintain strict rule of keeping your metrics green, if your triggers switch OK → ERROR → OK → ERROR for short periods of time several times a day. It can lead to alarm fatigue and missing actual failures.
There is no single recipe for eliminating false positives, but here are some tips.
Use Graphite Functions¶
Graphite provides tons of useful functions to process data, and Moira understands all of them. For example:
If you are experiencing peaks on you graphs that lead to unnecessary state switches, you can alleviate these peaks with
movingAverage
ormovingMedian
.If you are interested in aggregate 10-minute values, not single minute values, use
movingSum
.If you want zeros instead of missing data points, use
transformNull
. Also,keepLastValue
is useful when dealing with missing points.Avoid functions that show and hide metrics, like
averageAbove
. Moira does not consider hidden metrics to be in NODATA state. Instead, Moira retains last state that the metric had when it was visible.
Draw First, Monitor Later¶
Always draw a graph of target(s) you are planning to monitor. Use generic Graphite web interface or something like Grafana. Look for minumum and maximum values. Notice, how often and for how long the graph crosses your planned thresholds. Try to correlate the graph with previous system failures. Then, copy and paste corrected target to Moira.
Of course, you may and should remove any functions that make no sense in Moira
(like sortByTotal
) and can generate unwanted side effects
(like averageAbove
).
Schedules¶
Moira provides two ways of defining allowed time intervals for notifications.
Subscription Schedule¶
If a metric is not that important to wake you up in the middle of the night, you can set a schedule for subscription:

Notifications generated by this subscription will arrive only on weekdays, from 08:00 to 17:59 local time.
If an event happens on weekend, you will receive a notification at 08:00 on Monday. So notifications are not skipped, you just receive them later. Events will still appear on the event history page at the time when they happened (see Current State and Event History).
Trigger Watch Time¶
Let’s say, you have a popular website, that serves over 1000 page views per second during a day. You can set up a trigger to notify you when you have less than 50 page views per second - obviously, something is wrong. You also need to disable this trigger for the night, because at night all of your users sleep, and this metric is irrelevant.
Of course, you can set up a subscription schedule - but your history will become riddled with false night “events”, and you will still receive notifications in the morning. In this case, you need to set up a trigger watch time:

No events will be recorded for this trigger outside of watch time - you will receive no notifications, and the event history page will be empty (see Current State and Event History).
Current State and Event History¶
By clicking on a saved trigger, you can see current state and event history of this trigger.
Current State¶
Moira shows current state, current value and time of last event for every separate metric that matches the trigger.

Event History¶
On this tab you can see a chronologically sorted list of events for each separate metric. Each event includes time, old and new values. Please, note that the left (old) value is taken from the previous event, and does not represent metric value just before the event.

Throttling¶
Throttling is a distinctive and controversial feature of Moira. If you are experiencing a delay or any other strange behavior of notifications, chances are, it is because of throttling.
To understand throttling, imagine two triggers:
- Send notification if CPU load on any of your servers is more than 75%.
- Send notification if there is a fire in your server room.
It is a busy day, your servers are overloaded, and you are receiving a ton of notifications about CPU load. Probably, you already have several dozens of notifications in your inbox. You will likely delete all of them at once, and you probably won’t notice that one of these hundreds of letters was about a fire in your server room.
So, the problem is: one misconfigured trigger spoils everything by spamming your inbox with irrelevant notifications. Moira provides a protection mechanism called throttling. Simple rules:
- If a trigger sends more than 10 notifications per 1 hour, limit this trigger to 1 message per 30 minutes.
- If a trigger sends more than 20 notifications per 3 hours, limit this trigger to 1 message per 1 hour.
It works like this:
- First notification is delivered immediately.
- Second notification is delivered immediately.
- …
- Tenth notification is delivered immediately, and you get a warning: “Please, fix your system or tune this trigger to generate less events.”
- Next notifications are delayed so that you receive one message per 30 minutes/1 hour. Nothing is lost, you just receive one message with a pack of events. Every message contains a warning: “Please, fix your system or tune this trigger to generate less events.”
Moira will enable and disable throttling automatically based on frequency of events.
Disabling Throttling¶
There are four ways to disable throttling for a specific trigger:
- Obey the warning message. That is, fix your system to generate less events.
Or change trigger thresholds. Or use Graphite functions like
movingAverage
to remove spikes from your metric graph. This is the best method to deal with throttling. - Enable Maintenance mode for some of your metrics. This will temporarily disable checking of a metric and give you time to fix the system:

- Manually reset throttling for your trigger. This basically means that you’ve fixed the system and would like to resume operation normally. It won’t help if your trigger is still spamming notifications:

- Entirely disable throttling for a subscription. This is not recommended, unless you really know what you are doing:

Dealing with NODATA¶
If you have a simple trigger (like the one described in Simple Threshold Trigger), you probably know what happens when a metric has a very high or a very low value. Free disk space is too low? You get a notification.
But what if your metric has no value? Literally, what if Moira is not receiving any data for your metric? How can you know, whether you have enough disk space left or not? In this case, a trigger setting defines the behavior:

When Moira hasn’t been receiving data for more than default 600 seconds, it will set a special NODATA state for this metric. You can set any other state or change time delay here. For example, if you have an error metric, and no data means no errors, you should set this to OK.
Note
Checkbox Mute new metrics notifications
defines whether Moira should notify you about new metrics or mute those notifications.
If box is unchecked, Moira will send you NODATA
→ OK
event for every new metric in the trigger.
Muting notifications about new metrics could be useful if you have trigger with lots of metric in it.
You can also select DEL here to automatically delete all metrics that no longer provide data. A simple use case is when you often rename metrics and Moira quickly becomes flooded with old irrelevant metric names.
Warning
DEL is a dangerous setting, you can easily miss a real notification if your system stops sending metric data.
You will receive notifications when your metric goes in and out of NODATA state, just like any other state.
Advanced Mode Trigger¶
Sometimes a simple trigger (Simple Threshold Trigger) doesn’t provide enough flexibility for your task.
For example, you may want to receive a notification when 5% of user requests take up more than a second to process, but only if there are more than 100 requests per minute. Usually, you will have two separate metrics for this:
Nginx.requests.process_time.p95
- 95th percentile of request processing time in millisecondsNginx.requests.count
- request count per minute
Maybe you can construct a monstrous Graphite expression to reflect this combination, but Moira’s Advanced Mode is better:

You can use any govaluate expression with predefined constants here:
t1
,t2
, … are values from your targetsOK
,WARN
,ERROR
,NODATA
are states that must be the result of evaluationPREV_STATE
is equal to previously set state, and allows you to prevent frequent state changes
Note
Only T1 target can resolve into multiple metrics in Advanced Mode. T2, T3, … must resolve to single metrics. Moira will calculate expression separately for every metric in T1.
Any incorrect expressions or bad syntax will result in EXCEPTION trigger state.
Templates¶
The template is supported by Moira, the template implements data-driven templates for generating textual output. Information about how to program the templates themselves, see the documentation.
Data you can use:
Example:
https://grafana.yourhost.com/some-dashboard{{ range $i, $v := .Events }}{{ if ne $i 0 }}&{{ else }}?{{ end }}var-host={{ $v.Metric }}{{ end }}
Data source¶
If Remote Triggers Checker is enabled, you can choose between following Data Sources:
Redis — Moira database. By default Redis stores data for only several hours. It covers most of user cases when you need real-time alerting.
Graphite — remote Graphite-like HTTP API. It should be used only when you need to get metrics for a large period.
Warning
Please, use this Data Source with caution. It may cause extra load on Graphite HTTP API.
Important
Please, keep in mind that functions in Remote and Local triggers can work differently. To avoid this, make sure you use Carbonapi with the same revision as in Moira. Latest Carbonapi listed in ../changelog.
Maintenance¶
Maintenance is a proper way to mute alerting on specific metrics or triggers. It can be useful during planned work. E.g., you are going to move server from one data center to another and don’t want Moira to disturb you.

Examples¶
When you switch a metric or trigger into maintenance, Moira will mute all state changes during that period. You will receive notification about every metric, if the state before and after maintenance turn out to be different.
Example 1. Maintenance metric, alert will not be sent¶
- metric
awesomeMetric1
is inOK
state; - Rick switches metric into maintenance for an hour;
- within the hour metric changes its state several times:
OK
→WARN
,WARN
→ERROR
,ERROR
→OK
;
- after one-hour maintenance ends, metric is in
OK
state; - Moira checks if metric state changed during maintenance:
awesomeMetric1
state before maintenance:OK
;awesomeMetric1
state after maintenanceOK
;
- nothing to notify about: the state remained the same as it was before the maintenance period.
Example 2. Maintenance metric, alert will be sent¶
- metric
awesomeMetric2
is inOK
state; - Rick switches metric into maintenance for an hour;
- within the hour metric changes its state several times:
OK
→WARN
,WARN
→ERROR
,ERROR
→OK
,OK
→ERROR
;
- after one-hour maintenance ends, metric is in
ERROR
state; - Moira checks if metric state changed during maintenance:
awesomeMetric2
state before maintenance:OK
;awesomeMetric2
state after maintenanceERROR
;
- Moira sends message to user: the state has changed from that which was before the maintenance period.
Example 3. Maintenance trigger, alert will be sent¶
- metric
awesomeMetric1
is inWARN
state; - metric
awesomeMetric2
is inOK
state; - Rick switches trigger with this metrics into maintenance for an hour;
- within the hour metric
awesomeMetric2
changes its state several times:OK
→WARN
,WARN
→ERROR
,ERROR
→OK
,OK
→ERROR
;
- after one-hour maintenance ends, metric is in
ERROR
state; - Moira checks if metric state changed during maintenance:
awesomeMetric1
state before maintenance:WARN
;awesomeMetric1
state after maintenanceWARN
;awesomeMetric2
state before maintenance:OK
;awesomeMetric2
state after maintenanceERROR
;
- Moira sends message about
awesomeMetric2
metric to user: the state has changed from that which was before the maintenance period.
Self State Monitor¶
Self State Monitor is a built-in mechanism designed to protect
end user from false NODATA
notifications and notify administrator
about issues in Moira and/or Graphite systems.
Why Self State Monitor¶
A situation is possible when Graphite Relay, Redis DB or Moira-Filter
service breaks down. This leads to the fact that Moira doesn’t receive
any metrics from Graphite. In this case, Moira has no metrics on which
it could check state of the triggers. According to the Moira logic,
it should switch triggers to NODATA
state and send alert messages to users.
To handle this situation properly, we recommend turning on the Self State Monitor. In this case, Moira will prevent itself from sending alert messages to end users but notify administrators of the existing problem.
Warning
When Self State Monitor detects a problem, it disables any notifications to end users and does not turn it back on without manual intervention.
Please, read this manual before using Self State Monitor in production.
See also
For a better understanding, look at the architecture of the Moira microservices.
When Self State Monitor Helps¶
Self state monitor checks these situations:
- If there is no connection between Moira and Redis for longer
than
redis_disconect_delay
. - If Moira-Filter receive no metrics for longer than
last_metric_received_delay
. - If Moira-Checker checks no triggers for longer than
last_check_delay
.
See also
All the above configuration parametres can be found in the Moira-Notifier section on configuration page.
How Self State Monitor Works¶
When you turn Self State Monitor on, it works this way:
Self State Monitor checks Moira state every 10 seconds.
Something breaks down. It can be Graphite-Relay, connection to Redis DB or crashed Moira-Filter docker container.
Self State send alarm message to administrator with issue discription.
Here is an example of message:
Self State Monitor turns Moira-Notifier service off, switching it in
ERROR
state.Note
When Moira-Notifier switches to
ERROR
state, it mutes all messages to end users and only alerts administrators about Moira health issues. You need to fix existing problems and then manually switch Moira-Notifier back toOK
using API.When Moira-Notifier not in
OK
state, Moira will show you an error in Web UI:
Turn Moira Notifier On and Off¶
You can reveal current Moira-Notifier state or change it
on a hidden /notifications
page.

Warning
Please, note this toggle changes Moira-Notifier state, not user notifications preferences.
When you disable notifications with this toggle, Moira-Notifier stops sending messages to all users.
Development¶
All services use Redis database to store and exchange data. Therefore, it is important to maintain an accurate description of data storage formats and conventions.
Following topics describe database structure, running tests, developing notification plugins, etc.
Architecture¶
Terminology¶
Pattern¶
A Graphite pattern is a single dot-separated metric name, possibly containing one or more wildcards.
Examples:
server.web*.load
server.web{1,2,3}.load
server.web1.load
Target¶
A Graphite target is one or more patterns, possibly combined using Graphite functions.
Examples:
averageSeries(server.web*.load)
Metric¶
A metric is a single time-series that is a result of parsing some Graphite target.
Some targets produce a single metric, for example:
server.web1.load
highestCurrent(server.web*.load)
Some targets produce several metrics, for example:
movingAverage(server.web*.load, 10)
State¶
Moira stores separate state for every metric. Each metric can be in only one state at any moment:
Trigger¶
Trigger is a configuration that tells Moira which metrics to watch for. Triggers consist of:
- Name. This is just for convenience, user can enter anything here.
- Description. Longer text that gets included in notification to delivery channels that support long texts.
- One or more targets.
- WARN and ERROR value limits, or a Python expression to calculate state.
- One or more tags.
- TTL value. When new data doesn’t arrive for TTL seconds, the metric will switch to the State set by the user.
- Check schedule. For example, a trigger can be set to check only during business hours.
Last Check¶
When Moira checks a trigger, it stores the following information on each metric:
- Current value.
- Current timestamp.
- Current state.
Trigger Event¶
When Moira checks a trigger, if any of the metric states change, Moira generates an event. Events consist of:
- Trigger ID.
- Metric name (as given by parsed target).
- New state.
- Previous state.
- Current timestamp.
Tags¶
Tags are simple string markers for grouping of triggers and configuring subscriptions.
Subscription¶
Moira generates notifications for an event only if trigger tags match any of the user-created subscriptions. Each subscription consists of:
- One or more tags.
- Contact information.
- Quiet time schedule.
Dataflow¶
Filter and Check Incoming Metrics¶
When user adds a new trigger, Moira parses patterns from targets and saves them
to moira-pattern-list
key in Redis. Filter rereads this list every second.
When a metric value arrives, Filter checks metric name against the list
of patterns. Matching metrics are saved to moira-metric:<metricname>
keys
in Redis. Redis pub/sub mechanism is used to inform Checker of incoming metric
value that should be checked as soon as possible.
Checker metrics handler reads triggers by pattern from
moira-pattern-triggers:<pattern>
and add trigger_id
to Redis set
moira-triggers-to-check
. NODATA Checker adds all triggers to Redis set
moira-triggers-to-check
once per nodata_check_interval
setting.
Remote Triggers Checker gets all remote trigger ID and adds it to
Redis set moira-remote-triggers-to-check
once per remote\check_interval
setting.
Checker pops trigger_id
from moira-triggers-to-check
and starts
checking procedure. Remote Triggers Checker does the same, but pops
trigger_id
from moira-remote-triggers-to-check
and starts remote check,
which involve remote Graphite HTTP API.
Trigger target can contain one or multiple metrics, so results are written
per metric. moira-metric-last-check:<trigger_id>
Redis key contains last
check JSON with metric states.
When a metric changes its state, a new event is written to
moira-trigger-events
Redis key. This happens only if value timestamp
falls inside time period allowed by trigger schedule.
If a metric has been in NODATA or ERROR state for a long period, every 24 hours Moira will issue an additional reminder event.
Trigger switches to EXCEPTION state, if any exception occurs during trigger checking.
Process Trigger Events¶
Notifier constantly pulls new events from moira-trigger-events
Redis key and schedules notifications according to subscription schedule
and throttling rules. If a trigger contains all of the tags in
a subscription, and only in this case, a notification is created for
this subscription.
Subscription schedule delays notifications of occurred event to the beginning of next allowed time interval. Note that this differs from trigger schedule, which suppresses event generation entirely.
Throttling rules will delay notifications:
- If there are more than 10 events per hour, a notification will be sent at most once per 30 minutes.
- If there are more than 20 events per 3 hours, a notification will be sent at most once per hour.
Scheduled notifications are written to moira-notifier-notifications
Redis key.
Process Notifications¶
Notifier constantly pulls scheduled notifications from
moira-notifier-notifications
Redis key. It calls sender for certain contact
type and writes notification back to Redis in case of sender error.
UI¶
UI is a static web application built with RetailUI based on React.
Install dependencies.
yarn build
Starts dev server on port 9000. You’ll have to run yarn fakeapi
in separate terminal to provide mock API data. Mock API server starts on port 9002.
yarn start --env.API=local
Starts dev server with proxy to your API service. Make sure you setup local Moira API service and add it URL to webpack.config.js in devServer.proxy block.
yarn storybook
Starts Storybook on port 9001.
yarn lint
ESLint check. Recommended to run before commit.
yarn flow
Starts Flow server for checking types. You can also run yarn flow.status
for status, yarn flow.check
for errors report, yarn flow.coverage.html
to export html report with cute UI.
Backend¶
Project Setup¶
Backend microservices are written in Go (with module support, minimal Go lang required is 1.11), this is how you can get started writing your own:
Create a fork of the project from GitHub
- Create a base directory to check out the project into e.g.
/Development/go/moira/
- Set your GOPATH to that directory and change into it
export GOPATH=/Development/go/moira/ cd $GOPATH
- Get GoConvey for tests
go get github.com/smartystreets/goconvey
- Export your
GOPATH
’sbin
directory export PATH=$PATH:$GOPATH/bin
- Export your
- Checkout your Moira clone into the root of the
GOPATH
git clone https://github.com/<your username>/moira $GOPATH/src/github.com/moira-alert/moira
- Checkout your Moira clone into the root of the
Go into
$GOPATH/src/github.com/moira-alert/moira
- Launch GoConvey
goconvey
You are now ready to start hacking. Goconvey will keep running in the background, scanning for new test classes to execute. If you don’t want to run the entire suite, start Goconvey from the root of directory you will be developing your code from.
Alternatively you can run a quick test on the terminal with
go test -v
Writing Your Own Notification Sender¶
First, look at built-in senders:
- senders/slack
- senders/pushover
- senders/mail
All of them implement interface Sender
from interfaces.go
.
Please, note that scheduling and throttling require senders to support
packing several events into one message.
You should include your new sender in RegisterSenders
method of notifier/registrator.go
with appropriate type.
Senders have access to their settings in common config,
which is passed to the Init
method.
Contact Moira Developers¶
The best way to contact us is to visit our Telegram chat. We usually reply within a day, but sometimes immediately :)
Google Summer of Code¶
Here is the ideas page for Google Summer of Code 2020.
We encourage interested students to contact mentors to discuss these ideas or propose new ones. The Moira team would appreciate your contributions.
About Moira¶
Moira is a realtime alerting system based on Graphite data.
It’s key features are:
- storage independence
- simple and advanced trigger syntax
- tags for triggers and subscriptions
- extendable notification channels
- alarm fatigue protection
See overview for more.

Moira is written in Go, the web UI is written in JavaScript.
The source code is licensed under MIT.
Mentors¶
- Alexey Kirpichnikov (beevee)
- Arkady Borovsky (borovskyav)
- Emil Sharifullin (litleleprikon)
- Andrey Kolkov (androndo)
- Sorokin Vladimir (sorovlad)
Ideas¶
Health checks for delivery channels and contacts¶
Explanation. Moira’s users are able to set up new delivery channels and contacts to be used with those channels. However Moira doesn’t check if the channel configuration is valid and alerts can be actually sent. A user may provide a non-existent Slack user name, block Moira’s bot in Telegram, etc. As a result, such user wouldn’t be able to receive alerts. The bad thing is that sometimes invalid configuration would cause Moira’s bots to be banned for a certain period of time. This effectively means a denial-of-service for alerts which is highly undesirable.
The aim of this project is to implement health checks when delivery channels and contacts are set up. To do so, one should enhance the delivery channel and contact setup flow: send a test alert, verify that it’s received, don’t let to save an invalid configuration otherwise. Certain modifications of the web UI may be required.

Code reference. See contact API source code and subscription API source code.
Required skills. Go skills to add health checks, a bit of JavaScript and React to tune the web UI.
Expected outcome. Health checks are implemented and released.
Mentors. Arkady Borovsky (borovskyav@kontur.ru), Emil Sharifullin (e.sharifullin@kontur.ru).
RESTify Moira’s API¶
Explanation.
Moira designed to be API-first solution and all the setup of alerting must be done via HTTP API. Unfortunately Moira’a API right now is not follow all the principles of REST. This means that HTTP methods somewhere are not used correctly and URL paths somewhere are not describe the resources in a right way. Additionally some of the endpoints provide the data which schema is overcomplicated and contains wrong attributes. The great solution for this type of issues will be to use JSON API standard.
The aim of this project is to define methods of API that do not follow to the and change it using the REST and JSON API principles.
Code reference. See Moira’s API source code and the OpenAPI description source code.
Required skills. General Go skills. Familiarity with OpenAPI or Swagger. Knowledge of REST and JSON API principles.
Expected outcome. Moira’s API is RESTful. OpenAPI description is up to date. Moira’s frontend is changed to follow the API changes(optional).
Mentors. Emil Sharifullin (e.sharifullin@kontur.ru), Andrey Kolkov (androndo@kontur.ru).
Moira’s Business Metrics¶
Explanation. Moira is a huge and complicated software and it operates with a huge amount of data. Sometimes for statistics and troubleshooting we need to define some metrics that will more precise tell us which amount of load Moira is carrying on. The example of this metrics is: amount of triggers with tagged metrics, amount of triggers with huge amount of metrics, amount of triggers with and without subscriptions, etc.
To achieve this goal we can create a new microservice that will collect this data from storage and export it to graphite or implement this metrics to existing services.
Code reference. See Moira’s backend source code.
Required skills. Go skills. Familiarity with graphite would be a plus.
Expected outcome. Created a new service that will export business metrics or this metrics export will be added to existing services.
Mentors. Alexey Kirpichnikov (alexkir@kontur.ru), Andrey Kolkov (androndo@kontur.ru).
Complete Moira’s mobile web version¶
Explanation.
To provide best user experience Moira’s web UI were developed with accuracy and meant to be as much minimalistic and laconic as possible. But still there are exist pages that do not look perfectly in mobile version of web interface. The example of this pages are following pages:
- Main page and navigation on it
- Subscriptions page
- Trigger page and trigger edit pages
- Teams page
The aim of this project is to add this pages to mobile version of Moira’s web UI and build the UI with best user experience in mobile environment.
Code reference. See Moira’s web UI source code.
Required skills. JavaScript, TypeScript and React skills. Knowledge of UX will be a plus.
Expected outcome. Moira’s mobile web UI allows user to use the pages that are listed above.
Mentors. Sorokin Vladimir (v_sorokin@kontur.ru), Arkady Borovsky (borovskyav@kontur.ru).
Noisy trigger analysis tools¶
Explanation. On-call engineers are badly affected by noisy triggers that generate alerts multiple times a day. Attention to alerts reduces greatly, and chances to miss one important alert grow. One badly configured flapping trigger can affect the entire workflow. Our documentation contains an entire page dedicated to this problem with some tips on mitigation. But we can do more.
The aim of this project is to help Moira users identify noisy triggers. To do so, one should research and define a metric of trigger noisiness, and then create a UI page that demonstrates worst triggers to the user.
Code reference. See Moira’s backend source code and Moira’s web UI source code.
Required skills. Basic Go and JavaScript skills.
Expected outcome. Moira’s web UI allows user to see noisy trigger list, optionally filtered by tags.
Mentors. Alexey Kirpichnikov (alexkir@kontur.ru), Emil Sharifullin (e.sharifullin@kontur.ru).
Done in previous years¶
Warning
Following projects are no longer available.
OpenAPI description of Moira’s API¶
Done in 2020 by Michael Okoko.
Explanation. Moira’s web UI is nice and widely used. However, users don’t always want to create triggers, subscriptions, and contacts manually. They would like to be able to automate routine tasks with the tools like Ansible which they already use to bootstrap database and application clusters. For this kind of automation, Moira should have a well-documented API and a number of client libraries for all popular languages. At this point, Moira doesn’t have any API documentation. To use the API, one should study Moira’s source code or an existing client library source code to understand how the API works and reverse-engineer contracts of its methods.
The aim of this project is to provide an always up-to-date documentation of Moira’s API and a few client libraries. To do so, one should create an OpenAPI description of API, generate a number of client libraries for popular programming languages with Swagger tools, and setup a process so the documentation and the clients are updated when a new API version is released.
Code reference. See Moira’s API source code and the Python client library source code.
Required skills. General Go or Python skills. Familiarity with OpenAPI or Swagger would be a plus.
Expected outcome. Moira’s documentation has a link to a human-readable API documentation. Client libraries are released (not required). There’s a process in place to update the documentation and the clients on API changes.
Mentors. Emil Sharifullin (e.sharifullin@kontur.ru), Alexey Kirpichnikov (alexkir@kontur.ru).
Flow to TypeScript migration¶
Done in 2020 by Gilevich Petr.
Explanation. Nowadays, Moira’s web UI is written in JavaScript and Flow is used as a type checker. Although we love Flow dearly, TypeScript is adopted widely and has a bigger community. This makes TypeScript a better choice for Moira’s web UI development.
The aim of this project is to migrate Moira’s web UI source code from Flow to TypeScript. To do so, one should analyze the code base, propose a migration strategy, actually rewrite the code, and change the build process if needed.
Code reference. See Moira’s web UI source code.
Required skills. JavaScript and TypeScript skills. Familiarity with Flow would be a plus.
Expected outcome. Moira’s web UI source code is migrated to TypeScript. A new major version of Moira’s web UI is released.
Mentors. Alexey Kirpichnikov (alexkir@kontur.ru), Nikolay Kudrin (n.kudrin@kontur.ru).
Noisy trigger analysis tools¶
Support for additional delivery channels¶
Done in 2019 by Aswin.
Explanation. Moira supports a number of delivery channels such as email, Slack, Telegram, etc. to inform users that a certain trigger was activated (see Setting Up Your Subscriptions).
The aim of this project is to provide support for a number of additional delivery channels. To do so, one should talk to community and research possible channels to be added, contribute corresponding senders, and tune the web UI to allow users to create subscriptions using new channels.

Code reference. See email sender source code or Pushover sender source code.
Required skills. Go skills to add senders, a bit of JavaScript and React to tune the web UI.
Expected outcome. Some qualitative or quantitative data on channel popularity is collected. Several delivery channels are added to Moira and released.
Mentors. Alexey Kirpichnikov (alexkir@kontur.ru), Alexander Sushko (sushko@kontur.ru).
Overview¶
Moira is a real-time alerting tool, based on Graphite data.
Key Features¶
Graphite storage independence
Some Graphite queries are very ineffective. Tools like Seyren multiply this effect every minute making lots of ineffective queries and overloading your cluster. Moira relies on the incoming metric stream, and has its own fast cache for recent data.
Support for (almost) all Graphite functions
Graphite function library (carbonapi) is embedded directly into Moira source code. You can use any function and get predictable results, like in your Graphite or Grafana dashboards.
Support for custom expressions
If simple warning/error threshold is not enough, you can write flexible govaluate expressions to calculate trigger state based on metric data.
Tags for triggers and subscriptions
When several teams/services share one monitoring tool, it is essential to provide some way of filtering triggers and subscriptions in the UI. Moira has a flexible tag system.
Extendable notification channels
Moira supports email, Slack, Pushover and many other channels of notification out-of-the-box. But you can always write your own plugin in Go and rebuild Moira Notifier microservice.
Alarm fatigue protection
Sometimes one of your triggers goes mad and switches back and forth between states, sending you hundreds of notifications. Sometimes you just ignore and delete all messages, accidentally also deleting one that is actually important. Moira tries to protect you with a feature called throttling. It’s simple: if one of your triggers starts to send over 10 messages per hour, Moira limits this trigger to one message per 30 minutes. Alerts from this trigger are combined, and not lost - just packaged into a single message.
Limitations¶
By default, Moira stores metric history for one hour. This ensures performance under heavy load. You can tweak this in config file, but note that performance will degrade.
In order to reduce database load, Moira checks every single trigger at most once every 5 seconds. Probably, your metrics arrive once every minute, so you really won’t notice this limitation. You can also tweak this in config file.
Microservices¶
In spirit of Graphite architecture, Moira consists of several loosely coupled microservices. You are welcome to replace or to add new ones.
Filter¶
Filter is a lightweight service responsible for receiving lots of metric data in Graphite format. It filters received data and saves only metrics that match any of user triggers. This reduces load on all other parts of Moira.
Checker¶
Checker is an application with embedded Graphite functions. Checker watches for incoming metric values and performs checks according to saved trigger settings. When state of any trigger changes, Checker generates an event.
Notifier¶
Notifier is an application that watches for generated events. Notifier is responsible for scheduling and sending notifications, observing quiet hours, retrying failed notifications, etc.
API¶
API is an application that serves as a backend for UI.