Skip to main content

Alerts

When something goes wrong (perhaps a REST endpoint you rely on goes down, or an integration starts taking five minutes when you expect it to take five seconds) your incident response team should be alerted right away. It should never be the case that customers are calling you to inform you that your integrations are down. Your team should be able to proactively, or at least quickly reactively, respond to issues with your integrations.

With properly configured monitoring and alerting you can put your mind at ease - no news is good news!

Prismatic alert monitors are configurable. You can choose from a variety of alert triggers, including triggering on elevated log levels, long execution times, failed executions, etc., and you can notify your integration team via email and SMS, or via Slack, Pager Duty, OpsGenie, or any other notification system you choose by using webhooks.

Terminology

  • An alert group is a set of users to notify (by email or SMS) and webhooks to invoke when when an instance does something noteworthy or unexpected, like failing to run to completion.
  • An alert trigger is a noteworthy or unexpected event that causes an alert monitor to fire. An alert trigger may fire if an instance takes longer than expected or logs error or warning messages unexpectedly. You can also trigger on positive things, like successful instance runs or when you set an instance to enabled. A full list of alert triggers are below.
  • An alert monitor is a combination of an alert group and some alert triggers, and is configured for an instance. You add an alert monitor to an instance, specify when the monitor should be triggered, and which alert groups should be notified in the event of a trigger.
  • An alert event is created when an alert trigger causes an alert monitor to fire. For example, one alert event might notify the DevOps team at 07:30 AM that an instance failed to run. If the instance is scheduled to run every 15 minutes, another event would be created 15 minutes later if the issue hadn't been resolved.

Alert triggers

Many events can trigger an alert monitor:

  • Execution Completed: This will trigger an alert upon a successful run of an instance. You could use this to notify customers when an instance runs to completion.
  • Execution Duration Matched or Exceeded: Does your integration normally take 5 seconds? Do you want to be alerted if it takes longer than 10 seconds? Specify the maximum number of seconds you expect an instance to take, after which you'd like to be notified.
  • Execution Failed: This will trigger an alert upon a failed run of an instance.
  • Execution Failed, Retry Pending: This will trigger an alert if an instance fails to run, but a retry has been queued.
  • Execution Overdue: Do you expect your integration to run every X minutes? This will trigger an alert if X has been reached.
  • Execution Started: This will trigger an alert upon a start of an instance.
  • Instance Disabled: This will trigger if an instance is disabled.
  • Instance Enabled: This will trigger if an instance is enabled. You might want to use this to notify project managers when an instance is ready for a customer.
  • Instance Removed: This will trigger if an instance is deleted.
  • Log Level Matched or Exceeded: Are error, or warn log lines expected in standard execution of your integration? Presumably not. Specify a log level (error, or warn), and if log lines are written that match or exceed that log level, an alert is triggered.
  • Connection Threw an Exception: This will trigger if a connection that is part of your deployed instance threw an exception, generally indicating that credentials in the connection are expired or invalid, or that the API that you are connecting to is down.

Some triggers, like Instance {Disabled, Enabled, Removed} are instance-specific, meaning they're not tied to a particular flow. The others are flow-specific, meaning you can set up alert monitors to trigger when events happen in specific flows.

For More Information: Log Levels

Alert webhooks

In addition to email and SMS notifications, you can configure alert monitors to invoke a webhook URL with a payload of your choice. An alert webhook could be used to send alert info the PagerDuty or OpsGenie APIs, your own DevOps alert endpoint, or any other alerting service with an HTTP-based API.

Creating alert webhooks

To create or modify a webhook endpoint, click into the Settings page and select the Alert Webhooks tab. Click the + Add alert webhook button, enter an appropriate name for your alert webhook, URL, and payload information.

Alert webhooks are meant to be general enough that they can be used by multiple alert monitors, and their payload templates help with that. Within the Payload Template section you can enter certain keywords, which are replaced when an alert monitor fires with information about the alert monitor, instance, trigger, and monitor URL.

  • $SUBJECT - The string literal "Prismatic.io Alert"
  • $NAME - The name of the alert monitor that was triggered
  • $INSTANCE - The name of the instance that had the triggered alert monitor
  • $INSTANCE_ID - The global identifier of the instance (the SW5z.... portion of the URL when you open the instance)
  • $EXECUTION_ID - the global identifier of the running execution
  • $CUSTOMER - The name of the customer the instance is deployed to
  • $CUSTOMER_EXTERNAL_ID - The external ID of the customer the instance is deployed to
  • $FLOW - The name of the flow that was running when the alert monitor triggered
  • $TRIGGER - The name of the alert trigger (like "Execution Failed")
  • $STEP - The name of the step within the integration that triggered the alert monitor
  • $URL - A URL that will navigate to the specific alert monitor that was triggered

After creating the alert webhook, you can modify the name, URL, or alert payload, and optionally add HTTP headers. Headers are frequently used for passing an authorization token to a webhook.

Editing existing alert webhooks

To modify an existing alert webhook, click Settings on the left-hand sidebar and then select the Alert Webhooks tab. Click into an existing alert webhook. In this screen, you can modify the name by clicking the name at the top of the page. You can modify the webhook template, payload template, or URL from the Details tab, and you can also add optional HTTP headers if your webhook requires authorization tokens, etc.

Deleting alert webhooks

To delete an alert webhook open the Settings page from the left-hand sidebar. Click the Alert Webhooks tab and select an alert webhook. Within the alert webhook's page, click Delete Alert Webhook. Confirm deletion by clicking Remove alert webhook.

Sending incidents to PagerDuty with alert webhooks

Many operations teams prefer to use an incident response service like PagerDuty to track production issues. Alert webhooks can be configured to generate PagerDuty incidents by invoking PagerDuty's API.

To send alerts to PagerDuty, point an alert webhook at https://events.pagerduty.com/v2/enqueue and then configure a payload template that contains PagerDuty API's required fields:

{
"routing_key": "YOUR-PAGERDUTY-KEY",
"event_action": "trigger",
"links": [{ "href": "$URL", "text": "Link to Prismatic alert monitor" }],
"payload": {
"summary": "$NAME triggered - $INSTANCE failed to run.",
"severity": "error",
"source": "$SUBJECT"
}
}

Additional fields listed in PagerDuty's docs can be added to the payload template to add additional information to the PagerDuty incident. No special headers are required for this alert webhook since the PagerDuty key is passed in as part of the payload. When an alert monitor using this alert webhook fires, an incident is created in PagerDuty:

Sending notifications to Slack with alert webhooks

Many operations teams use Slack to notify themselves of production issues. Prismatic alert webhooks can be configured to send messages to a Slack channel.

To send alerts as messages to Slack, first generate a new Slack webhook:

  1. Navigate to https://api.slack.com/apps
  2. Click Create New App, adding an app to your workspace.
  3. Under Add features and functionality select Incoming Webhooks
  4. Activate Incoming Webhooks and then Add New Webhook to Workspace
  5. Take note of the Webhook URL. It should be of the form https://hooks.slack.com/services/foo/bar/baz

Use the Slack webhook URL that you generated in a Prismatic alert webhook, and configure the payload template to read similar to this:

{
"text": "$NAME triggered - $INSTANCE failed to run. See $URL"
}

No special headers are required for this alert webhook. When an alert monitor that uses the alert webhook next fires, a message will be sent to your Slack channel.

Alert groups

You will likely want to alert the same group of people if integration X fails and if integration Y fails. To do that, you can create an alert group that can be assigned to multiple alert monitors. That way, if you hire a new DevOps engineer, you can quickly add them to the DevOps alert group and they'll automatically be added to each alert monitor the DevOps group is attached to.

Note that you can add both organization team members and customer users to alert groups. If you wish to notify customers when alerts trigger, for the sake of reusability we recommend creating an alert group per customer, and alert group(s) for your team. You can then attach your team's alert group(s) to all alert monitors, and your customer's alert group to the monitors only for their instances.

Creating alert groups

Click Settings on the left-hand sidebar, and select the Alert Groups tab. Click the + Add alert group button on the upper-right and give your alert group a name (e.g. "Progix DevOps Team"). From there, you can enumerate users to be notified and webhooks to be invoked upon an alert being triggered.

Editing existing alert groups

To modify an existing alert group, you will return to the same screen you saw when you created your alert group by clicking Settings on the left-hand sidebar and then select the Alert Groups tab. Click into an existing alert group. Within this screen, you can modify the name of the group by clicking the group's name at the top of the page. You can also modify the list of users and webhooks associated with the group.

Deleting alert groups

To delete an alert group click the Settings link on the left-hand sidebar. Then, click the Alert Groups tab and select an alert group. Scroll to the bottom of the alert group's page and click Delete alert group. Click Remove alert group to confirm deletion.

Alert monitors

An alert monitor is a combination of an alert group (users and webhooks) and an alert trigger that is configured for an instance. When you add an alert monitor to an instance, you specify when the monitor should be triggered, and which alert group(s) should be notified in the event of a trigger firing.

Alert monitors cannot be bound to preprocess flows

Note that if your instances are configured to use a shared endpoint and a preprocess flow, an alert monitor cannot be assigned to the preprocess flow since the preprocess flow runs independently of any deployed instance.

Creating an alert monitor

After selecting an instance from a customer's Instances tab or the Instances link on the left-hand sidebar, click the instance's Monitors tab. Click the + Add alert monitor button on the top-right of the screen. Specify a name for the monitor and select a trigger. if you are in a customer's Instances tab, you'll need to also specify the instance.

After creating the alert monitor you will find yourself in the monitor's Details tab. Within this tab, you can add additional triggers to your alert monitor within the Triggers card. You can also choose the groups or users to notify and webhooks to trigger when an alert trigger fires.

Alerting on connection errors

You can set up an alert monitor to notify you if a connection in an instance becomes invalid (i.e. credentials expired or have been revoked, an API is down, etc). To alert on connection errors, create a new alert monitor and select Connection Threw an Exception as the trigger.

This is especially useful with OAuth 2.0 connections. You can be alerted if refreshing your access key fails for any reason, and you will be directed straight to relevant logs from the alert message that is sent to you or your team members

Editing existing alert monitors

To modify an existing alert monitor, click Instances on the left-hand sidebar and then select an instance. Under the instance's Monitors tab, select a monitor. This will bring you to the same screen you saw when you created the monitor, where you can modify who is notified under what conditions under the Details tab.

Clearing a triggered alert monitor

If multiple team members are notified by an alert event, it's important for the team to know if the event has been addressed. By marking an alert monitor as "cleared", your team member acknowledges the event and indicates that they are working to resolve the issue.

Click the Monitors link on the left-hand sidebar. Select one or more triggered monitors. Click the

icon to clear your selected events.

Deleting an alert monitor

Click Customers from the left-hand sidebar and select a customer. Under the customer's Instances tab, select an instance and then click Monitors. Click into an alert monitor and open the Details tab. Scroll to the bottom of the page. Click Delete Monitor and confirm deletion by clicking Remove monitor

Alert events

An alert event is created when an alert monitor is triggered. When an event is created, any users in the monitor's associated alert groups receive a notification (email or SMS) with a link to the event. Your team members can indicate that the issue has been acknowledged and is being addressed by clearing the alert event.

Viewing alert events

The easiest way to view an alert event is to click the link that is sent in the alert event email/SMS.

Alternatively, after clicking the Instances link on the left-hand sidebar, you will be presented with a list of all instances. Each instance has an indicator in the lower-right indicating if any alert monitors have been triggered but not yet cleared.

If you click an instance with triggered monitors and then select the Monitors tab, you can view currently triggered monitors.

Clicking a triggered monitor will bring you to the monitor's Details tab. Then you can click the Events tab to see exactly what happened.

From there, clicking a specific alert event will bring up logs from just before and after the event on the bottom of the page.

For More Information: Log Retention

Programmatically creating alert monitors

An alert monitor allows you to be notified when something happens in a particular instance's flow. Most commonly, alert monitors are used to notify you when an execution fails.

Let's look at how to create alert monitors programmatically for all instances.

An example script that creates alert monitors for all flows of all instances is available in the examples repository.

List instances programmatically

First, we need to get a list of all of our customers instances, along with their flows, the customer the instance is deployed to and any monitors that currently exist. A full query to the Primsatic GraphQL API might look like this:

query myGetInstancesQuery($cursor: String) {
instances(
isSystem: false
enabled: true
sortBy: { direction: ASC, field: CREATED_AT }
after: $cursor
) {
nodes {
id
name
flowConfigs {
nodes {
id
flow {
name
}
monitors {
nodes {
id
name
groups {
nodes {
id
}
}
}
}
}
}
customer {
id
name
}
}
pageInfo {
hasNextPage
endCursor
}
}
}

Note three important details about this query:

  • isSystem: false will ensure that we exclude test instances that are used in the integration designer.
  • enabled: true will ensure that we only get instances that are currently enabled.
  • A combination of sortBy: { direction: ASC, field: CREATED_AT }, after: $cursor, and pageInfo at the end of the query will allow us to paginate through the results.

If you'd like to see an example of how to paginate through results, check out the example script which implements the query above.

Fetch info about alert triggers

Next, we need to fetch information about the alert monitor we want to create. Alert monitors can be triggered by a number of events, including when an execution fails, when an execution succeeds, or when an execution takes longer than a certain amount of time.

You can fetch a list of all available alert triggers with the following query:

{
alertTriggers {
nodes {
id
name
}
}
}

The id returned by this query is what you'll use to create the alert monitor. An example of this query is available in the example script.

Fetch info about a user who should be notified

Assuming you want to notify a user when an alert is triggered, you'll need to fetch information about that user. That user's email address must be registered in your Prismatic organization in order to send emails to the user.

You can fetch a user by email address with a query like this:

query myGetUsersByEmail($email: String!) {
users(email: $email) {
nodes {
id
name
email
}
}
}

Note that this query may return zero or one users - you'll need to check the length of the nodes array to determine if a user was found, like the example script does here. The user's id is important here, as it's what you'll use to create the alert monitor.

Create the alert monitor

Finally, we can loop over all instances and their flows and create alert monitors. For each flow of an instance, we'll check if an alert monitor already exists for that flow. If not, we can create one with the createAlertMonitor mutation:

mutation myCreateAlertMonitor(
$name: String!
$instanceId: ID!
$flowConfigId: ID!
$triggerId: ID!
$userId: ID!
) {
createAlertMonitor(
input: {
name: $name
instance: $instanceId
flowConfig: $flowConfigId
triggers: [$triggerId]
users: [$userId]
}
) {
alertMonitor {
id
}
errors {
field
messages
}
}
}

This mutation takes a number of variables:

  • name is the name of the alert monitor. To ensure your script is idempotent (doesn't create multiple monitors that all do the same thing), you can follow a specific naming scheme like [Generated] Alert on Error - FLOW NAME.
  • instanceId is the ID of the instance you want to create the alert monitor for. We got that when we listed instances programmatically.
  • flowConfigId is the ID of the flow you want to create the alert monitor for. We also got that when we listed instances programmatically.
  • triggerId is the ID of the alert trigger you want to use. We got that when we fetched info about alert triggers.
  • userId is the ID of the user you want to notify. We got that when we fetched info about a user who should be notified.

Catching errors that may crop up when creating the alert monitor is important - you can look for errors in the errors field of the response. See the end of the example script here.