Replay Failed Executions
Replaying failed executions programmatically
Errors are almost inevitable in software integrations.
- A third-party API might be unavailable when your instance tries to access it. That might be remedied through instance retry, but retry doesn't handle the scenario that the API is down for hours or days at a time.
- A third-party might start sending you data in a new format that you didn't expect.
- Your instance may encounter some other edge-case that it doesn't handle gracefully.
Whatever the reason for the error, it's handy to be able to re-run your instance with the exact same incoming data when the third-party API is back up and running, or after you've fixed your integration to better handle the data coming in.
Replay allows you to run a previous execution's data through your instance again, and it's easy to do through Prismatic's GraphQL API.
Querying for failed executions
First, let's query an instance for failed executions. You can find an instance's ID in your browser's URL bar when you view the instance, or you can query for it via the API.
We'll search only for executions that have error_Isnull: false
(in other words, it had an error).
We also only want original executions (and not replays), so we'll include replayForExecution_Isnull: true
.
Finally, let's fetch replays
for those original executions that have since succeeded with replays(error_Isnull: true)
:
query getFailedExecutions($instanceId: ID!, $startCursor: String) {
executionResults(
instance: $instanceId
error_Isnull: false
replayForExecution_Isnull: true
after: $startCursor
) {
nodes {
id
startedAt
replays(error_Isnull: true) {
nodes {
id
startedAt
}
}
error
}
pageInfo {
hasNextPage
endCursor
}
}
}
{
"instanceId": "SW5zdGFuY2U6ZGVkZDQ3ZjQtNmQ4OC00NjJmLWE5YmYtNWM1OGNiMTg0MDAy",
"startCursor": ""
}
If we have multiple pages of executions (more than 100), we can use the endCursor
we get back as the startCursor
to get the next page of results.
The GraphQL API will respond with any failed executions for that instance along with any replays that have succeeded since the original execution failed:
{
"data": {
"instance": {
"executionResults": {
"nodes": [
{
"id": "SW5zdGFuY2VFeGVjdXRpb25SZXN1bHQ6NjBkZDliOWMtOGIyOS00NDQyLWFkNDctMjZkZTg5Y2NlNWM5",
"startedAt": "2023-07-26T17:18:15.886806+00:00",
"replays": {
"nodes": []
},
"error": "Unable to connect to API"
},
{
"id": "SW5zdGFuY2VFeGVjdXRpb25SZXN1bHQ6MmYxNTcxZTktNDVmOS00Mzc2LTg2OGUtMTJkNjZkNDhiNzRl",
"startedAt": "2023-07-26T17:18:13.800335+00:00",
"replays": {
"nodes": [
{
"id": "SW5zdGFuY2VFeGVjdXRpb25SZXN1bHQ6N2U1MDBkZTgtY2ZmYS00NWY5LWI0OGYtNGU1YjU2YWMzMzFh",
"startedAt": "2023-07-26T17:28:10.003443+00:00"
}
]
},
"error": "Unable to connect to API"
}
]
}
}
}
}
Replaying failed executions programmatically
Now, with the IDs of the executions that failed in hand, we can issue a replayExecution mutation for each one that does not have a replay that succeeded:
mutation myReplayExecution($executionId: ID!) {
replayExecution(input: {id: $executionId}) {
instanceExecutionResult {
id
}
errors {
field
messages
}
}
}
{
"executionId": "SW5zdGFuY2VFeGVjdXRpb25SZXN1bHQ6NjBkZDliOWMtOGIyOS00NDQyLWFkNDctMjZkZTg5Y2NlNWM5"
}
That mutation will return the ID of the new execution that occurs. You can the query the API with that new execution to verify that it runs successfully, or do further debugging if it does not.
For an example script that wraps the above GraphQL calls, see our examples GitHub repo.
For more information on the Prismatic API, see the API docs.