Replaying Failed Executions
Errors are almost inevitable in software integrations.
- A third-party API might be unavailable when your instance tries to access it. That might be remedied through instance retry, but retry doesn't handle the scenario that the API is down for hours or days at a time.
- A third-party might start sending you data in a new format that you didn't expect.
- Your instance may encounter some other edge-case that it doesn't handle gracefully.
Whatever the reason for the error, it's handy to be able to re-run your instance with the exact same incoming data when the third-party API is back up and running, or after you've fixed your integration to better handle the data coming in.
Replay allows you to run a previous execution's data through your instance again, and it's easy to do through Prismatic's GraphQL API.
#
Querying for failed executionsFirst, let's query an instance for failed executions. You can find an instance's ID in your browser's URL bar when you view the instance, or you can query for it via the API.
We'll search only for executions that have error_Isnull: false
(in other words, it had an error), and we'll sort our results by the oldest to most recent and we'll query for a specific time range:
query getFailedExecutions($instanceId: ID!, $rangeStart: DateTime, $rangeEnd: DateTime) { instance(id: $instanceId) { executionResults( error_Isnull: false sortBy: {field: STARTED_AT, direction: ASC} startedAt_Gte: $rangeStart startedAt_Lte: $rangeEnd ) { nodes { id error startedAt } } }}
{ "instanceId": "SW5zdGFuY2U6ZGVkZDQ3ZjQtNmQ4OC00NjJmLWE5YmYtNWM1OGNiMTg0MDAy", "rangeStart": "2022-12-30T22:08:19.506159+00:00", "rangeEnd": "2023-12-30T22:10:00.000000+00:00"}
The GraphQL API will respond with any failed executions for that instance within that time range, like this:
{ "data": { "instance": { "executionResults": { "nodes": [ { "id": "SW5zdGFuY2VFeGVjdXRpb25SZXN1bHQ6NjBkZDliOWMtOGIyOS00NDQyLWFkNDctMjZkZTg5Y2NlNWM5", "error": "Error: Cannot connect to Acme. Connection timeout.", "startedAt": "2022-12-30T22:08:19.506159+00:00" }, { "id": "SW5zdGFuY2VFeGVjdXRpb25SZXN1bHQ6YjFjZDczMjUtM2NiMi00MWRkLWIxMzAtNmEzNGNmNWQwZTcw", "error": "Error: Cannot connect to Acme. Connection timeout.", "startedAt": "2022-12-30T22:08:44.171613+00:00" }, { "id": "SW5zdGFuY2VFeGVjdXRpb25SZXN1bHQ6NzcwN2YzYjUtZjE1Mi00ZGIzLWExYWItNzM0NTUwODlmOGQx", "error": "Error: Cannot connect to Acme. Connection timeout.", "startedAt": "2022-12-30T22:08:49.344558+00:00" } ] } } }}
If we wanted to find all failed executions for all customers for a particular integration, we could have used the executionResults query with our integration ID as a parameter.
#
Replaying failed executions programmaticallyNow, with the IDs of the executions that failed in hand, we can issue a replayExecution mutation for each one:
mutation myReplayExecution($executionId: ID!) { replayExecution(input: {id: $executionId}) { instanceExecutionResult { id } errors { field messages } }}
{ "executionId": "SW5zdGFuY2VFeGVjdXRpb25SZXN1bHQ6NjBkZDliOWMtOGIyOS00NDQyLWFkNDctMjZkZTg5Y2NlNWM5"}
That mutation will return the ID of the new execution that occurs. You can the query the API with that new execution to verify that it runs successfully, or do further debugging if it does not.
For more information on the Prismatic API, see the API docs.