Looping Over Files

What We Will Accomplish#

In this tutorial we will build an integration that downloads and processes files stored in Google Cloud Storage.

For this integration, assume that some third-party service writes a timestamped file to a Google Cloud Platform (GCP) storage bucket whenever a user authenticates against one of their services. Our integration will examine the files written to the bucket, and will announce via Slack who logged in, and when.

Our integration will take advantage of the loop component to process files one by one.

We'll configure our integration to run every five minutes, and our integration will do the following:

  • Look for files in the unprocessed/ directory of our GCP Storage bucket
  • Loop over each file that we found:
    • Download the file
    • Deserialize the JSON contained in the file
    • Use a code component to format a Slack message based on the deserialized contents
    • Post the generated message to Slack
    • Move the file from the unprocessed/ directory to a processed/ directory
note

If you do not have a Google Cloud Platform (GCP) account, you can use Dropbox, Box, AWS S3, Azure Blob Storage, etc., instead; they all have similar list files, download file and move file actions.

If you would like to view the YAML definition of this example integration, it's available on GitHub.

Set Up Some Required Config Variables#

Our integration is going to interact with Google Cloud storage and Slack. For the sake of a more configurable integration, let's create three required config variables for our integration:

  • bucketName will represent the name of the GCP bucket where files are stored.
  • gcpProjectId will represent the ID of the GCP project that owns the GCP bucket.
  • slackWebhookUrl will represent a Slack webhook URL - see generating a Slack webhook URL

Create Our Loop#

The first two steps we'll add to our integration will (1) list files in the GCP storage bucket, and (2) create an empty loop iterating over those files.

First, we'll add a add a step to list files in our GCP storage bucket.

I've already created a Google Cloud Storage bucket and service account to test with, following GCP's documentation.

We'll configure our action to point to our bucket and account, and under prefix we'll enter unprocessed/ so we only loop over files in the unprocessed/ directory:

Next, we'll add a loop step. Under items we will reference the list of files our previous step output:

Add Tasks to the Loop#

Our loop is now configured to run once for each file that was found in the unprocessed/ directory in our GCP bucket. Our loop will contain five steps:

Download the File We're Currently Looping Over#

First, we'll download the file we're currently looping over. The item that we're currently processing from our items is accessible using the currentItem key. For example, if there's a file named unprocessed/2020-10-22T15-30-55.521Z in our bucket, loopOverEachFile.currentItem would be equal to "unprocessed/2020-10-22T15-30-55.521Z":

Parse the JSON from the Current File#

Next, we'll add a Deserialize JSON step to process the file we pulled down. The file we downloaded is a text file containing some JSON:

{
"username": "taylor",
"site": "https://api.progix.io"
}

This step will make those JSON keys accessible for subsequent steps.

Generate Slack Message and Outfile Name#

Next, we need a helper function to generate two things:

  1. The Slack message we're going to send
  2. The path where we're going to move the log file after processing (in the processed/ directory)

Let's add a code component and enter this code to execute:

module.exports = async (
{ logger },
{
loopOverEachFile: { currentItem: fileName },
deserializeUserData: {
results: { username, site },
},
}
) => {
const loginTime = fileName.replace("unprocessed/", "");
return {
data: {
slackMessage: `${username} logged into ${site} at ${loginTime}`,
outFileName: fileName.replace("unprocessed/", "processed/"),
},
};
return { data: results };
};

Note that this returns an object with two values: slackMessage containing a message to send, and outFileName that contains something like processed/2020-10-22T15-30-55.521Z.

Send a Slack Message#

Next, we'll send the slack message that we generated in the previous step. We'll do that by adding a Slack - Send Message step to our integration. If you do not have a Slack channel, you can try sending an email instead with SendGrid or an SMS with Twilio.

Move the File to a Processed Directory#

Finally, we'll move the file that we downloaded out of the way by moving the file from unprocessed/ to processed/. To do that we'll add a GCP - Move File step to our integration.

For the source file name, we'll reference our loop's currentItem key again. For the destination file name, we'll reference our code component's results.outFileName output:

Conclusion#

That's it! At this point we have an integration that loops over files in a directory, processes them, and sends alerts based on their contents. This integration can be published, and instances of this integration can be configured and deployed to customers.