Skip to main content

Looping Over Files

What We Will Accomplish#

In this tutorial we will build an integration that downloads and processes files stored in Google Cloud Storage.

For this integration, assume that some third-party service writes a timestamped file to a Google Cloud Platform (GCP) storage bucket whenever a user authenticates against one of their services. Our integration will examine the files written to the bucket, and will announce via Slack who logged in, and when.

Our integration will take advantage of the loop component to process files one by one.

We'll configure our integration to run every five minutes, and our integration will do the following:

  • Look for files in the unprocessed/ directory of our GCP Storage bucket
  • Loop over each file that we found:
    • Download the file
    • Deserialize the JSON contained in the file
    • Use a code component to format a Slack message based on the deserialized contents
    • Post the generated message to Slack
    • Move the file from the unprocessed/ directory to a processed/ directory
note

If you do not have a Google Cloud Platform (GCP) account, you can use Dropbox, Box, Amazon S3, Azure Blob Storage, etc., instead. They all have similar list files, download file and move file actions.

If you would like to view the YAML definition of this example integration, it's available on GitHub.

Set Up Some Config Variables#

Our integration is going to interact with Google Cloud storage and Slack. For the sake of a more configurable integration, let's create four config variables for our integration:

  • Storage Bucket Name will represent the name of the Google Cloud Storage bucket where files are stored.
  • Project ID will represent the ID of the GCP project that owns the GCP bucket.
  • Private Key Pair will be a set of credentials that allow the integration to interact with files in Google Cloud Storage.
  • Slack webhook will represent a Slack webhook URL - see generating a Slack webhook URL

Create Our Loop#

The first two steps we'll add to our integration will (1) list files in the GCP storage bucket, and (2) create an empty loop iterating over those files.

First, we'll add a add a step to list files in our GCP storage bucket.

I've already created a Google Cloud Storage bucket and service account to test with, following GCP's documentation.

We'll configure our action to point to our bucket and account and use the credential config variable we created. Under prefix we'll enter unprocessed/ so we only loop over files in the unprocessed/ directory:

Next, we'll add a loop step. Under items we will reference the list of files our previous step output:

Add Tasks to the Loop#

Our loop is now configured to run once for each file that was found in the unprocessed/ directory in our GCP bucket. Our loop will contain five steps:

Download the File We're Currently Looping Over#

First, we'll download the file we're currently looping over. The item that we're currently processing from our items is accessible using the currentItem key. For example, if there's a file named unprocessed/20210322_163522.json in our bucket, loopOverEachFile.currentItem would be equal to "unprocessed/20210322_163522.json":

Generate Slack Message and Outfile Name#

Next, we need a helper function to generate two things:

  1. The Slack message we're going to send
  2. The path where we're going to move the log file after processing (in the processed/ directory)

Let's add a code component and enter this code to execute:

module.exports = async (  { logger },  {    loopOverEachFile: { currentItem: fileName },    downloadLogFile: {      results: { username, site },    },  }) => {  const loginTime = fileName.replace("unprocessed/", "").replace(".json", "");  return {    data: {      slackMessage: `${username} logged into ${site} at ${loginTime}.`,      outFileName: fileName.replace("unprocessed/", "processed/"),    },  };  return { data: results };};

Note that this returns an object with two values: slackMessage containing a message to send, and outFileName that contains something like processed/20210322_163522.json.

Send a Slack Message#

Next, we'll send the slack message that we generated in the previous step. We'll do that by adding a Slack - Send Message step to our integration. If you do not have a Slack channel, you can try sending an email instead with SendGrid or an SMS with Twilio.

Move the File to a Processed Directory#

Finally, we'll move the file that we downloaded out of the way by moving the file from unprocessed/ to processed/. To do that we'll add a GCP - Move File step to our integration.

For the source file name, we'll reference our loop's currentItem key again. For the destination file name, we'll reference our code component's results.outFileName output:

Conclusion#

That's it! At this point we have an integration that loops over files in a directory, processes them, and sends alerts based on their contents. This integration can be published, and instances of this integration can be configured and deployed to customers.