Looping Over Files
In this tutorial we will build an integration that downloads and processes files stored in Google Cloud Storage.
For this integration, assume that some third-party service writes a timestamped file to a Google Cloud Platform (GCP) storage bucket whenever a user authenticates against one of their services. Our integration will examine the files written to the bucket, and will announce via Slack who logged in, and when.
Our integration will take advantage of the loop component to process files one by one.
We'll configure our integration to run every five minutes, and our integration will do the following:
- Look for files in the
unprocessed/
directory of our GCP Storage bucket - Loop over each file that we found:
- Download the file
- Deserialize the JSON contained in the file
- Use a code component to format a Slack message based on the deserialized contents
- Post the generated message to Slack
- Move the file from the
unprocessed/
directory to aprocessed/
directory
note
If you do not have a Google Cloud Platform (GCP) account, you can use Dropbox, Box, Amazon S3, Azure Blob Storage, etc., instead. They all have similar list files, download file and move file actions.
If you would like to view the YAML definition of this example integration, it's available on GitHub.
#
Set Up Some Config VariablesOur integration is going to interact with Google Cloud storage and Slack. For the sake of a more configurable integration, let's create four config variables for our integration:
Storage Bucket Name
will represent the name of the Google Cloud Storage bucket where files are stored.Project ID
will represent the ID of the GCP project that owns the GCP bucket.Private Key Pair
will be a set of credentials that allow the integration to interact with files in Google Cloud Storage.Slack webhook
will represent a Slack webhook URL - see generating a Slack webhook URL

#
Create Our LoopThe first two steps we'll add to our integration will (1) list files in the GCP storage bucket, and (2) create an empty loop iterating over those files.
First, we'll add a step to list files in our GCP storage bucket.
I've already created a Google Cloud Storage bucket and service account to test with, following GCP's documentation.
We'll configure our action to point to our bucket and account and use the credential config variable we created.
Under prefix we'll enter unprocessed/
so we only loop over files in the unprocessed/
directory:

Next, we'll add a loop step. Under items we will reference the list of files our previous step output:

#
Add Tasks to the LoopOur loop is now configured to run once for each file that was found in the unprocessed/
directory in our GCP bucket.
Our loop will contain five steps:
#
Download the File We're Currently Looping OverFirst, we'll download the file we're currently looping over.
The item that we're currently processing from our items is accessible using the currentItem
key.
For example, if there's a file named unprocessed/20210322_163522.json
in our bucket, loopOverEachFile.currentItem
would be equal to "unprocessed/20210322_163522.json"
:

#
Generate Slack Message and Outfile NameNext, we need a helper function to generate two things:
- The Slack message we're going to send
- The path where we're going to move the log file after processing (in the
processed/
directory)
Let's add a code component and enter this code to execute:
module.exports = async ( { logger }, { loopOverEachFile: { currentItem: fileName }, downloadLogFile: { results: { username, site }, }, }) => { const loginTime = fileName.replace("unprocessed/", "").replace(".json", ""); return { data: { slackMessage: `${username} logged into ${site} at ${loginTime}.`, outFileName: fileName.replace("unprocessed/", "processed/"), }, }; return { data: results };};
Note that this returns an object with two values: slackMessage
containing a message to send, and outFileName
that contains something like processed/20210322_163522.json
.
#
Send a Slack MessageNext, we'll send the slack message that we generated in the previous step. We'll do that by adding a Slack - Send Message step to our integration. If you do not have a Slack channel, you can try sending an email instead with SendGrid or an SMS with Twilio.

#
Move the File to a Processed DirectoryFinally, we'll move the file that we downloaded out of the way by moving the file from unprocessed/
to processed/
.
To do that we'll add a GCP - Move File step to our integration.
For the source file name, we'll reference our loop's currentItem
key again.
For the destination file name, we'll reference our code component's results.outFileName
output:

#
ConclusionThat's it! At this point we have an integration that loops over files in a directory, processes them, and sends alerts based on their contents. This integration can be published, and instances of this integration can be configured and deployed to customers.