Incrementally Load New Files in Azure Data Factory by Looking Up Latest Modified Date in Destination Folder

This is a common business scenario, but it turns out that you have to do quite a bit of work in Azure Data factory to make it work. So the goal is to take a look at the destination folder, find the file with the latest modified date, and then use that date as the starting point for coming new files from the source folder. I did not come up with this approach by myself, however, unfortunately, I misplaced the link to the original post so I cannot properly credit the author.

The details are in the video, but at high levels the steps are the following:

  1. Use Get Metadata activity to make a list of all files in the Destination folder
  2. Use For Each activity to iterate this list and compare the modified date with the value stored in a variable
  3. If the value is greater than that of the variable, update the variable with that new value
  4. Use the variable in the Copy Activity’s Filter by Last Modified field to filter out all files that have already been copied

Variables:

GetFileList:

For Each:

@activity(‘GetFileList’).output.childItems

The two activities in For Each:

Details:

If Condition:

@greater(activity(‘GetFileMetadata’).output.lastModified, variables(‘maxTime’))

And finally writing into the variable:

@activity(‘GetFileMetadata’).output.lastModified

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s