Azure Data Factory CI/CD using GitHub Actions
In this guide, we show how to do continuous integration and delivery in Azure Data Factory with GitHub Actions.
Lots of online resources, on implementing Continuous Integration and Continuous Deployment for Azure Data Factory using Azure DevOps:
Continuous integration and delivery - Azure Data Factory | Microsoft Learn
Automated publishing for continuous integration and delivery - Azure Data Factory | Microsoft Learn
Deploy linked ARM templates with VSTS - Azure Data Factory & Azure Synapse | Microsoft Learn
Azure Data Factory (ADF)— Continuous integration and delivery (CI/CD) | Medium
Some articles on using GitHub, GitHub Actions to implement ADF CI/CD:
GitHub code resources:
Here is a detailed guide on how a simple CI/CD workflow can be implemented using GitHub Actions to promote ADF data pipelines from one environment to another.
Configure GitHub Repository Settings
After creating and initializing a new repository, go to Settings tab to further configure the detailed options for establishing operating boundary.
Under Collaborators and teams > Manage access > Add teams:
ADF Engineering Team granted with Role: Maintain
Admin Team granted with Role: Admin
Everyone Team granted with Role: Read
Under Collaborators and teams > Moderation options > Interaction limits, restrict repository interaction as such:
Enable Limit to existing users
Enable Limit to repository collaborators
Under Collaborators and teams > Moderation options > Code review limits:
Enable Limit to users explicitly granted read or higher access
Under Code and automation > Branches > Add rule for main branch with following setting enabled:
Require approvals (Required number of approvals before merging: x). Start with more reviewers and then slowly transit to peer review to increase speed of deployment into production
Require review from Code Owners
Restrict who can dismiss pull request reviews (Add the Admin Team. This is particularly useful for out-of-the-norm scenarios)
Allow specified actors to bypass required pull requests (Add the Admin Team. This is particularly useful for out-of-the-norm scenarios)
Require approval of the most recent reviewable push
Require status checks to pass before merging and Require branches to be up to date before merging
Require conversation resolution before merging
Lock branch
Do not allow bypassing the above settings
Under Security > Secrets and variables, set the Secrets:
AZURE_TENANT_ID (take from below steps)
AZURE_SUBSCRIPTION_ID (take from below steps)
AZURE_RESOURCE_GROUP (where your ADF is in)
AZURE_USER_ASSIGNED_MANAGED_IDENTITY_CLIENT_ID
AZURE_DATA FACTORY_NAME_DEV
AZURE_DATA_FACTORY_NAME_PROD
etc..
Under Security > Secrets and variables, set the Variables:
ACTIONS_RUNNER_DEBUG: false
ACTOINS_STEP_DEBUG: false
The above 2 flags are useful during initial setup. Set them to true to examine detailed logs during GitHub Actions run. See Enabling debug logging - GitHub Docs
Create folders and files required for GitHub Repository for CI/CD
Create these files in the repository:
“.github/workflows/main.yml”
“.github/workflows/validatepullrequest.yml”
“.github/pull_request_template”
“.github/CODEOWNERS”
Set up User Assigned Managed Identity for GitHub Actions to deploy ARM Templates
Create credentials for ARM deployment when code is pushed to main branch:
Go to Azure, search for Managed Identities and create a new User Assigned Managed Identity
Once created, go to Federated Credential and Add Credential
Under Federated credential scenario, select “GitHub Actions deploying Azure resources - Configure a GitHub issued token to impersonate this application and deploy to Azure”
Fill in the Organization, Repository
For Entity, select “Branch”
For Branch, select “main”
Create credentials for ARM deployment when pull request is raised:
Repeat steps 1 to 6.
Then edit Subject Identifier to put in:
“repo:<your-org>/<your-repo>:pull_request”
Go to Settings > Properties and set these into the above created Secrets in GitHub Repository > Under Security > Secrets and variables
Tenant ID
Client ID
Subscription ID
Set up Azure Data Factory for CI/CD
Set up GitHub Actions to run the CI/CD workflow
ARM Template itself has limits: 256 variables, 800 resources (including copy count), 64 output value, 10 unique locations per subscription/tenant/management group scope, 24,576 characters in a template expression and 4mb file size for template and template parameter files and it affects ADF CI/CD workflow:
Setting up ADF CICD workflow to utilize Linked ARM Templates may overcome the 4mb file size limit but it does not solve the 256 parameters limit.
A simpler alternative to overcome the limit is just design the architecture to provision multiple Azure Data Factory instances upfront.