Azure Data Lake Connection
Pyplan connects to Azure Data Lake Storage Gen2 (ADLS Gen2) using the Azure SDK for Python. This integration is outbound from Pyplan to Azure and supports both file upload and download operations.
For IT and infrastructure teams, the most important point is that access is controlled from the Azure Storage Account side: Pyplan reaches ADLS Gen2 over HTTPS and Azure should allow only the public IPs used by Pyplan NAT Gateways.
Reference architecture

Integration flow
- Pyplan runs inside Pyplan Cloud on AWS.
- Outbound traffic leaves Pyplan through public NAT Gateways.
- Azure Storage Account firewall and network rules allow access only from those registered public IPs.
- Pyplan authenticates against ADLS Gen2 using either an Azure AD Service Principal or a SAS token.
- Pyplan reads or writes files in the configured filesystem/container over
HTTPS 443.
Network and security requirements
- Communication is outbound only:
Pyplan -> Azure. - Protocol:
HTTPS - Port:
443 - Azure Storage Account firewall/network rules must allow the public IPs used by Pyplan NAT Gateways. Request the corresponding IPs from the Pyplan team.
- Authentication can be done with:
- Azure AD Service Principal
- SAS token
- Access control is enforced on the Azure side through RBAC and ACLs, according to the folders and operations required.
- Encryption at rest and audit capabilities remain managed within Azure.
Requirements
Default
account_name: Storage Account name used to build the endpointhttps://<account_name>.dfs.core.windows.net/file_system: File System or container name in ADLS Gen2- Enable the firewall/network rules in the Azure Storage Account: Request the corresponding IPs from the Pyplan team.
ClientSecretCredential
tenant_id: Directory ID of the Service Principal associated with the Data Lakeclient_id: Application ID of the Service Principal associated with the Datalake.client_secret: Client secret of the Service Principal associated with the Datalake.
SharedKeyCredential
sas_token: To connect to Azure Data Lake Storage Gen2 using a SAS token, theSharedKeyCredentialclass must be used instead ofClientSecretCredential.
Authentication options for IT teams
Option 1: Azure AD Service Principal
Recommended when the customer wants centralized identity management in Azure.
- Register an application in Azure AD.
- Create a client secret or certificate for that application.
- Grant the required permissions on the Storage Account and filesystem.
- Share with Pyplan:
tenant_idclient_idclient_secretaccount_namefile_system
Option 2: SAS token
Recommended when the customer prefers scoped, time-bounded access to a specific storage resource.
- Generate a SAS token with the required permissions.
- Restrict scope and expiration according to the security policy.
- Share with Pyplan:
account_namefile_systemsas_token
What this integration enables
- Upload files from Pyplan to ADLS Gen2.
- Download files from ADLS Gen2 into Pyplan processes.
- Organize files in directories and containers.
- Keep Azure networking and access policies under customer control.
Different types of connections according to credential type
Connection - ClientSecretCredential
Integration through a Service Principal together with its clientId and secret respectively.
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core.exceptions import ResourceExistsError
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings
from azure.identity import ClientSecretCredential
import os, uuid, sys
account_name = "stgexample"
client_id = '1ej6d366-5a17-1234-1e16-da015a30303d'
client_secret = 'Ama5Q~rTrmbmZGGzRAm5ieBUO6RsD23.qRRzRaum'
tenant_id = '777d4d4b-c777-6m5f-4j68-2230d441d7j2'
file_system = "data"
credential = ClientSecretCredential(tenant_id, client_id, client_secret)
account_url = "https://{}.dfs.core.windows.net/".format(account_name)
datalake_service = DataLakeServiceClient(
account_url=account_url, credential=credential
)
result = datalake_service
Connection - SharedKeyCredential
Integration through a SAS token.
from azure.storage.filedatalake import DataLakeServiceClient
from azure.storage.filedatalake._models import FileSystemProperties
from azure.core._match_conditions import MatchConditions
from azure.core.exceptions import ResourceExistsError
from azure.storage.filedatalake._models import ContentSettings
from datetime import datetime, timedelta
from azure.identity import ClientSecretCredential
# Example values
account_name = "stgexample"
sas_token = "sas_token"
file_system_name = "data"
account_url = f"https://{account_name}.dfs.core.windows.net/?{sas_token}"
# Initialize DataLakeServiceClient with SAS token
datalake_service_client = DataLakeServiceClient(account_url=account_url)
# Example: Create a new file system
try:
file_system_client = datalake_service_client.create_file_system(file_system=file_system_name)
print("File system created:", file_system_client.url)
except ResourceExistsError:
print("File system already exists.")
# Example: List file systems
file_systems = datalake_service_client.list_file_systems()
print("List of file systems:")
for fs in file_systems:
print(fs.name)
Connection - SharedKeyCredential with Azure Key Vault
Integration through a SAS token obtained from Azure Key Vault.
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.core.exceptions import ResourceExistsError
from azure.storage.filedatalake._models import ContentSettings
from datetime import datetime, timedelta
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
from azure.identity import ChainedTokenCredential
import json
# Example Values
account_name = "stgexample"
vault_name = "key_vault_name"
sas_secret_name = "sas_secret_name"
file_system_name = "data"
# Get the client ID, tenant ID, and client secret from a secret json
with open("config.json", "r") as f:
config = json.load(f)
client_id = config["azure"]["client_id"]
tenant_id = config["azure"]["tenant_id"]
# Initialize the DefaultAzureCredential which uses environment variables,
# managed identity, or shared token cache for authentication
credential = ChainedTokenCredential(DefaultAzureCredential())
# Initialize the Key Vault client
key_vault_uri = f"https://{vault_name}.vault.azure.net/"
secret_client = SecretClient(vault_url=key_vault_uri, credential=credential)
# Get the SAS token from Azure Key Vault
sas_token = secret_client.get_secret(sas_secret_name).value
account_url = f"https://{account_name}.dfs.core.windows.net/?{sas_token}"
# Initialize DataLakeServiceClient with SAS token
datalake_service_client = DataLakeServiceClient(account_url=account_url)
# Example: Create a new file system
try:
file_system_client = datalake_service_client.create_file_system(file_system=file_system_name)
print("File system created:", file_system_client.url)
except ResourceExistsError:
print("File system already exists.")
# Example: List file systems
file_systems = datalake_service_client.list_file_systems()
print("List of file systems:")
for fs in file_systems:
print(fs.name)