Connect Azure Databricks to Synapse using Service Principal
Azure has recently added the ability to authenticate to Azure SQL Database and Azure Synapse using Azure Active Directory. This allows you to centrally manage identity to your database.
As per Microsoft documentation, Azure Active Directory authentication is a mechanism of connecting to Microsoft Azure Synapse and Azure SQL Database by using identities in Azure Active Directory (Azure AD). With Azure Active Directory authentication, you can centrally manage the identities of database users and other Microsoft services in one central location. Central ID management provides a single place to manage Synapse users and simplifies permission management.
See here for a great document explaining the details of using Azure Active Directory Authentication for authentication with SQL
Authenticating with your user name is practical when doing analysis and pulling data from external database sources, but not so much when you want to operationalize your pipeline. This is where an Azure Active Directory application registration (also called service principal) can be used to user accounts from execution accounts.
|Create an AAD App registration||You can follow this Microsoft doc|
|Create a secret for your app registration||This blogs covers the next 3 steps very well|
|Create an AAD Group||See above|
|Add the App registration to the AAD group as a member||See above|
|Add the AAD group to Azure SQL Active Directory Admin||You can follow this guide as well as the above|
In order to do the next part you will need a few keys:
App registration Application (client) ID
This was given to you when you initially created the secret. If you did not write it down, you can delete the key and create a new one.
Now that all the plumbing is done we're ready to connect Azure Databricks to Azure SQL Database. In this section we'll be using the keys we gathered to generate an access token which will be used to connect to Azure SQL Database.
Let's look at the building blocks first:
You will need to add the following libraries to your cluster:
//Instantiate the ADAL AuthenticationContext object val service = Executors.newFixedThreadPool(1) val context = new **AuthenticationContext**(authority, true, service);
|Constructor and Description|
|AuthenticationContext(String authority, boolean validateAuthority, ExecutorService service) Constructor to create the context with the address of the authority.|
//Get access token val ClientCred = new ClientCredential(ServicePrincipalId, ServicePrincipalPwd) val authResult = context.acquireToken(resourceAppIdURI, ClientCred, null) val accessToken = authResult.get().getAccessToken
Constructor and Description
ClientCredential(String clientId, String clientSecret) Constructor to create credential with client id and secret
public Future<AuthenticationResult> acquireToken(
Acquires security token from the authority.
resource - Identifier of the target resource that is the recipient of the requested token.
credential - object representing Private Key to use for token acquisition.
callback- optional callback object for non-blocking execution.
A Future object representing the AuthenticationResult of the call. It contains Access Token and the Access Token's expiration time. Refresh Token property will be null for this overload.
AuthenticationException - AuthenticationException
import com.microsoft.aad.adal4j.ClientCredential import com.microsoft.aad.adal4j.AuthenticationContext import java.util.concurrent.Executors val TenantId = "**<FILL HERE>**" val authority = "https://login.windows.net/" + TenantId val resourceAppIdURI = "https://database.windows.net/" val ServicePrincipalId = "**<FILL HERE>**" val ServicePrincipalPwd = "**<FILL HERE>**" //Instantiate the ADAL AuthenticationContext object val service = Executors.newFixedThreadPool(1) val context = new AuthenticationContext(authority, true, service); //Get access token val ClientCred = new ClientCredential(ServicePrincipalId, ServicePrincipalPwd) val authResult = context.acquireToken(resourceAppIdURI, ClientCred, null) val accessToken = authResult.get().getAccessToken
Once the token generated, you simply need to call the spark.read.sqlDB command and pass the token to authenticate.
import com.microsoft.azure.sqldb.spark.config.Config import com.microsoft.azure.sqldb.spark.connect.\_ val config = Config(Map( "url" -> "**<SERVER NAME>**.database.windows.net", "databaseName" -> "**<Database Name>**", "dbTable" -> "**<Table Name>**", "accessToken" -> **accessToken**, "hostNameInCertificate" -> "\*.database.windows.net", "encrypt" -> "true" )) val collection = spark.read.sqlDB(config) collection.show()
As you can see, once the token generated, it's really easy to query Azure SQL Database.