Blogs

Hive-DB connections using Kerberos Authentication from Mule 4

Written by Mahesh Sharma | Feb 5, 2023 6:30:00 PM

Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale. Like all data warehouses, it provides users with a centralized place to store information for easy analysis and data-driven decision-making. Hive specifically allows users to read, write, and manage petabytes of data using SQL.

Using MuleSoft to connect Hive-DB unlocks connectivity for more systems within an organization’s application network. It also does so faster and more securely than other methods.

This blog explains reading Apache Hive-DB data with MuleSoft Database Connector. It uses Kerberos authentication to authenticate with Hive-DB.

 

Prerequisites

You’ll need the following prerequisites to complete this process:

  • Anypoint Studio 7.0
  • MuleSoft Database Connector
  • Cloudera JDBC Driver (ClouderaHiveJDBC42-2.6.18.1021)
  • The below config files:
    • Jaas.conf
    • Krb5.conf
    • {MulesoftUser}.keytab

 

Note:
The krb5.conf and MulesoftUser}.keytab files should be provided through the Hadoop/Hive team.

What the krb5.conf file looks like:


Jaas.conf

 

Note:
Kerberos authentication won’t work if the above “Client” root node is changed or missing.

{MulesoftUser}.keytab

The {MulesoftUser}.keytab file is generated after a service user in AD for Hadoop Hive-DB access is created. This serves as a credential file for the user and is an essential step for authenticating using Kerberos.

Use this command to check principal for the keytab:

klist -t -k MulesoftUser.keytab

 

Workflow

The flow presented below is API based and receives database queries from post requests. It directs requests to the appropriate database cluster based on the query parameters, the response for which is then transformed and sent back to the caller.

Steps:

1. Download database connector from Mule palette.

2. This will add the following dependency in POM.xml

3. Add the ClouderaHiveJDBC42-2.6.18.1021 driver to your organization repository manager (example: Artifactory/nexus etc.).

Ref: https://stackoverflow.com/questions/24122382/how-to-configure-maven2-to-publish-to-artifactory

4.   Add ClouderaHiveJDBC42-2.6.18.1021 to the POM dependency and share-library

5. Add the below database configuration in global.xml:

URL pattern:

jdbc:hive2://hadoop-lr4-hiveserver1.anycast.prod.abc.com:10000;AuthMech=1;KrbRealm=ABCDATA.COM;KrbHostFQDN=http://hadoop-lr4-hiveserver1.anycast.prod.abc.com;KrbServiceName=hive;KrbAuthType=1

Note:
Please also add your reconnection strategy at this step.

 

hadoop-lr4-hiveserver1.anycast.prod.abc.com

Host

10000

Port

AuthMech

1, means Kerberos authentication

ABCDATA.COM

KrbRealm, the domain over which a Kerberos authentication server has the authority to authenticate a user, host or service

http://hadoop-lr4-hiveserver1.anycast.prod.abc.com

KrbHostFQDN, used to connect to kerberos host with fully qualified domain name.

hive

KrbServiceName

1

KrbAuthType, to create a LoginContext from a JAAS configuration and then use the Subject associated with it, set the KrbAuthType property to 1.

6. Set up your API kit router and bind it with the appropriate sub-flow and database config per the above workflow.

7. Once the flow is completed, add the following properties and run configurations.

Note:
Be sure to provide the appropriate krb5.conf and jaas.conf paths as per the above prerequisites.

8. Fire a query to your hive-DB using postman.

 

References:

 

To find out more email us at: salesforce@coforge.com