A secure Hadoop cluster is a cluster that relies on Kerberos for user and services (such as HiveServer, HDFS) authentication/authorization. This setup also provides a layer of security for the data that is consumed and produced by the Hadoop cluster. Users and services are authenticated in the configured Kerberos realm so that only those with the right credentials can interact with the cluster.
This document describes how to add support for a Cloudera secure cluster to JasperReports Server v5.6.
Note: This configuration can also be used as a reference for supporting similar environments for vendors such as Hortonworks.
Background
From the client perspective there are two approaches when using Kerberos authentication. The first one is the client using Kerberos’ tickets already available in the operating system for the current user. The second approach is the client requesting tickets and handling the whole authentication process.
This document describes the first approach given that, for JasperReports Server v5.6 Linux installer, Tomcat runs under a non-privileged user that one can use to login over SSH and issue kinit to get Kerberos tickets.
This approach works very well since the Hive JDBC driver requires the ticket to be present already in the user's cache so that it can use it on authentication.
Requisites
- Linux operating system
- JasperReports Server v5.6 Linux installer
- A non-privileged account to install JasperReports Server
- A modified version for the Hive connector (see details below)
- Network access allowed to your Kerberos KDC from the JasperReports Server host.
Setup and configuration
- Install JasperReports Server as described in the install manual (link to installation-steps-installer-distribution)
- Update the Hive connector's components (see details below)
- Restart Tomcat
- Create/Update the file /etc/krb5.conf with the proper configuration for your Kerberos realm and credentials.
Configuration
- Issue the following commands in your Linux CLI and enter your password when requested.
kinit user@REALM - Make sure you have a Kerberos ticket granted to the current user.
klist You should see
Ticket cache: FILE:/tmp/krb5cc_506 Default principal: jaspersoft@CLOUDERA Valid starting Expires Service principal 07/10/14 12:46:52 07/11/14 12:46:52 krbtgt/CLOUDERA@CLOUDERA renew until 07/17/14 12:46:52
- In JasperReports Server, configure a new Hive Data Source using the standard procedure.
- When entering the JDBCUrl use the following pattern:
jdbc:hive2://hiveserver2Name:10000/default;principal=user/hiveserver2Name@REALM
For example:jdbc:hive2://ec2-54-80-221-220.compute-1.amazonaws.com:10000/default;principal=hive/ip-10-113-130-114.ec2.internal@CLOUDERA
- When entering the JDBCUrl use the following pattern:
Connector modification
In order to have the Hive JDBC driver create a Kerberos session, the Hive connector requires an additional component to be present: hive-shims. In this case we have to use a legacy version for this component, since the requested class org.apache.hadoop.hive.shims.Hadoop20SShims is no longer present on hive 0.11+.
- Download hive-shims-0.10.0-cdh4.3.0.jar or hive-shims-0.10.0.jar
- Copy the file to <apache-tomcat-path>/webapps/jasperserver-pro/WEB-INF/lib.
- Restart Tomcat
Drawbacks
From customer’s perspective there are some drawbacks for the described approach.
- Normally Tomcat/JBoss/application server will run under a special account that has login disabled. Admins will need to enable login and issue kinit from the shell for that particular user, so that JDBC driver will use the ticket from the cache.
- Tickets expire every n hours, depending Kerberos' KDC configuration. Admin will need to issue a kinit’ renew command so that JasperReports Server is still able to login to secure Hadoop.
- The operating system where Tomcat (application server) runs will need to have a configured /etc/krb5.conf. In particular default realm is key to have properly configured.
Recommended Comments
There are no comments to display.