Hadoop-Hive Data Sources

Unlike traditional databases, Hadoop supports huge amounts of data, often called big data. As of version 5.6, JasperReports Server supports three servers that process requests to a Hadoop cluster:

•

Hive, also called Hive 1

•

Hive 2

•

Impala

Depending on whether you use Hive 1, Hive 2, or Impala, there are certain restrictions on accessing data in Hadoop. The original Hive 1 server has high latency with access times on the order of 30 seconds and up to 2 minutes. Hive 2 is much faster, but still not as fast as relational databases. As a result, Hadoop-Hive data sources have certain limitations and guidelines for use in JasperReports Server:

•

Hadoop-Hive data sources are not supported for OLAP connections.

•

Hadoop-Hive data sources cannot be used directly in Domains. To use Hadoop-Hive in a Domain, see Big Data Connectors for Virtual Data Sources.

•

Hadoop-Hive data sources are not suitable for creating reports interactively in the Ad Hoc Editor.

•

Reports based on Hadoop-Hive are not suitable for dashboards.

•

Filters and query-based input controls that rely on Hadoop-Hive data sources will be slow to populate the list of choices.

•

You must configure your query limits and timeout to handle latency (see Configuring Ad Hoc).

•

You must configure your JVM memory to handle the expected amount of data (see the JasperReports Server Installation Guide).

In general, reports based on Hadoop-Hive data sources are best suited to be run in the background from the repository. For very large reports, consider scheduling them to run at night so the output is available immediately when you need it during the day.

Hadoop-Impala data sources have much less latency, and allow interactivity with Ad Hoc views, filters, and dashboards. However, Hadoop-Impala data sources still have the following limitations:

•

Hadoop-Impala data sources are not supported for OLAP connections.

•

Hadoop-Impala data sources cannot be used directly in Domains. To use Hadoop-Impala in a Domain, see Big Data Connectors for Virtual Data Sources.

•

Hadoop-Impala data sources can be used in Ad Hoc Topics, but they do not support query optimization.

•

You must configure your query limits to handle big data (see Configuring Ad Hoc).

•

You must configure your JVM memory to handle the expected amount of data (see the JasperReports Server Installation Guide).

To create a Hadoop-Hive 1 or 2 or Hadoop-Impala data source:

Log on as an administrator.

Click View > Repository, expand the folder tree, and right-click a folder to select Add Resource > Data Source from the context menu. Alternatively, you can select Create > Data Source from the main menu on any page and specify a folder location later. If you have installed the sample data, the suggested folder is Data Sources.

The New Data Source page appears.

In the Type field, select Hadoop-Hive Data Source.

The information on the page changes to reflect what’s needed to define a Hadoop-Hive data source.

Hadoop-Hive Data Source Page

Fill in the required fields, along with any optional information.

The JDBC URL depends on which type of server you are using:

Hive 1:	jdbc:hive://<hostname>:10000/default
Hive 2:	jdbc:hive2://<hostname>:10001/default
Impala:	jdbc:hive2://<hostname>:21050/;auth=noSasl

When done, click Save. The data source appears in the repository.

Sign In

User Feedback

Recommended Comments

Activity

Products

Explore