Hadoop HBase

Jaspersoft's Hadoop-HBase Connector

Works with iReport, JasperReports, and JasperReports Server

Target Audience

Users who want to integrate Jaspersoft's JasperReports Server with Hadoop using the HBase database. This document assumes that the user already has familiarity with Hadoop and HBase and already has data in Hadoop stored in HBase tables.

Background

JasperReports Server 4.x and earlier do not ship with built-in connectors to Hadoop-HBase. These resources are provided here on JasperForge.org in order to enable JasperReports Server to access Hadoop-HBase data. They are currently in a pre-release format. They are therefore not yet supported by Jaspersoft Technical Support.

Tests were performed with HBase 0.90.x on distributions from Apache, Cloudera (CDH3 & CDH4), and IBM (BigInsights). It's likely to work with other versions, but others are not tested.

The Jaspersoft HBase Connector requires the REST server (called Stargate). It is a standard HBase component. Even if you are not already using it, you have it available and it's easy to launch.

Summary

  1. Obtain and unzip the connector
  2. Deploy the Hadoop-HBase plugin to iReport
  3. Create a report
  4. Deploy the query executer to JasperReports Server
  5. Deploy the report to JasperReports Server
  6. Run the report

Details

Obtain and unzip the connector

Download the connector from the project. The following files are included:

WEB-INF/applicationContext-HBaseDataSource.xml
WEB-INF/bundles/HBaseDataSource.properties
WEB-INF/lib/ezmorph-1.0.6.jar
WEB-INF/lib/hadoop-core-0.20.2.jar
WEB-INF/lib/hbase-0.90.4.jar
WEB-INF/lib/HBaseDataSource-0.9.jar
WEB-INF/lib/HBaseDeserializer-0.9.jar
WEB-INF/lib/json-lib-2.4-jdk15.jar
WEB-INF/lib/libthrift-0.6.1.jar
WEB-INF/lib/zookeeper-3.3.2.jar
plugin/HBasePlugin-0.9.nbm

Deploy the Hadoop-HBase plugin to iReport

  1. Choose the menu Tools → Plugins.
    From the tab "Downloaded" choose the plugin file (e.g. HBasePlugin-0.5.1.nbm)
    After installing the plugin you must restart iReport.
  2. Click the button "Report Datasources" to define a new connection to Hadoop HBase.
  3. Add a new datasource of type "HBase Connection"
  4. Set appropriate connection details:
    HBase REST Connection:
    Hostname: [hostname of REST server]
    Port: (default is 8080)
    Test the connection.

Create a report

Create a new report; set the query language to "HBaseQuery". The query language is a simple JSON-based syntax. The Jaspersoft query language for HBase is documented here: Jaspersoft HBase Query Language Here is a sample.

create 'blogposts', 'post', 'image'

put 'blogposts', 'post1', 'post:title', 'Hello World'
put 'blogposts', 'post1', 'post:author', 'Matt Dahlman'
put 'blogposts', 'post1', 'post:body', 'This is a blog post.'
put 'blogposts', 'post1', 'image:author', 'mdahlman.jpg'
put 'blogposts', 'post1', 'image:bodyimage1', 'world.png'
put 'blogposts', 'post1', 'post:num_replies', 0
put 'blogposts', 'post2', 'post:title', 'Everything'
put 'blogposts', 'post2', 'post:author', 'Baby K'
put 'blogposts', 'post2', 'post:body', 'It is impossible to conceive of the totality of everything thinkable.'
put 'blogposts', 'post2', 'image:author', 'BabyK.jpg'
put 'blogposts', 'post2', 'image:bodyimage1', 'Infinity.png'
put 'blogposts', 'post2', 'image:bodyimage2', 'Omega_plus_one.png'

With the above sample data inserted using the HBase shell, you can then create a report using the following query:

{
  "tableName"         : "blogposts",
  "deserializerClass" : "com.jaspersoft.hbase.deserialize.impl.ShellDeserializer"
}

The deserializerClass attribute is needed because HBase has no concept of data types or other metadata for fields. Therefore the deserializer is needed to determine the appropriate data type for the arrays of bytes being returned by HBase. The HBase connector includes DefaultDeserializer and ShellDeserializer. For more details refer to the Deserializer Documentation.

The query above returns all records in the table blogposts. It most cases it's critical to filter the results. For example, we could get only posts by a certain author:

{
 "tableName"         : "blogposts",
 "deserializerClass" : "com.jaspersoft.hbase.deserialize.impl.ShellDeserializer",
 "filter"            : { "DependentColumnFilter" :
                                  { "family"     : "post",
                                    "qualifier"  : "author",
                                    "compareOp"  : "EQUAL",
                                    "comparator" : { "RegexStringComparator" :
                                                                    { "expr" : "Baby K" } 
                                                   }
                                  }
                       }
}

Deploy the query executer to JasperReports Server

  1. Copy the files in WEB-INF from step 1 into

    <appserver>/webapps/jasperserver-pro/WEB-INF

  2. Be sure to keep the folder structure when copying the files.
  3. Start or restart JasperReports Server.
Feedback
randomness