In this article we will show how to run reports on HBase using the open source Apache Phoenix JDBC driver. Phoenix is now a stable and performant solution, which "became a top-level Apache project in 2014. Apache Phoenix is included in the Hortonworks distribution for HDP 2.1 and above, is available as part of Cloudera labs, and is part of the Hadoop ecosystem." (Wikipedia article)
The driver will only work in TIBCO JasperReports® Server v6.2 or later as the minimum ANTLR library version required by Phoenix is available since that release.
Earlier versions of TIBCO Jaspersoft® Studio and TIBCO JasperReports® Library might work independently from the server as they have different ANTLR dependencies.
Please note that only standard / production reports will work in JasperReports Server , Domains will not work because of the limited JDBC metadata implementation in Phoenix (see our JDBC metadata requirements in the Platform Support Datasheet PDF vs. Phoenix implementation).
Installing Apache Phoenix
HBase master and region servers
Follow the download and installation steps on the Phoenix website, depending on your HBase version. For this article, we have tested Phoenix 3.3.1 with Cloudera CDH 4.x quickstart VM – which contained version 4.8.3 at the time of testing.
We had to switch the whole Hadoop cluster from JDK 6 to JDK 7, or the Phoenix libraries, compiled with Java 7, would not run at all.
Client side – in this case JasperReports Server, Jaspersoft Studio or any application using JasperReports Library
There are two flavours of Phoenix you can install:
- phoenix-[version]-client-hadoopX.jar – replace [version] with 3.3.1 and X with 1 or 2, based on your Hadoop version. This jar is standalone and includes its own version of HBase (+ dependencies)
- phoenix-[version]-client-without-hbase.jar – replace [version] with 3.3.1. This jar does not contain HBase nor Hadoop dependencies, which means you will have to provide your own jar files for those
Although option a is simpler, in our test we have used option b (client without HBase) because the HBase libraries included with Phoenix were not compatible with HBase installed on CDH 4.8.3.
For example, for JasperReports Server we have copied the following files to <js-webapp>/WEB-INF/lib:
From the the Phoenix download:
From one of the Hadoop cluster nodes (HBase master node on CDH in our test, but you should find similar files in your Hadoop distribution):
The paths above are normally symbolic links to the specific versions of the libraries. You can copy those with a version in the file name or those without, it will not make a difference.
The same files above will be required (plus probably /usr/lib/hadoop/lib/common-collections-3.2.1.jar) in the classpath, in order to use Phoenix in Jaspersoft Studio or any applications using the the JasperReports Library. See "To add a JAR file to a data adapter's classpath" in the documentation for more details.
Setting up the data source
In JasperReports Server, create a new Data Source as superuser. Assuming your ZooKeeper quorum server host name is cluster1, you could use:
Type: JDBC Data Source
JDBC Driver: select Other...
Then the following fields will be available.
JDBC Driver (required): org.apache.phoenix.jdbc.PhoenixDriver
URL (required): jdbc:phoenix:cluster1
User Name and Password are not required in a default configuration.
Building a sample report
Any reports with an SQL query compatible with Phoenix can be used with a Data Source configured as above. Here's some sample DDL and a query to get you started. You can load employee data from the Foodmart sample database (gzipped and attached to this article in tab-separated values as employee.tsv.gz) to a standard HBase table, then create a table in Phoenix which exposes that data via SQL.
While extracting the data to the tsv file, we generated a UUID value to be used as the primary key of the HBase table, as it wouldn't be recommended to use the employee ID column which is a sequential integer.
Run this command on the HBase master to create the table:
$ echo "create 'employee', 'cf1'" | hbase shell
In the same directory of the file employee.tsv, you can then load the data with these commands:
$ gunzip employee.tsv.gz $ hadoop jar /usr/lib/hbase/hbase.jar importtsv \ -Dimporttsv.columns=HBASE_ROW_KEY,cf1:employee_id,cf1:full_name \ ,cf1:first_name,cf1:last_name \ ,cf1:position_id,cf1:position_title \ ,cf1:store_id,cf1:department_id \ ,cf1:birth_date,cf1:hire_date \ ,cf1:end_date,cf1:salary \ ,cf1:supervisor_id,cf1:education_level \ ,cf1:marital_status,cf1:gender \ ,cf1:management_role employee employee.tsv
Connect to Phoenix via sqlline.py (part of the Phoenix package) directly on the HBase master or via any JDBC client configured to use Phoenix (e.g. SQuirreL), and run this DDL:
CREATE TABLE "public"."employee" ( pk VARCHAR PRIMARY KEY, "cf1"."employee_id" VARCHAR, "cf1"."full_name" VARCHAR, "cf1"."first_name" VARCHAR, "cf1"."last_name" VARCHAR, "cf1"."position_id" VARCHAR, "cf1"."position_title" VARCHAR, "cf1"."store_id" VARCHAR, "cf1"."department_id" VARCHAR, "cf1"."birth_date" VARCHAR, "cf1"."hire_date" VARCHAR, "cf1"."end_date" VARCHAR, "cf1"."salary" VARCHAR, "cf1"."supervisor_id" VARCHAR, "cf1"."education_level" VARCHAR, "cf1"."marital_status" VARCHAR, "cf1"."gender" VARCHAR, "cf1"."management_role" VARCHAR );
You should then be able to query the data in your report, for example:
SELECT * FROM "public"."employee" ORDER BY "employee_id" LIMIT 100;
For your convenience, a simple report is attached to this article as HBaseTest.jrxml, created with JSS 6.1.1.
Alternatively you can create the table in Phoenix directly without going through HBase first and then load the data via traditional SQL (using UPSERT, as INSERT and UPDATE are not available), or using bulk loading.
If you have trouble starting sqlline.py locally on the master, set this environmental variable and try again:
$ export HBASE_CONF_PATH=/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/hadoop-common.jar:/usr/lib/hadoop/hadoop-auth.jar:/usr/lib/hbase/hbase.jar
You will need to change the paths above depending on your Hadoop distribution.
As in our case CDH was running in a test VM, we encountered this issue: http://stackoverflow.com/questions/18428722/hbase-java-client-unknown-ho...
That was worked around by adding localhost.localdomain to the existing /etc/hosts entry for cluster1, which was already pointing to the right IP address.
This should not be an issue in a Hadoop cluster where ZooKeeper is properly configured.