Jump to content
We've recently updated our Privacy Statement, available here ×
  • Jaspersoft HBase Query Language


    mdahlman

    Language Reference

    The Jaspersoft HBase Query Language is a JSON-style declarative language for specifying what data to retrieve from HBase. The connector converts this query into the appropriate API calls and uses the HBase REST Server interface (Stargate) to query the HBase instance.

    {   # The following parameters are mandatory
        "tableName" : "myTable",
        "deserializerClass" : "myDeserializer",


    Details Details

        # The following parameters are optional
         "filter" : { },

    Details

         "startRow/endRow" : { },

    Details

          "columnList" : { },

    Details

         "sortFields" : { },

    Details

         "rowsToProcess" : { },

    Details

         "batchSize" : { },

    Details

         "idField" : { },

    Details

          "alias" : { },

    Details

          "qualifiersGroup" : { }
    }

    Details

     

    tableName

    Specifies the table name. Exactly one table must be specified.

    # Hard-coded table name

    "tableName" : "myTable"

    # Collection name specified as a String Parameter

    "tableName" : $P{myTableParam}

    deserializerClass

    Specifies how the data will be deserialized into Java objects that the report engine can process. HBase has no data type metadata. Instead it stores all values simply as arrays of bytes. In order for a query and for a report engine to make sense of these arrays of bytes, there must be a some definition of how these bytes will be interpreted. If you are already using HBase then you are already doing this (either explicitly or implicitly).

    The HBase connector ships with two sample deserializer classes. The DefaultDeserializer uses Java's built in serialization. If you have inserted data into HBase using Java serialization, then choose this deserializer class.

    The ShellDeserializer correctly interprets data that has been inserted using the HBase shell. It includes logic to determine if a byte array represents a Long (Integer), Double, or String. This is practical for using the Jaspersoft HBase connector with HBase tutorials.

    More generally, HBase users have their own system for serializing data. To reuse your current Serialization/Deserialization (SerDe) logic, you must create a class which implements the Jaspersoft Deserializer interface. The deserialization may be based solely on interpreting the array of bytes, or the method of deserialization may be based on the table name, column family, or qualifier.

    # Java's built-in serialization

    "deserializerClass" : "com.jaspersoft.hbase.deserialize.impl.DefaultDeserializer"

    # Retrieving data input using the HBase shell

    "deserializerClass" : "com.jaspersoft.hbase.deserialize.impl.ShellDeserializer"

    # Using a custom deserializer class

    "deserializerClass" : "com.MyCompany.MyHBaseDeserializer"

    filter

    The filter provides methods for limiting what data is returned from the specified table. In principle any valid HBase filter may be used. In practice not all filter types are relevant to Business Intelligence queries. Refer to comprehensive HBase filter documentation here: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html

    Valid compareOp values: EQUAL, GREATER, GREATER_OR_EQUAL, LESS, LESS_OR_EQUAL, NO_OP, NOT_EQUAL Reference: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/CompareFilter.CompareOp.html

    Common comparators: BinaryComparator, SubstringComparator, RegexStringComparator Complete list: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/WritableByteArrayComparable.html

    Comparator syntax:

    # Use a binary value

    "BinaryComparator" : { "value" : "myBinaryValue" }

    # Use a regular expression

    "RegexStringComparator" : { "expr" : "myExpression" }

    # Search for a substring

    "SubstringComparator" : { "substr" : "myString" }

    Commonly used filters include the following:

    SingleColumnValueFilter

    Filter rows based on the value in a specified column.

    For example, return only Canadian customers:

    "filter" : {

        "SingleColumnValueFilter" : {

             "family" : "column_family_1",

             "qualifier" : "billing_address_country",

             "compareOp" : "EQUAL",

             "comparator" : {

                 "SubstringComparator" : {

                     "substr" : "Canada"

                 }

             }

        }

    }

    More information on this filter types is available: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SingleColumnValueFilter.html

    Row Filter

    Filter based on the row key. For example, return only rows which have a RowID starting with "2012" and ending with "X".

    "filter" : {

        "RowFilter" : {

             "compareOp" : "EQUAL",

             "comparator" : {

                 "RegexStringComparator" : {

                     "expr" : "2012.*X"

                 }

             }

         }

    }

    More information http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/RowFilter.html

    Filter List

    Multiple filters may be used together in a filter list. Filters are "ANDed" together or "ORed" together using MUST_PASS_ALL or MUST_PASS_ONE. Here we return only the rows matching the specified RowID format which are for Canadian customers.

    "filter" : {

        "FilterList" : {

            "operator" : "MUST_PASS_ALL",

            "rowFilters" : [  {   "RowFilter" : {

                                             "compareOp" : "EQUAL",

                                             "comparator" : { "RegexStringComparator" : { "expr" : "2012.*X" }  }

                                        }

                                    },

                                    {   "SingleColumnValueFilter" : {

                                             "family" : "schema",

                                             "qualifier" : "billing_address_country",

                                             "dropDependentColumn" : true,

                                             "compareOp" : "EQUAL",

                                             "comparator" : { "RegexStringComparator" : { "expr" : "Canada" }  }

                                        }

                                    }

                                ]

         }

    }

    More information: http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html

    startRow/endRow

    These parameters may be used as a sort of specialized RowFilter. Performance is better when filtering with these options compared with the equivalent RowFilter.

    The startRow is the row to start scanner at or after (inclusive).

    The endRow is the row to stop scanner before (exclusive)

    # Return only the rows between the specified rowIDs

    # Note: row 1309913959672 is returned. row 1309913959675 is not returned.

    "startRow" : "1309913959672",

    "endRow" : "1309913959675"

    # Return all rows after the specified start row (includes the start row)

    "startRow" : "1309913959672"

    # Return all rows before the specified end row (excludes the end row)

    "endRow" : "1309913959675"

    columnList

    This allows the query to select which fields will be part of the result set. If no value is specified for 'findFields' then all fields will be returned. To return only a subset of the fields available in the returned documents, specify a comma separated list of fields in the form ColumnFamily:Qualifier.

    # Return only these four fields

    # Note: the query will return these four fields plus the row id in the field named "_id_"

    "columnList" : "schema:billing_address_country, schema:billing_address_city, schema:account_type, schema:assigned_user_id"

    sortFields

    Specifies the fields that will be used to sort. Specify a comma separated list of fields in the form ColumnFamily:Qualifier.

    "sortFields" : "schema:billing_address_country, schema:billing_address_city"

    rowsToProcess

    The rowsToProcess key sets the number of rows that will be processed to determine the list of fields. This applies only at edit at time in iReport. It has no effect on reports when they are executed. The connector uses a default value of 10 records if this option is not specified. If a value of "0" is specified then the Fields Provider will iterate through all records in result set.

    "rowsToProcess" : "50"

    batchSize

    This parameter is optional and determines the size of the batch that retrieves result from HBase per request

    "batchSize" : 90

    alias

    It's an optional entry that allows the user to rename field name for a better usability. It's expressed as a map where keys are the alias and values are the original field names.

    {

         "ALIAS_NAME" : "FIELD_NAME"

    }

    For instance:

    {

         "street" : "schema|billing_address_street"

    }

    qualifiersGroup

    This provides an important pivot feature. It makes it possible to transform wide rows of data into multiple "shorter" rows. It groups a set of columns by a regular expression, and it will output the column names as a field name with their corresponding values as another field. The field type of the column names is String and for the values is Object The syntax is as follows

    {

        "qualifiersExpression" : Regex expression

         "qualifierJrField" : Name of the pivot JR field for the columns

         "valueJrField" : Name of the pivot JR field for the column values

    }

    Query snippet:

    "qualifiersGroup" : {

         "qualifiersExpression" : "street|schema|billing_.*",

         "qualifierJrField" : "billing",

         "valueJrField" : "billingValue"

    }

    Example of transforming data:

    # Original HBase data:

    rowID:row1, order-2012-01-01:$50, order-2012-01-03:$99

    rowID:row2, order-2012-01-01:$25, order-2012-01-02:$66, order-2012-01-07:$130

    # Pivoted result set:

    rowID fieldName fieldValue

    row1 order-2012-01-01 $50

    row1 order-2012-01-03 $99

    row2 order-2012-01-01 $25

    row2 order-2012-01-02 $66

    row2 order-2012-01-07 $130

    # This pivoted data can be used more easily for reporting and analysis.

    idField

    By default the rowID for each record is returned in the field $F{_id_}. You may override this name using the idField parameter.

    "idField" : "newIDFieldName"

    Sample queries

    Basic

    Retrieve absolutely everything from a table

    {
        "tableName" : "accounts",
         "deserializerClass" : "com.jaspersoft.hbase.deserialize.impl.DefaultDeserializer",
    }

    Filters

    SingleColumnValueFilter (only customers in Canada)

    {
        "tableName": "accounts",
        "deserializerClass": "com.jaspersoft.hbase.deserialize.impl.DefaultDeserializer",
        "sortFields": "schema|billing_address_country, schema|billing_address_city",
        "filter": {
            "SingleColumnValueFilter": {
                "family": "schema",
                "qualifier": "billing_address_country",
                "compareOp": "EQUAL",
                "comparator": { "SubstringComparator": { "substr": "Canada" } }
             }
         }
    }

    Multiple SingleColumnValueFilters (Only a specified customer, only in the last 60 minutes)

    {
        "tableName": "transfer",
         "deserializerClass": "com.jaspersoft.hbase.deserialize.impl.ShellDeserializer",
         "filter": {
             "FilterList": {
                 "operator": "MUST_PASS_ALL",
                 "rowFilters": [ {   "SingleColumnValueFilter": {
                                               "family": "Info",
                                               "qualifier": "id",
                                               "compareOp": "EQUAL",
                                               "comparator": {  "SubstringComparator": { "substr": "$P{CUSTOMER}" }  }
                                            }
                                       },
                                       {   "SingleColumnValueFilter": {
                                               "family": "Info",
                                               "qualifier": "time",
                                               "compareOp": "GREATER",
                                               "comparator": { "BinaryComparator": { "value": "$P{ONE_HOUR_AGO}" } }
                                            }
                                       }
                 ]
             }
         }
    }

    Filter and Pivot

    {
    "tableName": "accounts",
    "deserializerClass": "com.jaspersoft.hbase.deserialize.impl.DefaultDeserializer",
    "filter": {

        "FilterList": {
       
        "operator": "MUST_PASS_ALL",
       
        "rowFilters": [
          
                            {   "QualifierFilter": {
         
                                     "compareOp": "EQUAL",
          
                                   "comparator": { "RegexStringComparator": { "expr": "billing_.*" } }
       
                                  }
                                  },
                                  {   "SingleColumnValueFilter": {
           
                                  "family": "schema",
           
                                  "qualifier": "billing_address_country",
           
                                  "dropDependentColumn": true,
           
                                  "compareOp": "EQUAL",
          
                                   "comparator": { "RegexStringComparator": { "expr": "Mexico" } }
       
                                  }
                                  }
                               ]
            }
        },
        "qualifiersGroup": {
            "qualifiersExpression": "street|schema|billing_.*",
            "qualifierJrField": "billing",
            "valueJrField": "billingValue"
        },
        "alias": {
             "street": "schema|billing_address_street",
             "customerID": "$P{CUSTOMER_NUMBER}"
         }
    }

     

     


    User Feedback

    Recommended Comments

    There are no comments to display.



    Guest
    This is now closed for further comments

×
×
  • Create New...