What's New Overview Video
Please note - functionality noted below may not appear in every edition of JaspersoftETL - for edition information please visit this page
Table of Contents
Spark Job designer
The Spark Streaming framework is supported in Talend Real-time Big Data Platform and Talend Data Fabric. With these products, users can create Spark Streaming Jobs to handle data being generated in motion.
New messaging components such as tKafkaInput, tKafkaOutput, tKinesisInput, and tKinesisOutput, are available to the Spark Streaming Jobs.
The Spark window operations and the Spark checkpoints are supported by the Spark Streaming Jobs.
The tRestWebServiceOutput is added to allow users to send message streams to a given RESTful Webservice in a Spark Streaming Job.
In Talend Administration Center, the Big Data Streaming Conductor module has been created to gather the script generation, deployment and execution phases of Streaming Jobs in an execution task.
A wide range of component families are available in Spark Batch and Spark Streaming Jobs for accomplishing many different specific tasks, such as:
The full-features tMap component and other Processing components
Native connectors to Cassandra
Components for Spark SQL and Elasticsearch
Native RDBMS components
Spark Tuning properties
Hierarchical Avro files:
The tAvroInput component in a Spark Job can now handle the hierarchical Avro schema.
In a Spark Batch Job, the tAvroOutput component allows users to define a hierarchical key/value schema for the Avro file to be written.
In addition to the Spark components used for data integration, the new Machine learning components allow users to leverage Spark to make predictive analyses.
Discovery and reusability in Hadoop cluster metadata
An import wizard for Hadoop configuration has been created to import the configurations of various Hadoop services directly from different sources.
This wizard supports importing configurations directly from the Ambari service or the Cloudera manager service.
When users import configurations from local files, this wizard supports all of the distributions already officially supported by Talend.
Users can create environmental context out of the metadata of a connection to a Hadoop service such as HDFS or Hive. Then this context can be used to define the parameters of the Hadoop connection to be created.
Upgraded support for NoSQL and Hadoop distributions
New versions of the following Hadoop distributions are supported:
Cloudera 5.4 (YARN)
MapR 4.1.0 (YARN)
Hortonworks Data Platform V2.2
EMR Apache 2.4.0 and EMR 4.0.0 (Apache 2.6.0)
Cassandra CQL3 and the Datastax API are now available to the Cassandra components.
MongoDB 3.0.X. is supported
The MongoDB components now support more operations such as "Bulk write" or "Write concern" and have been enabled to work with different authentication mechanisms including Kerberos and SCRAM-SHA1.
Hadoop components enhancements
The support of the Tez framework has been added to the Hive and the Pig components.
With HiveServer 2 in the Standalone mode, the Hive components now support the Kerberos authentication.
From the Hive components, the SSL connection is now available to HiveServer2 in the Standalone mode of the following distributions:
Hortonworks Data Platform 2.0 +
Cloudera CDH4 +
Pivotal HD 2.0 +
Support for Kerberos has been added to the HBase components and the HBase connection in the Hive and Pig components.
The tHDFSOutputRaw component has been created to write data into more formats than the Text and the Sequence formats.
New messaging components in the Data Integration Jobs
The following Kafka components have been created to help users visually create a topic and consume or publish messages:
Memory Test Execution
Users can now monitor the JVM memory consumption and CPU usage via curve graphs during Job execution
Enhanced look and feel and user experience
The Studio has undergone a number of changes to enhance the look and feel and user experience.
The Studio user interface and component icons have been redesigned for enhanced look and feel.
The Studio login process has been streamlined:
For the first startup, fewer steps are required.
After the first start up, users can directly open the same project without seeing the login dialog.
Users can now drag and drop an output component from an input one on the design workspace.
Enhanced component search:
Users can now search for a component by providing a descriptive phrase as the search keywords, without having to know the component name. The maximum number of entries in the search result is configurable.
The Palette now shows recently used components to ease component search in Job design. The maximum number of components shown on the Recently Used list is configurable.
Java 8 support
The Studio now supports Java 8 only.
Teradata SCD (Slowly Changing Dimensions) is now supported through the new component tTeradataSCD.
VerticaELT and SQL Templates
Vertica ELT is now supported through the following new components:
SQL Templates for Vertica are now supported.
tVerticaBulkExec and tVerticaOutputBulkExec now support bulk-loading data to a Vertica database based on the defined schema columns.
The following components now support batch mode when using an existing connection:
tOracleSP now supports mapping the source database type to XMLTYPE.
PostgreSQL v9.x.x is now supported.
tSalesforceWaveBulkExec and tSalesforceWaveOutputBulkExec, which let you load data to Salesforce Analytics Cloud, have been added to the Studio.
tSalesforceWaveBulkExec and tSalesforceWaveOutputBulkExec support reusing an existing connection and retrieving the upload status.
tNetsuiteInput and tNetsuiteOutput, which let you read/write data from/to NetSuite, have been added to the Studio.
tNetsuiteOutput supports retrieving data rows with errors via a Reject row connection.
tMarketoInput, which supports three lead selector types when retrieving lead records from Marketo.
tMarketoOutput, which supports providing the status (CREATED, UPDATED, FAILED) and the optional error text.
Four tRedshift* load/unload components, which let you load/unload data from/to Amazon S3.
Seven tGoogleDrive* components, which let you create, upload, download, copy, delete, move and list files and folders on Google Drive.
Six tAmazonAurora* components, which let you read/write data from/to Amazon Aurora databases.
tS3Put now supports server-side encryption.
Static transport discovery mechanism is now supported for the ActiveMQ server in tMomXXX components.
tMomInput and tMomRollback now support backouting the messages to backout queue when the backout count reaches the configured threshold.