I am very happy to announce the general availability of the third generation of our flagship product, the Cask Data Application Platform (CDAP). Our developers and customers have been looking for easier ways to represent common use-cases on Hadoop, such as batch ETL, real-time ETL, assessing data quality and model building to name a few – and CDAP v3.0 is a major step toward addressing these challenges. To make our customers and developers instantly productive on Hadoop, we have packaged valuable out-of-the-box functionality with CDAP.
The core new feature we are introducing with 3.0 is called Application Templates. Application Templates are implementations of Hadoop use cases that are reusable through configuration and extensible through plugins; they can easily be managed and run in CDAP. Included in this release are built-in Application Templates for performing ETL in batch and realtime that are integrated with UI and CLI to create and manage ETL pipelines without the need to write code. ETL templates can be configured to
- ingest data from various sources – Twitter, JMS, Kafka, DB, Streams and Datasets,
- perform transformations on the data including filtering, projection or any custom transform,
- then allow the data to be persisted to a variety of destinations – DB, Datasets, Streams and HDFS.
ETL templates are highly extensible, allowing developers to create and plug-in new types of sources, transformations or sinks. Developers can build completely new Application Templates if they choose to and manage them in CDAP.
Next, we are introducing a slick new role-based UI that caters to the needs of developers, devops and administrators using CDAP. As an example, the operations section lets users create operational dashboards using application and system metrics from CDAP, as seen below.
Other major features that are part of this release include:
- Enhanced metrics and workflow support
- OLAP Cube dataset to perform complex data aggregations
- Fine-grained views of logs by run-id of CDAP programs
- Support for core Table datasets queryable from Hive
- Ability to attach schema to streams to understand several data formats – syslog, apache common log format and any custom format
In addition, we are removing the APIs that were deprecated in the previous release and discontinuing support for Java 6 with this release. Check out our complete release notes for all the details.