CDAP Blog

Combining Hadoop and Spark in a Data Processing Pipeline

Tony Duarte

  CDAP includes an Application Development Framework so that Developers can build entire Applications with existing Big Data technologies – technologies such as Apache Hadoop, Apache Spark, Apache HBase, Apache Hive and more. CDAP has been used by Fortune 50 customers to help them do Data Ingestion and Data Egress from their data lakes and to help them … Read more


Monitoring Key Hadoop Operational Statistics using CDAP

bhooshan

The Cask Data Application Platform (CDAP) is the first Unified Integration Platform for Big Data. It provides users with higher level abstractions and APIs over complex, low-level systems for building  Big Data applications. It does the heavy lifting involved in integrating various platforms in the Apache Hadoop ecosystem, to provide a single end-to-end platform. To … Read more


Cask Tracker Enhanced: Metadata Taxonomy and Data Usage Analytics in CDAP 3.5

Yue Gao and Riwaz Poudyal

Cask Tracker is a self-service CDAP Extension that automatically captures rich metadata and provides users with visibility into how data is flowing into, out of, and within a Data Lake. Tracker was first introduced in CDAP v3.4. Tracker v0.2 has just been released along with CDAP 3.5 and packs a ton of new features. Dataset … Read more



Running Legacy MapReduce Jobs in CDAP

Rohit Sinha

The Cask Data Application Platform is an integrated developer platform for the Hadoop ecosystem. With CDAP, developers can address a broader set of batch and real-time use-cases with easy-to-use abstractions. Developers can write MapReduce programs using CDAP and deploy them as CDAP applications easily, as explained in this guide. Running MapReduce programs inside CDAP has … Read more


CDAP Services for Apache Ambari

chrisg

Cask is excited to announce easy CDAP integration for Apache Ambari users. Previously, we introduced you to integration with Cloudera Manager. This post will familiarize you with integration with Apache Ambari, the open source provisioning system for HDP (Hortonworks Data Platform). Adding the CDAP service to Ambari To install CDAP on a cluster managed by … Read more


CDAP Workflows: A closer look

Sagar Kapare

The Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Apache Hadoop™. In a previous blog post we introduced Workflows, a core component of CDAP, in comparison with Apache Oozie. In this post we will discuss  the CDAP Workflow engine in greater detail. CDAP Workflows are used to … Read more


Announcing CDAP 3.2 – Hydrator and much more!

bhooshan

We are excited to announce the Cask Data Application Platform (CDAP) 3.2 release. This release brings many enhancements to existing CDAP features as well as lays the foundation for upcoming, advanced features—all designed to further simplify data application development. Cask Hydrator CDAP 3.2 introduces Cask Hydrator—a highly functional framework and UI to support self-service batch … Read more


CDAP Workflows: In Comparison with Apache Oozie

bhooshan

Apache Oozie is a workflow scheduler system to manage Apache Hadoop™ jobs. It is one of the most popular open-source workflow scheduler systems for Hadoop. Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Hadoop. CDAP provides abstractions on top of Hadoop that enable developers to rapidly build, … Read more


AeroCask – Real-time Flight Data Analytics using CDAP

One of the many things that I love about Cask are the hackathons before every release. It is not only a way for us to dog-food new features in the CDAP platform but it is also an opportunity to let your imagination run loose and implement an integration with another system; or develop an interesting … Read more