CDAP Blog

Integrating CDAP with Microsoft Azure HDInsight

We recently announced the integration of CDAP with the Microsoft Azure HDInsight platform. This post will give a behind-the-scenes look at this integration. First, a bit about the integration itself. Azure HDInsight is an Apache Hadoop and Spark distribution powered by the cloud. This means that it handles any amount of data, scaling from terabytes … Read more


Cask Tracker Enhanced: Metadata Taxonomy and Data Usage Analytics in CDAP 3.5

Yue Gao and Riwaz Poudyal

Cask Tracker is a self-service CDAP Extension that automatically captures rich metadata and provides users with visibility into how data is flowing into, out of, and within a Data Lake. Tracker was first introduced in CDAP v3.4. Tracker v0.2 has just been released along with CDAP 3.5 and packs a ton of new features. Dataset … Read more


A Data Quality Application Template for CDAP

Shilpa Subrahmanyam

One of Cask’s core goals is making a reasonably-experienced Java developer’s life much easier when building Hadoop applications. My summer project was aligned with the company’s effort to take this to the next level by lowering the barrier to entry for using Hadoop even further — Java proficiency not required. I spent my summer writing … Read more


Join us for the 2nd Big Data Application Meetup

Henry Saputra

Cask is proud to host the second Big Data Application Meetup on August 19, 2015 at Cask HQ in Palo Alto. By sponsoring and promoting knowledge-sharing and community-building through the Big Data Application Meetup, Cask continues to take lead in promoting technologies and best practices used to build big data applications. For the second meetup, we have … Read more


AeroCask – Real-time Flight Data Analytics using CDAP

One of the many things that I love about Cask are the hackathons before every release. It is not only a way for us to dog-food new features in the CDAP platform but it is also an opportunity to let your imagination run loose and implement an integration with another system; or develop an interesting … Read more


A Look at Automating Cluster Creation in the Cloud with Coopr

davidb

Coopr is a cluster provisioning system designed to fully facilitate cluster lifecycle management in public and private clouds. In this blog, we will take an inside look at what happens when Coopr provisions a cluster. Deploying clusters can be time-consuming. For many system deployments, this work can be accomplished with a configuration management tool such … Read more


Multitenancy for Hadoop: Namespaces – Part II

bhooshan

We introduced the concept of namespaces and how it helps to bring multitenancy to Apache Hadoop in a previous blog. We also briefly introduced the use of namespaces in CDAP,  leaving out the implementation details. In this blog we’ll discuss some of the requirements that influenced the design of namespaces in CDAP, as well as … Read more


Hadoop Components Versions in Distros Matrix

The Apache Hadoop ecosystem is always evolving, with the major distributions constantly upgrading their included core Hadoop components. This can present a challenge when building any application which runs on top of Hadoop. When developing our open-source application framework, CDAP, we strive to maintain compatibility with all major Hadoop distributions. Building on our previous reference … Read more


CDAP 3.0 – From Zero to App in 5 minutes

The Cask Data Application Platform (CDAP) was created with the intent of empowering all developers to build data applications. It was, is and always will be a developer platform – a platform with the mission to provide developers with simple access to power technology. CDAP has proven to significantly lower the barriers to building Hadoop … Read more