Twill, formerly Weave, accepted into the Apache Incubator

Please note: Continuuity is now known as Cask, and Continuuity Reactor is now known as the Cask Data Application Platform (CDAP).

For the past few years, applications have been generating hundreds of petabytes of data. Analyzing this data can create real business value, but until recently businesses had discarded the data because it was either too hard to analyze or too expensive to store using traditional relational databases.

The answer to this problem is a free, open-source technology running on commodity hardware: Apache Hadoop. Hadoop makes it cheap to store Big Data and easy to extract valuable insights using a batch-driven analysis method called MapReduce. It turns every web app into a data-driven app.

The first version of Hadoop had its drawbacks. MapReduce was the only thing you could run on your Hadoop cluster. What if you wanted to do real-time analysis or run a message-passing algorithm? Your cluster had more compute capacity than your Big Data needs, but it mostly remained unused.

The latest version of Hadoop (Hadoop 2.0) addressed this issue by separating cluster resource management from MapReduce. Its resource manager YARN allows you to use the Hadoop cluster for any of your computing needs, including distributed testing, stress-load generation or other types of analysis.

With YARN as its operating system, your Hadoop cluster turns into a collection of virtual machines. However, YARN’s interfaces are too low-level for rapid application development. Developers writing YARN applications typically find themselves writing the same boilerplate code over and over again for every application.

YARN is at the core of Continuuity Reactor and we develop many YARN applications every day. We got tired of cutting and pasting code, so we distilled these common code patterns into a set of libraries, named Twill (previously called Weave). Twill uses a programming model similar to Java threads making it easy to write distributed applications.

Twill has built-in support for real-time application logs and metrics collection, delegation token renewal, application lifecycle management, and network service discovery. This greatly reduces the pain that developers face when developing, debugging, deploying and monitoring distributed applications.

Our mission is to ignite the next generation of Big Data application development and make Hadoop accessible to all developers. We think the broader community of Java developers will greatly benefit from Twill. So, we open-sourced it. Today, we’re excited to announce that Twill has been accepted into the Apache Incubator. If you are a Java developer building Big Data applications, take it for a spin and help us make it better.

<< Return to Cask Blog