Deploying CDAP packages from source via Coopr


Chris Gianelloni is a DevOps Engineer at Cask and works on automating all the things to improve developer productivity, including building a self-service cluster architecture with Coopr. Previously, he worked at several startups and as a consultant at large companies such as Apple and Yahoo. Chris has been an open source developer for over a decade and has contributed to multiple projects over his career.


Developing features for CDAP follows a similar workflow as working on many projects. Developers have their local checkout of the source, make modifications in a feature branch, build and test locally on their development machines, push their branch, and submit a pull request for code review. During this process, developers build CDAP clusters (for testing) in the cloud with Coopr, using public cookbooks from the Chef Supermarket, which are maintained by Cask’s operations team, and shipped with Coopr’s provisioner. With a small wrapper cookbook, such as coopr_service_manager, it can also control the starting and stopping of the various CDAP services.

Deploying CDAP on clusters using scripts (the old way)

Remote testing was done by developers using a collection of scripts which would build JAR artifacts of CDAP on the developer’s machine, copy the JARs to their remote cluster, update the init scripts to point to the new directory, and restart the CDAP services. Coopr was used to quickly create the clusters, which was beneficial, but was otherwise not leveraged. The scripts worked well enough and were sufficient for most cases. However, there were a few problems with this approach.

First, all builds were done in the developer’s machine from their local git working directory. Once the JARs were built on their machines, they still had to be uploaded to the remote cluster. This required the developer’s machine to remain online during the entire operation. No firing off a build of their branch and catching the train home. Developers had to wait for their clusters to update. This effectively limited developers to only being able to easily test a single branch or feature at a time.

Second, the updated init scripts were different from the package init scripts, which were checked into the CDAP source repository. This meant that any changes in the source init scripts were not reflected in the updated init scripts. As features in CDAP changed and the init scripts needed updates, this became a problem.

Finally, using these scripts would decouple the running CDAP version from the managed CDAP version that was installed via Coopr. This would present itself as a problem whenever a new version of CDAP was released. Developers couldn’t use Coopr’s ability to reconfigure a cluster or start and stop services without having the Coopr-managed CDAP packages get updated. The packages would replace the updated init scripts with release ones, breaking the testing and forcing the developer to put on their operations hat and dig around on their cluster to repair the problem.

Deploying CDAP on clusters using Coopr (the new hotness)

The primary obstacle to using Coopr for testing development branches was getting the code onto the clusters. The cdap cookbook only installed released packages from an APT or YUM repository. Released code was installed using the package resource type in Chef. 

package 'cdap-master' do
  action :install
  version node['cdap']['version']

What Cask needed was a way to build the packages from a source checkout and install them, in place of the released packages. Coopr would then be able to perform this work, offloading it to the cluster to be processed, rather than the developer’s machine. This frees the developer to work on other things within the same repository or against another cluster. Since we’re now building and installing native packages for the cluster’s platform, we’re using the actual init scripts, built from the repository. Changes in the code are reflected on the cluster. There’s no special glue to make things work.

To solve this issue, we wrote a wrapper cookbook, which consists of a four tasks:

  • Setup build dependencies (Java, Maven, Node.JS, rpm-build, fpm, etc.)
  • Checkout Git repository
  • Build packages via Maven
  • Install packages from built artifacts

The first three tasks are straight-forward to accomplish with normal Chef resources, so we won’t go into detail on how they’re implemented. The installation of packages from built artifacts is the more interesting piece of this cookbook. The cdap cookbook already has resources defined for our packages, so we only need to modify them prior to them being used by the cookbook. This is done by getting a list of built packages, comparing them to the package resources already defined, and modifying the package resources to point to the new source locations for each package.

Chef runs consist of a compile stage, where the cookbooks are evaluated and resources are defined and collected, and a converge stage, where the defined resources are applied to the machine. One challenge in this task is the need to gather information from on-disk resources which do not occur until the converge phase of Chef. Resources are parsed and created during the compile phase of Chef, before any work has been done by Chef. This means we cannot use the Maven build output in our resources at compile time, and must modify them at converge time. 

# This block updates the package resources from the cdap cookbook.
# - identifies each package in the cdap repo
# - loops through packages and determines cdap cookbook's package resource name
# - adds source attribute to package's resource
# Example: cdap-master/target/cdap-master_2.8.0-1_all.deb adds source attribute
# to package[cdap-master] resource from cdap cookbook
ruby_block 'modify-cdap-package-resources' do
 block do
   pkg_files =
     if node['platform_family'] == 'debian'
     elsif node['platform_family'] == 'rhel'
   pkg_files.each do |f|
     p = f.split('/')[-3]
     p = 'cdap' if p == 'cdap-distributions'
       r = resources(package: p)
       r.provider(Chef::Provider::Package::Dpkg) if node['platform_family'] == 'debian'
     rescue Chef::Exceptions::ResourceNotFound
       Chef::Log.warn("No package[#{p}] found in the resources collection... skipping")
include_recipe 'cdap::fullstack' 

Let’s walk through this ruby_block resource. First, we identify the package files that have been built, by platform family. Next, we loop through each package file and identify the package name. In CDAP, packages match their top-level module name. As with any good rule, there must be an exception. The next line sets the package name to “cdap” if the module is “cdap-distributions”, to match the Maven build output. The begin marks where we load the package resource from the collection, then add the source attribute to each resource found, or log a warning if not found. Finally, we include the fullstack recipe from the cdap cookbook. This final step is crucial, as it includes the package resources in the resources collection, in the first place.

From here, the developer can use their new CDAP cluster for testing or for running applications, with CDAP built from any source branch.


<< Return to Cask Blog