Over the last few years, the popularity of cloud-based software development has risen dramatically, along with the need for sharing development assets and resources within and across organizations. Containers and open source have simplified the sharing and cloning of code and entire dev/test environments, taking efficiency, collaboration and productivity of product engineering organization to new levels. In this blog we take an internal view of some of the challenges software engineers at Cask have to tackle, and how cloud-based self-service tools have enabled our team to be more productive using Google Compute Platform.
Self-service cluster provisioning for developers
Engineers at Cask develop complex distributed system technologies; developing and testing against real clusters is such a critical and frequent component of our development process. Our approach to having a faster and easier self-service way to provisioning clusters was Coopr – an open source cluster management software that manages clusters on public and private clouds, developed by Cask. Clusters created with Coopr utilize templates of a variety of hardware and software stacks, from simple standalone LAMP-stack servers and traditional application servers like JBoss, to full Apache Hadoop clusters comprised of thousands of nodes. Clusters can be deployed across many cloud providers (Rackspace, Joyent, OpenStack, Amazon, GCP and the like) while utilizing common SCM tools (Chef and scripts).
Cask Data Application Platform (CDAP) runs on several hadoop distros (CDH, HDP, MapR, EMR, HDInsight) and supports multiple versions of them. Hence, continuous integration and testing are critical components to identify any change in CDAP functionality affecting any of the large number of distributions we support .
Cask’s test infrastructure automatically provisions and tests the latest CDAP code against all versions of supported distros. There are three critical pieces that are needed to enable this:
- Automated cluster provisioning
- Test Suites that can capture the various integration testing scenarios
- A scheduling component to co-ordinate the tests
Our cloud-based dev/test environment leverages Coopr for automatic cluster provisioning, test suites written by our developers to run complex integration tests, and Atlassian for bamboo scheduling and for running the tests. We have organized the tests into various build plans, each build plan running suites of tests against all supported distros. If the tests pass, clusters are torn down, but in case of failures clusters are kept alive for 12 hours to help us identify the cause of the failure, and to fix it before the next run.
The screenshot below shows one such build plan to test across all the distros.
Cask Internal Infrastructure
We have a number of internal infrastructure components for the build systems, bug tracking system, wiki, and monitoring infrastructure; all of these have different resource needs and capabilities to scale up and scale down on-demand. With a growing number of developers using these infrastructures, and at crunch time – such as during a product release – we have to be able to scale up resources to meet our needs.
The type of capabilities needed in our dev/test environment lend themselves to being operationalized and managed as cloud-based resources. When we set out to define the key requirements for our cloud infrastructure, we found ease of use of Cloud APIs, the performance of cloud VMs (in-terms of spin up time), and the ability to cost-effectively compartmentalize projects important criteria for selecting a cloud provider . For our environment and in our experience over the past three years, we found Google Cloud Platform to score high across all of these dimensions.
Since onboarding with GCP we have created 28388 clusters and 52227 nodes till date, averaging 25 clusters each day for development and testing; and we are ramping up on more as we onboard new engineers and support new distro versions. If you are interested in trying out CDAP, download it or access it as a Cloud sandbox from here.