Collecting metrics and providing access to metrics is a must-have requirement for any application platform — and is even more important when it comes to distributed systems. In this post we will examine some aspects of designing metrics systems for a distributed application platform and take a brief look at one built for the open source Cask Data Application Platform (CDAP).
Different people on a team use metrics system for various purposes, but we can boil down the requirements to:
- Providing actionable information about the current state of the platform and its apps
- Helping to identify and debug issues in the platform and apps
- Providing useful insights on apps
- Delivering data for third-party metrics, alerting and monitoring systems
So, what is so special about a distributed, multi-tenant data application platform such as CDAP? Well, at least a few things:
- Metrics aggregation across distributed components of the platform and apps
- Rich context for metrics data points, such as:
- Namespace -> App -> Program -> ProgramRunId
- UserGroup -> User
- Namespace -> Dataset -> App -> Program -> ProgramRunId
- Rack -> Hostname -> Container
- Large volume of metrics data
- A rich metrics query API with visualization and integration capabilities
Metrics are emitted from many processes, distributed across a cluster, that can have tens to thousands of nodes or even more. To be able to analyze and visualize useful insights of metrics data, you need to aggregate it in a central place. In a distributed, potentially large, cluster, failures of different kinds (node crash, network downtime) happen more often than in a single server setup.These must be handled by metrics emitters by design. Losing metrics data makes metrics less useful. Showing incorrect data makes them even worse.
In an application platform, the runtime “knows” a lot about the metrics’ origins —the metrics context: which application, which component of the application emits a metric, and so on. Given the distributed system nature, a large amount of information is available, useful, and thus need to be collect: rack id, hostname, process id, container id…. Add multitenancy to that and you will get even bigger context: namespace, user, etc. It is not unusual for the context from which a metric is emitted to contain tens of attributes—or tags—associated with it. Tags in the context may even form independent hierarchies, as in the example above.
Metrics aggregation and processing has to scale well: if you combine rich metrics context with the multiple processes running on a cluster, all emitting metrics, they can generate large volume of metrics data even on a moderately-sized cluster with just tens of nodes.
Metrics data is very rich, so the querying interface must be sophisticated to allow you get the most value from it. You should be able to slice and dice across different dimensions, apply aggregation functions, break down time series into multiples and much more. Often, you want to be able to configure pre-aggregating of the data to enable faster queries; but you need to be careful: the trade-off is a dramatic increase in the amount of data stored.
Many times, developers and companies have favorite metrics systems they use, including one to have all metrics in a single place. A metrics system of a platform should allow configuring data to be pushed to third-party systems for that purpose. Additionally, there are alerting and monitoring systems that might need to consume the same data. Having a pluggable mechanism helps to cover these requirements.
Let’s now take a look at CDAP and its design to discuss some of the possible choices you can make while building such system. Below is a schematic, high-level view of the CDAP Metrics System (taken, with polishing, from CDAP-760):
A process that emits metrics uses a hierarchical MetricsContext to acquire a Metric. MetricsContext is initialized with a set of tags which has all the information available and useful about the emitting context: such as hostname where the process is running; program id that the process belongs to; etc.
As soon as there’s more information available about the context, a child context is created by augmenting the existing context. For instance, if a program uses a dataset, then the dataset client object will be initialized with a child metrics context that has at least one extra tag: dataset id. When a dataset operation emit metrics such as number of bytes read, those metrics will be emitted with the extra tag that includes dataset id. Thus, passing around context and refining it is easy and straightforward.
Metrics are aggregated for a small, configured amount of time (usually 1 second) in the emitting process before getting flushed to Apache Kafka, which has a lot to offer for transporting metrics data. Kafka scales well and allows to configure different topics for different metrics data, for isolation, and quality-of-service if needed. For example, a system metrics that may be used for scheduling in workflows will have a higher priority for processing compared to a user-defined custom metric that is used only for observation. Kafka also provides a durable store for metrics data until they are processed. Being durable, Kafka also allows you to fetch the most recent data and push it directly to the clients (e.g. push to a browser) if needed.
Metrics records are consumed from Kafka by a scalable MetricsAggregator and then fed into the built-in CDAP Metrics System. In the future, MetricsAggregator will be a place where you can add plugins that can push metrics out, as this is where the “raw” metrics data is available prior to processing.
The built-in CDAP Metrics System is configured to pre-aggregate metrics data to make it available in the CDAP RESTful API and for the CDAP UI. It is optimized to pre-aggregate only the data that will be queried. As CDAP evolves, the pre-aggregation configuration can be adjusted; if new features are added in the UI or in RESTful APIs, needed data will be pre-aggregated without interrupting the running programs and system services.
Metrics are persisted in HBase which is a scalable distributed columnar data store commonly used with Hadoop and is currently a requirement for CDAP. You can configure the data retention policy for its tables, which is very useful to ensure that old metrics data gets automatically purged. Usually you want to set a different retention for a different resolution of aggregated metrics; most of the time, you don’t need to keep 1 second fine granular data for more than a couple days, while you may want to keep aggregated for 1 hour data points for months.
Both processing and query services run in YARN with help of Apache Twill and can be scaled independently depending on the case. For instance, based on the number of users actively looking at CDAP metrics, you may decide to increase the number of containers serving metrics queries. The CDAP Router on the diagram is a separate component that runs outside of YARN and routes incoming requests to the services that run on YARN.
Finally, CDAP 3.0 (to be released very soon) comes with a nice Ops Dashboard on top of the built-in Metrics System to visualize metrics data. CDAP 3.0 will also include a new Cube Dataset that re-uses much from the built-in Metrics system and can be used as a transactional OLAP Cube in applications. Stay tuned and please do not forget to stop by to talk to our engineers at HBaseCon next week.
Distributed application platforms add additional requirements for collecting and providing access to metrics, particularly around richer metrics context, managing large volumes of data and handling failures. There are number of tools and technologies that are available to help you address these requirements, and I hope that the overview of CDAP’s Metrics System is helpful in making your design decisions.