Open Database Connectivity (ODBC) is the de-facto standard API for accessing data stored in relational databases. ODBC drivers allow applications across a variety of platforms (especially non-Java) to access relational databases in a manner independent from the implementation and the operating system.
In this blog we will discuss the integration between CDAP Datasets and Tableau using the CDAP ODBC driver with a simple use-case. Datasets is a core abstraction within the Cask Data Application Platform (CDAP) for organizing, storing and accessing data from multiple storage engines in a uniform manner. Instead of forcing users to manipulate data with low-level APIs, datasets provide higher-level abstractions and generic, reusable implementations of common data patterns. Some of the datasets that CDAP provides out-of-the-box are Time Partitioned Filesets, Cube, and TimeSeries dataset. Another motivation behind datasets is to allow them to be accessed (both read and write) across multiple processing paradigms (both real-time and batch) like CDAP Flows, MapReduce, Spark and others.
In addition, CDAP allows developers, data scientists, as well as business analysts familiar with SQL to explore datasets. Since the platform supports SQL, users can also use the CDAP JDBC driver in their Java applications to programmatically access and manipulate this data. We recently added ODBC support in CDAP, enabling a wider variety of applications that support ODBC drivers, with seamless access to CDAP datasets. Let’s see it in action.
The following example shows a typical Cask Hydrator pipeline used to ingest customer data into a CDAP Table dataset. The pipeline reads a stream of events containing comma-separated customer information from a CDAP Stream. It then parses the data to extract fields and loads them into a table dataset “customers_ingest”.
The first step for this would be to install the CDAP ODBC driver following these instructions. Once installed, users can connect to CDAP from Tableau, by selecting the “dataset_customers_ingest” table. Once the dataset_customers_ingest table is connected, users will automatically be able to explore data in that table like below: