The Cask Data Application Platform (CDAP) is an open-source platform to build and deploy data applications on Apache Hadoop™. As of version 3.0, it includes a slick new user interface to help users deploy, manage and monitor their data applications. This UI provides real-time updates from the CDAP backend.
Initiating too many HTTP requests from a web browser to a server is a costly operation that we wanted to avoid when developing the new CDAP UI. So, we decided to use WebSockets and chose SockJS as the framework. The CDAP front end communicates with a NodeJS server (a node-proxy) which in turn communicates with the backend Java server. The client web browser opens a socket connection with the NodeJS server to send messages which in turn are routed as HTTP calls to the backend Java server’s RESTful endpoints.
To make this communication with the node-proxy simpler on the client side, we created a service layer that fetches data to be used in a controller for each feature.
- A request is constructed with a URL and, if necessary, a JSON body.
- If SSL or security are enabled, additional headers are added for authorization and authentication.
- The request is sent as a message through WebSockets.
- The node-proxy receives the message and makes the request to the backend server.
- Once the node-proxy receives the response, it routes it through the socket back to the client.
- The response is identified by its request URL and is then routed to the appropriate listener (controller).
The biggest advantage of the above approach is that none of the polling mechanism sits in the client. It merely uses the service layer to make a request or poll for data, and receive it without any additional concerns. There are no HTTP calls made from the client.
We can further enhance this to become a pseudo-push pattern where the node-proxy can compare the change in the server’s response for two subsequent requests. If there is no change in the data, no message is sent to the client. As a result, a message is sent to the client only when new data is available, rather than the client asking for new data repeatedly.
The overall idea here is to send messages from the client to the node-proxy, which recognizes the message and perform an appropriate action (request or poll). These messages are specifically named ‘REQUEST’, ‘POLL’, and ‘POLL-STOP‘.
To give a brief overview of what is underneath:
MyDataSource is an Angular factory that acts as the wrapper to send out messages and route responses to appropriate controllers. Internally, MyDataSource maintains a map of controller IDs to callback functions. The MyDataSource returns a DataSource constructor function, which is instantiated in each controller.
The one caveat of not managing the polling or request mechanism on the client is the issue of passing on the $scope of the controller to the MyDataSource. This way, when either a user navigates to a different state, or when the controller is destroyed, the entry from the map is automatically removed and a POLL-STOP message is sent to the node-proxy to stop the polling process for that particular resource.
Modeling the data layer for reusability
The above setup worked great when we used MyDataSource on a small scale (with five to six features). However, as the webapp grew larger, instantiating MyDataSource and constructing the URL in all controllers no longer was reusable! This is where the Angular ‘$resource’ comes into play. The Angular documentation describes it as:
“…a factory which creates a resource object that lets you interact with RESTful server-side data sources.”
The need for $resource boils down to two simple reasons:
- Re-usability: services that can be reused anywhere.
- Better API visibility: we see one service and know all the APIs that are being used.
Using $resource initially seemed like a great idea, but then we snapped back to reality and faced one big question:
“How would we integrate MyDataSource, which uses WebSockets under the hood and doesn’t really support promises, with Angular’s $resource?”
This question can be broken into two different problems:
- $resource is solely used for making HTTP calls (the underlying protocol is entirely different: HTTP versus WebSockets).
- $resource has its APIs closely tied to promises (although you can use callbacks it is generally preferred to use promises as they are better at chaining).
The first problem could be easily solved by the awesomeness (!) of Angular. The Angular Decorator pattern can intercept a third party library (ideally a service or factory) before its usage with custom logic to override or modify its original behavior.
Since $resource uses $http under the hood, it was quite simple to decorate $http with our own logic to use MyDataSource or otherwise delegate to the regular $http service. This potentially acts as a switch to decide whether to use MyDataSource factory or the $http service based on the request.
All that was required was add an additional options argument to the $http config that is being passed by any service using $http (in this case, $resource). If ‘options’ is available, it indicates that we need to route the request through MyDataSource instead of the regular $http service.
However for poll, we need a callback that is invoked multiple times and, though we can use $resource, it internally uses the ‘then’ handler returned by $http service. We initially thought of using the reactive programming approach using the RxJS library which implements observables. However, they don’t expose the same API as a promise.
The caveat here is that a promise once resolved will not resolve again. That’s one of the specifications of a promise. Subsequent resolves just doesn’t do anything. Internally, until a promise is resolved, it maintains a list of handlers to callback. Once resolved, the list of handlers is reset and listeners won’t be called again.
To solve the second problem, we would require a modification that changes this condition. A solution would be to take the promise implementation and introduce a flag called Observable which is passed in as an additional argument to the Promise constructor function. If the observable is set to true while resolving, the promise never resets its list of handlers to null or its state to FULFILLED. This allows us to resolve the promise and invoke the subsequent ‘then’ handlers multiple times.
Now that we have an ‘observable’ promise and a decorated $http service, it’s just a matter of stitching them together to work with WebSockets. We use the decorator in our main angular module’s config function and implement the request and poll APIs of MyDataSource to return MyPromise. This ensures:
- We have the same promise API as the one internally used in $http; and
- The promise now resolves multiple times.
Beyond this, we need to use $resource and write our own service, to be used across different controllers. This code snippet demonstrates how we pass an ‘options’ property to $http config from the $resource service:
Voila! Now we have a service API that can be injected into different controllers and used without instantiation: no new operations.
In summary, the above mentioned setup helps in easing out the process of using websockets on a large web application using angular as its framework. It helps in managing the data layer of our angular application in a better way. If you would like to see an end-to-end usage of $resource used with WebSockets, take a look at this git repo . For further reference do checkout our CDAP UI project where we replicate the above setup in production.