Visualization of real time data using Databricks and Grafana

In today’s fast-paced digital world, the ability to monitor and analyze data in (near) real-time has become essential for businesses across industries. Whether it’s tracking user activity on a website, monitoring sensor data in IoT devices, or analyzing financial transactions, having access to timely insights can make all the difference in driving informed decision-making and staying ahead of the competition.

We’ll delve into the challenges of handling streaming data, the architecture of our solution, and the step-by-step implementation process. By the end of this post, you’ll have a clear understanding of how to harness the power of these tools together to gain valuable insights from your data in real-time.

Challenge

Imagine you’re part of a team responsible for monitoring and analyzing data from a rapidly evolving API or sensor. This data is critical for making timely decisions and responding quickly to changing conditions. However, the sheer volume and velocity of the data present significant challenges. Traditional data processing and visualization tools struggle to keep pace with the influx of streaming data, leading to delays in insights and missed opportunities.

Solution

To address these challenges, we turned to two industry-leading platforms: Databricks and Grafana. Databricks provides a scalable and efficient environment for processing and analyzing large volumes of data, while Grafana offers powerful visualization capabilities for creating interactive dashboards and monitoring the health of our system.

In the following sections, we’ll walk you through our journey of ingesting data from the API into Databricks, processing it in near real-time, and visualizing the insights using Grafana. We’ll discuss the architecture of our solution, the implementation details, and the benefits it brings to our organization.

Architecture

Figure 1: Architecture

Organizations with a Grafana Enterprise subscription have the option to establish a direct connection between Grafana and Databricks. This allows Grafana to query Databricks directly for data and visualize it in real-time without any intermediate steps. While this approach offers simplicity and convenience, it may require an additional investment in Grafana Enterprise, which can be expensive for some organizations

In the first step, we pull the data first from the REST API end point using python script that runs in infinite loop and requests de data every 15 minutes. Note that here, we used a REST API but this can be done with any type of streaming data. This script runs in Azure Databricks environment. After that the response from the API is processed and transformed. Then the data is saved in Azure storage account and a delta table is made based on this data. As last, we configure the connection from Databricks to Grafana and query the data from the tables.

Implementation

First, to pull the data from the API a simple python script is executed in infinite loop  in Azure Databricks. The code for this is outlined in the figure below.

                                    Figure 2: Infinite loop to keep pulling the data from the API (15 minutes interval)

This code makes use of some helper functions to extract the data via a REST api.

                                    Figure 3: Helper functions to send request and process response

This script also calls the function save_df_to_delta(…) which saves the data to delta format.

                                                     Figure 4: save_df_to_delta function

Based on the delta files, we can create a delta table that can receive queries from Grafana by running the following query:
CREATE TABLE your_schema_name.your_table_name USING DELTA LOCATION ‘/path/to/root/delta/files;

Create Azure managed Grafana:

In this context, we use Azure managed Grafana (AMG). From Azure portal, create Azure managed Grafana instance(full guide via this  link).

Once AMG is set up, click on the URI (Endpoint) form the home screen to surf to Grafana UI.

Figure 5: link to Azure managed Grafana UI

Go to Azure Managed Grafana home page and choose Databricks as a data source: Home menu -> connect data -> search for databricks:

Figure 6: Set up Databricks as data source in Grafana

The connection can be configured manually by following the instructions in the official Grafana website:
https://grafana.com/docs/grafana/latest/administration/data-source-management/

Or can be configured via YAML file (for locally installed Grafana):

Figure 7: YAML config example

Once the connection details are filled, click on test & save. Create a new dashboard and add new visual. Choose the schema, database and table. Now we can query the data from Databricks by writing SQL queries in the query editor and create live graphs as desired.

Figure 8: example query

Grafana refreshes the data based on a specific interval, so when the table in Databricks gets updated, the update will be reflected in Grafana(near real time).

Figure 8: end result graph

Conclusion

In today’s data-driven landscape, the ability to monitor and analyze data in near real-time is crucial for businesses seeking to gain a competitive edge. This blog post explored the challenges of handling streaming data and presented an effective architecture for visualizing near real-time data using Databricks and Grafana.

This architecture involved establishing a direct connection between Grafana and Databricks, offering simplicity and convenience for organizations with a Grafana Enterprise subscription. By leveraging this direct connection, data could be queried and visualized in real-time without any intermediate steps, empowering businesses to make timely decisions based on fresh insights.

Tom Thevelein

Technical Lead @ Aivix